1 s2.0 S1877050919302789 Main
1 s2.0 S1877050919302789 Main
1 s2.0 S1877050919302789 Main
com
Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2019) 000–000
Procedia
Procedia Computer
Computer Science
Science 14700 (2019)
(2019) 000–000
400–406 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia
Abstract
Abstract
Deep learning has recently achieved great success in many areas due to its strong capacity in data process. For instance, it has been
Deep
widelylearning
used inhas recently
financial achieved
areas such asgreat
stocksuccess in prediction,
market many areasportfolio
due to itsoptimization,
strong capacity in datainformation
financial process. Forprocessing
instance, itand
has trade
been
widely used in financial areas such as stock market prediction, portfolio optimization, financial information processing
execution strategies. Stock market prediction is one of the most popular and valuable area in finance. In this paper, we propose a and trade
execution strategies.
novel architecture of Stock market
Generative predictionNetwork
Adversarial is one of(GAN)
the most popular
with and valuable
the Multi-Layer area in finance.
Perceptron (MLP) asIn the
thisdiscriminator
paper, we propose
and thea
novel architecture Memory
Long Short-Term of Generative
(LSTM)Adversarial Networkfor
as the generator (GAN) with thethe
forecasting Multi-Layer Perceptron
closing price of stocks.(MLP) as the discriminator
The generator is built by and
LSTMthe
Long Short-Term
to mine Memory (LSTM)
the data distributions as the
of stocks fromgenerator for in
given data forecasting
stock marketthe closing price data
and generate of stocks.
in theThe generator
same is built
distributions, by LSTM
whereas the
to mine the data
discriminator distributions
designed by MLP ofaims
stocks
to from given data
discriminate the in
realstock
stockmarket andgenerated
data and generate data.
data inWethe same the
choose distributions, whereas
daily data on S&P 500the
discriminator designed
Index and several stocksbyinMLP aims
a wide to discriminate
range the real
of trading days and stock
try to data andthe
predict generated data. We
daily closing choose
price. the daily data
Experimental on show
results S&P that
500
Index and GAN
our novel severalcan
stocks
get ainpromising
a wide range of tradingindays
performance and try to
the closing predict
price the daily
prediction on closing
the realprice. Experimental
data compared withresults show that
other models in
our novel GAN can get a promising
machine learning and deep learning. performance in the closing price prediction on the real data compared with other models in
machine learning and deep learning.
c 2019
© 2019 The Authors. Published
The Authors. Published by
by Elsevier
Elsevier B.V.
B.V.
c 2019
This The Authors. Published by Elsevier B.V.
This is an open access article under the
is an open access article under the CC
CC BY-NC-ND
BY-NC-ND license
license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
This is an open
Peer-review
Peer-review access
under
under article under
responsibility
responsibility of the
ofthe CC BY-NC-ND
thescientific
scientific license
committee
committee (https://creativecommons.org/licenses/by-nc-nd/4.0/)
ofofthe
the2018
2018 International
International Conference
Conference on on Identification,
Identification, Information
Information and
Peer-review
and Knowledge
Knowledge under
in responsibility
inInternet
the the Internet of of the scientific committee of the 2018 International Conference on Identification, Information
Things.
of Things.
and Knowledge in the Internet of Things.
Keywords: Deep Learning; Stock Prediction; Generative Adversarial Networks; Data Mining.
Keywords: Deep Learning; Stock Prediction; Generative Adversarial Networks; Data Mining.
1. Introduction
1. Introduction
The prediction of stock market returns is one of the most important and challenging issues in this domain. Many
The prediction
analyses of stockin
and assumptions market returns
financial areaisshow
one of
thatthestock
mostmarket
important and challenging
is predictable. issues
Technical in thisindomain.
analysis Many
stock invest-
analyses and assumptions in financial area show that stock market is predictable. Technical analysis
ment theory is an analysis methodology for forecasting the direction of prices through the research on past market in stock invest-
ment theory
data. A is an analysis
meaningful methodology
assumption named Meanfor forecasting the direction
Reversion states that the of prices
stock through
price the research
is temporary on past
and tends market
to move to
data. A meaningful assumption named Mean Reversion states that the stock price is temporary and
the average price over time. Moreover, this assumption has a further development called Moving Average Reversion tends to move to
the average price over time. Moreover, this assumption has a further development called Moving Average
(MAR), which supposes that the average of price is the mean of price in a past window of time, e.g. five days [7]. Reversion
(MAR),
Based onwhich supposes
the views that the
mentioned average
above, of price aisnew
we propose the deep
meanlearning
of pricemodel
in a past windowthe
to forecast ofdaily
time,closing
e.g. fiveprice.
days [7].
Based on the views mentioned above, we propose a new deep learning
The main contributions of this paper can be summarized in the followings: model to forecast the daily closing price.
The main contributions of this paper can be summarized in the followings:
• A novel Generative Adversarial Network (GAN) architecture with Long-Short Term Memory (LSTM) network
as the generator and Multi-Layer Perceptron (MLP) as the discriminator is proposed. The model trained in and
end-to-end way to predict the daily closing price by giving the stock data in several past days.
• We try to generate the same distributions of the stock daily data through the adversarial learning system, instead
of only utilizing traditional regression methods for the price forecasting.
2. Related Work
The stock market prediction can be seen as a time series forecasting issue and one of the classical algorithms is
the Autoregressive Integrated Moving Average (ARIMA) [2]. ARIMA performs well in linear and stationary time
series, but it doesnt perform well on the nonlinear and non-stationary data in stock market. In order to solve this
problem, one approach [9] combines ARIMA with SVM. The idea is that the forecasting is constituted by a linear
part and a nonlinear part, so that they can predict the linear part with ARIMA and the nonlinear part with SVM.
Moreover, another approach [6] combines the wavelet basis with SVM, which decomposes the stock data with wavelet
transformation and uses SVM for forecasting. Subsequently, the Artificial Neural Network (ANN) were combined
with ARIMA to predict the nonlinear part of the stock price data [1]. The hybrid of wavelet transformation and ANN
demonstrated that effective features should be extracted for the training of ANN [3]. Convolutional Neural Network
(CNN) was also used in forecasting stock prices from the limit order book [12]. The number of orders and the price of
10 bid/ask orders were transformed into a 2D array. In addition, some designed RNNs had been applied to forecasting
the stock data [11] [10]. News and events in financial area were extracted and represented as dense vectors to realize
stock prediction [4]. Besides, reinforcement learning is another popular method to improve the trading strategies
through fusing Q-learning and dynamic programming [8].
3. Our Methodology
3.1. Principle
GAN is a new framework which trains two models like a zero-sum game [5]. In the adversarial process, the
generator can be seen as a cheater to generate the similar data as the real data, while the discriminator plays the role
of judge to distinguish the real data and generated data. They can reach an ideal point that the discriminator is unable
to differentiate the two types of data. At this point, the generator can capture the data distributions from this game.
Based on this principle, we propose our GAN architecture for the prediction of stock closing price.
The generator of our model is designed by LSTM with its strong ability in processing time series data. We choose
the daily data in the last 20 years with 7 financial factors to predict the future closing price. The 7 factors of the stock
data in one day are High Price, Low Price, Open Price, Close Price, Volume, Turnover Rate and Ma5 (the average of
closing price in past 5 days). The 7 factors are valuable and significant in price prediction with the theory of technical
analysis, Mean Reversion, or MAR. Therefore, these factors can be used as 7 features of the stock data for the price
prediction. Suppose our input is X = {x1 , ..., xt }, which consists of the daily stock data of t days. Each xk in X is a
vector, which is composed of 7 features as follows:
[xk,i ]7i=1 = [xk,High , xk,Low , xk,Open , xk,Close , xk,TurnoverRate , xk,Volume , xk,Ma5 ]. (1)
The architecture of the generator is shown in Fig. 1. For simplicity, we have omitted the details of the LSTM. With
the generator, we extract the output ht of the LSTM and put it into a fully connected layer with 7 neurons to generate
the x̂t+1 . x̂t+1 aims to approximate xt+1 and we can get x̂t+1,Close from x̂t+1 as the prediction of closing price on the t + 1
day.
The output of generator G(X) is defined as follows:
ht = g(X), (2)
Kang Zhang / Procedia Computer Science 00 (2019) 000–000 3
402 Kang Zhang et al. / Procedia Computer Science 147 (2019) 400–406
^
x t+1
ht ...
LSTM
where g(·) denotes the output of LSTM and ht is the output of the LSTM with X = {x1 , ..., xt } as the input. δ stands
for the Leaky Rectified Linear Unit (ReLU) activation function. Wh and bh denote the weight and bias in the fully
connected layer. We also use dropout as a regularization method to avoid overfitting. Furthermore, we can continue to
predict x̂t+2 with x̂t+1 and X.
The purpose of the discriminator is to constitute a differentiable function D to classify the input data. The discrim-
inator is expected to output 0 when inputting a fake data and output 1 when inputting a real data. Here, we choose an
MLP as our discriminator with three hidden layers h1,h2,h3 including 72, 100, 10 neurons, respectively. The Leaky
ReLU is used as the activation function among the hidden layers and the sigmoid function is used in the output layer.
In addition, the cross entropy loss is chosen as the loss function to optimize the MLP. In particular, we concatenate
the X = {x1 , ..., xt } and x̂t+1 to get {x1 , ..., xt , x̂t+1 } as the fake data Xfake . Similarly, we concatenate the X = {x1 , ..., xt }
and xt+1 to get {x1 , ..., xt , xt+1 } as the real data Xreal . The output of the discriminator is defined as follows:
where d(·) denotes the output of MLP and denotes the sigmoid activation function. Both Xfake and Xreal output a
single scalar. Fig. 2 shows the architecture of the discriminator.
With the two models mentioned above, we propose our GAN architecture. According to [5], in the two-player
minimax game, both G and D try to optimize a value function. Similarly, we can define the optimization of our value
function V(G,D) as follows:
min max V(G, D) = E logD (Xreal ) + E log (1 − D (Xfake )) . (6)
G D
4 Kang Zhang / Procedia Computer Science 00 (2019) 000–000
Kang Zhang et al. / Procedia Computer Science 147 (2019) 400–406 403
0 1
h3 ...
h2 ...
h1 ...
Fig. 2. Discriminator designed using an MLP with Xfake and Xreal as the inputs.
We define the generator loss Gloss and discriminator loss Dloss to optimize the value function. Particularly, we
combine the Mean Square Error (MSE) with the generator loss of a classical GAN to constitute the Gloss of our
architecture. The Gloss and Dloss are as follows:
m m
1 1
Dloss = − logD(Xireal ) − log(1 − D(Xifake )), (7)
m i=1 m i=1
m
1 i
gMSE = (x̂ − xit+1 )2 , (8)
m i=1 t+1
m
1
gloss = log(1 − D(Xifake )), (9)
m i=1
The loss function Gloss is composed by gMSE and gloss with λ1 and λ2 , respectively. λ1 and λ2 are hyper-parameters
that we set manually. Fig. 3 shows the architecture of our GAN. The reason why we put Xfake and Xreal rather than x̂t+1
and xt+1 in the discriminator is that we expect the discriminator to capture the correlation and time series information
between xt+1 and X.
4. Experiments
We evaluate our model on the real stock data, including the Standard & Poor’s 500 (S&P 500 Index), Shanghai
Composite Index in China, International Business Machine (IBM) from New York Stock Exchange (NYSE), Mi-
crosoft Corporation (MSFT) from National Association of Securities Dealers Automated Quotation (NASDAQ), Ping
An Insurance Company of China (PAICC). All the stock data can be downloaded in Yahoo Finance. We select the
trade date within the last 20 years (almost 5000 pieces of data in each stock). For instance, some examples of the
stock features are shown in Tab. 1 . The trade date is not continuous due to the limitation of trading on weekends and
holidays.
Note that the normalization is necessary and a key point to achieve competitive results. With the assumption of
MAR mentioned above, we normalize the data as follows:
xi − µt
xi = , (11)
τt
Kang Zhang / Procedia Computer Science 00 (2019) 000–000 5
404 Kang Zhang et al. / Procedia Computer Science 147 (2019) 400–406
X₀₀
X₁₁
Real Data
...
Xt
X Xt+1
D Is D correct
?
X₀₀ Discriminator
X₀₀ X₁₁
X₁₁
...
...
Xt
Xt
^
Xt+1
G ^
Xt+1
Generator
Fine Tune
where µt and τt are the mean and standard deviation of X. We select t = 5 empirically because we attempt to predict
the data in the next day by data in the past one week (trade is limited on weekends). For instance, we compute the
mean and standard deviation of the data of 5 days to normalize the data. Afterwards, the normalized data are used to
predict the data on 6th day. The data in both training and testing periods are processed in the same way.
Our purpose is to predict these 7 factors and get the closing price in the next day through the data in the past t days.
The reason why predicting 7 factors in the next day is that the generator aims to mining the distributions of the real
data and we can get the closing price from the generated data. The data are separated into two parts for training and
testing. We choose the first 90%-95% of the stock data for training and the remaining 5%-10% (about 250-500 pieces
of data) for testing.
The loss in the training period can be seen in Fig. 4. There is a significant adversarial process between the discrim-
inator and generator during training. Both the discriminator and generator have been optimized during the adversarial
process.
We evaluate the forecasting performance of our model by the following statistical indicators: Mean Absolute Error
(MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and Average Return (AR).
6 Kang
Kang Zhang
Zhang et al. / Computer
/ Procedia Procedia Computer
Science 00Science
(2019) 147 (2019) 400–406
000–000 405
Suppose the real closing price and the prediction of closing price on the k-th day as yk and ŷk . Then the indicators are
given as follows:
N
1
MAE = ŷk − yk , (12)
N k=1
N
1
RMAE = (ŷk − yk )2 , (13)
N k=1
N
1 ŷk − yk
MAE = , (14)
N k=1 yk
N−1
1
AR = yk+1 − yk , if ŷk+1 > ŷk . (15)
N − 1 k=1
We compute the mean of RMSE on our five datasets as the average evaluation. MAE and HR are also calculated
in this way. Support Vector Regression (SVR), ANN and LSTM are classical methods for stock market prediction
and we choose them as the baselines to compare with our model. The prediction results are shown in Tab. 2 with the
boldface as the best results. Low MAE, RMSE and MAPE indicate that the prediction of closing price is approximate
to the real data. AR shows the daily average return of these stocks based on four prediction methods. We can see our
method achieves a competitive performance compared with other methods.
Fig. 5 shows an example of prediction by four methods on PAICC with the same training steps. From Fig. 5 we
can see that the best performance in matching the trend line of the real price is achieved by our method.
5. Conclusion
We have made an exploration in stock market prediction and attempt to catch the distributions of the real stock data
by our proposed GAN. For the future work, we plan to explore how to extract more valuable and influential financial
406 KangKang Zhang
Zhang et al. Computer
/ Procedia / ProcediaScience
Computer
00 Science 147 (2019) 400–406
(2019) 000–000 7
Fig. 5. Illustration of price prediction by our GAN and some compared models on PAICC.
factors from stock markets and optimize our model to learn the data distributions more accurately, so that we can
obtain a higher precision of trend or price prediction in stock market by our method.
Acknowledgements
This work was supported by the National Key R&D Program of China under Grant 2016YFC1401004, the Na-
tional Natural Science Foundation of China (NSFC) under Grant No. 61170312 and 61633021, the Science and
Technology Program of Qingdao under Grant No. 17-3-3-20-nsh, the CERNET Innovation Project under Grant No.
NGII20170416, the State Key Laboratory of Software Engineering under Grant No. SKLSE2012-09-14, and the Fun-
damental Research Funds for the Central Universities of China.
References
[1] Areekul, P., Senjyu, T., Toyama, H., Yona, A., 2010. A hybrid arima and neural network model for short-term price forecasting in deregulated
market. IEEE Transactions on Power Systems Pwrs .
[2] Box, G.E.P., Jenkins, G.M., 1976. Time series analysis: Forecasting and control. Journal of Time 31, 238–242.
[3] Chandar, S.K., Sumathi, M., Sivanandam, S.N., 2016. Prediction of stock market price using hybrid of wavelet transform and artificial neural
network. Indian Journal of Science & Technology 9.
[4] Ding, X., Zhang, Y., Liu, T., Duan, J., 2015. Deep learning for event-driven stock prediction, in: Proceedings of the Twenty-Fourth International
Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pp. 2327–2333.
[5] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y., 2014. Generative adversar-
ial nets, in: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014,
December 8-13 2014, Montreal, Quebec, Canada, pp. 2672–2680.
[6] Huang, S., Wang, H., 2006. Combining time-scale feature extractions with svms for stock index forecasting, in: Neural Information Processing,
13th International Conference, ICONIP 2006, Hong Kong, China, October 3-6, 2006, Proceedings, Part III, pp. 390–399.
[7] Li, B., Hoi, S.C.H., 2012. On-line portfolio selection with moving average reversion, in: Proceedings of the 29th International Conference on
Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012.
[8] Nevmyvaka, Y., Feng, Y., Kearns, M.J., 2006. Reinforcement learning for optimized trade execution, in: Machine Learning, Proceedings of
the Twenty-Third International Conference (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25-29, 2006, pp. 673–680.
[9] Pai, P.F., Lin, C.S., 2005. A hybrid arima and support vector machines model in stock price forecasting. Omega 33, 497–505.
[10] Rather, A.M., Agarwal, A., Sastry, V.N., 2015. Recurrent neural network and a hybrid model for prediction of stock returns. Expert Syst. Appl.
42, 3234–3241.
[11] Saad, E.W., Prokhorov, D.V., II, D.C.W., 1998. Comparative study of stock trend prediction using time delay, recurrent and probabilistic neural
networks. IEEE Trans. Neural Networks 9, 1456–1470.
[12] Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., Iosifidis, A., 2017. Forecasting stock prices from the limit order book
using convolutional neural networks, in: 19th IEEE Conference on Business Informatics, CBI 2017, Thessaloniki, Greece, July 24-27, 2017,
Volume 1: Conference Papers, pp. 7–12.