Stock Prediction Based On Genetic Algorithm Feature Selection and Long Short-Term Memory Neural Network
Stock Prediction Based On Genetic Algorithm Feature Selection and Long Short-Term Memory Neural Network
ABSTRACT In the financial market, there are a large number of indicators used to describe the change of
stock price, which provides a good data basis for our stock price forecast. Different stocks are affected by
different factors due to their different industry types and regions. Therefore, it is very important to find a multi
factor combination suitable for a particular stock to predict the price of the stock. This paper proposes to use
Genetic Algorithm(GA) for feature selection and develop an optimized Long Short-Term Memory(LSTM)
neural network stock prediction model. Firstly, we use the GA to obtain a factors importance ranking. Then,
the optimal combination of factors is obtained from this ranking with the method of trial and error. Finally,
we use the combination of optimal factors and LSTM model for stock prediction. Thorough empirical
studies based upon the China construction bank dataset and the CSI 300 stock dataset demonstrate that
the GA-LSTM model can outperform all baseline models for time series prediction.
INDEX TERMS Deep learning, feature selection, genetic algorithm, machine learning, optimization
methods, forecasting.
I. INTRODUCTION results. There are many factors affecting stock prices. With
With the rapid development of social economy, the number the increasing maturity of statistical techniques in the finan-
of listed companies is increasing, so the stock has become cial field, financial scholars have mined a large number of
one of the hot topics in the financial field. The changing stock market impact factors and quantified them into specific
trend of stock often affects the direction of many economic data for the study of stock change trends. With the support
behaviors to a certain extent [1], so the prediction of stock of massive financial data, it provides the possibility for the
price has been paid more and more attention by scholars. implementation of machine learning algorithm. More and
The stock market data has the characteristics of non-linear, more researchers begin to use the non-linear prediction model
high noise, complexity and timing, etc., so scholars have of machine learning to predict stock prices. Nair et al. [4] pro-
done a lot of research on the stock prediction method [2]. posed a decision tree system based on rough sets. This method
The traditional stock prediction method is to build a linear combines the advantages of rough sets and decision trees, but
prediction model based on the historical stock data, Bow- this method is prone to overfitting when dealing with data
den et al. [3] proposed to use ARIMA method to build sets with a large amount of noise, which will affect the trend
autoregressive model to predict stock prices. Although this of stock prediction. In theory, artificial neural network can
method has some advantages in computational efficiency, learn any nonlinear relationship and is less disturbed by noise
the assumption of statistical distribution and stability of the data, so it has been widely used in the field of time series
research data limits their ability to model the nonlinear and prediction. Penman [5], Nottola et al. [6] have respectively
non-stationary financial time series, and the outliers in the carried out a series of prediction work by using neural net-
research data also have a great impact on the prediction work, and achieved better results in stock prediction accuracy
than decision tree. However, neural networks are prone to
The associate editor coordinating the review of this manuscript and local optimal problems in practical applications, and support
approving it for publication was Bilal Alatas . vector machines (SVM) based on structural risk minimization
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
9066 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 9, 2021
S. Chen, C. Zhou: Stock Prediction Based on GA Feature Selection and LSTM Neural Network
greatly reduce the possibility of the model falling into local on the target series over time. Although these methods can
optimal problems. Cao et al. [7] established the SVM stock effectively capture temporal features, they cannot determine
prediction model, which effectively improved the generaliza- effective multi-factors combination. When the number of fac-
tion ability of the model. Ensemble learning is characterized tors increases, the factors tend to have collinearity and inter-
by fast operation rate, strong anti-interference ability and fere with each other. GA has a good effect in the application
high accuracy rate. Deng et al. [8] proved in the experi- of feature selection problem. The application of population-
ment that, after parameter optimization, the random forests based GA can effectively solve the problems of noise and
stock prediction model has a higher prediction accuracy collinearity [19]. Therefore, this paper proposes the use of
than SVM. GA for feature selection of multiple factors, and applies
With the development of artificial intelligence technol- it to the LSTM neural network stock prediction model.
ogy, deep learning has attracted extensive attention due to Through experimental comparison, this method achieved
its excellent performance in machine translation [9], voice remarkable results in improving the accuracy of stock
emotion recognition [10], image recognition [11] and other prediction.
aspects. Compared with the traditional statistical model, The remainder of this paper is organized as follows:
the deep neural network (DNN) can analyze the deep and Section 2 describes the methodologies that are used in
complex nonlinear relationship through the layered feature this study. Section 3 describes the GA-LSTM two-stage
representation, which is suitable for the multi-factor influ- stock price prediction model. Section 4 describes the whole
ence, unstable and complex nonlinear problem of stock data experimental process. In this section, the best combination
analysis [12]. Tsantekidis et al. [13] proposed a stock pre- of feature factors was determined by the process of fea-
diction model based on convolutional neural network (CNN) ture selection and experimental comparison. Section 5 sum-
and compared it with other classical models to verify the marizes the findings and provides suggestions for further
effectiveness of the convolution model in stock prediction. research.
However, due to the timing of stock data, the convolutional
neural network is not the most suitable neural network model II. RESARCH METHODOLOGY
for stock prediction. Selvin et al. [14] proposed three stock A. GENETIC ALGORITHM
prediction models based on CNN, recurrent neural network GA [20] is an adaptive heuristic search algorithm based on
(RNN) and LSTM deep learning networks respectively, and the ideas of natural selection and genetic evolution, which
compared the performance of the three models by predicting is widely used to find the approximate optimal solution of
the stock prices of listed companies. Finally, it was concluded optimization problems with large search space, and can be
that LSTM neural network is most suitable for forecast- effectively used in the selection of optimization features.
ing the stock market with time series due to its long-term GA encodes a potential solution of a problem into an indi-
memorability. vidual, and each individual is actually an entity with charac-
For multivariable financial timing prediction, effective fea- teristics of chromosomes. The algorithm calls such a number
ture selection is very important. Features selection has many of individuals together as a population, and the optimization
benefits, such as: (i) it reduces the training time of the process of GA is carried out on the population [21]. As the
model; (ii) it helps in simplifying the complexity of fore- main carrier of genetic material, chromosome is a collection
casters; (iii) it improves the accuracy of the model; (iv) it of multiple genes. Its internal expression is a combination of
also avoids over-fitting by eliminating unnecessary variables certain genes, which determines the external expression of
from the feature set [15]. The traditional feature selection individual shape. For example, the characteristics of black
methods mainly include filter methods, embedded methods hair are determined by a combination of certain genes in
and wrapper methods. Yu et al. [16] successfully improved the chromosome that control this characteristic. Therefore,
the model prediction accuracy by using PCA for dimension- the mapping from phenotype to genotype needs to be imple-
reduction extraction of feature data combined with SVM mented at the beginning, i.e., encoding work. Because of the
model. Qin et al. [17] proposed a dual-stage attention-based complexity of copying genetic code, we tend to simplify it,
recurrent neural network (DA-RNN) for feature extraction usually in the form of binary strings [22]. Chromosomes that
and sequential prediction. In the first stage, they introduced closer to the optimal solution will have a better chance of
an input attention mechanism to adaptively extract relevant reproducing. After the initial generation of the population,
driving series at each time step by referring to the previ- according to the principle of survival of the fittest and survival
ous encoder hidden state. In the second stage, they used of the fittest, each generation of evolution produced better and
a temporal attention mechanism to select relevant encoder better approximate solutions. In each generation, individuals
hidden states across all time steps. With this dual-stage are selected according to the fitness of individuals in the prob-
attention scheme, their model make predictions effectively. lem domain, and the population representing the new solution
According to the changes of influence information in dif- set is generated by combining crossover and variation with
ferent time stages, Zheng et al. [18] designed a specific the help of genetic operators. This process will result in the
attention network and successfully learned the dynamic influ- population evolution, which is like the natural evolution of
ence of the changes of multiple non-predictive time series the population. Then the population would be more suitable
to the environment than the previous generation [23]. After B. LONG SHORT-TERM MEMORY NEURAL NETWORK
decoding, the optimal individual in the last generation of the RNN is a kind of recursive neural network which takes
population can be used as the approximate optimal solution sequence data as input and performs recursion in sequence
to the problem. evolution direction and all nodes are connected by chain.
Because of its memorability, RNN has achieved good results
in the short sequence model. However, as the length of input
sequence becomes longer, the number of layers in the network
will increase greatly, which will easily cause problems such
as gradient disappearance [26].
LSTM is a special deep RNN, LSTM greatly enhances
the memory capacity of the model due to its special gate
mechanism neural unit structure, and solves the problem
FIGURE 1. Flow chart of GA. Solution Enc: Solution encoding. Fitness of gradient disappearance caused by excessively long input
Eval: Fitness evaluation. Mutat: Mutation. Terminal Cond: Termination
condition. sequence in the learning process of traditional cyclic neural
network. Fig. 2 [27] shows the network structure of LSTM,
GA processing can be divided into seven stages: solu- the LSTM network saves all the information before each time
tion encoding, initialization, fitness evaluation, termination step in the neural unit of the current time step, and each
condition checking, selection, crossover and mutation [24]. neural unit is controlled by the input gate, forgetting gate
Fig.1 shows the whole process of GA. {α 1 , α 2 , . . . , α n } and output gate [28]. The input gate is used to control the
reprents the original feature set. First, it designs a binary input information of the neural unit at the current moment,
encoding for each chromosome β that represents a potential the forgetting gate is used to control the historical information
solution to the problem, i.e., the binary encoding of each stored in the neural unit at the previous moment, and the
chromosome represents each feature combination. In the ini- output gate is used to control the output information of the
tialization phase, the population size is set for the population neural unit at the current moment. The purpose of this design
and a random original population {β 1 , β 2 , . . . , β n } is gen- is to allow the LSTM model to selectively remember more
erated. Then the fitness of each chromosome is calculated important historical information.
according to the pre-set fitness function. The fitness function
is an evaluation index used to evaluate the chromosome per-
formance. In GA, the definition of fitness function is a key
factor affecting performance [25]. The process of calculat-
ing the fitness function will be used to retain the excellent
solution for further reproduction. High-performing chromo-
somes are more likely to be selected multiple times, while
low-performing ones are more likely to be eliminated. After
several rounds of selection, crossover and mutation operation,
we obtain the optimal chromosome β̂. In this paper, we adopt
r2 determination coefficient as the fitness function of GA. FIGURE 2. LSTM internal structure.
The determination coefficient reflects how much percentage
of the fluctuation of Y can be described by the fluctuation of The update calculation method of LSTM is as follows:
X, i.e., the interpretation degree of the characteristic variable
X to the target value of Y. The determination coefficient can it = σ (Wi xt + Ui ht−1 + bi ) (2)
be defined as follows: ft = σ (Wf xt + Uf ht−1 + bf ) (3)
P
(y − ŷ)2 ot = σ (Wo xt + Uo ht−1 + bo ) (4)
2
r =1− P (1) C̃t = tanh(Wc xt + Uc ht−1 + bc ) (5)
(y − ȳ)2
Ct = ft × Ct−1 + it × C̃t (6)
where, the determination coefficient is represented by r2 , ht = ot × tanh(Ct ) (7)
y is the label value, ŷ is the predicted value, ȳ is the average
value, and the value range of r2 is [0, 1]. The larger r2 is, the where, σ represents the sigmoid activation function in the
stronger the ability of X to explain Y of this chromosome is, LSTM network which is used to speed up the training process,
and the more likely it is to be passed on to the next generation. Wf , Wi , Wc and Wo are weight matrices of forgetting gate,
The process of chromosome crossing and mutation has great input gate, update gate and output gate respectively, bf , bi , bc
significance to GA. It is beneficial to increase the genetic and bo are respectively the bias of forgetting gate, input gate,
diversity of the population to exchange the corresponding part update gate and output gate. Finally, the output at the current
of chromosome chain and change the gene combination to moment and the updated cell state at the current moment are
produce new offspring. calculated.
III. GA-LSTM TWO-STAGE STOCK PRICE PREDICTION TABLE 1. Parameters of the GA model.
MODEL
This paper mainly proposes a two-stage stock price prediction
model combined with GA and LSTM deep learning network.
The experiment is divided into the following two stages.
The first stage is to use GA to sort the importance of
factors. The specific steps are as follows:
(i) Binary encoding of chromosomes, random initialization
of the population. We denote the population of GA using pop
TABLE 2. Parameters of the LSTM model.
as follows:
. . . a1,k
a1,1 a1,2
a2,1 a2,2 . . . a2,k
POP = ... (8)
... ... ...
of the original feature set {α 1 , α 2 , . . . , α n }. These multi- Fig. 4 shows the details of the factor importance ranking
factor combinations are used as the input features of LSTM on China construction bank dataset. The top factors EMA,
to predict stock price ŷt at time t. SMA, DKX, low, OSC, MAUDL, CHO, BIAS, UDL, FSL,
EPS, MACD, CCI, OBV, DIF1 and BBI have strong factor
IV. EXPERIMENT
importance, while the factor importance decreases gradually
A. EXPERIMENTAL DATA
after high. MACHO, open and MASS factor have almost
According to China’s financial market, this paper determines no influence on the stock price. Fig. 5 shows the details of
the original factor combination consisting of 40 stock factors the factor importance ranking on CSI 300 stock dataset. The
which including market factors, technical factors and finan- top factors OSC, MASS, UDL, DKX, EMA, MACD, SMA,
cial factors. In this paper, 2490 pieces of historical data of OBV, DIF1, MAUDL, eps, MACHO, WVAD, MA, BIAS and
China construction bank and CSI 300 stock from January 1, FSL have strong factor importance, while the factor impor-
2010 to April 1, 2020 are obtained through the JoinQuant tance decreases gradually after MAFSL. MATRIX, PSY and
quantitative platform [29]. In order to eliminate the dimen- AMO factor have almost no influence on the stock price.
sional influence between indexes and accelerate the speed of
gradient descent to find the optimal solution, we normalized
C. ANALYSIS OF EXPERIMENTAL RESULTS
the data. The normalization principle is as follows,
In this experiment, 2490 historical data of China construc-
x − min(x) tion bank and CSI 300 stock from January 1, 2010 to
x̃ = (10)
max(x) − min(x) April 1, 2020 were substituted into the LSTM model, and
where, min(x) and max(x) are the maximum and minimum the data were processed by mean filling and normalization.
values of x respectively. The experiment is trained according to the first 80% data and
tested according to the last 20% data, and the MSE of test set
B. FACTOR IMPORTANCE RANKING was obtained for model evaluation. In particular, the original
In this subsection, GA is used to carry out 100 iterations for factor subset and the top 30, 20, 10 and 5 factor subsets in
all 40 factors, and the factor importance ranking is obtained. the factor importance ranking were used as the input features
cases, so GA is proposed for feature selection to select the fac- [14] S. Selvin, R. Vinayakumar, E. A. Gopalakrishnan, V. K. Menon, and
tors more suitable for the current scene. And combined with K. P. Soman, ‘‘Stock price prediction using LSTM, RNN and CNN-sliding
window model,’’ in Proc. Int. Conf. Adv. Comput., Commun. Informat.
LSTM deep learning network model, the complex nonlinear (ICACCI), Beijing, China, Sep. 2017, pp. 234–239.
relationship between factors and stocks is mined to predict [15] U. F. Siddiqi, S. M. Sait, and O. Kaynak, ‘‘Genetic algorithm for the mutual
stock prices. information-based feature selection in univariate time series data,’’ IEEE
Access, vol. 8, pp. 9597–9609, 2020.
Although the stock price prediction model proposed in this [16] H. Yu, R. Chen, and G. Zhang, ‘‘A SVM stock selection model within
paper can effectively improve the prediction accuracy and PCA,’’ Procedia Comput. Sci., vol. 31, no. 1, pp. 406–412, May 2014.
has strong robustness, there are still some shortcomings as [17] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. W. Cottrell, ‘‘A dual-
stage attention-based recurrent neural network for time series prediction,’’
follow: Firstly, we only use Chinese stock data for experi- in Proc. 26th Int. Joint Conf. Artif. Intell., Melbourne, VIC, Australia,
ment, so further research can include data from different stock Aug. 2017, pp. 2627–2633.
markets. Secondly, in the design of model parameters in this [18] J. Hu and W. Zheng, ‘‘Multistage attention network for multivariate
time series prediction,’’ Neurocomputing, vol. 18, no. 383, pp. 122–137,
paper, trial and error is usually adopted instead of systematic Mar. 2020.
method to find the optimal size of parameters, such as the [19] G. Li, P. Liu, C. Le, and B. Zhou, ‘‘A novel hybrid meta-heuristic algo-
selection of number of factors. The improvement method is rithm based on the cross-entropy method and firefly algorithm for global
optimization,’’ Entropy, vol. 21, no. 5, p. 494, May 2019.
to combine with other machine learning technologies to find [20] Y. C. Li, L. N. Zhao, and S. J. Zhou, ‘‘Review of genetic algorithm,’’ Adv.
the optimal parameters and improve the interpretability of Mater. Res., vol. 179, no. 180, pp. 365–367, Aug. 2011.
the model. In addition, when the control parameters of GA [21] J. H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor,
MI, USA: Univ. of Michigan Press, 1975, p. 183.
are set, such as crossover rate, mutation rate and number of [22] G. Armano, M. Marchesi, and A. Murru, ‘‘A hybrid genetic-neural archi-
factor combinations, a variety of suitable combinations can tecture for stock indexes forecasting,’’ Inf. Sci., vol. 170, no. 1, pp. 3–33,
be derived to improve the performance of the research. Feb. 2005.
[23] I. Sekaj and V. Veselý, ‘‘Robust output feedback controller design:
Genetic algorithm approach,’’ IMA J. Math. Control Inf., vol. 22, no. 3,
APPENDIX pp. 257–265, Sep. 2005.
[24] S. Jadhav, H. He, and K. Jenkins, ‘‘Information gain directed genetic
Table 4 shows the 40 original factors definition in the algorithm wrapper feature selection for credit rating,’’ Appl. Soft Comput.,
experiment. vol. 69, no. 1, pp. 35–37, Aug. 2018.
[25] K. Deb, ‘‘Multi-objective genetic algorithms: Problem difficulties and
construction of test problems,’’ Evol. Comput., vol. 7, no. 3, pp. 205–230,
REFERENCES Sep. 1999.
[1] E. F. Fama and K. R. French, ‘‘Common risk factors in the returns on stocks [26] D. Wang and J. Fang, ‘‘Research on optimization of big data construction
and bonds,’’ J. Financial Econ., vol. 33, no. 1, pp. 3–56, Feb. 1993. engineering quality management based on RNN-LSTM,’’ Complexity,
[2] P. Ding, Quantitative Investment: Strategy and Technology. Beijing, China: vol. 15, no. 2, pp. 15–20, Jun. 2018.
Electronics Industry Press, 2012, pp. 122–127. [27] Y. Kim, J.-H. Roh, and H. Kim, ‘‘Early forecasting of rice blast disease
[3] N. Bowden and J. E. Payne, ‘‘Short term forecasting of electricity prices for using long short-term memory recurrent neural networks,’’ Sustainability,
MISO hubs: Evidence from ARIMA-EGARCH models,’’ Energy Econ., vol. 10, no. 2, p. 34, Dec. 2017.
vol. 30, no. 6, pp. 3186–3197, Nov. 2008. [28] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[4] B. B. Nair, V. P. Mohandas, and N. R. Sakthivel, ‘‘A decision tree-rough
[29] JoinQuant Quantitative Platform. Accessed: Apr. 15, 2020. [Online].
set hybrid system for stock market trend prediction,’’ Int. J. Comput. Appl.,
Available: https://www.joinquant.com/
vol. 6, no. 9, pp. 1–6, Sep. 2010.
[5] H. Penman, ‘‘Application study of BP neural network and logistic regres-
sion for stock investment,’’ Expert Syst., vol. 5, no. 12, pp. 21–22,
Dec. 1989.
[6] F. Li and C. Liu, ‘‘Application study of BP neural network on stock market
prediction,’’ in Proc. 9th Int. Conf. Hybrid Intell. Syst., Beijing, China,
2009, pp. 2–5. SHILE CHEN received the bachelor’s degree in
[7] L. J. Cao and F. H. Tay, ‘‘Support vector machine with adaptive parameters electrical engineering from the Zhejiang Univer-
in financial time series forecasting,’’ IEEE Trans. Neural Netw., vol. 14, sity of Media and Communications, in 2017.
no. 6, pp. 5–10 Nov. 2003. He is currently pursuing the master’s degree in
[8] J. Deng and L. Li, ‘‘Application of parameter optimization stochastic forest computer Science and Technology with Zhejiang
in stock prediction,’’ Software, vol. 41, no. 1, pp. 178–182, Jan. 2020. Normal University, Jinhua, China. His research
[9] M. R. Costa-jussà, A. Allauzen, L. Barrault, K. Cho, and H. Schwenk, interests include machine learning, deep learn-
‘‘Introduction to the special issue on deep learning approaches for ing, social networking, natural language process-
machine translation,’’ Comput. Speech Lang., vol. 46, pp. 367–373, ing, and finance. He has won scholarships from
Nov. 2017. Zhejiang Normal University twice.
[10] H. M. Fayek, M. Lech, and L. Cavedon, ‘‘Evaluating deep learning archi-
tectures for speech emotion recognition,’’ Neural Netw., vol. 5, no. 3,
pp. 23–28, Aug. 2017.
[11] J. Xing, K. Li, W. Hu, C. Yuan, and H. Ling, ‘‘Diagnosing deep learning CHANGJUN ZHOU was born in Shangrao, China,
models for high accuracy age estimation from a single image,’’ Pattern in 1977. He received the Ph.D. degree in mechani-
Recognit., vol. 66, no. 1, pp. 106–116, Jun. 2017. cal design and theory from the School of Mechani-
[12] D. Lv, D. Wang, M. Li, and Y. Xiang, ‘‘DNN models based on dimen- cal Engineering, Dalian University of Technology,
sionality reduction for stock trading,’’ Intell. Data Anal., vol. 24, no. 1, Dalian, in 2008. He is currently a Professor with
pp. 19–45, Feb. 2020. Zhejiang Normal University. His research inter-
[13] A. Tsantekidis, N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj, and ests include pattern recognition, intelligence com-
A. Iosifidis, ‘‘Forecasting stock prices from the limit order book using puting, and DNA computing. He has published
convolutional neural networks,’’ in Proc. IEEE 19th Conf. Bus. Informat. 60 papers in these areas.
(CBI), Beijing, China, Jul. 2017, pp. 10–15.