0% found this document useful (0 votes)
13 views5 pages

Ic3 2019 8844891

This document discusses an ensemble machine learning approach for stock market prediction. It summarizes previous research applying individual machine learning techniques like SVR, LSTM, and multiple regression for stock prediction. It then proposes a weighted ensemble model combining SVR, LSTM and multiple regression to attain more accurate predictions with reduced variance compared to individual models. The key idea is that combining predictions from different models provides a less noisy overall prediction than any single model alone.

Uploaded by

thumuvsreddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views5 pages

Ic3 2019 8844891

This document discusses an ensemble machine learning approach for stock market prediction. It summarizes previous research applying individual machine learning techniques like SVR, LSTM, and multiple regression for stock prediction. It then proposes a weighted ensemble model combining SVR, LSTM and multiple regression to attain more accurate predictions with reduced variance compared to individual models. The key idea is that combining predictions from different models provides a less noisy overall prediction than any single model alone.

Uploaded by

thumuvsreddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Ensemble Learning Approach for Enhanced Stock

Prediction
Shikha Mehta, Priyanka Rana, Shivam Singh, Ankita Sharma, Parul Agarwal
Department of Computer Science and Engineering
Jaypee Institute of Information Technology, Noida, India
[email protected], [email protected],[email protected],[email protected], [email protected]

Abstract—Stock market prediction is the technique for [2]. Time series modeling is a challenging task as the means
deciding the future estimation of an organization stock or other and the standard deviation of the series changes over a period
money-related instrument exchanged on a monetary trade. of time and thus making the correlation between past and
Fruitful forecasts of stock market lead to high investment present unpredictable. In literature numerous time series
gains. Analysis of stock market data has always been a hot area techniques forecasting techniques such as Autoregressive[3],
of research due to the large number of factors affecting the Autoregressive Moving Average, Moving Average (MA)[4],
stock market. In the last few years, researchers have utilized Exponential Moving Average (EMA), etc. have been applied
machine learning techniques for learning the trends of stock substantially for financial time series predictions attaining
market in order to improve the accuracy of predictions.
good results [3]. However, their performance varied
However, authors have applied these techniques individually
drastically with the change in the time period and sources of
and compared their results. Since the aggregated opinion of a
group of models is relatively less noisy as compared to the time series such as stocks, indexes, and currencies. With the
single opinion of one of the models, this paper presents an rising complexity of the problem, conventionally time series
ensemble machine learning approach for predicting the stock techniques are proving inefficient to deal with the challenges
market. The weighted ensemble model is built using weighted of the digital era. This motivated the researchers to explore
support vector regression (SVR), Long-short term memory machine learning algorithms for predicting the bulls and
(LSTM) and Multiple Regression. From the results it is bears and analyze the behavior of stock markets for making
observed that ensemble learning approach is able to attain trade decisions.
maximum accuracy with reduced variance and hence better
predictions.
In literature, a number of machine learning algorithms
have been applied for predicting market stocks. Yu, Chen,
Keywords—Long Short Term Memory; Multiple Regression; and Zhang [5] applied support vector machine for non-linear
Ensemble approach; Support Vector Regression. classification of market stocks. Authors created a stock
selection model and applied PCA (principal component
I. INTRODUCTION analysis) on a financial dataset in order to extract small
dimensional relevant features for successful classification.
A nation's capital market generally relies upon the width However, accuracy was achieved nearly 62% only.
and profundity of the stock base. The financial growth of a Moghaddam, Moghaddam, and Esfandyari[6] employed
nation to a great extent relies upon the extension and artificial neural network (ANN) for prediction of a stock
advancement of long term capital. The extraordinary market index. ANN is considered as one of the efficient
renaissance seen everywhere around the world is because of classification technique, thus applied on NASDAQ stock
offers and stocks. Enthusiasm for the buying and selling of exchange rate forecasting. Authors used a back propagation
offers isn't just appeared by huge organizations and financial network for daily prediction of stock prices. Dash and Dash
specialists but also by little speculators, people, salaried [7] developed a framework for the analysis of market stocks
individuals, settled salary gathering, etc. Offers are bought using machine learning algorithms. The decision for stock
when the costs are low in the market and sold when they are trading was forecasted with their ANN and decision support
high[1]. The edge is the benefit to the financial specialist. model and compared with SVM, Naïve Bayesian, K-Nearest
Regardless of whether there is a minor enhancement in the neighbor, and decision tree model. Chang et. al. [8] utilized
execution of securities exchange expectation, still incredible Ensemble neural network model for training and testing of
benefit can be accomplished. stock market data. In order to determine stock turning points
Forecasting the trends of the stock market has always in stock trading, intelligent piecewise linear representation
been a relevant area of research in every decade. The methods are used and then training of turning points was
technological advancements, social, economic and political done through ensemble neural networks. Forecasting of
factors, international policies, etc. pose new challenges for turning points was made through this approach and was
financial market prediction in every era. The mushrooming profitable as compared to other approaches. Another work on
usage of WWW has further escalated the issues by financial trading through machine learning was performed by
shortening the life cycle of products and services. Exploring Gerlein et. al [9]. This work is an empirical analysis
time series financial data has always attracted researchers describing the merits and demerits of financial trading
due to it's non-stationary and pseudo-chaotic in nature[1], through existing tools and techniques. However, the work

978-1-7281-3591-5/19/$31.00 ©2019 IEEE


did not propose any new method for good stock predictions. unsupervised learning, the model finds the patterns existing
Qian and Rasheed[10] applied various machine learning in a given class. Ensemble model combines the
classifiers such ask-nearest neighbor, decision tree and predictions/results of various machine learning models in
artificial neural network for forecasting stocks. Accuracy of order to improve the overall performance of a system [15].
only 65% was achieved through collaborated models. The basic idea behind the ensemble method is to unite the
Nunno[11] performed stock market prediction through diverse perspectives obtained from different models to make
regression techniques. The work describes types of better prediction quality[16]. Prediction of various machine
regression methods such as linear and polynomial regression. learning models can be combined in three ways. These
Support vector regression was found to be the most effective methods are max voting, averaging and weighted
model for prediction of stocks in the market. However, it averaging[16]. In max voting methods, various machine
needs some advancement to get better results. Another work learning classifiers make a prediction for every data point.
of support vector regression (SVR) for stock market Prediction made for each data point by each classifier is
prediction was made by Meesad and Rasel[2]. Data is considered as one vote. The final prediction of class for a
preprocessed through different types of windowing operators data point is the one with the majority of the votes i.e. class
and then fed to the SVR model. The model is applied to real- given by the majority of the models. This is mainly used for
time data for a popular company named Dhaka stock classification algorithms. In the averaging method, the
exchange. Narayanan and Govindarajan [1]applied SVM and prediction is made by taking the mean of results obtained
Naïve Bayes model on time series data i.e. stock market from multiple models. It is mainly used for regression
prediction. Authors proposed two new models i.e. AdaSVM models. The weighted average method is an extension of the
and AdaNaive for analyzing stock market data. The averaging method in which each and every model is assigned
performance of the proposed algorithm is compared is SVM a weight which signifies the importance of a model for
and Naïve Bayes and it was observed that the proposed making the final predictions.
algorithm gives better efficacy as compared to existing
algorithms. Tsai et. al. [12] predicted prices of various stocks Ensemble approach is a machine learning method that
through different classifiers. Financial time series forecasting consolidates a few base models with the end goal to create
is made through machine learning techniques by Sung et. al. one ideal model. The ensemble has a different number of
[13]. The direction of stock prices through different learners which are called base learners. The ability of
classifiers is also predicted by Ballingset. al. [14]. Although, generalization of an ensemble is generally more grounded
a number of classifiers have been applied for predicting than that of base learners. Combining learners are engaging
prices of the stock market, yet an efficient technique in light of the fact that it can support powerless learners who
assembling the benefits of the best machine learning model is are somewhat superior to strong learners and can make
missing. precise predictions. "Base learners" are likewise called as
"powerless learners". Stock prediction is a supervised
Most of the machine learning techniques has attained classification problem where forecasting of stock prices are
considerable results; each individual technique has its own made for the future. In this paper, ensemble learning model
merits and demerits. To tackle the concerns of individual combines the decisions of three base learners namely,
algorithms or to take benefits or advantages of all algorithms, Support Vector Regression(SVR), Multiple Regression and
ensemble strategies are gaining more importance[9]. Long short-term memory network (LSTM) for stock market
Techniques such as Deep Neural Networks, SVMs, prediction through the weighted averaging method.
Recurrent Neural Networks, and Ensembles have become
popular not only for their prediction capacity but also for the Ensemble model improves accuracy and robustness over
fact that they can now be actually trained in a stable and single model methods. Ensemble learners are used in this
timely manner. This is due to the increase in computation research paper because it has overcome the limitations of a
power and the advent of new training methods, such as single hypothesis. The target function may not be
Long-Short team memory for Recurrent Neural Networks. implementable with individual classifiers but may be
Ensemble strategies overcome limitations of Neural approximated by model weighted averaging. Thus this
method has been deployed for training stock prediction data.
Networks and classification algorithms. In this paper, the
Ensemble learning algorithm has been deployed with In the next subsection, a method for constructing ensemble
machine learning classifier namely Support Vector learning and algorithm used in ensemble learning are
Regression, Multiple Regression and Long short-term discussed.
memory network (LSTM) for the purpose of stock A. Ensemble Model Construction
prediction. Ensemble model presented here improves
accuracy and robustness over single model methods. It also Ensemble learning is a technique of combining multiple
overcomes the limitations of a single hypothesis. The target machine learning algorithms in order to yield better
function may not be implementable with individual predictions. An ensemble model is built in two stages. In the
classifiers but may be approximated by model averaging. initial step, all the base learners are formulated. Each of
Thus, this method is more beneficial for predicting the stocks these learners is produced in a parallel style where the
in the market. generation of a learner has an impact with respect to the
other learner. In the following stage, decisions of these base
II. ENSEMBLE MODEL FOR STOCK PREDICTION learners are consolidated in two ways- majority voting and
Machine learning is the specialized form of data mining weighted averaging. The popular combination method used
in which models are learned in a supervised or unsupervised is majority voting for classification and weighted averaging
fashion. In supervised learning, a model is learned by for regression.
training it over a given set of examples. Thereafter trained
model is used to predict the class of the new instance. In
B. Algorithms Used 3) Long short-term memory network(LSTM)

The ensemble learning algorithm is formulated using LSTMs are an exceptional type of recurrent neural
three machine learning algorithms such as Support Vector network (RNN) that are skilled for learning dependencies of
Regression (SVR), Multiple Regression and Long short-term the long term given by Hochreiter & Schmidhuber[20].
memory network (LSTM). These are the popular technique used in a large number of
applications and is used to surpass the dependency problem
1) Multiple Regression of long terms[21]. This is a type of RNN which contains a
long sequence repeating part of a neural network. Unlike
Regression [11] is a technique for calculating target RNN with single layer neural network in each module,
values based on the independent predictors. This technique is LSTM has a different structure for the neural network in
generally utilized for determining and finding the relation each module. The outcome obtained in the output block is
between two variables. Regression methods generally vary in fed as input in next iterations. LSTMs are successful in
two different ways, one the number of independent factors overcoming the problem of vanishing gradients.
and second, kind of relationship between the independent
and dependent factors. Regression consists of a number of The core of LSTM is the cell state which runs
independent variables and there exists a linear relationship horizontally through neural network modules. LSTM
between the independent(x) and dependent(y) variable consists of gates through which information flow can be
known as Linear Regression. regulated. These gates consist of pointwise multiplications
with the sigmoid neural network layer. Sigmoid layer implies
Multiple regression is an advanced version of simple that output can be from 0 to 1. Zero means no information is
linear regression. It is used for discovering the relation allowed and one means all information is allowed to transmit
between at least two continuous factors. One is predictor or from cell state.
independent variable and others are responses or dependent
variables. It looks for a statistical relationship but not a 4) Proposed Ensemble Model
deterministic relationship. The relation between factors is
said to be deterministic if one variable can be accurately The ensemble approach proposed in this paper uses a
expressed in terms of others[17]. Multiple regression for ‘y’ weighted averaging method as this method gives more
dependent variable can be expressed by equation (1): promising results as compared to average and majority
voting methods. Initially, the prices of the stocks are
=( ∗ )+( ∗ )+( ∗ )…+ ( ∗ )+Є (1)
forecasted through the base learners i.e. Support Vector
Variable ‘y’ is linearly dependent on ‘k’ independent Regression, LSTM, Multiple Regression. Thereafter in the
variable a1, a2, a3 ... ak. Regression coefficients are expressed second step on the basis of accuracy obtained from base
as β1,β2,β3 ...βk. Є is the tolerance error i.e. difference in learners, weights are assigned to them. In the third step
observed and fitted value. The motive of the multiple average of weighted accuracies of the base, learners are
regression algorithms is to find the best values for a1, a2, a3 ... obtained as final output to the ensemble approach.
ak such that the model is trained well. The model is trained in
a number of epochs to best fit the model with known data. Let A1, w1 is the weight and accuracy of Multiple
regression, A2, w2 is the weight and accuracy of Support
2) Support Vector Regression(SVR) vector regression and A3, w3 is the weight and accuracy of
LSTM, then accuracy of ensemble model is calculated using
Support Vector Machine [18][19] is likewise utilized as a Equation 3.
regression strategy, keeping all the other features intact. The
∗ ∗
same principle of SVM is employed by Support Vector = (3)
Regression (SVR), with some minor contrasts[2]. Error value
yields a real number which makes it exceptionally hard to The next section proves the performance efficacy of the
predict the outcome of current data, which has infinite proposed ensemble model against individual base learners
possibilities. In the case of SVR, epsilon (margin of through experimental results and analysis.
tolerance) i.e. a parameter of the regression model is adjusted
with respect to SVM[20][4]. The basic motive behind SVR III. EXPERIMENTAL RESULTS AND ANALYSIS
remains same as that of SVM i.e. maximize the margin and
minimize the error reserving the tolerated error part. The performance evaluation of different machine
learning algorithms i.e. base learners such as Support Vector
In the present research work, non-linear support vector Regression, Multiple Regression and Long short-term
regression is used to predict the stock data. Non-linear SVM memory network (LSTM) is made on the basis of accuracy
uses kernel function that reconstructs the data into space of against the proposed ensemble model. Experiments are
high dimensional attributes so that linear separation can be conducted in python spyder 3.1 version. Inbuilt libraries of
performed. The Gaussian radial basis kernel function is more python used are – Numpy, Scikit-learn, Pandas, and Keras.
preferred over polynomial kernel function as it gives better All techniques are implemented on a computer system with
accuracy. The Equation of Gaussian radial basis kernel 4GB RAM and intel core i5 processor. The parameter values
function[2]is given in Equation 2. used in base learners are shown in Table I.
, = − ‖ − ‖ (2)
Where is parameter defining the width of bell curve, xi
and xj are feature vector of input space.
TABLE I. Parameter values of Base Learners
Algorithms Parameters Values
Multiple Regression Test_size 0.1
Random_state 4
Kernel Rbf
Support Vector
C Le3
Regression
Gamma 0.1
Init Uniform
Activation ReLu
Loss MSE
Long Short Term Optimizer ADAM
Memory Batch_size 512
Nb_epoch 500
Validation_Split 0.1
Verbose 0
Fig.1. Predicted prices of stocks using Multiple Regression
Yahoo Stock (1996-2016) dataset is used for evaluation of
base learners as well as ensemble approach. Yahoo finance
dataset has been collected from Yahoo stock data [22]. The
data is collected from April 1996 to April 2016. The stock
market took a toll during 2007-2008 financial crises. During
this time companies went in loss and stock data of
companies absolutely volatile (unpredictable). Training
machine learning model using this data would cause the
system to be less accurate. This is because of a lack of trend
during the crisis period. So, such data is avoided that can
result in uncertain behavior. Yahoo stock prediction dataset
consists of 8 features and 5039 instances. Features of yahoo
dataset are as follows:
Fig.2. Predicted prices of stocks using support vector regression
1. Date
2. Date Value
3. Open
4. High
5. Low
6. Close
7. Volume
8. Adj Close

In the stock market prediction dataset, prices of stocks


are defined for different dates. Fig.1, Fig. 2, Fig. 3 and Fig. 4
illustrate the line graphs that depict the prices of stock
predicted from multiple regression, support vector
regression, LSTM and proposed ensemble model
respectively. The x-axis of the line graph denotes time in a
number of days where 1 unit is 20 days. Y-axis denotes
Fig.3. Predicted prices of stocks using LSTM
prices where 1 unit is 2 rupees. The predicted value of each
learner is compared with the target/desired output value It can be observed from the above graphs that predicted
which is already defined in dataset. The desired output and the actual value largely deviates in LSTM learner. In
enables predictors (learners) to calculate its performance case of support vector regression, predicted results better
efficacy through accuracy. Red line in the graph denotes the than LSTM. However, prices of stocks predicted by multiple
prices of stocks predicted by the learner and blue line regression give good overlap of predicted value with actual
denotes the actual stock price. value. Best results are obtained through our proposed
ensemble model. Prices predicted by the proposed model
The weight values in base learners are assigned on the closely mirror the actual prices of stock. It can also be
basis of prediction accuracy obtained. Multiple regression is noticed from plots (given in fig 1to fig 3) that peak of
assigned highest weight i.e. 3 it gives better accuracy in predicted graphs gets dilute from LSTM, SVR, multiple
contrast to the other two. Support vector regression is regression to ensemble model. This is because the proposed
assigned 2 and long short term memory network 1 i.e. least model takes the best of all learners. It assigns the highest
weight. On the basis of this, the weighted average method is weight to the learner with maximum accuracy and lowest
applied in an ensemble model proposed here. weight to the minimum accuracy learner.
support vector regression,” 2013 Int. Conf. Informatics, Electron.
Vision, ICIEV 2013, pp. 1–6, 2013.
[3] M. Asad, “Optimized Stock market prediction using ensemble
learning,” 9th Int. Conf. Appl. Inf. Commun. Technol. AICT 2015 -
Proc., pp. 263–268, 2015.
[4] H. Qu and Y. Zhang, “A New Kernel of Support Vector Regression
for Forecasting High-Frequency Stock Returns,” Math. Probl. Eng.,
vol. 2016, pp. 1–9, 2016.
[5] H. Yu, R. Chen, and G. Zhang, “A SVM stock selection model within
PCA,” Procedia Comput. Sci., vol. 31, pp. 406–412, 2014.
[6] A. H. Moghaddam, M. H. Moghaddam, and M. Esfandyari, “Stock
market index prediction using artificial neural network,” J. Econ.
Financ. Adm. Sci., vol. 21, no. 41, pp. 89–93, 2016.
[7] R. Dash and P. K. Dash, “A hybrid stock trading framework
integrating technical analysis with machine learning techniques,” J.
Financ. Data Sci., vol. 2, no. 1, pp. 42–57, 2016.
[8] P. C. Chang, C. H. Liu, C. Y. Fan, J. L. Lin, and C. M. Lai, “An
ensemble of neural networks for stock trading decision making,”
Fig.4. Predicted prices of stocks using the proposed ensemble model Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell.
Lect. Notes Bioinformatics), vol. 5755 LNAI, pp. 1–10, 2009.
Further results are established through accuracy rate [9] E. A. Gerlein, M. McGinnity, A. Belatreche, and S. Coleman,
obtained through predicted and actual value shown in Table “Evaluating machine learning classification for financial trading: An
II. The accuracy rate is computed by multiplying the empirical approach,” Expert Syst. Appl., vol. 54, pp. 193–207, 2016.
[10] B. Qian and K. Rasheed, “Stock market prediction with multiple
outcomes of SVR, LSTM, Multiple Regression with the classifiers,” Appl. Intell., vol. 26, no. 1, pp. 25–33, 2007.
respective weights that were assigned to them on the basis of [11] L. Nunno, “Stock Market Price Prediction Using Linear and
their accuracies (shown in Equation 4). The output is then Polynomial Regression Models,” pp. 1–6, 2014.
divided by the sum of all weights assigned to the base [12] C. F. Tsai, Y. C. Lin, D. C. Yen, and Y. M. Chen, “Predicting stock
learner. returns by classifier ensembles,” Appl. Soft Comput., vol. 11, no. 2,
pp. 2452–2459, 2011.
[13] M.-C. Sung, T. Ma, M.-W. Hsu, J. E. V. Johnson, and S. Lessmann,
= 100 − ( )
∗ 100 (4) “Bridging the divide in financial market forecasting: machine learners
vs. financial economists,” Expert Syst. Appl., vol. 61, pp. 215–234,
2016.
[14] M. Ballings, D. Van den Poel, N. Hespeels, and R. Gryp, “Evaluating
multiple classifiers for stock price direction prediction,” Expert Syst.
TABLE II. Accuracy rate of base learners and ensemble model on stock
Appl., vol. 42, no. 20, pp. 7046–7056, 2015.
market prediction
[15] C. Zhang, Y. Ma, and Editors, Ensemble machine learning: methods
Algorithm Accuracy Rate and applications. Springer Science & Business Media, 2012., 2012.
[16] T. G. Dietterich, “Ensemble Methods in Machine Learning,” pp. 1–
Multiple Regression 99.02 15, 2007.
[17] D. Enke, M. Grauer, and N. Mehdiyev, “Stock market prediction with
Support Vector Regression 98.56
Multiple Regression, Fuzzy type-2 clustering and neural networks,”
Long Short Term Memory Network 97.63 Procedia Comput. Sci., vol. 6, pp. 201–206, 2011.
(LSTM) [18] “support vector Regression.” [Online]. Available:
https://www.saedsayad.com/support_vector_machine_reg.htm.
Proposed Ensemble Model 99.12 [Accessed: 28-Apr-2019].
[19] F. E. H. Tay and L. Cao, “Application of support vector machines in
financial time series forecasting,” Int. J. Manag. Sci., vol. 29, pp.
309–317, 2001.
Hence it has been observed from the above results that [20] C. Lin, “Large-scale Linear Support Vector Regression,” Jmlr, vol.
the proposed ensemble model gives better results as 13, pp. 3323–3348, 2012.
compared existing base learners. [21] K. Greff, R. K. Srivastava, J. Koutn’\ik, B. R. Steunebrink, and J.
Schmidhuber, “LSTM: Search Space Odyssey,” CoRR, vol.
IV. CONCLUSION abs/1503.0, no. 10, pp. 2222–2232, 2015.
[22] “Yahoo Stock Data.” [Online]. Available:
The paper presented an ensemble model based on various https://github.com/ranapriyanka1604/Ensemble-Approach-to-Stock-
machine learning models i.e. multiple regression, support Prediction/blob/master/yahoostock.csv. [Accessed: 28-Apr-2019].
vector regression and long short term memory networks. The
model applies the weighted average method based on the
accuracies obtained from prediction of Yahoo stock data.
Experimental results depict that though accuracy rates of the
algorithms are not enhanced much (because accuracy rate of
Multiple regression is 99.02 and ensemble learning is 99.12),
but the deviation between the actual and predicted price has
significantly reduced in ensemble model as compared to any
other of the three algorithms for any particular day (shown in
graphs). Therefore, ensemble learning is a more appropriate
model than other single models and can be further used for
other classification problems.
REFERENCES
[1] B. Narayanan and M. Govindarajan, “Prediction of Stock Market
using Ensemble Model,” Int. J. Comput. Appl., vol. 128, no. 1, pp.
18–21, 2015.
[2] P. Meesad and R. I. Rasel, “Predicting stock market price using

You might also like