What-If Analysis Template
What-If Analysis Template
1644
identifying long term dependencies and uses them for future
ht = f (ht−1 , xt ; θ) (1) prediction. However CNN architectures mainly focuses on the
given input sequence and does not use any previous history or
In the case of RNN, the learned model always has the information during the learning process.The motivation behind
same input size, because it is specified in terms of transition testing the models with data from other companies is to check
from one state to another. Also the architecture uses the same for interdependencies among the companies and to understand
transition function with the same parameters at every time the market dynamics.
step. LSTM is a special kind of RNN, introduced in 1997 The train data was normalized. Test data was also subjected
by Hochreiter and Schmidhuber [20] . In the case of LSTM to the same normalization. After obtaining the predicted out-
architecture, the usual hidden layers are replaced with LSTM put, denormalization was applied and percentage error was
cells. The cells are composed of various gates that can control calculated using the available true labels.The error percentage
the input flow. An LSTM cell consists of input gate, cell state, was calculated using (7)
forget gate, and output gate. It also consists of sigmoid layer, h i
tanh layer and point wise multiplication operation.The various i i
abs Xreal − Xpredicted
gates and their functions are as follows ep = i
× 100 (7)
Xreal
• Input gate : Input gate consists of the input.
i
• Cell State : Runs through the entire network and has the where ep is the error percentage,Xreal is the ith real value
i th
ability to add or remove information with the help of and Xpredicted is the i predicted value. Error percentage
gates. gives the magnitude of error present in the output.
• Forget gate layer: Decides the fraction of the information
to be allowed.
III. RESULTS AND DISCUSSION
• Output gate : It consists of the output generated by the
LSTM. The experiment was done for three different deep learning
• Sigmoid layer generates numbers between zero and one, models. The maximum value of error percentage obtained for
describing how much of each component should be let each model is given in Table[I]. From the table it is clear
through. that CNN is giving more accurate results than the other two
• Tanh layer generates a new vector, which will be added models. This is due to the reason that CNN does not depend
to the state. on any previous information for prediction. It uses only the
current window for prediction. This enables the model to
The cell state is updated based on the outputs form the
understand the dynamical changes and patterns occurring in
gates. Mathematically we can represent it using the following
the current window. However in the case of RNN and LSTM,
equations.
it uses information from previous lags to predict the future
instances. Since stock market is a highly dynamical system,
ft = σ(Wf .[ht−1 , xt ] + bf ) (2)
the patterns and dynamics existing with in the system will not
it = σ(Wi .[ht−1 , xt ] + bi ) (3) always be the same. This cause learning problems to LSTM
and RNN architecture and hence the models fails to capture
ct = tanh(Wc .[ht−1 , xt ] + bc ) (4) the dynamical changes accurately.
ot = σ(Wo [ht−1 , xt ] + bo ) (5)
TABLE I: ERROR PERCENTAGE
ht = ot ∗ tanh(ct ) (6) COMPANY RNN LSTM CNN
Infosys 3.90 4.18 2.36
where xt : input vector, ht : output vector, ct : cell state TCS 7.65 7.82 8.96
vector, ft : forget gate vector, it : input gate vector, ot : output Cipla 3.83 3.94 3.63
gate vector and W,b are the parameter matrix and vector.
Convolutional neural networks or CNNs, are a specialized For comparison we have used ARIMA, which is a linear
kind of neural network for processing data that has a known, model used for forecasting.The error percentage obtained for
grid-like topology. This include time-series data, which can the three companies are as follows
be thought of as a 1D and image data, which can be thought
of as a 2D grid of pixels.The network employs a math- TABLE II: ERROR PERCENATGE - ARIMA
ematical operation called convolution and hence known as COMPANY Error Percentage
convolutional neural network. It is a specialized kind of linear Infosys 31.91
operation. Convolutional networks use convolution instead of TCS 21.16
Cipla 36.53
general matrix multiplication in at least one of their layers.
The motivation behind using these three models is to identify
From Table[I] and Table[II] it is clear deep learning models
whether there is any long term dependency existing in the
are outperforming ARIMA.
given data. This can be identified from the performance of
the models. RNN and LSTM architectures are capable of
1645
Fig. 2: Plot for Real value vs Predicted value for INFOSYS Fig. 6: Plot for Real value vs Predicted value for TCS using
using RNN LSTM
Fig. 3: Plot for Real value vs Predicted value for INFOSYS Fig. 7: Plot for Real value vs Predicted value for TCS using
using LSTM CNN
Fig. 5: Plot for Real value vs Predicted value for TCS using
RNN
1646
[5] G. Batres-Estrada, “Deep learning for multivariate financial time series,”
ser. Technical Report, Stockholm, May 2015.
[6] P. Abinaya, V. S. Kumar, P. Balasubramanian, and V. K. Menon,
“Measuring stock price and trading volume causality among nifty50
stocks: The toda yamamoto method,” in Advances in Computing, Com-
munications and Informatics (ICACCI), 2016 International Conference
on. IEEE, 2016, pp. 1886–1890.
[7] J. Heaton, N. Polson, and J. Witte, “Deep learning in finance,” arXiv
preprint arXiv:1602.06561, 2016.
[8] H. Jia, “Investigation into the effectiveness of long short term memory
Fig. 10: Plot for Real value vs Predicted value for CIPLA networks for stock price prediction,” arXiv preprint arXiv:1603.07893,
using CNN 2016.
[9] Y. Bengio, I. J. Goodfellow, and A. Courville, “Deep learning,” Nature,
vol. 521, pp. 436–444, 2015.
[10] H. White, Economic Prediction Using Neural Networks: The Case of
2014 to October-14-2014, even then the testing accuracy for IBM Daily Stock Returns, ser. Discussion paper- Department of Eco-
Infosys is lower when compared to the other companies.This nomics University of California San Diego. Department of Economics,
shows that whatever trend Infosys exhibits during the period of University of California, 1988.
[11] B. G. Malkiel, “Efficient market hypothesis,” The New Palgrave: Fi-
July to October 14 is not present in the test data as such(from nance. Norton, New York, pp. 127–134, 1989.
October-16-2014 to November-28-2014) ie; there is a change [12] X. Ding, Y. Zhang, T. Liu, and J. Duan, “Deep learning for event-driven
in the dynamics.This accounts for the difference in the error stock prediction.” in IJCAI, 2015, pp. 2327–2333.
[13] J. Roman and A. Jameel, “Backpropagation and recurrent neural net-
percentage for Infosys when compared to other models. Also works in financial analysis of multiple stock market returns,” in System
the model is capable of predicting stock price for companies Sciences, 1996., Proceedings of the Twenty-Ninth Hawaii International
other than Infosys. This shows that, the pattern or dynamics Conference on,, vol. 2. IEEE, 1996, pp. 454–460.
[14] M.-C. Chan, C.-C. Wong, and C.-C. Lam, “Financial time series fore-
identified by the model is common to other companies also. casting by neural network using conjugate gradient learning algorithm
and multiple linear regression weight initialization,” in Computing in
IV. CONCLUSION Economics and Finance, vol. 61, 2000.
[15] J. Roman and A. Jameel, “Backpropagation and recurrent neural net-
We propose a deep learning based formalization for stock works in financial analysis of multiple stock market returns,” in System
price prediction. It is seen that, deep neural network archi- Sciences, 1996., Proceedings of the Twenty-Ninth Hawaii International
tectures are capable of capturing hidden dynamics and are Conference on,, vol. 2. IEEE, 1996, pp. 454–460.
[16] E. W. Saad, D. V. Prokhorov, and D. C. Wunsch, “Comparative study
able to make predictions. We trained the model using the data of stock trend prediction using time delay, recurrent and probabilistic
of Infosys and was able to predict stock price of Infosys, neural networks,” IEEE Transactions on neural networks, vol. 9, no. 6,
TCS and Cipla. This shows that, the proposed system is pp. 1456–1470, 1998.
[17] O. Hegazy, O. S. Soliman, and M. A. Salam, “A machine learning model
capable of identifying some inter relation with in the data. for stock market prediction,” arXiv preprint arXiv:1402.7351, 2014.
Also, it is evident from the results that, CNN architecture is [18] K.-j. Kim and I. Han, “Genetic algorithms approach to feature discretiza-
capable of identifying the changes in trends. For the proposed tion in artificial neural networks for the prediction of stock price index,”
Expert systems with Applications, vol. 19, no. 2, pp. 125–132, 2000.
methodology CNN is identified as the best model. It uses the [19] Y. Kishikawa and S. Tokinaga, “Prediction of stock trends by using the
information given at a particular instant for prediction. Even wavelet transform and the multi-stage fuzzy inference system optimized
though the other two models are used in many other time by the ga,” IEICE Transactions on Fundamentals of Electronics, Com-
munications and Computer Sciences, vol. 83, no. 2, pp. 357–366, 2000.
dependent data analysis, it is not out performing the CNN [20] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
architecture in this case. This is due to the sudden changes computation, vol. 9, no. 8, pp. 1735–1780, 1997.
that occurs in stock markets. The changes occuring in the
stock market may not always be in a regular pattern or may
not always follow the same cycle. Based on the companies
and the sectors, the existence of the trends and the period of
their existence will differ. The analysis of these type of trends
and cycles will give more profit for the investors.To analyze
such information we must use networks like CNN as they rely
on the current information.
R EFERENCES
[1] A. V. Devadoss and T. A. A. Ligori, “Forecasting of stock prices using
multi layer perceptron,” Int J Comput Algorithm, vol. 2, pp. 440–449,
2013.
[2] J. G. De Gooijer and R. J. Hyndman, “25 years of time series forecast-
ing,” International journal of forecasting, vol. 22, no. 3, pp. 443–473,
2006.
[3] V. K. Menon, N. C. Vasireddy, S. A. Jami, V. T. N. Pedamallu,
V. Sureshkumar, and K. Soman, “Bulk price forecasting using spark
over nse data set,” in International Conference on Data Mining and Big
Data. Springer, 2016, pp. 137–146.
[4] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time series
analysis: forecasting and control. John Wiley & Sons, 2015.
1647