Masters Dissertation Sidra Mehtab
Masters Dissertation Sidra Mehtab
net/publication/342882552
Robust Stock Price Prediction Using Machine Learning and Deep Learning
Models
CITATIONS READS
25 954
2 authors:
All content following this page was uploaded by Sidra Mehtab on 12 July 2020.
By
Sidra Mehtab
(Reg. No: 182341810028)
1
Certificate of Approval
This is to certify that the dissertation (minor) titled “A Time Series Analysis-Based
Stock Price Prediction Using Machine Learning and Deep Learning Models” being
the requirements for the Master of Science in Data Science and Analytics course of
India, and carried out at the NSHM Knowledge Campus, Kolkata, India, embodies
the work done under my supervision during the period August 2019 – May 2020.
To the best of my knowledge and belief, the results presented in this work have not
Jaydip Sen
Kolkata, INDIA
2
Acknowledgement
my supervisor Prof. Jaydip Sen, Professor & Head, School of Computing and
Analytics, NSHM Knowledge Campus, Kolkata, India. Prof. Sen, despite his
approached him for help. I also acknowledge with thanks the kind cooperation that
I received from the support staff in the High-Performance Computing Lab in the
cooperation and support that I received from the staff in executing some of the
heavy-duty programs in my work have been invaluable. I would also like to thank
all my classmates and friends for their help and constructive criticisms that were
instrumental in improving the quality of the work in this dissertation. Last, but not
encouragement, support, and cooperation, this work could never have been
possible.
Sidra Mehtab
3
Abstract
Prediction of future movement of stock prices has always been a challenging task
for the researchers. While the advocates of the efficient market hypothesis (EMH)
believe that it is impossible to design any predictive framework that can accurately
predict the movement of stock prices, there are seminal work in the literature which
clearly demonstrated that the seemingly random movement patterns in the time
series of a stock price can be predicted with a high level of accuracy. The design of
such predictive models requires the choice of appropriate variables, right
transformation methods of the variables, and tuning of the parameters of the
models. In this dissertation, I present a very robust and accurate framework of stock
price prediction that consists of an agglomeration of statistical, machine learning,
and deep learning models. I have used daily stock price data, collected at five
minutes interval of time, of a very well-known company that is listed in the National
Stock Exchange (NSE) of India. The granular data is aggregated into three slots in
a day, and the aggregated data is used for training and building the forecasting
models. We contend that the agglomerative approach of model building that uses a
combination of statistical, machine learning, and deep learning approaches, can
very effectively learn from the volatile and random movement patterns in a stock
price data. This effective learning will lead to the building of very robust training
of the models that can be deployed for short-term forecasting of stock prices, and
prediction of stock movement patterns. We build eight classification and eight
regression models based on statistical and machine learning approaches. In addition
to these models, two deep learning-based regression models using a long-and-short-
term memory (LSTM) network and a convolutional neural network (CNN) have
also been built. Extensive results have been presented on the performance of these
models, and results are critically analyzed. We have also identified some interesting
future scope of work.
4
Table of Contents
Chapter No. Description Page No
List of Figures 7
List of Tables 12
1 Introduction 13
2 Related Work 15
3 Methodology 25
4 Machine Learning Models 31
Classification Models 31
1. Logistic Regression 31
2. K Nearest Neighbor 31
3. Decision Tree 32
4. Bagging 32
5. Boosting 32
6. Random Forest 33
7. Artificial Neural Network 33
8. Support Vector Machine 34
Regression Models 34
1. Multivariate Regression 34
2. Multivariate Adaptive Regression Spline 36
3. Decision Tree 37
4. Bagging 37
5. Boosting 37
6. Random Forest 37
7. Artificial Neural Network 38
8. Support Vector Machines 38
5 Deep Learning Models 39
Performance Metrics 44
1. Sensitivity 44
2. Specificity 44
3. Positive Predictive Value 44
4. Negative Predictive Value 45
5. Classification Accuracy 45
6. F1 Score 45
5
Results of Machine Learning Classification Models 45
1. Logistic Regression Results 45
2. K-Nearest Neighbor Classification Results 51
3. Decision Tree Classification Results 52
4. Bagging Classification Results 54
5. Boosting Classification Results 56
6. Random Forest Classification Results 58
7. Artificial Neural Network Classification Results 59
8. Support Vector Machine Classification Results 63
6
List of Figures
Fig No Description of Figure Page No
Logistic Regression -- actual vs predicted probabilities of open_perc 47
1(a)
(Case I)
1(b) Logistic Regression for classification – lift curve (Case I) 48
1(c) Logistic Regression for classification – ROC curve (Case II) 48
2(a) Logistic Regression – actual vs predicted probabilities of open_perc 49
(Case II)
2(b) Logistic Regression for classification – lift curve (Case II) 49
2(c) Logistic Regression for classification – ROC curve (Case II) 49
3(a) Logistic Regression – actual vs predicted probabilities of open_perc 50
(Case III)
3(b) Logistic Regression for classification – lift curve (Case III) 50
3(c) Logistic Regression for classification – ROC curve (Case III) 51
4(a) Decision Tree for classification (Case I) 53
4(b) Decision Tree for classification (Case II) 53
4(c) Decision Tree for classification (Case III) 54
5(a) Bagging for classification – actual vs predicted classes of open_perc 55
(Case I)
5(b) Bagging for classification – actual vs predicted classes of open_perc 55
(Case II)
5(c) Bagging for classification – actual vs predicted classes of open_perc 56
(Case III)
6(a) Boosting for classification – actual vs predicted classes of open_perc 56
(Case I)
6(b) Boosting for classification – actual vs predicted classes of open_perc 58
(Case II)
6(c) Boosting for classification – actual vs predicted classes of open_perc 58
(Case III)
7(a) ANN classification model (Case I) 61
7(b) ANN classification – actual vs predicted classes of open_perc (Case I) 61
8(a) ANN classification model (Case II) 62
8(b) ANN classification – actual vs predicted classes of open_perc (Case 62
II)
9(a) ANN classification model (Case III) 62
9(b) ANN classification – actual vs predicted classes of open_perc (Case 63
III)
10(a) Multivariate Regression- time-varying actual and predicted values of 66
open_perc (Case1)
7
List of Figures (contd..)
Fig No Description of Figure Page No
10(b) Multivariate Regression - relationship between actual and predicted 67
open_perc (Case I)
11(a) Multivariate Regression- time-varying actual and predicted values of 67
open_perc (Case II)
11(b) Multivariate Regression - relationship between actual and predicted 68
open_perc (Case II)
11(c) Multivariate Regression- time-varying residuals (Case II) 68
12(a) Multivariate Regression- time-varying actual and predicted 68
open_perc (Case III)
12(b) Multivariate Regression - relationship between actual and predicted 69
open_perc (Case III)
12(c) Multivariate Regression- time-varying residuals (Case III) 69
13(a) MARS- time-varying actual and predicted values of open_perc (Case 70
I)
13(b) MARS – relationship between actual and predicted values of 71
open_perc (Case I)
13(c) MARS - time-varying residuals (Case I) 71
14(a) MARS- time-varying actual and predicted values of open_perc (Case 72
II)
14(b) MARS – relationship between actual and predicted values of 72
open_perc (Case II)
14(c) MARS - time-varying residuals (Case II) 73
15(a) MARS- time-varying actual and predicted values of open_perc 73
values (Case III)
15(b) MARS – relationship between actual and predicted values of 73
open_perc (Case III)
15(c) MARS - time-varying residuals (Case III) 74
16(a) Decision Tree regression model (Case I) 75
16(b) Decision Tree regression - time-varying actual and predicted 75
open_perc (Case I)
16(c) Decision Tree regression - relationship between actual and predicted 76
open_perc (Case I)
16(d) Decision Tree regression – time-varying residuals (Case I) 76
17(a) Decision Tree regression model (Case II) 77
17(b) Decision Tree regression - time-varying actual and predicted 77
open_perc (Case II)
17(c) Decision Tree regression - relationship between actual and predicted 77
open_perc (Case II)
8
List of Figures (contd..)
Fig No Description of Figure Page No
17(d) Decision Tree regression – time-varying residuals (Case II) 78
18(a) Decision Tree regression model (Case III) 78
18(b) Decision Tree regression - time-varying actual and predicted 79
open_perc (Case III)
18(c) Decision Tree - relationship between actual and predicted open_perc 79
(Case III)
18(d) Decision Tree regression – time-varying residuals (Case III) 79
19(a) Bagging regression - time-varying actual and predicted values of 80
open_perc (Case I)
19(b) Bagging regression - relationship between actual and predicted 80
open_perc (Case I)
19(c) Bagging regression – time-varying residuals (Case I) 81
20(a) Bagging regression - time-varying actual and predicted values of 81
open_perc (Case II)
20(b) Bagging regression - relationship between actual and predicted 82
open_perc (Case II)
20(c) Bagging regression – time-varying residuals (Case II) 82
21(a) Bagging regression - time-varying actual and predicted values of 82
open_perc (Case III)
21(b) Bagging regression - relationship between actual and predicted 83
open_perc (Case III)
21(c) Bagging regression – time-varying residuals (Case III) 83
22(a) Boosting regression - time-varying actual and predicted values of 84
open_perc (Case I)
22(b) Boosting regression - relationship between actual and predicted 84
values of open_perc (Case I)
22(c) Boosting regression – time-varying residuals (Case I) 84
23(a) Boosting regression - time-varying actual and predicted values of 85
open_perc (Case II)
23(b) Boosting regression - relationship between actual and predicted 85
values of open_perc (Case II)
23(c) Boosting regression – time-varying residuals (Case II) 86
24(a) Boosting regression - time-varying actual and predicted values of 86
open_perc (Case III)
24(b) Boosting regression - relationship between actual and predicted 86
values of open_perc (Case III)
24(c) Boosting regression – time-varying residuals (Case III) 87
9
List of Figures (contd..)
Fig No Description of Figure Page No
25(a) Random Forest regression - time-varying actual and predicted values 88
of open_perc (Case I)
25(b) Random Forest - relationship between actual and predicted values of 88
open_perc (Case I)
25(c) Random Forest regression – time-varying residuals (Case I) 89
26(a) Random Forest regression - time-varying actual and predicted values 89
of open_perc (Case II)
26(b) Random Forest - relationship between actual and predicted values of 89
open_perc (Case II)
26(c) Random Forest regression – time-varying residuals (Case II) 90
27(a) Random Forest regression - time-varying actual and predicted values 90
of open_perc (Case III)
27(b) Random Forest - relationship between actual and predicted values of 91
open_perc (Case III)
27(c) Random Forest regression – time-varying residuals (Case III) 91
28(a) ANN regression model (Case I) 92
28(b) ANN regression - time-varying actual and predicted values of 93
open_perc (Case I)
28(c) ANN regression - relationship between actual and predicted values of 93
open_perc (Case I)
28(d) ANN regression – time-varying residuals (Case I) 94
29(a) ANN regression model (Case II) 94
29(b) ANN regression - time-varying actual and predicted values of 95
open_perc (Case II)
29(c) ANN regression - relationship between actual and predicted values of 95
open_perc (Case II)
29(d) ANN regression – time-varying residuals (Case II) 96
30(a) ANN regression model (Case III) 96
30(b) ANN regression - time-varying actual and predicted values of 96
open_perc (Case III)
30(c) ANN regression - relationship between actual and predicted values of 97
open_perc (Case III)
30(d) ANN regression – time-varying residuals (Case III) 97
31(a) SVM regression - time-varying actual and predicted values of 98
open_perc (Case I)
31(b) SVM regression - relationship between actual and predicted values of 99
open_perc (Case I)
31(c) SVM regression – time-varying residuals (Case I) 99
10
List of Figures (contd..)
Fig No Description of Figure Page No
32(a) SVM regression - time-varying actual predicted open_perc (Case II) 100
32(b) SVM regression - relationship between actual and predicted values of 100
open_perc (Case II)
32(c) SVM regression – time-varying residuals (Case II) 101
33(a) SVM regression - time-varying actual and predicted values of 101
open_perc (Case III)
33(b) SVM regression - relationship between actual and predicted values of 101
open_perc (Case III)
33(c) SVM regression – time-varying residuals (Case III) 102
34(a) LSTM regression – stock data representation (Case I) 103
34(b) LSTM model architecture (Case I, Case II and Case III) 103
34(c) LSTM regression – training and validation error (Case I) 104
35(a) LSTM regression – stock data representation (Case II) 105
35(b) LSTM regression – training and validation error (Case II) 105
36(a) LSTM regression – stock data representation (Case III) 106
36(b) LSTM regression – training and testing error (Case III) 106
37 CNN regression – stock data representation 107
38 CNN model architecture – Univariate multistep with one week’s data 108
as input (N = 5)
39 CNN model architecture – Univariate multistep with two week’s data 110
as input (N = 10)
40 CNN model architecture – Multivariate multistep with two week’s 112
data as input (N = 10)
41 CNN model architecture – Multivariate sub-models with two week’s 114
data as input (N = 10)
11
List of Tables
Table No Description of Table Page No
1 Logistic regression classification results 47
2 KNN classification results 51
3 Decision Tree classification results 52
4 Bagging classification results 54
5 Boosting classification results 56
6 Random Forest classification results 58
7 ANN classification results 59
8 SVM classification results 63
9 Multivariate Regression results 66
10 MARS regression results 70
11 Decision Tree regression results 74
12 Bagging regression results 80
13 Boosting regression results 83
14 Random Forest regression results 88
15 ANN regression results 92
16 SVM regression results 98
17 CNN regression results (Case I: Univariate multi-step N=5) 109
18 CNN regression results (Case II: Univariate multi-step N=10) 111
19 CNN regression results (Case III: Multivariate multi-step N=10) 113
20 CNN regression results (Case IV: Multiheaded CNN N=10) 114
21 Summary of the performance of the classification models in Case I 115
22 Summary of the performance of the classification models in Case II 116
23 Summary of the performance of the classification models in Case III 116
24 Summary of the performance of the regression models in Case I 116
25 Summary of the performance of the regression models in Case II 117
26 Summary of the performance of the regression models in Case III 117
12
Chapter 1
Introduction
Prediction of future movement patterns of stock prices has been a widely researched
area in the literature. While there are proponents of the efficient market hypothesis
who believe that it is impossible to predict stock prices, there are also propositions
that demonstrated that if correctly formulated and modeled, prediction of stock
prices can be done with a fairly high level of accuracy. The latter school of thought
focused on the construction of robust statistical, econometric, and machine learning
models based on the careful choice of variables and appropriate functional forms
or models of forecasting. There are propositions in the literature that are based on
time series analysis and decomposition for forecasting future values of stocks. In
this regard, several propositions have been presented in the literature for stock price
forecasting following a time series decomposition approach. (Sen & Datta
Chaudhuri, 2018a; Sen, 2018b; Sen, 2018c; Sen, 2018d; Sen & Data Chaudhuri,
2017a; Sen & Datta Chaudhuri, 2017b; Sen, 2017c; Sen, 2017d; Sen & Datta
Chaudhuri, 2017e; Sen & Datta Chaudhuri, 2016a; Sen & Datta Chaudhuri, 2016b;
Sen & Datta Chaudhuri, 2016c; Sen & Datta Chaudhuri, 2016d; Sen & Datta
Chaudhuri, 2015). There is also an extent of literature that deals with various
technical analysis of stock price movements. Propositions also exist for mining
stock price patterns using various important indicators like Bollinger Bands,
moving average convergence divergence (MACD), relative strength index (RSI),
moving average (MA), stochastic momentum index (SMI), etc. There are also well-
known patterns like head and shoulders pattern, inverse head and shoulders pattern,
triangle, flag, Fibonacci fan, Andrew's Pitchfork, etc., which are exploited by
traders for investing intelligently in the stock market. These approaches provide the
user with visual manifestations of the indicators which help the ordinary investors
to understand which way stock prices are more likely to move in the near future.
In this thesis, we propose a granular approach to forecasting of stock price and the
13
price movement pattern by combining several statistical, machine learning, and
deep learning methods of prediction on technical analysis of stock prices. We
present several approaches for short-term stock price movement forecasting using
various classification and regression techniques and compare their performance in
prediction of stock price movement and stock price values. We believe this
approach will provide several useful information to the investors in the stock market
who are particularly interested in short-term investments for profit. This work is a
modified and extended version of our previous work (Mehtab & Sen, 2019). In the
present work, we have presented a predictive framework that aggregates eight
classification and eight regression models including a long-and short-term memory
(LSTM)-based advanced deep learning model, and four variants of convolutional
neural network (CNN)-based forecasting models.
The objective of our work is to take stock price data at five minutes interval from
the National Stock Exchange (NSE) of India and develop a robust forecasting
framework for the stock price movement. We contend that such a granular approach
can model the inherent dynamics and can be fine-tuned for immediate forecasting
of stock price or stock price movement. Here, we are not addressing the problem of
forecasting of long-term movement of the stock price. Rather, our framework will
be more relevant to a trade-oriented framework.
Related Work
The literature attempting to prove or disprove the efficient market hypothesis can
be classified into three strands, according to the choice of variables and techniques
of estimation and forecasting. The first strand consists of studies using simple
regression techniques on cross-sectional data (Basu, 1983; Jaffe et al., 1989;
Rosenberg et al., 1985; Fama & French, 1995; Chui & Wei, 1998). The second
strand of the literature has used time series models and techniques to forecast stock
returns following economic tools like autoregressive integrated moving average
(ARIMA), Granger causality test, autoregressive distributed lag (ARDL) and
quantile regression (QR) to forecast stock prices (Jarrett & Kyper, 2011; Adebiyi
et al., 2014; Mondal et al., 2014; Mishra, 2016). The third strand includes work
using machine learning tools for the prediction of stock returns (Mostafa, 2010;
Dutta et al., 2006; Wu et al., 2008; Siddiqui & Abdullah, 2015; Jaruszewicz &
Mandziuk, 2004).
Among the some of the recent propositions in the literature on stock price
prediction, Mehtab and Sen have demonstrated how machine learning and long-
and short-term memory (LSTM)-based deep learning networks can be used for
accurately forecasting NIFTY 50 stock price movements in the National Stock
Exchange (NSE) of India (Mehtab & Sen, 2019). The authors used the daily stock
prices for three years during the period of January 2015 till December 2017 for
building the predictive models. The forecast accuracies of the models were then
evaluated based on their ability to predict the movement patterns of the close value
of the NIFTY index on a time horizon of one week. For the purpose of testing, the
authors used NIFTY 50 index values for the period of January 2018 till June 2019.
To further improve the predictive power of the models, the authors incorporated a
15
sentiment analysis module for analyzing the public sentiments on Twitter on
NIFTY 50 stocks. The output of the sentiment analysis module is fed into the
predictive model in addition to the past NIFTY 50 index values for the building a
very robust and accurate forecasting model. The sentiment analysis module uses a
self-organizing fuzzy neural network (SOFNN) for handling non-linearity in a
multivariate predictive environment.
Mehtab and Sen recently proposed another approach to stock price and movement
prediction using convolutional neural networks (CNN) on a multivariate time series
(Mehtab & Sen, 2020). The predictive model proposed by the authors exploits the
learning ability of a CNN with a walk-forward validation ability so as to realize a
high level of accuracy in forecasting the future NIFTY index values, and their
movement patterns. Three different architectures of CNN are proposed by the
authors that differ in the number of variables used in forecasting, the number of
sub-models used in the overall system, and the size of the input data for training the
models. The experimental results clearly indicated that the CNN-based multivariate
forecasting model was highly accurate in predicting the movement of NIFTY index
values with a weekly forecast horizon.
The design of efficient predictive models and algorithms for accurately forecasting
the movement patterns of stock prices and stock returns has attracted considerable
attention and effort from the research community over a significantly long period.
Many of such propositions involve the application of various types of neural
networks. The neural networks have the ability of modeling nonlinearity in data
and this property is proven to be extremely effective in mining the complex patterns
in stock price movements. Moreover, the ability of modeling nonlinearity can be
controlled adaptively by choosing a suitable number of hidden layers and the
number of nodes in such hidden layers (Hornik et al., 1989).
Mostafa showed how accurately neural network-based models could predict stock
market movements in Kuwait (Mostafa, 2010).
16
Kimoto et al. illustrated how neural network-based predictive models could be
applied to historical accounting data (Kimoto et al., 1990). In the model
construction process, the authors utilized various macroeconomic variables, and
then applied the model for forecasting the patterns of variations in stock return
movements.
Chen et al. have proposed an approach for constructing a model for predicting the
direction of return on the Taiwan Stock Exchange Index (Chen et al., 2003). The
authors contended that the stock trading guided by robust forecasting models were
more effective and usually led to a higher return on investment. For the purpose of
constructing a robust forecasting model, the authors built and trained a probabilistic
neural network (PNN) using historical stock market data. The forecasted output of
17
the model was applied to form various index trading strategies, and the
effectiveness of those strategies was compared with those generated by the buy and
hold strategy, the investment strategies formed using the output of a random walk
model, and the parametric generalized method of moments (GMM) with a Kalman
filter. The results showed that the investment strategies made using the output of
the PPN yielded the highest return of investment in the long-run.
de Faria et al. illustrated a predictive model using a neural network and an adaptive
exponential smoothing (AES) method for forecasting the movements of the
principal index of the Brazilian stock market (de Faria et al., 2009). The authors
compared the forecasting performance of both the neural network and the
exponential smoothing models with a particular focus on the sign of the market
returns. While the simulation results showed that both methods were equally
efficient in predicting the index returns, the neural network model was found to be
more accurate in predicting the market movement than the adaptive exponential
smoothing method.
Leigh et al. proposed the use of linear regression and simple neural network models
for forecasting the stock market indices in the New York Stock Exchange during
the period 1981-1999 (Leigh et al., 2005). The proposed scheme by the authors
used a template matching mechanism based on statistical pattern recognition that
efficiently and accurately identified spikes in the trading volumes. A threshold limit
for the spike in volume was identified, and the days on which the traded volume
exhibited significant spikes were identified. A linear regression model was applied
to forecast the future change in price based on the historical price, traded volume,
and the prime interest rate.
Shen et al. proposed a novel scheme that was based on a tapped delay neural
network (TDNN) with an ability of adaptive learning and pruning for forecasting
on a non-linear time series of stock price values (Shen et al., 2007). The TDDN
model was trained by a recursive least square (RLS) technique that involved a
tunable learning-rate parameter that enables faster network convergence. The
18
trained neural network model was optimized using a pruning algorithm that
reduced the possibility of overfitting of the model. The experimental results in a
simulated environment clearly showed that the pruned model had a reduced
complexity, faster execution, and improved prediction accuracy.
Ning et al. proposed a scheme of stock index prediction that was based on a chaotic
neural network (Ning et al., 2009). Data from a Chinese stock market and a
Shenzhen stock market were used for building the model. The non-linear,
stochastic, and chaotic patterns in the stock market indices were learned by the
chaotic neural network, and the learnings of the chaotic neural network were
gainfully applied in forecasting future index values of the stock markets.
Hanias et al. conducted a study to predict the daily stock exchange price index of
the Athens Stock Exchange (ASE) using a neural network with backpropagation
(Hanias et al., 2012). The neural network was used to make multistep forecasting
for nine days and yielded a very low mean square error (MSE) value of 0.0024.
Liao et al. carried out a study on the stock market investment issues on the Taiwan
stock market (Liao et al., 2008). The scheme involved two phases. In the first phase,
the apriori algorithm was used to identify the association rules and knowledge
patterns about stock category association and possible stock category investment
collections. After the association rules were successfully mined, in the second
phase, the k-means clustering algorithm was used to identify the various clusters of
stocks based on their association patterns. The authors also proposed several
possible stock market portfolio alternatives under various clusters of stocks.
19
Zhu et al. hypothesized that there is a significant bidirectional nonlinear causality
between stock returns and trading volumes (Zhu et al., 2008). The authors proposed
the use of a neural network-based scheme for forecasting stock index movements.
The model was further enriched by the inclusion of different combinations of
indices and component stocks’ trading volumes as inputs. NASDAQ, DJIA, and
STI data of stock prices and volume of transactions were used in training the neural
network. The experimental results demonstrated that the augmented neural
networks with trading volumes lead to improvements in forecasting performance
under different terms of the forecasting horizon.
Bentes et al. presented a study on the long memory and volatility clustering for the
S&P 500, NASDAQ 100, and Stoxx 50 indexes in order to compare the US and
European markets (Bentes et al., 2008). The authors compared the performance of
two different approaches. The first approach was based on the traditional
approaches using generalized autoregressive conditional heteroscedasticity
GARCH (1, 1), IGARCH (1, 1), and FIGARCH (1, d, 1), while the second approach
exploited the concept of entropy in the Econophysics. In the second approach, three
different measures were considered by the authors in the study. The three measures
were Shannon, Renyi, and Tsallis measures. The results obtained using both the
approaches elicited the existence of nonlinearity and volatility of SP 500,
NASDAQ 100, and Stoxx 50 indexes.
Chen et al. demonstrated how the random and chaotic behavior of stock price
movements can be very effectively modeled using a local linear wavelet neural
network (LLWNN) technique (Chen et al., 2005). The proposed wavelet-based
model was further optimized using a novel algorithm, which the authors referred to
as estimation of distribution algorithm (EDA). The purpose of the model was to
accurately predict the share price for the following trade day given the opening,
closing, and maximum values of the stock price for a particular day. The study
revealed an interesting observation - even for a time series that exhibited an
extremely high level of random fluctuations in its values, the model could extract
some very important features from the opening, closing and the maximum values
20
of the stock index that enabled an accurate prediction of its future behavior.
Dutta et al. illustrated how ANN models could be applied in forecasting Bombay
Stock Exchange’s SENSEX weekly closing values for the period of January 2002
to December 2003 (Dutta et al., 2006). The proposed approach by the author
involved building two neural networks each consisting of three hidden layers, in
addition to the input and the output layers. The input values to the first neural
network were: (i) the weekly closing values, (ii) the 52-week moving average of
the weekly closing SENSEX values, (iii) the 5-week moving average of the closing
values, and (iv) the 10-week oscillator values for the past 200 weeks. On the other
hand, the second network was provided with the following input values: (i) weekly
closing value of SENSEX, (ii) the moving average of the weekly closing values
computed on the 52-week historical data, (iii) the moving average of the closing
values computed on the 5-week historical data, and (iv) the volatility of the
SENSEX records computed on 5-week basis over the past 200 weeks. The
forecasting performance of the two neural networks was compared using their root
mean square error (RMSE) and mean absolute error (MSE) values on the test data.
21
For the purpose of testing the networks, the weekly closing SENSEX values for the
period of January 2002 to December 2003 were used.
Hammad et al. demonstrated that an artificial neural network (ANN) model can be
trained to converge to an optimal solution while it maintains a very high level of
precision in the forecasting of stock prices (Hammad et al, 2009). The proposed
scheme was based on a multi-layer feedforward neural network model that used the
back-propagation algorithm. The model was used for forecasting the Jordanian
stock prices. The authors demonstrated simulations using MATLAB that were
carried on seven Jordanian companies from the service and manufacturing sectors.
The accuracy of the model in forecasting stock price movement was found to be
very high.
Tsai and Wang found conducted a study to illustrate how Bayesian Network-based
approaches could produce better forecasting results than traditional regression and
neural network-based approaches (Tsai & Wang, 2009). The authors proposed a
hybrid predictive model for stock price forecasting that combined a neural network-
based model with a decision-tree. The experimental results demonstrated that the
hybrid model had higher predictive power than the single ANN and the single
decision tree-based approach.
Tseng et al. utilized various approaches including the traditional time series
decomposition (TSD) model, HoltWinters (H/W) exponential smoothing with trend
and seasonality models, Box-Jenkins (B/J) models using autocorrelation and partial
autocorrelation, and neural network-based models (Tseng et al, 2012). The authors
trained the models on the stock price data of 50 randomly chosen stocks during the
period: September 1, 1998 - December 31, 2010. For the purpose of training the
models, 3105 observations based on closed prices of the stocks were used. The
testing of the model was carried out on data spanning over 60 trading days. The
study showed that the forecasting accuracies were higher for B/J, H/W, and
normalized neural network models. The errors associated with the time series
decomposition-based model and the non-normalized neural network models were
22
found to be higher.
Senol and Ozturan illustrated that ANN can be used to predict stock prices and their
direction of changes (Senol & Ozturan, 2008). The result was promising with a
forecast accuracy of 81% on the average.
In the literature, a substantial number of contributions exist that are based on the
application of time series and fuzzy time series approaches for forecasting stock
price movements. Thenmozhi investigated the applicability of chaos theory in
modeling the nonlinear behavior of the Bombay Stock Exchange (BSE) time series
(Thenmozhi, 2006). The author used the return values of the BSE SENSEX time-
series data during the period August 1980 to September 1997, and showed that the
time series of the daily and the weekly return values exhibited nonlinearity and
weakly chaotic properties.
Fu et al. presented an approach that represented the data points in a financial time
series according to their importance (Fu et al., 2007). Using the ranked data points
based on their importance, a tree was constructed that enabled incremental updating
of data in the time series. The scheme facilitated representation of a large-sized time
series in different levels of details, and also enabled multi-resolution dimensionality
reduction. The authors have presented several evaluation methods of data point
importance, a novel method of updating a time series, and two-dimensionality
reduction approaches. Extensive experimental results are also presented
demonstrating the effectiveness of all propositions.
Phua et al. presented a predictive model using neural networks with genetic
algorithms for forecasting stock price movements in the Singapore Stock Exchange
(Phua et al., 2001). The forecasting accuracy of the predictive model was found to
be 81% on the test dataset indicating that the model was moderately effective in its
forecasting job.
Moshiri and Cameron described a back propagation-based neural network and a set
23
of econometric models to forecast inflation levels (Moshiri, & Cameron, 2010). The
set of econometric models proposed by the authors included the following: (i) Box-
Jenkins autoregressive integrated moving average (ARIMA) model, (ii) vector
autoregression (VAR) model, and (iii) Bayesian vector autoregression (BVAR)
model. The forecasting accuracies of the three models were compared with the
hybrid back propagation network (BPN) model proposed by the authors. For the
purpose of testing the models, three different values of the forecasting horizon were
used: one month, two months, and twelve months. With the root mean square error
(RMSE) and the mean absolute error (MAE) as the two metrics, the authors
observed that the performance of the hybrid BPN was superior to the other
econometric models.
The major drawback of the existing propositions in literature for stock price
prediction is their inability to predict stock price movement in a short-term interval.
The current work attempts to address this shortcoming by exploiting the learning
ability of a gamut of machine learning and two deep neural networks in stock price
movement modeling and prediction.
24
Chapter 3
Methodology
month: This is a numeric variable that refers to the month for a given stock price
record. The twelve months are assigned numeric codes of 1 through 12, with the
month of January being coded as 1, and the month of December assigned with a
code of 12.
day_month: This numeric variable denotes the particular day of a given month to
which a stock price record corresponds. The value of this variable lies in the interval
[1, 31]. For instance, if the date for a stock price record is 22nd May 2013 then the
day_month variable for that record will be assigned a value of 22.
day_week: This is a numeric variable that corresponds to the day of the week for a
given stock price record. The five days in a week on which the stock market remain
open are assigned numeric codes of 1 through 5, with Monday being coded as 1,
while the Friday is assigned a code of 5.
time: This numeric variable refers to the time slot to which a stock price record
belongs. There are three-time slots in a day - morning, afternoon, and evening. The
slots are assigned codes 1, 2, and 3 respectively. For example, if a stock price record
refers to the time point 3:45 PM, the variable time will be assigned a value of 3 for
the stock price record.
26
open_perc: it is a numeric variable that is computed as a percentage change in the
value of the open price of the stock over two successive time slots. The computation
of the variable is done as follows. Suppose, we have two successive slots: S1 and
S2. Both of them consist of several records at five minutes interval of time. Let the
open price of the stock for the first record of S1 is X1 and that for S2 is X2. The
open_perc for the slot S2 is computed as (X2 - X1)/X2 in terms of percentage.
low_perc: it is a numeric value that is computed as the difference between the low
values of two successive slots. For two successive slots S1 and S2, first we compute
the mean of all low values of the records in both the slots. If L1 and L2 refer to the
mean of the low values for S1 and S2 respectively, then low_perc for S2 is computed
as (L2 - L1)/L2 in terms of percentage.
After we compute the values of the above eleven variables for each slot for both the
stocks for the time frame of two years (i.e., 2013 and 2014), we develop the
forecasting framework. As mentioned earlier, we followed two broad approaches
in the forecasting of the stock movements - regression and classification.
In the regression approach, based on the historical movement of the stock prices we
predict the stock price in the next slot. We use open_perc as the response variable,
which is a continuous numeric variable. The objective of the regression technique
is to predict the open_perc value of the next slot given the stock movement pattern
and the values of the predictors till the previous slot. In other words, if the current
time slot is S1, the regression techniques will attempt to predict open_perc for the
next slot S2. If the predicted open_perc is positive, then it will indicate that there is
an expected rise in the stock price in S2, while a negative open_perc will indicate a
fall in the stock price in the next slot. Based on the predicted values, a potential
investor can make his/her investment strategy in stocks.
Case I: We used the data for the year 2013 which consisted of 19,385 records at
five minutes interval. These records were aggregated into 745 time slot records for
building the predictive model. We used the same dataset for testing the forecast
accuracy of the models for the stock of Godrej Consumer Products Ltd. and carried
out a comparative analysis of all the models.
Case II: We used the data for the year 2014 which consisted of 18,972 records at
five minutes interval. These granular data were aggregated into 725 time slot record
for building the predictive model. We used the same dataset for testing the forecast
accuracy of the models and carried out an analysis on the performance of the
predictive models.
Case III: We used that data for 2013 as the training dataset for building the models
and test the models using the data for the year 2014 as the test dataset. We, again,
carried out an analysis of the performance of different models in this approach.
We have built eight classification models and ten regression models for developing
our forecasting framework. The classification models are: (i) logistic regression,
(ii) k-nearest neighbor (iii) decision tree, (iv) bagging, (v) boosting, (vi) random
forest, (vii) artificial neural network, and (viii) support vector machines. For
measuring accuracy and effectiveness in these approaches, we use several metrics
such as: sensitivity, specificity, positive predictive value, negative predictive value,
classification accuracy, and F1 score. Sensitivity and positive predictive value are
also known as recall and precision respectively.
29
The ten regression methods that we built are: (i) multivariate regression, (ii)
multivariate adaptive regression spline, (iii) decision tree, (iv) bagging, (v)
boosting, (vi) random forest, (vii) artificial neural network, (viii) support vector
machine, (ix) long- and short-term memory network, (x) convolutional neural
network.
While all the classification techniques are machine learning-based approaches, two
regression techniques, i.e., long- and short-term memory (LSTM) network, and
convolutional neural network (CNN) – based approaches are deep learning
methods. For comparing the performance of the regression methods, we use several
metrics such as root mean square error (RMSE), correlation coefficient between
the actual and predicted values of the response variable, e.g., open_perc, and the
number of cases in which the predicted and the actual values of open_perc differed
in their signs.
30
Chapter 4
The eight classification models that we built are discussed in detail in this Chapter.
Decision Tree: The classification and regression tree (CART) algorithm produces
decision trees that are strictly binary so that there are exactly two branches for each
node. The algorithm recursively partitions the records in the training data set into
subsets of records with similar values for the target attributes. The trees are
constructed by carrying out an exhaustive search on each node for all available
variables and all possible splitting values and selects the optimal split based on
some goodness of split criteria. We used the tree function defined in the tree library
of R for classification of the stock records.
34
Multivariate Regression: In this regression approach, we used open_perc as the
response variable and the remaining ten variables as the predictors to build
predictive models for three cases mentioned earlier in Chapter 3. In all these cases,
we use the programming language R for data management, model construction,
testing of models, and visualization of results.
Case I: We use 2013 data as the training data set for building the model, and then
test the model using the same data set. For both the stocks, we used two approaches
of multivariate regression - (i) backward deletion and (ii) forward addition of
variables. Both approaches yielded the same results for the stock price data.
For the year 2013, we applied the vif function in the faraway library to detect the
collinear variables in order to get rid of the multicollinearity problem. The variance
inflation factor (VIF) values of the variables were found to be as follows: month =
1.003, day_month = 1.008, day_week = 1.002, time = 1.095, high_perc = 4372. 547,
low_perc = 4369.694, close_perc = 165.436, vol_perc = 1.072, nifty_perc = 1.046,
range_diff = 156.198. Hence, it was clear that high_perc, low_perc, close_perc,
and range_diff exhibited multicollinearity. We retained low_perc and range_diff
for the model construction and removed the other two variables since their VIF
values were smaller than the other two. Using the drop1 function in case of the
backward deletion technique, and the add1 function in case of the forward addition
technique, we identified the variables that were not significant in the model and did
not contribute to the information content of the model. For identifying the variables
that contributed least to the information contained in the model at each iteration,
we used the Akaike Information Criteria (AIC) - the variable that had the least AIC
value and non-significant p-value at each iteration, was removed from the model,
in case of the backward deletion process. On the other hand, the variable that had
the lowest AIC and a significant p-value was added to the model at each iteration
for the forward addition technique. It was found that low_perc and range_diff were
the two predictors that finally remained in the regression model.
Case II: For the year 2014, the VIF values for the predictors were found to be as
35
follows: month = 1.007, day_month = 1.004, day_week = 1.007, time = 1.057,
high_perc = 1161.446, low_perc = 1331.035, close_perc = 115.161, vol_perc =
1.022, range_diff = 92.092, nifty_perc = 1.073. The variables high_perc, low_perc,
close_perc, and range_diff exhibited multicollinearity. As in Case I, we retained
low_perc and range_diff as their VIF values were smaller compared with the other
two. Use of backward deletion and forward addition methods both yielded the same
regression models as in Case I with low_perc and range_diff as the predictors and
open_perc as the response variable.
Case III: In this case, the model is identical to that in Case I. However, the model
is tested on data for the year 2014. There, the performance results of the model are
expected to be different. The performance results and their critical analysis is
presented in Chapter 6.
Decision Tree: For building a regression model, we have used the same tree
function in the tree library in R as we did in building the classification decision
tree-based classification model. However, in this case, the response variable was
kept as numeric and not converted to a factor variable unlike in the classification
techniques. The predict function is used to predict the values of the response
variable. The functions cor and rmse defined in the library Metrics are used to
compute the correlation coefficient and the RMSE value for determining the
prediction accuracy of the models.
Bagging: For carrying out regression on stock price data, we use bagging function
defined in the ipred library of R. The value of the parameter nbag - that specifies
the number of samples - is taken as 100. We use the predict function in the ipred
library to predict the response variable values and rmse function in the Metric
library to compute the RMSE values of the predicted values. The cor function in R
is used to compute the correlation between the original and the predicted values of
the response variable.
Boosting: We use the blackboost function defined in the mboost library in R for
building regression models on the stock price data unlike the boosting function of
the adabag library in R for classification of stock price data. As in other cases of
regression, the predict and rmse functions are used to compute the predicted values
and the RMSE values in the regression model.
37
Artificial Neural Network: As in the case of classification, we use the neuralnet
function defined in the neuralnet library in R for regression on the stock price data.
The predictors are normalized using min-max normalization before building the
model. The compute function defined in the neuralnet library is used for computing
the predicted values, while the parameter hidden is used to change the number of
nodes in the hidden layer. The value of the parameter stepmax is set to 106 so as to
exploit the maximum number of iterations executed by the neuralnet function. The
parameter linear.output is by default set to TRUE, and hence it is not altered. For
the Godrej dataset, we needed only one node in the hidden layer for all the three
cases for building ANN regression models.
Support Vector Machine: For building the regression model using SVM, we use
the svm function defined in the e1071 library in R. The predict function is used for
predicting the response variable values using the regression model, and the rmse
function is used to compute the RMSE values for the predicted quantities.
38
Chapter 5
In this Chapter, we discuss two deep learning-based regression methods: (i) the
long- and short-term memory (LSTM) network, and (ii) the convolutional neural
networks (CNNs).
39
to predict the stock prices of Godrej Consumer Products a multivariate time series.
For this purpose, we use the open price of the stocks as the response variable and
the predictors chosen are – high, low, close, volume, and the NIFTY index values.
However, unlike for the machine learning techniques, we don't compute the
differences between successive slots. Rather, we forecast the open value of the next
slot based on the predictor values in the previous slots. We used the mean absolute
error (MAE) as the loss function and the adaptive moment estimation (ADAM) as
the optimizer for evaluating the model performance in all the three cases. ADAM
computes adaptive learning rates for each parameter in the gradient descent
algorithm. In addition to storing an exponentially decaying average of the past
squared gradients, ADAM also keeps track of the exponentially decaying average
of the past gradients, which serves as the momentum in the learning process. Instead
of behaving like a ball running down a steep slope like momentum, ADAM
manifests itself like a heavy ball with a rough outer surface. This high level of
friction results in ADAM’s preference for a flat minimum in the error surface. Due
to its ability to integrate an adaptive learning with a momentum, ADAM is found
to perform very efficiently in optimizing the performance of large-scale networks.
This was the reason for our choice of ADAM as the optimizer in our LSTM
modelling. However, we trained the deep learning networks using different epoch
values, different batch sizes for the three different cases and determined the
optimum performance of the network under those parameter values. The sequential
constructor in the Tensorflow framework has been used to build the LSTM model.
The performance results of the LSTM models are presented in Chapter 6.
Convolutional Neural Networks: CNNs emerged from the study of the brain’s
visual cortex, and they have been used in image recognition since the 1980s. In the
last few years, thanks to the increase in computational power, the amount of
available training data, and the tricks for training deep neural networks. CNNs have
managed to achieve superhuman performance on some complex visual tasks. The
power image search services, self-driving cars, automatic video classification
systems, and more. Moreover, CNNs are not restricted to visual perception: they
are also successful at many other tasks, such as voice recognition, natural language
40
processing, and complex time series analysis of financial data.
In the present work, we have exploited the power of CNN in forecasting the
univariate and multivariate time series data of Godrej Consumer Products stock.
CNNs have two major types of processing layers – convolutional layers and pooling
or subsampling layers. The convolutional layers reads an input such as a 2-
dimensional image or a one-dimensional signal using a kernel (also referred to as
the filter) by reading the data in small segments at a time, and scan across the input
41
data field. Each read result is an interpretation of the input that is projected onto a
filter map and represents an interpretation of the input. The pooling or the
subsampling layers take the feature map projections and distill them to the most
essential elements, such as using a signal averaging (average pool) or signal
maximizing process (max pool). The convolution and pooling layers are repeated
at depth, providing multiple layers of abstraction of the input signals. The output of
the final pooling layer is fed into one or more fully-connected layers that interpret
what has been read and maps this internal representation to a class value.
We use the power of CNN in multi-step time series forecasting in the following
way. The convolutional layers are used to read sequences of the input data, and
automatically extract features. The pooling layers are used for distilling the
extracted features, and in focusing attention on the most salient elements. The fully
connected layers are deployed to interpret the internal representation and output a
vector representing multiple time steps. The benefits that CNN provides in our time
series forecasting job are the automatic feature learning, and the ability of the model
to output a multi-step vector directly.
We exploit the power of CNN in forecasting the stock prices using the Godrej
Consumer Products data in two different ways. In recursive or direct forecast
strategy, the model makes one-step predictions, and outputs are fed as inputs for
subsequent predictions. In the other approach, we used CNNs to predict the entire
output sequence as a one-step prediction of the entire vector. Using these two
approaches, we have built three different types of CNN models for multi-step time
series forecasting of stock prices as follows: (i) Multi-step time series forecasting
with univariate input data, (ii) Multi-step time series forecasting with multivariate
input data via channels – in this case, each input sequence is read as a separate
channel, like different channels of an image (e.g., red, green, and blue), (iii) multi-
step time series forecasting with multivariate input data via sub-models – in this
case, each input sequence is read by a different CNN sub-model and the internal
representations are combined before being interpreted and used to make a
prediction.
42
In the first case, we designed a CNN for multi-step time series forecasting using
only the univariate sequence of the open values. In other words, given some number
of prior days of open values, the model predicts the next standard week of stock
market operation. A standard week consists of five days – Monday to Friday. The
number of prior days used as the input defines the one-dimensional (1D) data of
open_perc values that the CNN will read and learn for extracting features. There
are several choices in deciding on the size and the nature of the input to the CNN
for training such as: (a) all prior days till the week for which the open values to be
predicted, (b) the prior five days (i.e., one week) only before the week of prediction,
(c) the prior two weeks (i.e., 10 days, as each week consists of 5 days), (d) prior
one month, (e) prior week and the week to be predicted in the previous year. Since
there is no obvious best choice here, we have tested the performance of the model
on different input sizes and observed the performance of the model under each such
case. Based on the choice of the input, the training data, the test data, and the
prediction process of the model are accordingly designed.
43
Chapter 6
Sensitivity: It is the ratio of the number of true positives to the total number of
positives in the test dataset, expressed as a percentage. Here, positive refers to the
cases that belong to the target group (i.e., the class “1”). The term true positives
refer to the number of positive cases that the model correctly identified. The word
sensitivity is also sometimes referred to as recall.
Specificity: It is the ratio of the number of true negatives to the total number of
negatives in the test dataset, expressed as a percentage. Here, negative refers to the
cases that belong to the non-target group (i.e., the class “0”). The term true negative
refers to the number of negative cases that the model correctly identified.
Positive Predictive Value: Positive predictive ratio (PPV), also sometimes referred
to as precision, refers to the accuracy of the model in classifying the target group
cases among the total number of target group cases identified by it. It is computed
as the ratio of the number of correctly identified target group cases to the total
number of target group cases as identified by the model. Since the total number of
target group cases identified by the model is the sum of the number of true positive
cases and the number of false-positive cases, PPV is the ratio of the total number
44
of true positive cases to the sum of the number of true positive cases and the number
of false-positive cases, expressed as a percentage. The complement of PPV is also
called false discovery rate (FDR).
Negative Predictive Value: Negative predictive value (NPV) refers to the accuracy
of the model in classifying the non-target group cases among the total number of
non-target elements identified by it. It is computed as the ratio of the number of
correctly identified non-target group cases to the total number of non-target group
cases as identified by the model. Since the total number of non-target group cases
identified by the models is the sum of the number of true negative cases and the
number of false-negative cases, NPV is the ratio of the total number of true negative
cases to the sum of the number of true negative cases and the number of false-
negative cases, expressed as a percentage. The complement of NPV is also called
false omission rate (FOR).
Classification Accuracy (CA): It is the ratio of the total number of cases that are
correctly classified to the total number of cases in the dataset, expressed as a
percentage.
F1 Score: If the test data set is highly imbalanced, with the cases belonging to the
non-target group far outnumbering the target cases, sensitivity is usually found to
be very poor even with a very high classification accuracy. Hence, classification
accuracy is not considered a very robust and reliable metric. F1 score, which is
computed as the harmonic mean of the sensitivity and PPV, is found to be a very
robust metric, however.
Classification Methods:
Fig 1(a), 1(b), and 1(c) present the classification performance, the lift curve, and
the ROC curve of the logistic regression-based classification model. In Fig 1(a), the
y-axis represents the actual classes of the records (either “0” or “1”) and the x-axis
denotes the probability that a case will belong to the class “1”. The threshold value
along the x-axis is by convention taken to be 0.5. Hence, all the cases which are
found to be lying on the level “0” along the y-axis and situated to the right of the
threshold value of 0.5 along the x-axis are misclassified. Similarly, all the points
which are on the level “1” along the y-axis, and are situated to the left of the
threshold value of 0.5 along the x-axis are also misclassified. It is evident from Fig
1(a) that the number of misclassified cases in Case I was very low. Fig 1(b) shows
that the lift curve is pulled up from the baseline indicating that the model was very
accurate in discriminating between the two classes. Fig 1(c) depicts the ROC curve
for the logistic regression model for Case I. The steepness of the curve makes it
46
evident that the model has been able to very effectively optimize the values of the
true positive rate (TPR) and the false positive rate (FPR).In Fig 1(c), the line
segment with red color presents the class “1” cases which are correctly classified,
while the blue line segment denotes the correctly classified cases which belong to
the class “0”. The portion of the ROC curve that is colored with yellow represents
those cases which actually belong to the class “0”, but the model wrongly classified
them to the class “1”. The “green” colored portion of the ROC curve depicts those
cases which are misclassified into the class “0”, while they actually belong to the
class “1”.
47
Fig 1(b): Logistic Regression for classification – lift curve (Case I)
Fig 1(c): Logistic Regression for classification – ROC curve (Case II)
Fig 2(a), Fig 2(b), and Fig 2(c) depict respectively the classification performance,
the lift curve, and the ROC curve of the logistic regression model for Case II. The
performance of the model, in this case, is similar to that of Case I. However, the
AUC value yielded by the model, in this case, was just marginally smaller than the
corresponding value in Case I.
48
Fig 2(a): Logistic Regression – actual vs predicted probabilities of open_perc (Case II)
Fig 2(b): Logistic Regression for classification – lift curve (Case II)
Fig 2(c): Logistic Regression for classification – ROC curve (Case II)
49
Fig 3(a): Logistic Regression – actual vs predicted probabilities of open_perc (Case III)
Fig 3(b): Logistic Regression for classification – lift curve (Case III)
Fig 3(a), Fig 3(b), and Fig 3(c) show the classification accuracy, the lift curve, and
the ROC curve for the logistic regression model in Case III. It is evident from Fig
3(c) that unlike in Case I and Case II, the classification model committed more
errors in classification. This case also yielded a lower AUC value of 0.9587.
50
Fig 3(c): Logistic Regression for classification – ROC curve (Case III)
51
Decision Tree Classification: We used tree function defined in the tree library in
R programming language for building the decision tree-based classification models
in all the three cases. The response variable open_perc is converted into a
categorical type using as.factor function for the purpose of classification. The
predict function in the tree library was used for predicting the classes of the
response variable open_perc for the records in the test dataset. For Case I and Case
III the models were identical as they were trained on the 2013 data. However, while
the model in Case I was tested on the 2103 data, the 2014 data was used for testing
the model in Case II. For all these cases, we found high_perc, low_perc, and
close_perc were the three predictor variables that were used by the models to
construct the decision trees. However, in Case I, the predictor which was used for
splitting at the root node was close_prec, indicating that close_perc was the most
important predictor for classification in the 2013 dataset. However, for the 2014
dataset, high_perc was found to be the most discriminating one as the same was
used by the model for splitting at the root node. In Case I, the decision tree classifier
misclassified 8 cases out of a total of 419 cases which actually belonged to the class
“0”, while 16 cases were wrongly classified out of a total of 326 cases which were
actually the records of the class “1”.
In Case II, the model failed to correctly classify 17 cases out of a total of 396 cases
which were actually “0” class members, while 25 cases were misclassified out of a
total of 329 cases that actually belonged to the class “1”. In Case III, the model had
a more difficult task at hand. We found that we could not correctly classify 30 cases
52
out of a total of 396 cases which actually belonged to the class “0”, while 33 cases
were misclassified out of a total of 329 cases which actually belonged to the actual
class of “1”.Table 3 presents the performance results of the decision tree
classification models under three different cases. Fig 4(a), 4(b), 4(c) depict the
decision tree classifiers for Case I, Case II, and Case III respectively.
53
Fig 4(c): Decision Tree for classification (Case III)
Fig 5(a): Bagging for classification – actual vs predicted classes of open_perc (Case I)
Fig 5(b): Bagging for classification – actual vs predicted classes of open_perc (Case II)
55
Fig 5(c): Bagging for classification – actual vs predicted classes of open_perc (Case III)
Fig 6(a): Boosting for classification – actual vs predicted classes of open_perc (Case I)
56
Boosting Classification: We have used the boosting function defined in the adabag
library in R programming language for building the boosting models for
classification under all the three cases. The response variable open_perc was
transformed into the categorical type using as.factor function so as to satisfy the
requirement of a classification model. The predict function was used for predicting
the class of the response variable in the test data records. For both Case I and Case
II, the boosting classification models were found to have yielded 100% accuracy in
all the metrics of classification as presented in Table 5. This is not surprising as in
both the cases the models were built and tested using the same dataset, and thus the
learning of the models had been very accurate using the ensemble of the weighted
majority voting on a large number of random decision tree classifiers. However, the
model faced more challenges in Case III in which the ensemble model was built
using the 2013 data, and the testing was done using the 2014 data. In Case III, we
found that the model misclassified 26 cases out of a total of 396 cases which
actually belonged to the class “0”, while among 329 cases which were actually of
the class “1”, 26 cases were incorrectly classified. Table 5 presents the performance
results of the boosting classification models for all three cases. Fig 6(a), Fig 6(b),
and Fig 6(c) depict the performance of the boosting classifier for Case I, Case II,
and Case III respectively. In these three figures, along the y-axis the actual classes
are plotted – there are two actual class levels “0” and “1”. The x-axis presents the
predicted probability that a case will belong to the class “1”. Hence, the data points
which are situated to the left side of the threshold value of 0.5 along the x-axis and
lying on the level “1” along the y-axis are the misclassified cases. Similarly, the
point that is on the right side of the threshold value of 0.5 and lying on the level “0”
along the y-axis are also the misclassified cases. It is evident from Fig 6(a), Fig
6(b), and Fig 6(c) that boosting classifiers have performed very well in all the three
cases.
57
Fig 6(b): Boosting for classification – actual vs predicted classes of open_perc (Case II)
Fig 6(c): Boosting for classification – actual vs predicted classes of open_perc (Case III)
61
Fig 8(a): ANN classification model (Case II)
Fig 8(b): ANN classification – actual vs predicted classes of open_perc (Case II)
62
Fig 9(b): ANN classification – actual vs predicted classes of open_perc (Case III)
SVM Classification: We used the ksvm function defined in the kernlab library in R
programming language for building the SVM-based classification models. The
function ksvm was used with the parameter kernel set to vanilladot. It implies that
a linear kernel is used for building the SVM classification models. For Case I, the
model found 120 number of support vectors. We found that out of a total number
of 430 cases which were actually “0” class records, 19 cases were misclassified as
“1”. On the other hand, 8 cases were wrongly classified as “0”, out of a total of 315
cases which were actually “1”. The training error for Case I was found to be 3.62%.
For Case II, the model found 156 support vectors in order to classify all the 725
records. Among 406 cases which actually belonged to the class “0”, 27 cases were
misclassified as “1”. On the other hand, 17 cases were wrongly classified as “0” out
63
of a total of 319 cases which were actually “1”. The training error for Case II was
found to be 6.07%. The SVM classification model found 116 support vector points
in Case III. The model misclassified 41 cases as “1” out of a total of 418 cases
which were actually “0”. On the other hand, out of a total of 307 cases which were
actually “1”, 19 cases were misclassified as “0”. Table 8 presents the results of the
SVM classification models for all three cases.
Regression Methods:
65
Table 9 presents the results of the multivariate regression results for all three cases.
Fig 10(a), (b) and (c) present some performance results of the multivariate
regression model for Case I. Fig 10(a) shows that the predicted values very closely
followed the pattern of the actual open_perc values, while Fig 10(b) exhibits that
there is a very strong linear relationship between the predicted and the actual values
of open_perc. The residuals of the model were found to be scattered and random
and exhibited no significant autocorrelation as depicted in Fig. 10(c). The
performance results of Case II are presented in Fig 11(a), (b), and (c). The predicted
and the actual values of the open_perc exhibited almost identical movement
patterns in this case as in Case I. The residuals did not show any significant
autocorrelations. Fig 12(a) shows how closely the predicted values of the
open_perc followed the patterns exhibited its actual values in Case III, while Fig
12(b) exhibits a strong linear relationship between them. Fig 12(c) depicts that the
residuals of the regression model for Case III were random and did not exhibit any
autocorrelations.
Fig10(a): Multivariate Regression- time-varying actual and predicted values of open_perc (Case1)
66
Fig 10(b): Multivariate Regression - relationship between actual and predicted open_perc (Case I)
Fig 11(a): Multivariate Regression- time-varying actual and predicted values of open_perc (Case II)
67
Fig 11(b): Multivariate Regression - relationship between actual and predicted open_perc (Case II)
Fig 12(a): Multivariate Regression- time-varying actual and predicted open_perc (Case III)
68
Fig 12(b): Multivariate Regression - actual and predicted open_perc (Case III)
MARS: We used the earth function defined in the earth library in R programming
language for building MARS regression models in all the three cases. In Case I, in
the forward pass of the execution of the algorithm, seven terms were used in the
model building as after the inclusion of the 8th term the change in the value of R2
was found to be only 5*10-5 which was less than the threshold value of 0.001. After
the completion of the forward pass, both the generalized R-square (GRSq) and the
R2 converged to a common value of 0.993. During the backward pass, the algorithm
could not prune any term and all the seven terms used in the forward pass were
finally retained in the model. In Case 1, the model retained three predictors out of
a total of ten predictors. The selected predictors in decreasing order of their
importance in the model were found to be: close_perc, high_perc, and low_perc.
The predictors which the algorithm did not use were: month, day_month, day_week,
69
time, vol_perc, nifty_perc, and range_diff. At the completion of the execution of
the algorithm, the values of some of the important metrics were as follows: (i)
generalized cross-validation (GCV): 0.0065, (ii) residual sum of square (RSS):
4.7006, (iii) GRSq: 0.9928, and (iv) R2: 0.9930. The seven terms that the MARS
algorithm used in Case I were found to be as follows: (i) the intercept of the model,
(ii) h(-0.83682 – high_perc), (iii) h(high_perc – 0.83682), (iv) h(-0.692841 –
low_perc), (v) h(low_perc – 0.692841), (vi) h(-2.11268 – close_perc), and (vii)
h(close_perc – 2.11268). In Case I, the MARS regression model yielded 9 cases
out of a total of 745 cases that exhibited mismatch in signs between the predicted
and the actual values of open_perc. The RMSE value for this case was 0.0794,
while the mean of the absolute values of the actual open_perc was 0.6402. Hence,
the ratio of the RMSE to the mean of the absolute values of the actual open_perc
was 2.4065. The correlation test yielded a value of correlation coefficient as 0.99
with the t-statistic value of 325.41 and an associated p-value of 2.2*10-16. This
indicated that there is a strong linear relationship between the predicted and the
actual values. Table 10 presents the results of the MARS regression model.
In Case II, the algorithm used nine terms during its forward execution since the
change in the R2 value at the end of the 9th term was found to be only 0.0002, which
was less than the threshold value of 0.001. After the completion of the forward
pass, the values of GRSq and R2 were found to be 0.985 and 0.986 respectively.
During the backward pass of its execution, the algorithm could prune one term out
of the nine terms included in the forward pass. Hence, the algorithm used eight
terms in constructing the regression model. We also observed that the algorithm
retained four predictors out of a total of ten predictors available initially. The four
predictors which were retained in the model in the decreasing order of their
importance were found to be: low_perc, close_perc, range_diff and high_perc. At
the end of the execution of the backward pass of the algorithm, some important
metric values were noted: GCV: 0.0262, RSS: 18.2512, GRSq: 0.9852, and R2:
71
0.9858. The eight terms that the algorithm used in building the regression model in
Case II were: (i) the intercept of the model, (ii) h(0.3675 – high_perc), (iii)
h(high_perc – 0.3675), (iv) h(-2.6685 – low_perc), (v) h(low_perc – 2.6685), (vi)
h(0.3996 – close_perc), (vii) h(-1.8 – range_diff), and (viii) h(range_diff - -1.8). In
Case II, we found that 31 cases out of a total of 725 cases exhibited mismatched
signs between the predicted and the actual values of open_perc. With an RMSE
value of 0.1587 and the mean of the absolute values of the actual open_perc as
0.9286, their ratio was found to be 17.09. The correlation test yielded the value of
the correlation coefficient as 0.99, with the value of the t-statistic as 223.87, with
an associated p-value of 2.2*10-16. The high value of the correlation coefficient and
the negligible support for the null hypothesis in the form of a very low p-value
indicated that there was a very strong linear relationship between the predicted and
the actual values of open_perc in Case II.
Fig 14(a): MARS- time-varying actual and predicted values of open_perc (Case II)
Fig 14(b): MARS – relationship between actual and predicted values of open_perc (Case II)
72
Fig 14(c): MARS - time-varying residuals (Case II)
Fig 15(a): MARS- time-varying actual and predicted values of open_perc values (Case III)
Fig 15(b): MARS – relationship between actual and predicted values of open_perc (Case III)
In Case III, the MARS model of regression was identical to that of Case I. The
73
model was, however, tested on 2014 data. We observed that in Case III, the MARS
model yielded 46 cases out of a total of 725 cases that yielded a sign mismatch
between the predicted and the actual open_perc values. The RMSE for this case
was found to be 0.1894, while the mean of the absolute values of the actual
open_perc was 0.9286. The ratio of the RMSE to the mean value was found to be
20.40. The correlation test on the predicted and the actual values of open_perc
yielded a correlation coefficient value of 0.99 with the value of t-statistic as 187.13
and an associated p-value of 2.2*10-16. The results indicated that like in Case I and
Case II, the predicted and the actual values of open_perc exhibited a strong linear
relationship between them in Case III as well.
Decision Tree Regression: We used the tree function defined in the tree library in
R programming language to build a decision tree-based regression model. For Case
I, close_perc turned out to be the splitting variable at the root node. Other important
variables that led to splitting at nodes were high_perc and low_perc. Fig. 16(a)
74
depicts the decision tree model. RMSE for this case was 0.2263, and the mean of
the absolute values of the actual open_perc was found to be 0.6402. Among the
total of 745 cases, 100 cases exhibited sign mismatch between the predicted and the
actual values of open_perc. The correlation coefficient between the predicted and
the actual open_perc values turned out to be 0.97. The t-statistics for the correlation
test yielded a value of 111.35 with a p-value of 2.2*10-16 which indicated that there
was a strong linear relationship between the predicted and the actual open_perc
values. Fig 16(b), (c), and (d) depict different performance characteristics of the
decision tree-based regression model for Case I. Fig 16(b) depicts that except for a
few instances, the predicted values of open_perc very closely followed the pattern
exhibited by its actual values. Fig 16(c) shows that with the increase in the actual
open_perc values, its predicted values also exhibited an upward trend stepwise. Fig
16(d) shows that residuals did not exhibit an autocorrelation among them. Table 11
depicts the results of the decision tree regression model.
Fig 16(b): Decision Tree regression - time-varying actual and predicted open_perc (Case I)
75
Fig 16(c): Decision Tree regression - actual and predicted open_perc (Case I)
Fig. 17(a) presents the decision tree regression model for Case II. In this case too,
the variable close_perc was the node that was split at the root node, and the other
two variables which were split at subsequent nodes were high_perc and low_perc.
This case yielded an RMSE value of 0.3440, and the mean of the absolute values
of the actual open_perc values was 0.9286. 126 cases out of a total of 725 cases
exhibited sign mismatch between their predicted and actual open_perc values. The
correlation coefficient between the actual and the predicted values of open_perc
was found to be 0.96 with a t-statistics value of the correlation test as 100.47, and
its associated p-values as 2.2*10-16. The correlation test indicated that the predicted
and the actual open_perc values were highly correlated. Fig 17 (b), (c), (d) show
76
that the regression model was effective in establishing a linear relationship between
the response variable, open_perc, and all other predictor variables.
Fig 17(b): Decision Tree regression - time-varying actual and predicted open_perc (Case II)
Fig 17(c): Decision Tree regression - actual and predicted open_perc (Case II)
77
Fig 17(d): Decision Tree regression – time-varying residuals (Case II)
The decision tree regression model for Case III was the same as that in Case I. The
decision tree model is presented in Fig 18(a). However, the performance of the
model yielded different results as it was tested on 2014 data unlike in Case I, in
which the model was tested on 2013 data. The correlation coefficient between the
predicted and the actual values of open_perc for this was found to be 0.10 with the
t-statistics value of the correlation test as 2.8243 and its associated p-value as
0.00487. However, as expected, the RMSE for this case was higher than those in
the previous two cases. The RMSE was found to be 1.5407 with the mean of the
absolute values of the actual open_perc as 0.9286. This led to a very high value of
165.92 as their ratio. 346 cases out a total of 725 cases exhibited mismatch in sign
between the predicted and the actual values of open_perc. The model was
absolutely unable to predict as it had a very limited number of values to map into a
set of a large set of continuously varying open_perc values for the year 2014.
78
Fig 18(b): Decision Tree regression - time-varying actual and predicted open_perc (Case III)
Fig 18(c): Decision Tree - relationship between actual and predicted open_perc (Case III)
Fig 18(b), (c), and (d) present the performance of the model in Case III. While the
behavior of the model was almost identical to that in the other two cases, Fig 18(b)
shows clearly that there were more deviations between the patterns exhibited by the
actual values and the predicted values of open_perc. This led to a significantly
higher RMSE in this case as compared to Case I and Case II.
Fig 19(a): Bagging regression - time-varying actual and predicted values of open_perc (Case I)
Fig 19(b): Bagging regression - relationship between actual and predicted open_perc (Case I)
80
Fig 19(c): Bagging regression – time-varying residuals (Case I)
Case II yielded an RMSE value of 0.2386 and the mean of the absolute values of
the actual open_perc as 0.9286. We found that 37 cases out of a total of 725 cases
yielded mismatch in sign between the predicted and the actual values of open_perc.
The RMSE value for Case III was found to be 0.3242. The mean of the absolute
values of the actual open_perc was 0.9286. We observed that 67 cases out of a total
of 725 cases showed a mismatch in sign among its predicted and the corresponding
actual values of open_perc. Table 12 presents the results of the bagging regression
model.
Fig 20(a): Bagging regression - time-varying actual and predicted values of open_perc (Case II)
81
Fig 20(b): Bagging regression - relationship between actual and predicted open_perc (Case II)
Fig 21(a): Bagging regression - time-varying actual and predicted open_perc (Case III)
82
Fig 21(b): Bagging regression - relationship between actual and predicted open_perc (Case III)
83
Fig 22(a): Boosting regression - time-varying actual and predicted open_perc (Case I)
Fig 22(b): Boosting regression - relationship between actual and predicted open_perc (Case I)
Fig 23(a): Boosting regression - time-varying actual and predicted open_perc (Case II)
Fig 23(b): Boosting regression - relationship between actual and predicted open_perc (Case II)
85
Fig 23(c): Boosting regression – time-varying residuals (Case II)
Fig 24(a): Boosting regression - time-varying actual and predicted open_perc (Case III)
Fig 24(b): Boosting regression - relationship between actual and predicted open_perc (Case III)
86
Fig 24(c): Boosting regression – time-varying residuals (Case III)
87
Table 14: Random Forest regression results
Case I Case II Case III
Metrics Training 2013 Training 2014 Test 2014
Correlation Coefficient 0.99 0.99 0.97
RMSE/Mean of Absolute Values of Actuals 16.26 10.82 32.02
Percentage of Mismatched Cases 0.00 2.62 6.48
Fig 25(a): Random Forest regression - time-varying actual and predicted open_perc (Case I)
Fig 25(b): Random Forest - relationship between actual and predicted open_perc (Case I)
88
Fig 25(c): Random Forest regression – time-varying residuals (Case I)
Fig 26(a): Random Forest regression - time-varying actual and predicted open_perc (Case II)
Fig 26(b): Random Forest - relationship between actual and predicted open_perc (Case II)
89
Fig 26(c): Random Forest regression – time-varying residuals (Case II)
Fig 25(a) depicts the way the predicted open_perc values superimposed on their
corresponding actual values for each of the 745 time slots in Case I. The linear
relationship between the predicted and the actual open_perc values are presented
in Fig 25 (b). The residual values for the random forest regression model are
depicted in Fig 25 (c). These three graphs along with the numeric metrics presented
under Case I in Table 14 clearly indicate that the random forest regression very
effectively modeled the Case I of Godrej Consumer data.
Fig 27(a): Random Forest regression - time-varying actual and predicted open_perc (Case III)
90
Fig 27(b): Random Forest - relationship between actual and predicted open_perc (Case III)
Fig 26(a), (b), and (c) present various visual performance metrics of the random
forest regression model for Case II. It is evident from these figures that the
predicted values of the open_perc very closely follows the patterns of the actual
values. Moreover, the residual values of the regression model exhibited randomness
and no significant autocorrelations were observed among them.
It is also evident from Fig 27(a), (b), and (c) that the random forest regression was
very effective in modeling Case III. Fig 27(b) indicates there are some deviations
from linearity at the head and the tail of the linear segment that exhibited a linear
relationship between the actual and the predicted values of open_perc. This
manifested in the form of a marginally higher value of the ratio the RMSE and the
mean of the absolute values of open_perc for Case III in random forest regression.
92
Fig 28(b): ANN regression - time-varying actual and predicted values of open_perc (Case I)
Fig 28(a) the ANN regression model for Case I. Only one node is used in the hidden
layer as additional nodes in this layer would have led to an overfitted model. The
link weights are written in black color while the bias values associated with the
hidden layer node and the output layer node are written in the blue color. The input
layer depicted nodes each of which corresponds to an input variable. While Fig
28(b) shows how the predicted values of open_perc followed the variational
patterns of its actual values. Fig 28(c) exhibits the linear relationship between the
predicted and the actual values of open_perc. From both these figures, it is evident
that the Case I was very elegantly modeled by ANN regression. Fig 28(d) showed
that the residuals are random and do not exhibit any autocorrelation. The correlation
for this case was found to be 0.99 and the percentage of cases that exhibited
mismatching signs in the predicted and the actual open_perc was only 1.07.
Fig 28(c): ANN regression - relationship between actual and predicted open_perc (Case I)
93
Fig 28(d): ANN regression – time-varying residuals (Case I)
Fig 29(a) depicts the ANN regression model built for modeling Case II. Fig 29(b)
and Fig 29(c) clearly show that the predicted series for the open_perc very closely
followed the patterns of its corresponding actual values. The linearity of the
relationship between the predicted and actual values of open_perc is depicted in
Fig 29(c). Fig 29(d) shows that the residuals of the regression model did not exhibit
any autocorrelation.
94
Fig 29(b): ANN regression - time-varying actual and predicted open_perc (Case II)
Fig 30(a), (b), (c), (d) the ANN regression model for Case III and the behavior of
the predicted values of open_perc with respect to its actual values, and the residuals
of the regression model. All these figures and the numerical metrics like correlation
coefficient, the ratio of RMSE and the mean of the absolute values of the actual
open_perc, and the number of cases in which the predicted values had different
signs from its actual values, all showed that the model was very accurate.
Fig 29(c): ANN regression - relationship between actual and predicted open_perc (Case II)
95
Fig 29(d): ANN regression – time-varying residuals (Case II)
Fig 30(b): ANN regression - time-varying actual and predicted open_perc (Case III)
96
Fig 30(c): ANN regression - relationship between actual and predicted open_perc (Case III)
SVM Regression: In SVM regression we have used svm function defined in the
e1071 library of R programming language. For all the three cases, the regression
type used by R was eps-regression, SVM-kernel was radial. The values of the
parameters gamma and epsilon were both found to be 0.1. The algorithm found the
number of support vectors as 248, 265, and 246 for Case I, Case II, and Case III
respectively. The RMSE values for the three cases were found to be 0.3450, 0.2593,
and 0.7703 respectively. The mean of the absolute values of the open_perc was
0.6402. We computed the ratio of the RMSE values to the mean of the absolute
values of open_perc for all the three cases so as to get an idea about the magnitude
of RMSE with respect to mean of the actual open_perc values. We also identified
the cases which exhibited a difference in the signs in the actual and predicted values
of open_perc. These are the cases, where the regression model had failed to predict
the direction of the movement of the actual open_perc values.
97
Table 16: SVM regression results
Fig 31(a): SVM regression - time-varying actual and predicted open_perc (Case I)
For Case I, 2 cases out of 745 cases were found to have exhibited sign mismatch in
the actual and the predicted values of open_perc. 32 out of 725 cases were found to
have yielded sign mismatch in Case II. In Case III, the model faced more challenges
in prediction, and thus 95 cases out of 725 cases mismatched in sign in their actual
and predicted values of open_perc. The product moment correlation coefficient
values were also computed between the actual and the predicted values of
open_perc. The SVM regression results are presented in Table 16. For all the three
cases, SVM regression was found to have yielded quite encouraging results.
98
Fig 31(b): SVM regression - relationship between actual and predicted open_perc (Case I)
Fig 31(a) presents the variation of actual open_perc and its predicted values at 745
time slots for Case I. It is clear that in most of the cases, the predicted series has
been able to accurately predict the movement of the actual open_perc time series.
In Fig 32(b), we have plotted the predicted values of the open_perc as a function of
its actual values. It can be easily observed that except for some points at the tail and
the head, most of the points exhibit a strong linear relationship between the actual
and the predicted values of open_perc for Case I. The residual plots in Fig 32(c)
also depicts that most of the residuals are random within a small range with a very
few residuals exhibiting large positive or negative values.
99
Fig 32(a): SVM regression - time-varying actual and predicted open_perc (Case II)
Fig 32(b): SVM regression - relationship between actual and predicted open_perc (Case II)
Fig 32(a), (b), and (c) depict almost similar patterns as exhibited by Fig. 31(a), (b)
and (c) respectively, indicating an almost identical performance of SVM regression
in Case II as in Case I. In fact, if we closely observe the pattern of variation in Fig
32(a), we can see that the predicted open_perc series follows even more closely the
actual open_perc series, in this case. It can be verified by checking the ratio of the
RMSE to the mean of the absolute values of the actual open_perc values which was
much lower in Case I than it was in Case II.
100
Fig 32(c): SVM regression – time-varying residuals (Case II)
Fig 33(a): SVM regression - time-varying actual and predicted open_perc (Case III)
Fig 33(b): SVM regression - relationship between actual and predicted open_perc (Case III)
101
Fig 33(c): SVM regression – time-varying residuals (Case III)
However, Fig 33(a), (b), and (c) clearly shows that Case III proved to be much more
challenging for the SVM regression model. The correlation coefficient between the
actual and the predicted values of the open_perc was found to be much lower in
this case, which can be easily verified in Fig 33(a) and Fig 33 (b). While Fig 33(a)
showed that the predicted time series in many time instances failed to follow the
pattern exhibited by the actual open_perc time series, Fig 33(b) exhibited
substantial nonlinearity between the predicted and the actual open_perc values. Fig
33(c), however, depicts that the residuals were randomly scattered and did not
exhibit any significant autocorrelation.
Fig 34(b): LSTM model architecture: (Case I, Case II and Case III)
For Case I, we first plot the open, high, low, close, volume, and the NIFTY time
series. In this case, there were 746 records in total. Fig 34(a) depicts the time series
for each of the attributes in Case I. All these six attributes (leaving out the time
attribute) are then normalized using the MinMaxScalar function defined in the
103
sklearn.processing module in Python. Out of the 746 records, the first 500 records
are used for training and the remaining 246 for the validation. The Sequential
function defined in Keras is used for building the LSTM and the model is compiled
using MAE as the loss function and ADAM as the optimizer. The model
architecture is depicted in Fig, 34 (b). The input layer consisted of six-time series
data as six channels and the output of the input layer is passed on to the LSTM layer
that expands the feature set to 50. The output of the LSTM is passed on to a dense
layer (i.e., a fully connected layer) that has 50 nodes in its input and 1 node at the
final output layer. The behavior of the training and the validation loss values is
studied for different values of epochs and batch sizes. With a batch size of 72 and
an epoch value of 100, the training and validation losses are found to have
converged to a very low value. Fig 34(c) presents the behavioral patterns of the
training and the validation losses in Case I. At the completion of the final epoch,
the RMSE value was 8.812, and Pearson’s product moment correlation coefficient
was 0.983 between the actual and the predicted open values. The training and
validation loss values were 0.0194 and 0.0252 respectively.
Case II involved stock prices for the entire year 2014 and it consisted of 725 tuples.
As in Case II, first the six attributes are plotted for all the 725 records. Fig 35(a)
depicts the plots for the attributes – open, high, low, close, volume, and the NIFTY
index. Similar to Case I, the raw values of these six attributes are normalized using
the MinMaxScalar function. The LSTM model architecture for the case is exactly
identical to that of Case I and is represented in Fig. 34(b).
104
Fig 35(a): LSTM regression – stock data representation (Case II)
Fig 35(b): LSTM regression – training and validation error (Case II)
The first 500 records are used for model construction and the remaining 225 records
are utilized in validating the model. The validation loss converged with the training
loss at an epoch value of 40. However, it started increasing again with the increase
in epoch value. The validation loss converged finally with the training loss at an
epoch value of 100, and with a batch size of 72. The RMSE of the model was found
to 15.002 with a correlation value of 0.982 between the actual and the predicted
open values. The training and the validation loss were 0.0134 and 0.0301
respectively, after the completion of the last epoch. Fig 35(b) depicts the pattern of
variation of the training and the validation loss with different epochs in Case II.
105
Fig 36(a): LSTM regression – stock data representation (Case III)
Fig 36(b): LSTM regression – training and testing error (Case III)
In Case III, the LSTM model was built using the records of the year 2013, and then
the model was tested on the records of the year 2014. The raw dataset, in this case,
consisted of 1471 records in total, of which 746 records (those belonging to the year
2013) were used in building the model, and the remaining 725 records (those
belonging to the year 2014) were used for testing the model. Fig 36(a) presents the
plots of the open, high, low, close, volume, and the NIFTY time series for this case
with 1471 records. The LSTM model architecture in Case III remains identical to
those of Case I and Case II. The training and the test losses were found to have
converged at an epoch value of 60 with a batch size of 72. The RMSE and the
correlation values for this case were found to be 13.477 and 0.996 respectively. The
106
training and the test losses were 0.0116 and 0.0258 respectively. Fig 36(b) depicts
the patterns exhibited by the training and the testing losses with different values of
epoch.
CNN Regression: In Chapter 5, we discussed briefly about the way we have used
CNN regression to carryout multi-step forecasting of the open values of the Godrej
Consumer Products stock time series. We have followed a slightly different
approach in this case. We used Godrej Consumer Products stock price data for the
period December 31, 2012 (Monday) till January 9, 2015 (Friday). During this
period, the stock price movements have been captured at 5 minutes interval time.
At each slot the values of open, high, low, close, and volume are available. The
stock price data for the period December 31, 2012 till December 30, 2013 has been
used as the training dataset, and for the purpose testing, the data for the period
107
December 31, 2013 till January 9, 2015 has been used. The entire dataset has also
been arranged in the form of a weekly sequence: Monday to Friday. Fig. 37 depicts
the data at 5 minutes interval for the entire period under consideration. As
mentioned earlier in Chapter 5, we followed four different approaches to CNN
regression for the Godrej dataset. We describe them as under:
Fig 38: CNN model architecture – Univariate multistep with one week’s data as input (N = 5)
Case I: Univariate Forecasting with One-week prior data (N=5) – With one-week
prior data used for building a univariate forecasting model using a CNN, we had a
small amount of data and hence a very light model. We used only one convolution
layer with 16 filters and a kernel size of 3. In other words, it means that the input
sequence of five days is read with a convolutional operation in three time-steps at
a time and this operation is performed 16 times. A max pooling layer of size 2 is
used that reduces the size of the feature maps before the internal representation is
flattened to one long vector. This is then interpreted by a fully-connected layer
before the output layer (which is also fully connected) predicts the open values for
108
the next five days. Fig. 38 depicts the architecture of the CNN model for Case I.
Both for the convolution layer and the fully connected layer, the ReLU (Rectified
Linear Unit) function has been used as the activation function. The “ADAM”
implementation of the stochastic gradient descent algorithm has been used as the
optimizer with 20 epochs and a batch size of 4. The loss function used was mean
squared error (MSE). For computing the error in prediction, we used root mean
squared error (RMSE) as the metric. Since with small batch size and with the use
of stochastic nature of the gradient descent algorithm, the model is expected to learn
a slightly different mapping of the inputs to the outputs very time it is trained. This
implies that performance results will vary slightly in different runs.
We tested the model for 20 rounds and noted the performance of the model with
respect to its overall RMSE, the RMSE values for the individual days of a week
(i.e., Monday – Friday), the execution time of the model, and the ratio of the RMSE
to the mean value of the variable predicted (i.e., mean of the open value for the test
dataset). Table 17 depicts the results for the performance of the CNN model for
Case I. It may be noted that the mean value of open for the test data is 866.5875.
109
The training and the test data consisted of 19500 and 20250 records respectively.
The execution time for the models has been expressed in seconds. The model has
been executed on a system consisting of Intel i7 CPU with a clock speed of 2.60
GHz- 2.59 GHz and 16 GB random access memory (RAM) running on Windows
10 operating system.
Fig 39: CNN model architecture – Univariate multistep with two week’s data as input (N = 10)
Case II: Univariate Forecasting with Two-week prior data (N=10) – The
architecture of the model in the case is identical to that in Case I. However, the
model is fed with two weeks’ prior data (i.e., ten immediate past open values) for
the purpose of forecasting the open values of the subsequent week. Fig. 39 depicts
the architecture of the CNN model for Case II. Table 18 depicts the results for the
performance of the model for the 20 rounds of Case II. The execution time for the
model has been expressed in seconds. The system hardware and operating system
details on which the model was tested has been mentioned earlier under Case I.
110
Table 18: CNN regression results (Case II: Univariate multi-step N=10)
Round_No Overall RMSE Monday Tuesday Wednesday Thursday Friday Exec_Time
1 5.307 3.9 4.6 5.6 5.7 6.4 85.386
2 6.042 3.9 6.2 6.4 6.8 6.6 85.845
3 5.378 4.0 5.0 5.6 5.6 6.3 84.247
4 5.278 3.5 4.6 5.5 5.9 6.3 108.214
5 5.592 4.4 5.0 5.7 6.0 6.7 88.994
6 8.852 6.6 8.7 9.6 10.4 8.5 83.484
7 5.294 3.8 5.3 5.5 5.7 6.0 85.899
8 6.061 4.7 4.9 5.9 6.8 7.5 88.373
9 5.229 3.9 4.9 5.4 5.8 5.8 88.297
10 5.857 5.6 5.3 5.3 5.9 7.1 84.636
11 5.227 4.8 4.7 5.6 5.4 5.7 101.964
12 8.797 7.8 8.4 9.0 9.5 9.1 87.828
13 7.190 5.3 6.6 7.9 8.5 7.2 81.649
14 5.697 4.9 5.7 5.8 5.9 6.1 84.072
15 5.314 3.6 4.5 5.6 6.3 6.1 87.396
16 5.186 3.6 4.8 5.3 5.7 6.2 81.858
17 7.110 9.3 6.0 6.5 7.1 6.2 88.401
18 5.356 3.7 4.5 5.2 6.0 6.8 84.992
19 5.210 4.1 4.6 5.3 5.6 6.2 84.526
20 6.889 6.7 6.6 7.1 6.9 7.1 82.084
Mean 6.043 4.905 5.545 6.190 6.575 6.695 87.407
SD 1.145 1.578 1.228 1.261 1.370 0.871 6.528
Min 5.186 3.500 4.500 5.200 5.400 5.700 81.649
Max 8.852 9.300 8.700 9.600 10.400 9.100 108.214
RMSE/Mean 0.0070 0.0057 0.0064 0.0071 0.0076 0.0077
Case III: Multivariate Forecasting with Two-week prior data (N=10) – In this case,
with a multi-channel CNN approach, we used each of the five-time series variables,
open, high, low, close, and volume for forecasting the next week’s open values. We
do this by providing each one-dimensional time series to the model as a separate
channel of input. In this case, CNN uses a separate kernel and reads each input
sequence onto a separate set of filter maps (i.e., feature maps), essentially learning
features from each input time series variable. Five input variables are used with two
weeks of prior data for the purpose of training the model. The increase in the
amount of data requires a larger and more sophisticated model that is trained for a
longer time. We used two convolutional layers with 32 filter maps with a kernel
size of 3, followed by a max pooling layer of size 2, then another convolutional
layer with 16 filter maps with a kernel size of 3, and a max pooling layer of size 1.
The fully connected layer that interprets the features is increased to 100 nodes and
the model is fit for 70 epochs with a batch size of 16 samples of records. The
activation function for all layers has been chosen as ReLU and the ADAM
optimizer being used for optimizing the stochastic gradient descent algorithm. Fig.
40 depicts the architecture of the CNN model for the multivariate time series with
111
two weeks’ previous data as the input. Table 19 depicts the results of performance
for Case III. The execution time for each round of execution of the model has been
expressed in seconds.
Fig 40: CNN model architecture – Multivariate multistep with two week’s data as input (N = 10)
112
Table 19: CNN regression results (Case III: Multivariate multi-step N=10)
Round_No Overall RMSE Monday Tuesday Wednesday Thursday Friday Exec_Time
1 8.466 7.9 6.7 7.4 9.6 10.1 134.286
2 5.510 4.2 5.2 5.6 6.1 6.3 122.020
3 6.003 4.7 5.6 6.3 6.4 6.8 111.729
4 6.718 5.6 6.4 7.0 7.2 7.3 116.582
5 5.602 4.4 5.2 5.7 6.1 6.4 131.704
6 6.056 4.3 5.9 6.2 6.7 6.8 113.107
7 5.585 4.3 5.0 5.7 5.9 6.8 137.406
8 6.220 4.6 6.6 6.5 6.4 6.8 113.899
9 5.710 4.5 5.4 5.9 6.1 6.4 113.717
10 6.101 5.1 6.0 6.2 6.3 6.7 111.018
11 6.708 6.7 6.6 6.5 6.6 7.1 131.253
12 5.398 4.1 5.0 5.5 5.9 6.2 139.208
13 7.956 8.3 6.4 6.4 7.8 10.2 114.197
14 7.061 6.5 6.3 6.4 8.4 7.5 115.814
15 5.870 4.5 5.6 6.1 6.4 6.5 116.391
16 6.070 4.7 5.7 6.2 6.5 7.0 113.899
17 5.977 5.2 5.4 6.0 6.4 6.7 116.089
18 5.760 4.8 5.0 6.5 6.1 6.2 114.035
19 5.647 4.9 5.2 5.7 6.0 6.3 112.558
20 5.547 4.4 5.1 5.6 5.9 6.5 116.758
Mean 6.198 5.185 5.715 6.17 6.64 7.03 119.784
SD 0.819 1.218 0.601 0.488 0.949 1.124 9.307
Min 5.398 4.1 5 5.5 5.9 6.2 111.018
Max 8.466 8.3 6.7 7.4 9.6 10.2 139.208
RMSE/Mean 0.0072 0.0060 0.0066 0.0071 0.0077 0.0081
Case IV: Forecasting with multivariate input data with sub-model (N=10) – In this
case, we constructed a separate sub-CNN model for each of the five input variables,
which we refer to as a multi-headed CNN model. The configuration of the model,
including the number of layers and their hyperparameters, is modified to optimize
the overall model performance. Two convolutional layers with 32 feature maps and
kernel size of 3 are used followed by a max pooling layer of size 2. Two dense
layers are used for the output size each consisting of 200 and 100 nodes respectively
before the output layer receives the data using 100 nodes at its input and produces
finally 5 output values through 5 nodes at its output layer. ReLU was the chosen
activation function and the optimizer was ADAM. The number of epochs and the
batch size was 25 and 16 respectively. The multi-headed model is specified using a
more flexible functional API for defining Keras models (Brownlee, 2019). The
program designed for this approach loops over each input variable, and creates a
sub-model that takes a one-dimensional sequence of 10 days (two weeks) of data
and outputs a flat vector containing a summary of the learned features from the
sequence. Each of these vectors is merged by concatenation to make one very long
vector that is then interpreted by some fully-connected layers before the forecast
113
for the next week is made. The model needs four arrays as input – one each for the
sub-models. We achieved this by creating a list of three-dimensional (3D) arrays.
Fig. 41 depicts the architecture of the CNN model for Case IV.
Fig 41: CNN model architecture – Multivariate sub-models with two week’s data as input (N = 10)
Table 20 depicts the performance results of Case IV. The model execution time for
each of the 20 rounds has been expressed in seconds.
Table 20: CNN regression results (Case IV: Multiheaded CNN N=10)
Round_No Overall RMSE Monday Tuesday Wednesday Thursday Friday Exec_Time
1 5.765 5.3 5.5 5.8 5.7 6.4 88.420
2 9.371 8.5 7.9 8.7 11.8 11.1 101.872
3 6.480 5.8 6.4 7.1 6.6 6.5 88.159
4 6.632 4.7 5.8 7.3 7.5 7.4 89.144
5 6.458 6.8 5.5 6.2 6.4 7.3 83.641
6 5.013 3.6 5.3 5.0 5.3 5.6 87.065
7 5.271 4.3 5.0 5.0 5.4 6.4 91.820
8 5.393 2.8 4.3 5.2 5.7 7.7 96.391
9 5.112 3.7 4.9 5.3 5.5 5.9 83.661
10 4.923 3.6 4.6 5.0 5.3 5.8 80.920
11 7.133 3.8 6.1 8.0 7.7 8.9 100.523
12 5.198 3.8 4.7 5.4 5.7 6.0 85.323
13 6.544 4.8 6.8 6.4 7.2 7.3 84.169
14 6.782 4.8 5.8 7.1 7.6 8.1 84.272
15 5.502 3.9 5.3 6.0 5.8 6.1 84.145
16 5.343 5.7 4.7 5.3 5.4 5.5 84.995
17 4.853 3.3 4.3 4.7 5.6 5.9 104.519
18 6.018 6.2 4.5 5.5 5.9 7.6 94.429
19 6.114 4.1 7.2 6.6 6.1 6.1 83.603
20 6.205 3.4 5.7 5.3 7.4 8.1 99.166
Mean 6.006 4.645 5.515 6.045 6.48 6.985 89.812
SD 1.050 1.397 0.984 1.109 1.506 1.370 7.170
Min 4.853 2.800 4.300 4.700 5.300 5.500 80.920
Max 9.371 8.500 7.900 8.700 11.800 11.100 104.519
RMSE/Mean 0.0069 0.0054 0.0064 0.0070 0.0075 0.0081
114
From Tables 17, 18, 19, and 20, it is easy to observe that Case I with Univariate
multi-step walk-forward forecasting method with prior one week’s data as the input
is the most accurate model yielding a ratio of RMSE to the mean of the actual values
as 0.0065. This model is also found to be the fastest in execution with a mean
execution time of 82.668 seconds, and with a small value of 4.392 of the standard
deviation of the execution time for 20 rounds. Case III with multivariate, multi-
step, walk-forward forecasting with prior two weeks’ data as the input is found to
have performed the worst among all the CNN models. This model yielded a value
of 0.0072 for the ratio of the RMSE to the mean of the actual values. The execution
time for Case III was also observed to be the longest with a mean execution time
of 119.784 seconds and a standard deviation of 9.307 for 20 rounds of execution of
the model. It is also noted that the ratio of RMSE to the mean value of the response
variable is the lowest for Monday and the same is the highest for Friday. The ratio
of RMSE to the mean value of the response value consistently increased from
Monday through Friday.
We now present a summary of the performance results for all the machine learning-
based classification and regression results.
115
Table 22: Summary of the performance of the classification models in Case II
We observe that both for Case I and Case II and all the metrics, boosting performed
the best among all the classification models. However, considering the fact that
Case I and Case II exhibit only the training accuracies, the performance in the Case
III should be considered as the most critical as it demonstrates the test accuracy of
a model. From Table 23, we find that ANN performed the best on sensitivity and
NPV while boosting outperformed all other models on specificity, PPV, and
classification accuracy. However, random forest was found to have performed best
on the F1 score, which is usually considered to be the most important metric in
classification. In Tables 21-26, the following abbreviations are used in the column
names: LR – Logistic Regression, KNN – K-Nearest Neighbor, DT- Decision Tree,
BAG – Bagging, BOOST – Boosting, RF – Random Forest, ANN – Artificial
Neural Networks, SVM – Support Vector Machines, LSTM – Long- and- Short-
Term Memory.
Table 23: Summary of the performance of the classification models in Case III
116
machine learning models on all metrics and for all the three cases, we have also
noted down the best performing machine learning model on each metric.
In Case I, multivariate regression, MARS, boosting, random forest, and ANN all
yielded the highest correlation coefficient value of 0.99. However, the correlation
coefficient was found to be 1.00 in the case of LSTM. For the ratio of the RMSE to
the mean of the absolute values of the open_perc values, MARS yielded the lowest
value of 12.41 among the machine learning models, while the corresponding value
for LSTM was 7.94. Both random forest and LSTM yielded no sign mismatch
among the predicted and the actual values of the open_perc.
Table 26: Summary of the performance of the regression models in Case III
In Case II, the highest value of the correlation coefficient was achieved by
multivariate regression, MARS, boosting, and random forest. LSTM outperformed
all the machine learning models on this metric by attaining a value of 1.00. The
RMSE to the mean ratio value of 10.82 was the least for random forest among the
machine learning models. However, the corresponding value yielded by LSTM was
4.04. Random forest produced only 2.62 percent cases that mismatched in the signs
of the actual and predicted open_perc values, however for LSTM, all the cases had
the same sign for the actual and the predicted open_perc values.
For Case III, while LSTM exhibited the best performance on all metrics,
multivariate regression and MARS yielded the same (the highest) value for the
correlation coefficient. For the metric RMSE to the mean ratio and the percentage
117
of the mismatched cases, multivariate regression produced the best results among
the machine learning models.
It may be noted that the CNN models worked on the stock price data of Godrej
Consumer Products during the period December 31, 2012 till January 9, 2015,
while the data were collected at 5 minutes interval of time for each day of a week:
Monday through Friday. Since all other models built in this work are based on stock
price data aggregated into three slots in a day, it is not wise to compare the
performance of the CNN model suites with the machine learning-based models and
the LSTM models. However, one can easily see that based on the ratio of the RMSE
to the mean of the actual values of the forecasted variable, all the CNN models
outperform the LSTM by a large margin. While the least value for the ratio of the
RMSE to the actual value of the forecasted variable for the LSTM model was found
to be 2.36, the corresponding value for the CNN suite was 0.0065.
118
Chapter 7
In this work, we have proposed a robust forecasting framework for stock price and
stock price movement pattern prediction with a very high level of accuracy. The
predictive model consists of eight classification and eight regression models based
on several machine learning approaches. In addition to that, the framework also
includes two deep learning models of regression using an LSTM network and a
suite of CNNs. All these models work on a short-term time horizon, and they have
the ability to forecast stock price movement and stock price on the basis of three-
time slots on a given day. We constructed the models, trained, validated, and finally
tested them using the historical stock prices of a company – Godrej Consumer
Products Ltd. The data is taken from the listed values of the stock in the National
Stock Exchange (NSE) of India during the period of two years – January 2013 till
December 2014. The stock price data were extracted from the NSE database at five
minutes interval of time using the Metastock tool. After its collection, the raw data
were pre-processed, appropriate transformation (i.e., normalization,
standardization, NA removal, etc.) done, and a number of derived predictor
variables were created based on the rich features of the stock data. While a number
of newly derived predictors were used in building the model, we used the
percentage change in the open values of the stock, called open_perc, as the response
variable. The five minutes interval granular data are also aggregated into three slots
on a given day so that the predictive models can be built to forecast the value of the
open_perc in the next slot given stock price data till the current slot. While the
classification-based models are used to predict the movement pattern of open_perc
values, the objective of the regression models is to accurately predict the value of
the open_perc. In addition to exploiting the machine learning algorithms for
119
building the eight classification and eight regression models, we also leveraged the
rich features of Tensorflow and Keras frameworks in building two extremely
powerful deep learning-based regression models using an LSTM network and a
suite of CNNs. For building the machine learning models, we used R programming
language, while for the LSTM-based deep learning regression model, and the suite
of four CNN models, Python programming has been used. The models are trained,
validated, and tested on the stock data and extensive results are produced and
critically analyzed. The results elicited a very interesting observation. While there
was not a single machine learning model that performed the best on all the metrics
on classification and regression, the deep learning model using an LSTM network
outperformed all the regression models on every metric that we considered. Since
the CNN models were built using stock price data collected at 5 minutes interval of
time while the machine learning models and the LSTM models were based on stock
price data collected at three slots in a day, it is not wise to compare the performance
of the CNN suite with the other models. However, it has been found that based on
the metric of the ratio of the RMSE to the mean of the actual values of the forecasted
variable, the CNN models are far more accurate than the machine learning models
and the LSTM-based deep learning model of regression.
In another recently published work, we have also studied the efficacy and accuracy
of a CNN-based deep learning regression model in time series forecasting (Mehtab
& Sen, 2020). It is a very well-known fact now that deep learning models have a
much higher capability of extracting and learning the features from a time series
data than their machine learning counterparts. However, in order to exploit the
power of deep learning models, the volume of data should be very large. As a future
scope of work, we would explore the use of a large variety of hybrid LSTM models
such as: univariate and multivariate encoder-decoder LSTM models, CNN-LSTM
models, convolutional LSTM models and generalized adversarial networks (GAN)
in forecasting stock price movements patterns and stock price values. We believe
that an integrated approach to building deep learning models combining the power
of LSTM, CNN, and GAN can be a very interesting area of work in this direction.
120
References
Adebiyi, A., Adewumi, O., & Ayo, C.K. (2014). Stock Price Prediction Using the
ARIMA Model. Proceedings of the International Conference on Computer
Modelling and Simulation (UKSIM’14), March 2014, pp. 106 – 112,
Cambridge, UK. DOI: 10.1109/UKSim.2014.67.
Basalto, N., Bellotti, R., De Carlo, F., Facchi, P., & Pascazio, S. (2005). Clustering
Stock Market Companies via Chaotic Map Synchronization. Physica A:
Statistical Mechanics and its Applications, Vol 345, Nos 1 – 2, pp. 196-206,
January 2005. DOI: 10.1016/j.physa.2004.07.034.
Basu, S. (1983). The Relationship between Earnings Yield, Market Value, and
Return for NYSE Common Stocks: Further Evidence. Journal of Financial
Economics, Vol 12, No 1, pp. 129-156, June 1983. DOI: 10.1016/0304-
405X(83)90031-4.
Bentes, S. R., Menezes, R., & Mendes, D. A. (2008). Long Memory and Volatility
Clustering: Is the Empirical Evidence Consistent across Stock Markets?
Physica A: Statistical Mechanics and its Applications, Vol 387, No 15, pp.
3826-3830, June 2008. DOI: 10.1016/j.physa.2008.01.046.
121
an Emerging Financial Market: Forecasting and Trading the Taiwan Stock
Index. Computers and Operations Research, Vol 30, No 6, pp. 901– 923.
DOI: 10.1016/S0305-0548(02)00037-0.
Chen, Y., Dong, X. & Zhao, Y. (2005). Stock Index Modeling using EDA Based
Local Linear Wavelet Neural Network. Proceedings of International
Conference on Neural Networks and Brain, 13 – 15 October 2005, Beijing,
China, pp. 1646 – 1650. DOI: 10.1109/ICNNB.2005.1614946.
Chui, A. & Wei, K. C. (1998). Book-to-Market Firm Size, and the Turn-of-the Year
Effect: Evidence from Pacific-Basin Emerging Markets. Pacific-Basin
Finance Journal, Vol 6, No 3-4, pp. 275-293, August 1998. DOI:
10.1016/S0927-538X(98)00013-4.
Dutta, G., Jha, P., Laha, A. K. & Mohan, N. (2006). Artificial Neural Network
Models for Forecasting Stock Price Index in the Bombay Stock Exchange.
Journal of Emerging Market Finance, Vol 5, No 3, pp. 283-295, December
2006. DOI: 10.1177/097265270600500305.
Fama, E.F. & French, K.R. (1995). Size and Book-to-Market Factors in Earning
and Returns. Journal of Finance, Vol 50, No 1, pp. 131-155, March 1995.
DOI: 10.1111/j.1540-6261.1995.tb05169.x
Fu, T-C, Chung, F-L., Luk, R., & Ng, C-M, (2008). Representing Financial Time
Series Based on Data Point Importance. Engineering Applications of
Artificial Intelligence, Vol 2, No 2, pp. 277-300, March 2008. DOI:
10.1016/j.engappai.2007.04.009.
Hanias, M., Curtis, P. & Thalassinos, J. (2012). Time Series Prediction with Neural
Networks for the Athens Stock Exchange Indicator. European Research
Studies Journal, Vol 15, No 2, pp. 23-32. DOI: 10.35808/ersj/351.
Jaffe, J., Keim, D. B., & Westerfield, R. (1989). Earnings Yields, Market Values,
and Stock Returns. Journal of Finance, Vol 44, No 1, pp. 135-148, March
1989. DOI: 10.1111/j.1540-6261.1989.tb02408.x
Kimoto, T., Asakawa, K., Yoda, M. & Takeoka, M. (1990). Stock Market
Prediction System with Modular Neural Networks. Proceedings of the IEEE
International Joint Conference on Neural Networks (IJCNN), 17-21 June
1990, San Diego, CA, USA. DOI: 10.1109/IJCNN.1990.137535.
123
Lahmiri, S. (2014). Wavelet Low- and High- Frequency Components as Features
for Predicting Stock Prices with Backpropagation Neural Networks. Journal
of King Saud University – Computer and Information Sciences, Vol 26, Issue
2, pp. 218-227. DOI: 10.1016/j.jksuci.2013.12.001.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning
Applied to Document Recognition. Proceedings of the IEEE, Vol 86, No 11,
pp. 2278-2324, November 1998. DOI: 10.1109/5.726791
Leigh, W., Hightower, R. & Modani, N. (2005). Forecasting the New York Stock
Exchange Composite Index with Past Price and Interest Rate on Condition of
Volume Spike. Expert Systems with Applications, Vol 28, No 1, pp. 1-8,
January 2005. DOI: 10.1016/j.eswa.2004.08.001.
Liao, S-H., Ho, H-H., Lin, H-W. (2008). Mining Stock Category Association and
Cluster on Taiwan Stock Market. Expert System with Applications, Vol 35,
Nos 1-2, pp. 19-29, July-August 2008. DOI: 10.1016/j.eswa.2007.06.001.
Mehtab, S. & Sen, J. (2020). Stock Price Prediction Using Convolutional Neural
Networks on a Multivariate Time Series. Proceedings of the 3rd National
Conference on Machine Learning and Artificial Intelligence (NCMLAI’20),
New Delhi, India, February 1, 2020.
Mehtab, S. & Sen, J. (2019). A Robust Predictive Model for Stock Price Prediction
Using Deep Learning and Natural Language Processing. Proceedings of the
7th International Conference on Business Analytics and Intelligence
(BAICONF’19), Bangalore, India, December 5 – 7, 2019. DOI:
10.2139/ssrn.3502624.
Mondal, P., Shit, L., & Goswami, S. (2014). Study of Effectiveness of Time Series
Modeling (ARMA) in Forecasting Stock Prices. International Journal of
Computer Science, Engineering and Applications, Vol 4, No 2, pp. 13-29,
April 2014. DOI: 10.5121/ijcsea.2014.4202
Phua, P. K. H., Ming, D., & Lin, W. (2001). Neural Network with Genetically
Evolve Algorithms for Stock Prediction. Asia-Pacific Journal of Operational
Research, Vol 18, No 1, pp. 103 – 107.
Rosenberg, B., Reid, K., & Lanstein, R. (1985). Persuasive Evidence of Market
Inefficiency. Journal of Portfolio Management, Vol 1, No 1, pp. 9 – 17. DOI:
10.3905/jpm.1985.409007.
Sen, J. (2018b). Stock Composition of Mutual Funds and Fund Style: A Time Series
Decomposition Approach towards Testing for Consistency. International
Journal of Business Forecasting and Marketing Intelligence, Vol 4, No 3, pp.
235-292. DOI: 10.1504/IJBFMI.2018.092781.
125
Sen, J. (2018c). A Study of the Indian Metal Sector Using Time Series
Decomposition-Based Approach. Book Chapter in Selected Studies on
Economics and Finance, Editors: Basar, S., Celik, A. A., & Bayramoglu, T.,
pp. 105-152, Cambridge Scholars Publishing, UK, March 2018.
Sen, J. (2018d). Stock Price Prediction Using Machine Learning and Deep Learning
Frameworks. Proceedings of the 6th International Conference on Business
Analytics and Intelligence (ICBAI’18), Bangalore, India, December 20-22,
2018.
Sen, J. & Datta Chaudhuri, T. (2017b). A Predictive Analysis of the Indian FMCG
Sector Using Time Series Decomposition-Based Approach. Journal of
Economics Library, Vol 4, No 2, pp. 206-226, June 2017. DOI:
10.1453/jel.v4i2.1282.
Sen, J. (2017c). A Time Series Analysis-Based Forecasting Approach for the Indian
Realty Sector. International Journal of Applied Economic Studies, Vol 5, No
4, pp. 8 – 27, August 2017.
Sen, J. (2017d). A Robust Analysis and Forecasting Framework for the Indian Mid
Cap Sector Using Time Series Decomposition. Journal of Insurance and
Financial Management, Vol 3, No 4, pp. 1-32, September 2017.
Sen, J. & Datta Chaudhuri, T. (2017e). A Robust Predictive Model for Stock Price
Forecasting. Proceedings of the 5th International Conference on Business
Analytics and Intelligence, Bangalore, India, December 11-13, 2017.
Sen, J. & Datta Chaudhuri, T. (2016a). Decomposition of Time Series Data of Stock
Markets and its Implications for Prediction -An Application for the Indian
Auto Sector. Proceedings of the 2nd National Conference on Advances in
Business Research and Management Practices (ABMRP’16), Kolkata, India,
126
January 8-9, 2016, pp. 1-28. DOI: 10.13140/RG.2.1.3232.0241.
Sen, J. & Datta Chaudhuri, T. (2016b). An Alternative Framework for Time Series
Decomposition and Forecasting and its Relevance for Portfolio Choice – A
Comparative Study of the Indian Consumer Durable and Small-Cap Sector.
Journal of Economics Library, Vol 3, No 2, pp. 303-326. DOI:
10.1453/jel.v3i2.787.
Senol, D. & Ozturan, M. (2008). Stock Price Direction Prediction Using Artificial
Neural Network Approach: The Case of Turkey. Journal of Artificial
Intelligence, 1, pp. 70-77. DOI: 10.3923/jai.2008.70.77
Shen, J., Fan, H. & Chang, S. (2007). Stock Index Prediction Based on Adaptive
Training and Pruning Algorithm. Advances in Neural Networks, Lecture
Notes in Computer Science, Springer-Verlag, Vol 4492, pp. 457–464. DOI.
10.1007/978-3-540-72393-6_55.
Tsai, C.-F. & Wang, S.-P. (2009). Stock Price Forecasting by Hybrid Machine
Learning Techniques. Proceedings of International MultiConference of
Engineers and Computer Scientists, 1.
Tseng, K-C., Kwon, O., & Tjung, L. C. (2012). Time Series and Neural Network
Forecast of Daily Stock Prices. Investment Management and Financial
Innovations, Vol 9, No 1, pp. 32-54.
Wang, Z, Yan, W., & Oates, T. (2016). Time Series Classification from Scratch
with Deep Neural Networks: A Strong Baseline. Proceedings of the 2017
IEEE International Joint Conference on Neural Networks (IJCNN),
Anchorage, Alaska, USA, May 14-19, 2017. DOI:
10.1109/IJCNN.2017.7966039.
Wu, Q., Chen, Y. & Liu, Z. (2008). Ensemble Model of Intelligent Paradigms for
Stock Market Forecasting. Proceedings of the IEEE 1st International
Workshop on Knowledge Discovery and Data Mining, pp. 205 – 208,
Washington, DC, USA. DOI: 10.1109/WKDD.2008.54
Zhang, D., Jiang, Q., & Li, X. (2007). Application of Neural Networks in Financial
Data Mining. International Journal of Computer, Electrical, Automation, and
Information Engineering, Vol 1, No 1, pp. 225-228. DOI:
10.5281/zenodo.1333234.
Zhu, X., Wang, H., Xu, L. & Li, H. (2008). Predicting Stock Index Increments by
Neural Networks: The Role of Trading Volume under Different Horizons.
Expert Systems Applications, Vol 34, No 4, pp. 3043–3054, May 2008. DOI:
10.1016/j.eswa.2007.06.023.
128