0% found this document useful (0 votes)
15 views129 pages

Masters Dissertation Sidra Mehtab

The dissertation titled 'Robust Stock Price Prediction Using Machine Learning and Deep Learning Models' explores the development of a predictive framework for stock prices using various statistical, machine learning, and deep learning techniques. The study utilizes granular stock price data from a well-known company in India and presents a combination of models, including classification and regression approaches, to enhance prediction accuracy. The findings indicate that an agglomerative model-building approach can effectively capture the volatile patterns in stock price movements, leading to robust short-term forecasting capabilities.

Uploaded by

Hoàng Liêm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views129 pages

Masters Dissertation Sidra Mehtab

The dissertation titled 'Robust Stock Price Prediction Using Machine Learning and Deep Learning Models' explores the development of a predictive framework for stock prices using various statistical, machine learning, and deep learning techniques. The study utilizes granular stock price data from a well-known company in India and presents a combination of models, including classification and regression approaches, to enhance prediction accuracy. The findings indicate that an agglomerative model-building approach can effectively capture the volatile patterns in stock price movements, leading to robust short-term forecasting capabilities.

Uploaded by

Hoàng Liêm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 129

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/342882552

Robust Stock Price Prediction Using Machine Learning and Deep Learning
Models

Thesis · July 2020


DOI: 10.13140/RG.2.2.24529.76641

CITATIONS READS
25 954

2 authors:

Sidra Mehtab Jaydip Sen


NSHM Knowledge Campus Praxis Business School
117 PUBLICATIONS 2,723 CITATIONS 553 PUBLICATIONS 9,671 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Sidra Mehtab on 12 July 2020.

The user has requested enhancement of the downloaded file.


MSc (Data Science) Dissertation Series

Robust Stock Price Prediction


Using Machine Learning and
Deep Learning Models
A dissertation submitted in partial fulfillment of the requirements for
the Master of Science in Data Science degree of the Maulana Abul
Kalam Azad University of Technology (MAKAUT)

By

Sidra Mehtab
(Reg. No: 182341810028)

Under the supervision of

Prof. Jaydip Sen


Professor & Head
School of Computing & Analytics

NSHM Knowledge Campus


June 2020
124, B. L. Saha Road,
Kolkata - 700053, INDIA

1
Certificate of Approval

Dated: June 15, 2020

This is to certify that the dissertation (minor) titled “A Time Series Analysis-Based

Stock Price Prediction Using Machine Learning and Deep Learning Models” being

submitted by Sidra Mehtab (Reg No: 182341810028) towards partial fulfillment of

the requirements for the Master of Science in Data Science and Analytics course of

Maulana Abul Kalam Azad University of Technology (MAKAUT), West Bengal,

India, and carried out at the NSHM Knowledge Campus, Kolkata, India, embodies

the work done under my supervision during the period August 2019 – May 2020.

To the best of my knowledge and belief, the results presented in this work have not

been published previously elsewhere.

Jaydip Sen

Professor & Head

School of Computing and Analytics

NSHM Knowledge Campus

Kolkata, INDIA

2
Acknowledgement

It gives me immense pleasure to express my heartfelt gratitude and indebtedness to

my supervisor Prof. Jaydip Sen, Professor & Head, School of Computing and

Analytics, NSHM Knowledge Campus, Kolkata, India. Prof. Sen, despite his

extremely busy academic and administrative schedule, always found time to

provide me with his valuable guidance, advice, and suggestions, whenever I

approached him for help. I also acknowledge with thanks the kind cooperation that

I received from the support staff in the High-Performance Computing Lab in the

School of Computing and Analytics, NSHM Knowledge Campus, Kolkata. The

cooperation and support that I received from the staff in executing some of the

heavy-duty programs in my work have been invaluable. I would also like to thank

all my classmates and friends for their help and constructive criticisms that were

instrumental in improving the quality of the work in this dissertation. Last, but not

least, I would like to thank my family members without whose constant

encouragement, support, and cooperation, this work could never have been

possible.

Sidra Mehtab

(Reg No: 182341810028 / Roll No: 23458118001)


MSc Data Science & Analytics (2018 – 2020)
NSHM Knowledge Campus,
Kolkata, India

3
Abstract

Prediction of future movement of stock prices has always been a challenging task
for the researchers. While the advocates of the efficient market hypothesis (EMH)
believe that it is impossible to design any predictive framework that can accurately
predict the movement of stock prices, there are seminal work in the literature which
clearly demonstrated that the seemingly random movement patterns in the time
series of a stock price can be predicted with a high level of accuracy. The design of
such predictive models requires the choice of appropriate variables, right
transformation methods of the variables, and tuning of the parameters of the
models. In this dissertation, I present a very robust and accurate framework of stock
price prediction that consists of an agglomeration of statistical, machine learning,
and deep learning models. I have used daily stock price data, collected at five
minutes interval of time, of a very well-known company that is listed in the National
Stock Exchange (NSE) of India. The granular data is aggregated into three slots in
a day, and the aggregated data is used for training and building the forecasting
models. We contend that the agglomerative approach of model building that uses a
combination of statistical, machine learning, and deep learning approaches, can
very effectively learn from the volatile and random movement patterns in a stock
price data. This effective learning will lead to the building of very robust training
of the models that can be deployed for short-term forecasting of stock prices, and
prediction of stock movement patterns. We build eight classification and eight
regression models based on statistical and machine learning approaches. In addition
to these models, two deep learning-based regression models using a long-and-short-
term memory (LSTM) network and a convolutional neural network (CNN) have
also been built. Extensive results have been presented on the performance of these
models, and results are critically analyzed. We have also identified some interesting
future scope of work.

4
Table of Contents
Chapter No. Description Page No
List of Figures 7
List of Tables 12
1 Introduction 13
2 Related Work 15
3 Methodology 25
4 Machine Learning Models 31

Classification Models 31
1. Logistic Regression 31
2. K Nearest Neighbor 31
3. Decision Tree 32
4. Bagging 32
5. Boosting 32
6. Random Forest 33
7. Artificial Neural Network 33
8. Support Vector Machine 34

Regression Models 34
1. Multivariate Regression 34
2. Multivariate Adaptive Regression Spline 36
3. Decision Tree 37
4. Bagging 37
5. Boosting 37
6. Random Forest 37
7. Artificial Neural Network 38
8. Support Vector Machines 38
5 Deep Learning Models 39

1. Long- and Short-Term Memory Network 39


2. Convolutional Neural Network 40
6 Performance Results and Analysis 44

Performance Metrics 44
1. Sensitivity 44
2. Specificity 44
3. Positive Predictive Value 44
4. Negative Predictive Value 45
5. Classification Accuracy 45
6. F1 Score 45
5
Results of Machine Learning Classification Models 45
1. Logistic Regression Results 45
2. K-Nearest Neighbor Classification Results 51
3. Decision Tree Classification Results 52
4. Bagging Classification Results 54
5. Boosting Classification Results 56
6. Random Forest Classification Results 58
7. Artificial Neural Network Classification Results 59
8. Support Vector Machine Classification Results 63

Results of Machine Learning Regression Models 64


1. Multivariate Regression Results 64
2. Multivariate Adaptive Regression Spline Results 69
3. Decision Tree Regression Results 74
4. Bagging Regression Results 80
5. Boosting Regression Results 84
6. Random Forest Regression Results 87
7. Artificial Neural Network Regression Results 92
8. Support Vector Machine Regression Results 97
9. Long and Short Memory Network Regression Results 102
10. Convolutional Neural Network Regression Results 107

Summary of Performance Results of the Models 115


7 Conclusion and Future Work 119
References 121

6
List of Figures
Fig No Description of Figure Page No
Logistic Regression -- actual vs predicted probabilities of open_perc 47
1(a)
(Case I)
1(b) Logistic Regression for classification – lift curve (Case I) 48
1(c) Logistic Regression for classification – ROC curve (Case II) 48
2(a) Logistic Regression – actual vs predicted probabilities of open_perc 49
(Case II)
2(b) Logistic Regression for classification – lift curve (Case II) 49
2(c) Logistic Regression for classification – ROC curve (Case II) 49
3(a) Logistic Regression – actual vs predicted probabilities of open_perc 50
(Case III)
3(b) Logistic Regression for classification – lift curve (Case III) 50
3(c) Logistic Regression for classification – ROC curve (Case III) 51
4(a) Decision Tree for classification (Case I) 53
4(b) Decision Tree for classification (Case II) 53
4(c) Decision Tree for classification (Case III) 54
5(a) Bagging for classification – actual vs predicted classes of open_perc 55
(Case I)
5(b) Bagging for classification – actual vs predicted classes of open_perc 55
(Case II)
5(c) Bagging for classification – actual vs predicted classes of open_perc 56
(Case III)
6(a) Boosting for classification – actual vs predicted classes of open_perc 56
(Case I)
6(b) Boosting for classification – actual vs predicted classes of open_perc 58
(Case II)
6(c) Boosting for classification – actual vs predicted classes of open_perc 58
(Case III)
7(a) ANN classification model (Case I) 61
7(b) ANN classification – actual vs predicted classes of open_perc (Case I) 61
8(a) ANN classification model (Case II) 62
8(b) ANN classification – actual vs predicted classes of open_perc (Case 62
II)
9(a) ANN classification model (Case III) 62
9(b) ANN classification – actual vs predicted classes of open_perc (Case 63
III)
10(a) Multivariate Regression- time-varying actual and predicted values of 66
open_perc (Case1)

7
List of Figures (contd..)
Fig No Description of Figure Page No
10(b) Multivariate Regression - relationship between actual and predicted 67
open_perc (Case I)
11(a) Multivariate Regression- time-varying actual and predicted values of 67
open_perc (Case II)
11(b) Multivariate Regression - relationship between actual and predicted 68
open_perc (Case II)
11(c) Multivariate Regression- time-varying residuals (Case II) 68
12(a) Multivariate Regression- time-varying actual and predicted 68
open_perc (Case III)
12(b) Multivariate Regression - relationship between actual and predicted 69
open_perc (Case III)
12(c) Multivariate Regression- time-varying residuals (Case III) 69
13(a) MARS- time-varying actual and predicted values of open_perc (Case 70
I)
13(b) MARS – relationship between actual and predicted values of 71
open_perc (Case I)
13(c) MARS - time-varying residuals (Case I) 71
14(a) MARS- time-varying actual and predicted values of open_perc (Case 72
II)
14(b) MARS – relationship between actual and predicted values of 72
open_perc (Case II)
14(c) MARS - time-varying residuals (Case II) 73
15(a) MARS- time-varying actual and predicted values of open_perc 73
values (Case III)
15(b) MARS – relationship between actual and predicted values of 73
open_perc (Case III)
15(c) MARS - time-varying residuals (Case III) 74
16(a) Decision Tree regression model (Case I) 75
16(b) Decision Tree regression - time-varying actual and predicted 75
open_perc (Case I)
16(c) Decision Tree regression - relationship between actual and predicted 76
open_perc (Case I)
16(d) Decision Tree regression – time-varying residuals (Case I) 76
17(a) Decision Tree regression model (Case II) 77
17(b) Decision Tree regression - time-varying actual and predicted 77
open_perc (Case II)
17(c) Decision Tree regression - relationship between actual and predicted 77
open_perc (Case II)

8
List of Figures (contd..)
Fig No Description of Figure Page No
17(d) Decision Tree regression – time-varying residuals (Case II) 78
18(a) Decision Tree regression model (Case III) 78
18(b) Decision Tree regression - time-varying actual and predicted 79
open_perc (Case III)
18(c) Decision Tree - relationship between actual and predicted open_perc 79
(Case III)
18(d) Decision Tree regression – time-varying residuals (Case III) 79
19(a) Bagging regression - time-varying actual and predicted values of 80
open_perc (Case I)
19(b) Bagging regression - relationship between actual and predicted 80
open_perc (Case I)
19(c) Bagging regression – time-varying residuals (Case I) 81
20(a) Bagging regression - time-varying actual and predicted values of 81
open_perc (Case II)
20(b) Bagging regression - relationship between actual and predicted 82
open_perc (Case II)
20(c) Bagging regression – time-varying residuals (Case II) 82
21(a) Bagging regression - time-varying actual and predicted values of 82
open_perc (Case III)
21(b) Bagging regression - relationship between actual and predicted 83
open_perc (Case III)
21(c) Bagging regression – time-varying residuals (Case III) 83
22(a) Boosting regression - time-varying actual and predicted values of 84
open_perc (Case I)
22(b) Boosting regression - relationship between actual and predicted 84
values of open_perc (Case I)
22(c) Boosting regression – time-varying residuals (Case I) 84
23(a) Boosting regression - time-varying actual and predicted values of 85
open_perc (Case II)
23(b) Boosting regression - relationship between actual and predicted 85
values of open_perc (Case II)
23(c) Boosting regression – time-varying residuals (Case II) 86
24(a) Boosting regression - time-varying actual and predicted values of 86
open_perc (Case III)
24(b) Boosting regression - relationship between actual and predicted 86
values of open_perc (Case III)
24(c) Boosting regression – time-varying residuals (Case III) 87

9
List of Figures (contd..)
Fig No Description of Figure Page No
25(a) Random Forest regression - time-varying actual and predicted values 88
of open_perc (Case I)
25(b) Random Forest - relationship between actual and predicted values of 88
open_perc (Case I)
25(c) Random Forest regression – time-varying residuals (Case I) 89
26(a) Random Forest regression - time-varying actual and predicted values 89
of open_perc (Case II)
26(b) Random Forest - relationship between actual and predicted values of 89
open_perc (Case II)
26(c) Random Forest regression – time-varying residuals (Case II) 90
27(a) Random Forest regression - time-varying actual and predicted values 90
of open_perc (Case III)
27(b) Random Forest - relationship between actual and predicted values of 91
open_perc (Case III)
27(c) Random Forest regression – time-varying residuals (Case III) 91
28(a) ANN regression model (Case I) 92
28(b) ANN regression - time-varying actual and predicted values of 93
open_perc (Case I)
28(c) ANN regression - relationship between actual and predicted values of 93
open_perc (Case I)
28(d) ANN regression – time-varying residuals (Case I) 94
29(a) ANN regression model (Case II) 94
29(b) ANN regression - time-varying actual and predicted values of 95
open_perc (Case II)
29(c) ANN regression - relationship between actual and predicted values of 95
open_perc (Case II)
29(d) ANN regression – time-varying residuals (Case II) 96
30(a) ANN regression model (Case III) 96
30(b) ANN regression - time-varying actual and predicted values of 96
open_perc (Case III)
30(c) ANN regression - relationship between actual and predicted values of 97
open_perc (Case III)
30(d) ANN regression – time-varying residuals (Case III) 97
31(a) SVM regression - time-varying actual and predicted values of 98
open_perc (Case I)
31(b) SVM regression - relationship between actual and predicted values of 99
open_perc (Case I)
31(c) SVM regression – time-varying residuals (Case I) 99

10
List of Figures (contd..)
Fig No Description of Figure Page No
32(a) SVM regression - time-varying actual predicted open_perc (Case II) 100
32(b) SVM regression - relationship between actual and predicted values of 100
open_perc (Case II)
32(c) SVM regression – time-varying residuals (Case II) 101
33(a) SVM regression - time-varying actual and predicted values of 101
open_perc (Case III)
33(b) SVM regression - relationship between actual and predicted values of 101
open_perc (Case III)
33(c) SVM regression – time-varying residuals (Case III) 102
34(a) LSTM regression – stock data representation (Case I) 103
34(b) LSTM model architecture (Case I, Case II and Case III) 103
34(c) LSTM regression – training and validation error (Case I) 104
35(a) LSTM regression – stock data representation (Case II) 105
35(b) LSTM regression – training and validation error (Case II) 105
36(a) LSTM regression – stock data representation (Case III) 106
36(b) LSTM regression – training and testing error (Case III) 106
37 CNN regression – stock data representation 107
38 CNN model architecture – Univariate multistep with one week’s data 108
as input (N = 5)
39 CNN model architecture – Univariate multistep with two week’s data 110
as input (N = 10)
40 CNN model architecture – Multivariate multistep with two week’s 112
data as input (N = 10)
41 CNN model architecture – Multivariate sub-models with two week’s 114
data as input (N = 10)

11
List of Tables
Table No Description of Table Page No
1 Logistic regression classification results 47
2 KNN classification results 51
3 Decision Tree classification results 52
4 Bagging classification results 54
5 Boosting classification results 56
6 Random Forest classification results 58
7 ANN classification results 59
8 SVM classification results 63
9 Multivariate Regression results 66
10 MARS regression results 70
11 Decision Tree regression results 74
12 Bagging regression results 80
13 Boosting regression results 83
14 Random Forest regression results 88
15 ANN regression results 92
16 SVM regression results 98
17 CNN regression results (Case I: Univariate multi-step N=5) 109
18 CNN regression results (Case II: Univariate multi-step N=10) 111
19 CNN regression results (Case III: Multivariate multi-step N=10) 113
20 CNN regression results (Case IV: Multiheaded CNN N=10) 114
21 Summary of the performance of the classification models in Case I 115
22 Summary of the performance of the classification models in Case II 116
23 Summary of the performance of the classification models in Case III 116
24 Summary of the performance of the regression models in Case I 116
25 Summary of the performance of the regression models in Case II 117
26 Summary of the performance of the regression models in Case III 117

12
Chapter 1

Introduction
Prediction of future movement patterns of stock prices has been a widely researched
area in the literature. While there are proponents of the efficient market hypothesis
who believe that it is impossible to predict stock prices, there are also propositions
that demonstrated that if correctly formulated and modeled, prediction of stock
prices can be done with a fairly high level of accuracy. The latter school of thought
focused on the construction of robust statistical, econometric, and machine learning
models based on the careful choice of variables and appropriate functional forms
or models of forecasting. There are propositions in the literature that are based on
time series analysis and decomposition for forecasting future values of stocks. In
this regard, several propositions have been presented in the literature for stock price
forecasting following a time series decomposition approach. (Sen & Datta
Chaudhuri, 2018a; Sen, 2018b; Sen, 2018c; Sen, 2018d; Sen & Data Chaudhuri,
2017a; Sen & Datta Chaudhuri, 2017b; Sen, 2017c; Sen, 2017d; Sen & Datta
Chaudhuri, 2017e; Sen & Datta Chaudhuri, 2016a; Sen & Datta Chaudhuri, 2016b;
Sen & Datta Chaudhuri, 2016c; Sen & Datta Chaudhuri, 2016d; Sen & Datta
Chaudhuri, 2015). There is also an extent of literature that deals with various
technical analysis of stock price movements. Propositions also exist for mining
stock price patterns using various important indicators like Bollinger Bands,
moving average convergence divergence (MACD), relative strength index (RSI),
moving average (MA), stochastic momentum index (SMI), etc. There are also well-
known patterns like head and shoulders pattern, inverse head and shoulders pattern,
triangle, flag, Fibonacci fan, Andrew's Pitchfork, etc., which are exploited by
traders for investing intelligently in the stock market. These approaches provide the
user with visual manifestations of the indicators which help the ordinary investors
to understand which way stock prices are more likely to move in the near future.
In this thesis, we propose a granular approach to forecasting of stock price and the
13
price movement pattern by combining several statistical, machine learning, and
deep learning methods of prediction on technical analysis of stock prices. We
present several approaches for short-term stock price movement forecasting using
various classification and regression techniques and compare their performance in
prediction of stock price movement and stock price values. We believe this
approach will provide several useful information to the investors in the stock market
who are particularly interested in short-term investments for profit. This work is a
modified and extended version of our previous work (Mehtab & Sen, 2019). In the
present work, we have presented a predictive framework that aggregates eight
classification and eight regression models including a long-and short-term memory
(LSTM)-based advanced deep learning model, and four variants of convolutional
neural network (CNN)-based forecasting models.

The objective of our work is to take stock price data at five minutes interval from
the National Stock Exchange (NSE) of India and develop a robust forecasting
framework for the stock price movement. We contend that such a granular approach
can model the inherent dynamics and can be fine-tuned for immediate forecasting
of stock price or stock price movement. Here, we are not addressing the problem of
forecasting of long-term movement of the stock price. Rather, our framework will
be more relevant to a trade-oriented framework.

The rest of the thesis is organized as follows. Chapter 2 presents a comprehensive


review of the literature on stock price movement modelling and prediction. In
Chapter 3, we present a detailed discussion on the methodology that we have
followed in this work. Chapter 4 provides a brief discussion on the working
principles of the classification and the regression models in machine learning that
we have used in this work. In Chapter 5, we provided a summary of two deep
learning-based models – LSTM-based deep learning model for regression and four
variants of CNN-based forecasting models - that we have also used in our predictive
model. Chapter 6 presents a detailed discussion of the performance of machine
learning and deep learning models. A comparative analysis of the performances of
the models is also presented in this Chapter. Finally, Chapter 7 concludes the thesis.
14
Chapter 2

Related Work

The literature attempting to prove or disprove the efficient market hypothesis can
be classified into three strands, according to the choice of variables and techniques
of estimation and forecasting. The first strand consists of studies using simple
regression techniques on cross-sectional data (Basu, 1983; Jaffe et al., 1989;
Rosenberg et al., 1985; Fama & French, 1995; Chui & Wei, 1998). The second
strand of the literature has used time series models and techniques to forecast stock
returns following economic tools like autoregressive integrated moving average
(ARIMA), Granger causality test, autoregressive distributed lag (ARDL) and
quantile regression (QR) to forecast stock prices (Jarrett & Kyper, 2011; Adebiyi
et al., 2014; Mondal et al., 2014; Mishra, 2016). The third strand includes work
using machine learning tools for the prediction of stock returns (Mostafa, 2010;
Dutta et al., 2006; Wu et al., 2008; Siddiqui & Abdullah, 2015; Jaruszewicz &
Mandziuk, 2004).

Among the some of the recent propositions in the literature on stock price
prediction, Mehtab and Sen have demonstrated how machine learning and long-
and short-term memory (LSTM)-based deep learning networks can be used for
accurately forecasting NIFTY 50 stock price movements in the National Stock
Exchange (NSE) of India (Mehtab & Sen, 2019). The authors used the daily stock
prices for three years during the period of January 2015 till December 2017 for
building the predictive models. The forecast accuracies of the models were then
evaluated based on their ability to predict the movement patterns of the close value
of the NIFTY index on a time horizon of one week. For the purpose of testing, the
authors used NIFTY 50 index values for the period of January 2018 till June 2019.
To further improve the predictive power of the models, the authors incorporated a

15
sentiment analysis module for analyzing the public sentiments on Twitter on
NIFTY 50 stocks. The output of the sentiment analysis module is fed into the
predictive model in addition to the past NIFTY 50 index values for the building a
very robust and accurate forecasting model. The sentiment analysis module uses a
self-organizing fuzzy neural network (SOFNN) for handling non-linearity in a
multivariate predictive environment.

Mehtab and Sen recently proposed another approach to stock price and movement
prediction using convolutional neural networks (CNN) on a multivariate time series
(Mehtab & Sen, 2020). The predictive model proposed by the authors exploits the
learning ability of a CNN with a walk-forward validation ability so as to realize a
high level of accuracy in forecasting the future NIFTY index values, and their
movement patterns. Three different architectures of CNN are proposed by the
authors that differ in the number of variables used in forecasting, the number of
sub-models used in the overall system, and the size of the input data for training the
models. The experimental results clearly indicated that the CNN-based multivariate
forecasting model was highly accurate in predicting the movement of NIFTY index
values with a weekly forecast horizon.

The design of efficient predictive models and algorithms for accurately forecasting
the movement patterns of stock prices and stock returns has attracted considerable
attention and effort from the research community over a significantly long period.
Many of such propositions involve the application of various types of neural
networks. The neural networks have the ability of modeling nonlinearity in data
and this property is proven to be extremely effective in mining the complex patterns
in stock price movements. Moreover, the ability of modeling nonlinearity can be
controlled adaptively by choosing a suitable number of hidden layers and the
number of nodes in such hidden layers (Hornik et al., 1989).

Mostafa showed how accurately neural network-based models could predict stock
market movements in Kuwait (Mostafa, 2010).

16
Kimoto et al. illustrated how neural network-based predictive models could be
applied to historical accounting data (Kimoto et al., 1990). In the model
construction process, the authors utilized various macroeconomic variables, and
then applied the model for forecasting the patterns of variations in stock return
movements.

Zhang et al. proposed the application of a multilayer backpropagation (BP) neural


network in financial data mining (Zhang et al., 2004). The proposed scheme was a
modified neural network-based forecasting model that carries out intelligent mining
tasks. The system was capable of making robust forecasting on the buying and
selling signs according to the prediction of future trends in the stock market. The
simulation results on seven years of data of the Shanghai composite index indicated
that the return achieved by the system is about three times that achieved by the buy-
and-hold strategy.

Basalto et al. proposed an approach based on a pair-wise clustering to analyze the


Dow Jones Index companies in order to identify similar temporal behavior of the
traded stock prices (Basalto et al., 2005). The main goal of the authors was to
investigate and understand the dynamics that govern companies’ stock prices. The
proposed scheme deployed a pairwise version of the chaotic map algorithm that
executed based on correlation coefficients between the financial time series to find
similarity measures for clustering the temporal patterns. The resultant dynamics of
such systems formed the clusters of companies that belong to different industrial
branches. These clusters of companies can be gainfully exploited to optimize
portfolio construction.

Chen et al. have proposed an approach for constructing a model for predicting the
direction of return on the Taiwan Stock Exchange Index (Chen et al., 2003). The
authors contended that the stock trading guided by robust forecasting models were
more effective and usually led to a higher return on investment. For the purpose of
constructing a robust forecasting model, the authors built and trained a probabilistic
neural network (PNN) using historical stock market data. The forecasted output of
17
the model was applied to form various index trading strategies, and the
effectiveness of those strategies was compared with those generated by the buy and
hold strategy, the investment strategies formed using the output of a random walk
model, and the parametric generalized method of moments (GMM) with a Kalman
filter. The results showed that the investment strategies made using the output of
the PPN yielded the highest return of investment in the long-run.

de Faria et al. illustrated a predictive model using a neural network and an adaptive
exponential smoothing (AES) method for forecasting the movements of the
principal index of the Brazilian stock market (de Faria et al., 2009). The authors
compared the forecasting performance of both the neural network and the
exponential smoothing models with a particular focus on the sign of the market
returns. While the simulation results showed that both methods were equally
efficient in predicting the index returns, the neural network model was found to be
more accurate in predicting the market movement than the adaptive exponential
smoothing method.

Leigh et al. proposed the use of linear regression and simple neural network models
for forecasting the stock market indices in the New York Stock Exchange during
the period 1981-1999 (Leigh et al., 2005). The proposed scheme by the authors
used a template matching mechanism based on statistical pattern recognition that
efficiently and accurately identified spikes in the trading volumes. A threshold limit
for the spike in volume was identified, and the days on which the traded volume
exhibited significant spikes were identified. A linear regression model was applied
to forecast the future change in price based on the historical price, traded volume,
and the prime interest rate.

Shen et al. proposed a novel scheme that was based on a tapped delay neural
network (TDNN) with an ability of adaptive learning and pruning for forecasting
on a non-linear time series of stock price values (Shen et al., 2007). The TDDN
model was trained by a recursive least square (RLS) technique that involved a
tunable learning-rate parameter that enables faster network convergence. The
18
trained neural network model was optimized using a pruning algorithm that
reduced the possibility of overfitting of the model. The experimental results in a
simulated environment clearly showed that the pruned model had a reduced
complexity, faster execution, and improved prediction accuracy.

Ning et al. proposed a scheme of stock index prediction that was based on a chaotic
neural network (Ning et al., 2009). Data from a Chinese stock market and a
Shenzhen stock market were used for building the model. The non-linear,
stochastic, and chaotic patterns in the stock market indices were learned by the
chaotic neural network, and the learnings of the chaotic neural network were
gainfully applied in forecasting future index values of the stock markets.

Hanias et al. conducted a study to predict the daily stock exchange price index of
the Athens Stock Exchange (ASE) using a neural network with backpropagation
(Hanias et al., 2012). The neural network was used to make multistep forecasting
for nine days and yielded a very low mean square error (MSE) value of 0.0024.

Wu et al. proposed an ensemble model of prediction using support vector machines


(SVM) and artificial neural networks (ANN) for predicting stock prices (Wu et al.,
2008). The forecasting performance of the ensemble model was compared with
those of the SVM model and the ANN model. It was observed by the authors that
the ensemble approach produced more accurate results than the other two models.

Liao et al. carried out a study on the stock market investment issues on the Taiwan
stock market (Liao et al., 2008). The scheme involved two phases. In the first phase,
the apriori algorithm was used to identify the association rules and knowledge
patterns about stock category association and possible stock category investment
collections. After the association rules were successfully mined, in the second
phase, the k-means clustering algorithm was used to identify the various clusters of
stocks based on their association patterns. The authors also proposed several
possible stock market portfolio alternatives under various clusters of stocks.

19
Zhu et al. hypothesized that there is a significant bidirectional nonlinear causality
between stock returns and trading volumes (Zhu et al., 2008). The authors proposed
the use of a neural network-based scheme for forecasting stock index movements.
The model was further enriched by the inclusion of different combinations of
indices and component stocks’ trading volumes as inputs. NASDAQ, DJIA, and
STI data of stock prices and volume of transactions were used in training the neural
network. The experimental results demonstrated that the augmented neural
networks with trading volumes lead to improvements in forecasting performance
under different terms of the forecasting horizon.

Bentes et al. presented a study on the long memory and volatility clustering for the
S&P 500, NASDAQ 100, and Stoxx 50 indexes in order to compare the US and
European markets (Bentes et al., 2008). The authors compared the performance of
two different approaches. The first approach was based on the traditional
approaches using generalized autoregressive conditional heteroscedasticity
GARCH (1, 1), IGARCH (1, 1), and FIGARCH (1, d, 1), while the second approach
exploited the concept of entropy in the Econophysics. In the second approach, three
different measures were considered by the authors in the study. The three measures
were Shannon, Renyi, and Tsallis measures. The results obtained using both the
approaches elicited the existence of nonlinearity and volatility of SP 500,
NASDAQ 100, and Stoxx 50 indexes.

Chen et al. demonstrated how the random and chaotic behavior of stock price
movements can be very effectively modeled using a local linear wavelet neural
network (LLWNN) technique (Chen et al., 2005). The proposed wavelet-based
model was further optimized using a novel algorithm, which the authors referred to
as estimation of distribution algorithm (EDA). The purpose of the model was to
accurately predict the share price for the following trade day given the opening,
closing, and maximum values of the stock price for a particular day. The study
revealed an interesting observation - even for a time series that exhibited an
extremely high level of random fluctuations in its values, the model could extract
some very important features from the opening, closing and the maximum values
20
of the stock index that enabled an accurate prediction of its future behavior.

Hutchinson et al. proposed a non-parametric method for estimating the pricing


formula of a derivative that applied the network principles of learning (Hutchinson
et al, 1994). The variables that were used as the input to the model were: the present
fundamental asset price, the strike price, the time to maturity, etc. These variables
had a direct influence on the derivative price. The learning network mapped the
input values to its output values. For training the model, the authors used a dataset
consisting of the daily closing prices of S&P 500 futures and the options prices for
the 5-year period from January 1987 to December 1991. For the purpose of
understanding the efficacy and the efficiency of various models, the authors
compared the performance of four models: (i) ordinary least squares (OLS), (ii)
radial basis function (RBF) networks, (iii) multilayer feed-forward neural
networks, and (iv) the projection pursuit (PP). The simulation results showed that
among the four models, the non-parametric model proposed by the authors yielded
the most accurate forecasts on the derivative prices.

Dutta et al. illustrated how ANN models could be applied in forecasting Bombay
Stock Exchange’s SENSEX weekly closing values for the period of January 2002
to December 2003 (Dutta et al., 2006). The proposed approach by the author
involved building two neural networks each consisting of three hidden layers, in
addition to the input and the output layers. The input values to the first neural
network were: (i) the weekly closing values, (ii) the 52-week moving average of
the weekly closing SENSEX values, (iii) the 5-week moving average of the closing
values, and (iv) the 10-week oscillator values for the past 200 weeks. On the other
hand, the second network was provided with the following input values: (i) weekly
closing value of SENSEX, (ii) the moving average of the weekly closing values
computed on the 52-week historical data, (iii) the moving average of the closing
values computed on the 5-week historical data, and (iv) the volatility of the
SENSEX records computed on 5-week basis over the past 200 weeks. The
forecasting performance of the two neural networks was compared using their root
mean square error (RMSE) and mean absolute error (MSE) values on the test data.
21
For the purpose of testing the networks, the weekly closing SENSEX values for the
period of January 2002 to December 2003 were used.

Hammad et al. demonstrated that an artificial neural network (ANN) model can be
trained to converge to an optimal solution while it maintains a very high level of
precision in the forecasting of stock prices (Hammad et al, 2009). The proposed
scheme was based on a multi-layer feedforward neural network model that used the
back-propagation algorithm. The model was used for forecasting the Jordanian
stock prices. The authors demonstrated simulations using MATLAB that were
carried on seven Jordanian companies from the service and manufacturing sectors.
The accuracy of the model in forecasting stock price movement was found to be
very high.

Tsai and Wang found conducted a study to illustrate how Bayesian Network-based
approaches could produce better forecasting results than traditional regression and
neural network-based approaches (Tsai & Wang, 2009). The authors proposed a
hybrid predictive model for stock price forecasting that combined a neural network-
based model with a decision-tree. The experimental results demonstrated that the
hybrid model had higher predictive power than the single ANN and the single
decision tree-based approach.

Tseng et al. utilized various approaches including the traditional time series
decomposition (TSD) model, HoltWinters (H/W) exponential smoothing with trend
and seasonality models, Box-Jenkins (B/J) models using autocorrelation and partial
autocorrelation, and neural network-based models (Tseng et al, 2012). The authors
trained the models on the stock price data of 50 randomly chosen stocks during the
period: September 1, 1998 - December 31, 2010. For the purpose of training the
models, 3105 observations based on closed prices of the stocks were used. The
testing of the model was carried out on data spanning over 60 trading days. The
study showed that the forecasting accuracies were higher for B/J, H/W, and
normalized neural network models. The errors associated with the time series
decomposition-based model and the non-normalized neural network models were
22
found to be higher.

Senol and Ozturan illustrated that ANN can be used to predict stock prices and their
direction of changes (Senol & Ozturan, 2008). The result was promising with a
forecast accuracy of 81% on the average.

In the literature, a substantial number of contributions exist that are based on the
application of time series and fuzzy time series approaches for forecasting stock
price movements. Thenmozhi investigated the applicability of chaos theory in
modeling the nonlinear behavior of the Bombay Stock Exchange (BSE) time series
(Thenmozhi, 2006). The author used the return values of the BSE SENSEX time-
series data during the period August 1980 to September 1997, and showed that the
time series of the daily and the weekly return values exhibited nonlinearity and
weakly chaotic properties.

Fu et al. presented an approach that represented the data points in a financial time
series according to their importance (Fu et al., 2007). Using the ranked data points
based on their importance, a tree was constructed that enabled incremental updating
of data in the time series. The scheme facilitated representation of a large-sized time
series in different levels of details, and also enabled multi-resolution dimensionality
reduction. The authors have presented several evaluation methods of data point
importance, a novel method of updating a time series, and two-dimensionality
reduction approaches. Extensive experimental results are also presented
demonstrating the effectiveness of all propositions.

Phua et al. presented a predictive model using neural networks with genetic
algorithms for forecasting stock price movements in the Singapore Stock Exchange
(Phua et al., 2001). The forecasting accuracy of the predictive model was found to
be 81% on the test dataset indicating that the model was moderately effective in its
forecasting job.

Moshiri and Cameron described a back propagation-based neural network and a set
23
of econometric models to forecast inflation levels (Moshiri, & Cameron, 2010). The
set of econometric models proposed by the authors included the following: (i) Box-
Jenkins autoregressive integrated moving average (ARIMA) model, (ii) vector
autoregression (VAR) model, and (iii) Bayesian vector autoregression (BVAR)
model. The forecasting accuracies of the three models were compared with the
hybrid back propagation network (BPN) model proposed by the authors. For the
purpose of testing the models, three different values of the forecasting horizon were
used: one month, two months, and twelve months. With the root mean square error
(RMSE) and the mean absolute error (MAE) as the two metrics, the authors
observed that the performance of the hybrid BPN was superior to the other
econometric models.

The major drawback of the existing propositions in literature for stock price
prediction is their inability to predict stock price movement in a short-term interval.
The current work attempts to address this shortcoming by exploiting the learning
ability of a gamut of machine learning and two deep neural networks in stock price
movement modeling and prediction.

24
Chapter 3

Methodology

In Chapter 1, we mentioned that the goal of this work is to develop a robust


forecasting framework for the short-term price movement of stocks. We use the
Metastock tool for collecting data on the short-term price movement of stocks
(Metastock). Particularly, we collected the stock data for the company – Godrej
Consumer Products Ltd. The data is collected at every 5 minutes interval in a day,
for all the days in which the National Stock Exchange (NSE) was operational during
the years 2013 and 2014. The raw data for each stock consisted of the following
variables: (i) date, (ii) time, (iii) open value of the stock, (iv) high value of the stock,
(v) low value of the stock, (vi) close value of the stock, and (vii) volume of the stock
traded in a given interval. The variable time refers to the time instance at which the
stock values are noted as each record is collected at 5 minutes interval of time.
Hence, the time interval between two successive records in the raw data was 5
minutes. The raw data in this format is collected for the stock Godrej Consumer
Products. for two years. In addition to the seven variables in the raw data that we
have mentioned above, we also collected the NIFTY index at 5 minutes interval for
the same period of two years, in order to capture the overall market sentiment at
each time instant, so that more accurate and robust forecasting can be made using
the combined information of historical stock prices and the market sentiment index.
Therefore, the raw data for both the stocks now consists of seven variables. As 5
minutes interval is too granular, we make some aggregation of the raw data. We
break the total time interval in a day into three slots as follows: (1) morning slot
that covers the time interval 9:00 AM till 11:30 AM, (2) afternoon slot that covers
the time interval 11:35 AM till 1:30 PM and (3) evening slot that covers the time
interval 1:35 PM till the time of closure of NSE in a given day. Hence, the daily
stock information now consists of three records, each record containing stock price
information for a time slot.
25
Using the eight variables in the raw data, and incorporating the aggregation of data
using the time slots, we create eleven derived variables and compute their values.
These derived variables are used as the input variables for building the predictive
models for forecasting the stock price and the stock movement. We followed two
approaches to stock price forecasting - regression and classification. The difference
in these two approaches lied in the way the response variable open_perc was used
in the model building process. This point will be described in detail later in this
Chapter.

Following are the eleven derived variables that were computed:

month: This is a numeric variable that refers to the month for a given stock price
record. The twelve months are assigned numeric codes of 1 through 12, with the
month of January being coded as 1, and the month of December assigned with a
code of 12.

day_month: This numeric variable denotes the particular day of a given month to
which a stock price record corresponds. The value of this variable lies in the interval
[1, 31]. For instance, if the date for a stock price record is 22nd May 2013 then the
day_month variable for that record will be assigned a value of 22.

day_week: This is a numeric variable that corresponds to the day of the week for a
given stock price record. The five days in a week on which the stock market remain
open are assigned numeric codes of 1 through 5, with Monday being coded as 1,
while the Friday is assigned a code of 5.

time: This numeric variable refers to the time slot to which a stock price record
belongs. There are three-time slots in a day - morning, afternoon, and evening. The
slots are assigned codes 1, 2, and 3 respectively. For example, if a stock price record
refers to the time point 3:45 PM, the variable time will be assigned a value of 3 for
the stock price record.

26
open_perc: it is a numeric variable that is computed as a percentage change in the
value of the open price of the stock over two successive time slots. The computation
of the variable is done as follows. Suppose, we have two successive slots: S1 and
S2. Both of them consist of several records at five minutes interval of time. Let the
open price of the stock for the first record of S1 is X1 and that for S2 is X2. The
open_perc for the slot S2 is computed as (X2 - X1)/X2 in terms of percentage.

high_perc: it is a numeric value that is computed as the difference between the


high values of two successive slots. The computation is identical to that of
open_perc except for the fact that high values are used in this case instead of the
open values.

low_perc: it is a numeric value that is computed as the difference between the low
values of two successive slots. For two successive slots S1 and S2, first we compute
the mean of all low values of the records in both the slots. If L1 and L2 refer to the
mean of the low values for S1 and S2 respectively, then low_perc for S2 is computed
as (L2 - L1)/L2 in terms of percentage.

close_perc: it is a numeric value that is computed as the difference between the


close values of two successive slots. Its computation is similar to the open_perc
variable, except for the fact that we use the close values in the slots instead of the
open values.

vol_perc: it is a numeric value that is computed as the difference between the


volume values of two successive slots. For two successive slots S1 and S2, we
compute the mean values of volume for both the slots, say V1 and V2 respectively.
Now, the vol_perc for S2 is computed as (V2 - V1)/V2 in terms of percentage.

nifty_perc: it is a numeric variable that is computed as a percentage change in the


NIFTY index over two successive time slots. The computation of the variable is
done as follows. We compute the means of the NIFTY index values for two
successive time slots S1 and S2. Let us assume the means are M1 and M2
27
respectively. Then the nifty_perc for the slot S2 is computed as (M2 - M1)/M2 in
terms of percentage.

range_diff: The value of this numeric variable is obtained by computing the


difference in the range values of two consecutive time slots. The range value for a
given slot is the difference between its high and the low values. If S1 and S2, denote
two consecutive slots, and if H1, H2, L1 and L2 respectively represent the high and
the low values of the slots S1 and S2 respectively, then the range value for S1 is R1 =
(H1 - L1) and for S2 is R2 = (H2 - L2). The range_diff for the slot S2 is computed as
(R2 - R1).

After we compute the values of the above eleven variables for each slot for both the
stocks for the time frame of two years (i.e., 2013 and 2014), we develop the
forecasting framework. As mentioned earlier, we followed two broad approaches
in the forecasting of the stock movements - regression and classification.

In the regression approach, based on the historical movement of the stock prices we
predict the stock price in the next slot. We use open_perc as the response variable,
which is a continuous numeric variable. The objective of the regression technique
is to predict the open_perc value of the next slot given the stock movement pattern
and the values of the predictors till the previous slot. In other words, if the current
time slot is S1, the regression techniques will attempt to predict open_perc for the
next slot S2. If the predicted open_perc is positive, then it will indicate that there is
an expected rise in the stock price in S2, while a negative open_perc will indicate a
fall in the stock price in the next slot. Based on the predicted values, a potential
investor can make his/her investment strategy in stocks.

In the classification approach, the response variable open_perc is a discrete variable


belonging to one of two classes – “0” or “1”. For developing the classification-
based forecasting approaches, we converted open_perc into a categorical variable
that takes up one of the two values “0” and “1”. The value “0” indicating a negative
open_perc values, and “1” indicating a positive open_perc values. Hence, if the
28
current slot is S1 and if the forecast model expects a rise in the open_perc value in
the next slot S2, then the open_perc value for S2 will be “1”. An expected negative
value of the open_perc in the next slot will be indicated by a “0” value for the
response variable.

For both classification and regression approaches, we experimented with three


cases which are described below.

Case I: We used the data for the year 2013 which consisted of 19,385 records at
five minutes interval. These records were aggregated into 745 time slot records for
building the predictive model. We used the same dataset for testing the forecast
accuracy of the models for the stock of Godrej Consumer Products Ltd. and carried
out a comparative analysis of all the models.

Case II: We used the data for the year 2014 which consisted of 18,972 records at
five minutes interval. These granular data were aggregated into 725 time slot record
for building the predictive model. We used the same dataset for testing the forecast
accuracy of the models and carried out an analysis on the performance of the
predictive models.

Case III: We used that data for 2013 as the training dataset for building the models
and test the models using the data for the year 2014 as the test dataset. We, again,
carried out an analysis of the performance of different models in this approach.

We have built eight classification models and ten regression models for developing
our forecasting framework. The classification models are: (i) logistic regression,
(ii) k-nearest neighbor (iii) decision tree, (iv) bagging, (v) boosting, (vi) random
forest, (vii) artificial neural network, and (viii) support vector machines. For
measuring accuracy and effectiveness in these approaches, we use several metrics
such as: sensitivity, specificity, positive predictive value, negative predictive value,
classification accuracy, and F1 score. Sensitivity and positive predictive value are
also known as recall and precision respectively.
29
The ten regression methods that we built are: (i) multivariate regression, (ii)
multivariate adaptive regression spline, (iii) decision tree, (iv) bagging, (v)
boosting, (vi) random forest, (vii) artificial neural network, (viii) support vector
machine, (ix) long- and short-term memory network, (x) convolutional neural
network.

While all the classification techniques are machine learning-based approaches, two
regression techniques, i.e., long- and short-term memory (LSTM) network, and
convolutional neural network (CNN) – based approaches are deep learning
methods. For comparing the performance of the regression methods, we use several
metrics such as root mean square error (RMSE), correlation coefficient between
the actual and predicted values of the response variable, e.g., open_perc, and the
number of cases in which the predicted and the actual values of open_perc differed
in their signs.

30
Chapter 4

Machine Learning Models

The eight classification models that we built are discussed in detail in this Chapter.

Logistic Regression: This being a classification technique, we transformed the


response variable open_perc to a discrete domain from a continuous domain. In
other words, we transformed the response variable into a categorical variable that
can assume values “0” or “1”. We converted all negative or zero values of
open_perc to the class “0”, and all non-zero positive values to class “1”. We used
the function glm in R for building the logistic regression model with three
parameters being passed in the function: (i) the first parameter is the formula which
is open_perc ~. to include open_perc as the response variable and all the remaining
variables as the predictors, (ii) the second parameter is family = binomial indicating
that model is a binary logistic regression that involves two classes, and (iii) the third
parameter is the R data object containing the training data set. We used the predict
function in R to compute the probability of the test records belonging to the two
classes. We assumed a threshold value of 0.5 as the probability. In other words,
when the probability of a record belonging to a class exceeds 0.5, we assume that
the record belongs to that class.

K-Nearest Neighbor: The K- nearest neighbor (KNN) is an example of instance-


based learning. Based on the training, the classification for a new unclassified
record may be found simply by comparing it to the most similar records in the
training set. The value of k determines how many closest similar records in the
training data set is considered for classifying a test data set record. We have used
the R function knn defined in the library class to carryout KNN classification in the
31
stock price data. The data is normalized using min-max normalization before
applying the knn function so that all predictors are scaled down into the same range
of values. Different values of k were tried out for building the models and the value
of k = 3 was finally chosen. This value of k was found to produce the best
performance of the models with the minimum probability of model overfitting.

Decision Tree: The classification and regression tree (CART) algorithm produces
decision trees that are strictly binary so that there are exactly two branches for each
node. The algorithm recursively partitions the records in the training data set into
subsets of records with similar values for the target attributes. The trees are
constructed by carrying out an exhaustive search on each node for all available
variables and all possible splitting values and selects the optimal split based on
some goodness of split criteria. We used the tree function defined in the tree library
of R for classification of the stock records.

Bagging: Bootstrap Aggregation (Bagging) is an ensemble technique. It works as


follows: Given a set D, of d tuples, for iteration i, a training set, Di of d tuples is
sampled with replacement form the original set of tuples D. Each training set
represents a bootstrap sample. Since the samples are simple random samples with
replacement, it is possible that some records (i.e., tuples) in D may not get a chance
to be included in Di, while some tuples may get included in more than one samples.
A classifier model Mi is trained on the information contained in each training set,
Di. For classifying an unknown tuple X in the out-of-sample set (i.e., in the test
dataset), each classifier, Mi is asked to return its class predictions. The classification
result of each of the trained classifier is considered as one vote. The bagging
classifier counts the votes and finally assigns the class with the maximum number
of votes to the tuple X. For carrying out classification on stock price data, we used
bagging function defined in the ipred library of R. The value of the parameter nbag
- that specifies the number of samples - was taken as 25.

Boosting: Unlike bagging, boosting assigns weights to each tuple in a training


dataset. Based on the training dataset, k classification models are built iteratively.
32
However, all the classifiers are not given equal importance in the final classification
decision. Unlike bagging which uses simple majority voting among the classifiers,
boosting uses a weighted majority voting mechanism. After a classifier Mi is
constructed, the weights assigned to the classifiers are updated before building the
subsequent classifier Mi+1. After the completion of the current iteration, the
classifiers that could correctly classify the tuples which were misclassified in the
previous round are assigned higher weights before the next iteration of classifier
construction starts. After the completion of the final round, the boosted classifier
model combines the weighted votes of each classifier, where the weights are
computed based on some functions of the classification accuracies of the results
reported by the individual classifier. Adaptive Boosting (AdaBoost) is a very
popular variant of Boosting for classification purpose that we have used. We used
the boosting function of the adabag library in R for the classification of stock price
data.

Random Forest: Random forest is an ensemble machine learning approach. The


algorithm first builds a large number of decision tree classifiers separately so that
the collection of the classifiers is a forest. The individual decision tree classifier
models are built based on a random selection of attributes at each node. The
splitting at each node is done by randomly selecting the feature and the feature
value for splitting to introduce as much randomness as possible. In other words,
each decision tree depends on the values of a random vector sampled
independently, and with the same distribution for all trees in the forest. The
objective of introducing so much randomness in building the decision tree models
is to avoid overfitting of the models during the training phase. During the
classification phase, each tree votes, and the most popular class is returned. We
have used the randomForest function defined in the randomForest library in R for
classification purposes of the stock price data.

Artificial Neural Network: An artificial neural network (ANN) is a connectionist


network that consists of nodes and their interconnecting links where the nodes are
arranged in several layers - an input layer, one or more hidden layers, and an output
33
layer. The nodes in the input layer correspond to the predictor variables (i.e.,
attributes) in the training dataset. The inputs are fed simultaneously into the units
making up the input layer. The input values pass through the respective nodes in
the input layer and are then weighted using the weights associated with the links
connecting the nodes and fed simultaneously to the second layer of nodes, known
as the hidden layer nodes. The outputs of the nodes in the first hidden layer are
weighted again using the corresponding link weights, and the resultant values are
provided as the inputs to a possible second hidden layer and so on. The weighted
outputs of the last hidden layer are input to units making up the output layer, which
produces the network's prediction for given tuples. We used the neuralnet function
defined in the neuralnet library in R for classifying the stock price data. The raw
data is normalized using the min-max normalization approach. Only the predictors
are normalized, the response variable: open_perc is kept unchanged. The parameter
hidden of the function neuralnet is changed to realize the different number of
hidden layers in the network. The parameter stepmax is set to the maximum value
of 106 so that the maximum number of iteration capability of the neuralnet function
can be utilized. In order to carry out classification exercise, the parameter
linear.output if set to FALSE in the neuralnet function.

Support Vector Machine: A support vector machine (SVM) is a machine learning


model for both classification and regression. When applied for classification, it can
classify both linear and nonlinear data. It uses a nonlinear mapping to transform the
original training data into a higher dimension. Within this new higher dimension, it
searches for the linear optimal hyperplane that separates the two classes. SVM
finds this hyperplane using support vectors which are the essential and the
discriminating training tuples to separate the two classes. We have used the ksvm
function defined in the kernlab library in R for carrying out the classification of the
stock price data. The function ksvm has an optional parameter called kernel which
is set to vanilladot in our implementation.

We now briefly discuss the regression models.

34
Multivariate Regression: In this regression approach, we used open_perc as the
response variable and the remaining ten variables as the predictors to build
predictive models for three cases mentioned earlier in Chapter 3. In all these cases,
we use the programming language R for data management, model construction,
testing of models, and visualization of results.

Case I: We use 2013 data as the training data set for building the model, and then
test the model using the same data set. For both the stocks, we used two approaches
of multivariate regression - (i) backward deletion and (ii) forward addition of
variables. Both approaches yielded the same results for the stock price data.

For the year 2013, we applied the vif function in the faraway library to detect the
collinear variables in order to get rid of the multicollinearity problem. The variance
inflation factor (VIF) values of the variables were found to be as follows: month =
1.003, day_month = 1.008, day_week = 1.002, time = 1.095, high_perc = 4372. 547,
low_perc = 4369.694, close_perc = 165.436, vol_perc = 1.072, nifty_perc = 1.046,
range_diff = 156.198. Hence, it was clear that high_perc, low_perc, close_perc,
and range_diff exhibited multicollinearity. We retained low_perc and range_diff
for the model construction and removed the other two variables since their VIF
values were smaller than the other two. Using the drop1 function in case of the
backward deletion technique, and the add1 function in case of the forward addition
technique, we identified the variables that were not significant in the model and did
not contribute to the information content of the model. For identifying the variables
that contributed least to the information contained in the model at each iteration,
we used the Akaike Information Criteria (AIC) - the variable that had the least AIC
value and non-significant p-value at each iteration, was removed from the model,
in case of the backward deletion process. On the other hand, the variable that had
the lowest AIC and a significant p-value was added to the model at each iteration
for the forward addition technique. It was found that low_perc and range_diff were
the two predictors that finally remained in the regression model.

Case II: For the year 2014, the VIF values for the predictors were found to be as
35
follows: month = 1.007, day_month = 1.004, day_week = 1.007, time = 1.057,
high_perc = 1161.446, low_perc = 1331.035, close_perc = 115.161, vol_perc =
1.022, range_diff = 92.092, nifty_perc = 1.073. The variables high_perc, low_perc,
close_perc, and range_diff exhibited multicollinearity. As in Case I, we retained
low_perc and range_diff as their VIF values were smaller compared with the other
two. Use of backward deletion and forward addition methods both yielded the same
regression models as in Case I with low_perc and range_diff as the predictors and
open_perc as the response variable.

Case III: In this case, the model is identical to that in Case I. However, the model
is tested on data for the year 2014. There, the performance results of the model are
expected to be different. The performance results and their critical analysis is
presented in Chapter 6.

Multivariate Adaptive Regression Spline: Multivariate Adaptive Regression


Spline (MARS) is a statistical machine learning approach for building robust
regression models. MARS works by splitting input variables into multiple basis
functions and then fitting a linear regression model to those basis functions. The
basis functions used by MARS are designed in pairs: 𝑓𝑓(𝑥𝑥) = {𝑥𝑥 − 𝑡𝑡, 𝑖𝑖𝑖𝑖 𝑥𝑥 >
𝑡𝑡, 0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒} and 𝑔𝑔(𝑥𝑥) = {𝑡𝑡 − 𝑥𝑥, 𝑖𝑖𝑖𝑖 𝑥𝑥 < 𝑡𝑡, 0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒}. The main
characteristic property of the basis functions is that these functions are piecewise
linear functions. The value 𝑡𝑡 at which the two functions meet is called a knot. The
working principles of MARS are very similar to that of CART. Like CART, MARS
first builds a complex model involving a large number of basis functions, which are
separated from each other by a large number of knots. This phase of the algorithm
execution is called the forward pass of the model building. In the subsequent phase,
known as the backward pass, the algorithm prunes back unimportant terms (i.e.,
basis functions), which could not contribute significantly to the generalized R2
values of the model. This phase essentially enables MARS to avoid a possible
overfitting model during the training phase. During the execution of the backward
phase, the algorithm computes the generalized cross-validation (GCV) values to
determine how well the model fits into the data while avoiding any possible
36
overfitting. Finally, the algorithm returns the model with the best cost/benefit ratio.
To fit a model using MARS in R, we use the function earth in the library earth.

Decision Tree: For building a regression model, we have used the same tree
function in the tree library in R as we did in building the classification decision
tree-based classification model. However, in this case, the response variable was
kept as numeric and not converted to a factor variable unlike in the classification
techniques. The predict function is used to predict the values of the response
variable. The functions cor and rmse defined in the library Metrics are used to
compute the correlation coefficient and the RMSE value for determining the
prediction accuracy of the models.

Bagging: For carrying out regression on stock price data, we use bagging function
defined in the ipred library of R. The value of the parameter nbag - that specifies
the number of samples - is taken as 100. We use the predict function in the ipred
library to predict the response variable values and rmse function in the Metric
library to compute the RMSE values of the predicted values. The cor function in R
is used to compute the correlation between the original and the predicted values of
the response variable.

Boosting: We use the blackboost function defined in the mboost library in R for
building regression models on the stock price data unlike the boosting function of
the adabag library in R for classification of stock price data. As in other cases of
regression, the predict and rmse functions are used to compute the predicted values
and the RMSE values in the regression model.

Random Forest: We use the randomForest function defined in the randomForest


library in R for regression purposes. The response variable open_perc is kept as a
numeric variable and not converted to a factor variable as it was done in case of
random forest classification. The same predict and rmse functions are used as in
other regression methods.

37
Artificial Neural Network: As in the case of classification, we use the neuralnet
function defined in the neuralnet library in R for regression on the stock price data.
The predictors are normalized using min-max normalization before building the
model. The compute function defined in the neuralnet library is used for computing
the predicted values, while the parameter hidden is used to change the number of
nodes in the hidden layer. The value of the parameter stepmax is set to 106 so as to
exploit the maximum number of iterations executed by the neuralnet function. The
parameter linear.output is by default set to TRUE, and hence it is not altered. For
the Godrej dataset, we needed only one node in the hidden layer for all the three
cases for building ANN regression models.

Support Vector Machine: For building the regression model using SVM, we use
the svm function defined in the e1071 library in R. The predict function is used for
predicting the response variable values using the regression model, and the rmse
function is used to compute the RMSE values for the predicted quantities.

38
Chapter 5

Deep Learning Models

In this Chapter, we discuss two deep learning-based regression methods: (i) the
long- and short-term memory (LSTM) network, and (ii) the convolutional neural
networks (CNNs).

Long- and Short-Term Memory Network: LSTM is a variant of recurrent neural


networks (RNNs) - neural networks with feedback loops (Geron, 2019). In such
networks, output at the current time slot depends on the current inputs as well as
the previous state of the network. However, RNNs suffer from a problem that these
networks cannot capture long-term dependencies due to vanishing or exploding
gradient during backpropagation in learning the weights of the links (Geron, 2019).
LSTM networks overcome such problems, and hence such networks are quite
effective in forecasting in multivariate time series. LSTM networks consist of
memory cells that can maintain their states over time using memory and gating units
that regulate the information flow into and out of the memory. There are different
variants of gates used. The forget gates control what information to throw away
from memory. The input gates are meant for controlling the new information that
is added to the cell state from the current input. The cell state vector aggregates the
two components - the old memory from the forget gate, and the new memory from
the input gate. In the end, the output gates conditionally decide what to output from
the memory cells. The architecture of an LSTM network along with the
backpropagation through time (BPTT) algorithm for learning provides such
networks a very powerful ability to learn and forecast in a multivariate time series
framework. We use Python programming language and the Tensorflow deep
learning framework for implementing LSTM networks and utilize those networks

39
to predict the stock prices of Godrej Consumer Products a multivariate time series.
For this purpose, we use the open price of the stocks as the response variable and
the predictors chosen are – high, low, close, volume, and the NIFTY index values.
However, unlike for the machine learning techniques, we don't compute the
differences between successive slots. Rather, we forecast the open value of the next
slot based on the predictor values in the previous slots. We used the mean absolute
error (MAE) as the loss function and the adaptive moment estimation (ADAM) as
the optimizer for evaluating the model performance in all the three cases. ADAM
computes adaptive learning rates for each parameter in the gradient descent
algorithm. In addition to storing an exponentially decaying average of the past
squared gradients, ADAM also keeps track of the exponentially decaying average
of the past gradients, which serves as the momentum in the learning process. Instead
of behaving like a ball running down a steep slope like momentum, ADAM
manifests itself like a heavy ball with a rough outer surface. This high level of
friction results in ADAM’s preference for a flat minimum in the error surface. Due
to its ability to integrate an adaptive learning with a momentum, ADAM is found
to perform very efficiently in optimizing the performance of large-scale networks.
This was the reason for our choice of ADAM as the optimizer in our LSTM
modelling. However, we trained the deep learning networks using different epoch
values, different batch sizes for the three different cases and determined the
optimum performance of the network under those parameter values. The sequential
constructor in the Tensorflow framework has been used to build the LSTM model.
The performance results of the LSTM models are presented in Chapter 6.

Convolutional Neural Networks: CNNs emerged from the study of the brain’s
visual cortex, and they have been used in image recognition since the 1980s. In the
last few years, thanks to the increase in computational power, the amount of
available training data, and the tricks for training deep neural networks. CNNs have
managed to achieve superhuman performance on some complex visual tasks. The
power image search services, self-driving cars, automatic video classification
systems, and more. Moreover, CNNs are not restricted to visual perception: they
are also successful at many other tasks, such as voice recognition, natural language
40
processing, and complex time series analysis of financial data.

A CNN is a biologically-inspired type of deep neural network that has gained


popularity due to its success in classification problems (e.g., image recognition
(LeCun et al., 1998), or time series classification (Wang et al., 2016). CNN consists
of a sequence of convolutional layers, the output of which is connected only to local
regions in the input. This is achieved by sliding a filter, or weight matrix, over the
input, and at each point computing the dot product between the two (i.e., a
convolution between the input and filter). This structure allows the model to learn
filters that are able to recognize specific patterns in the input data. One recent
advance in CNNs for time series forecasting involves an undecimated
convolutional network for time series modelling based on an undecimated wavelet
transform (Mittelman, 2015). In another work, the authors propose to use an
autoregressive-type weighting system for forecasting financial time series, where
the weights are allowed to be data-dependent by learning them through a CNN
(Binkowski et al., 2017). In general literature on financial time series forecasting
with convolutional architectures is still scarce, as these types of networks are much
more commonly applied in classification problems. Intuitively, the idea of applying
CNNs to time series forecasting would be to learn filters that represent certain
repeating patterns in the series and use these to forecast the future values. Due to
the layered structure of CNNs, they might work well on noisy series, by discarding
in each subsequent layer the noise and extracting only the meaningful patterns, in
this way bearing similarities to neural networks which use wavelet transformed
time series (i.e., a split in high-and low-frequency components) as input (Aussem
& Murtagh, 1997; Lahmiri, 2014).

In the present work, we have exploited the power of CNN in forecasting the
univariate and multivariate time series data of Godrej Consumer Products stock.
CNNs have two major types of processing layers – convolutional layers and pooling
or subsampling layers. The convolutional layers reads an input such as a 2-
dimensional image or a one-dimensional signal using a kernel (also referred to as
the filter) by reading the data in small segments at a time, and scan across the input
41
data field. Each read result is an interpretation of the input that is projected onto a
filter map and represents an interpretation of the input. The pooling or the
subsampling layers take the feature map projections and distill them to the most
essential elements, such as using a signal averaging (average pool) or signal
maximizing process (max pool). The convolution and pooling layers are repeated
at depth, providing multiple layers of abstraction of the input signals. The output of
the final pooling layer is fed into one or more fully-connected layers that interpret
what has been read and maps this internal representation to a class value.

We use the power of CNN in multi-step time series forecasting in the following
way. The convolutional layers are used to read sequences of the input data, and
automatically extract features. The pooling layers are used for distilling the
extracted features, and in focusing attention on the most salient elements. The fully
connected layers are deployed to interpret the internal representation and output a
vector representing multiple time steps. The benefits that CNN provides in our time
series forecasting job are the automatic feature learning, and the ability of the model
to output a multi-step vector directly.

We exploit the power of CNN in forecasting the stock prices using the Godrej
Consumer Products data in two different ways. In recursive or direct forecast
strategy, the model makes one-step predictions, and outputs are fed as inputs for
subsequent predictions. In the other approach, we used CNNs to predict the entire
output sequence as a one-step prediction of the entire vector. Using these two
approaches, we have built three different types of CNN models for multi-step time
series forecasting of stock prices as follows: (i) Multi-step time series forecasting
with univariate input data, (ii) Multi-step time series forecasting with multivariate
input data via channels – in this case, each input sequence is read as a separate
channel, like different channels of an image (e.g., red, green, and blue), (iii) multi-
step time series forecasting with multivariate input data via sub-models – in this
case, each input sequence is read by a different CNN sub-model and the internal
representations are combined before being interpreted and used to make a
prediction.
42
In the first case, we designed a CNN for multi-step time series forecasting using
only the univariate sequence of the open values. In other words, given some number
of prior days of open values, the model predicts the next standard week of stock
market operation. A standard week consists of five days – Monday to Friday. The
number of prior days used as the input defines the one-dimensional (1D) data of
open_perc values that the CNN will read and learn for extracting features. There
are several choices in deciding on the size and the nature of the input to the CNN
for training such as: (a) all prior days till the week for which the open values to be
predicted, (b) the prior five days (i.e., one week) only before the week of prediction,
(c) the prior two weeks (i.e., 10 days, as each week consists of 5 days), (d) prior
one month, (e) prior week and the week to be predicted in the previous year. Since
there is no obvious best choice here, we have tested the performance of the model
on different input sizes and observed the performance of the model under each such
case. Based on the choice of the input, the training data, the test data, and the
prediction process of the model are accordingly designed.

The multi-step time series forecasting approach is essentially an autoregression


process. Whether univariate or multivariate, the prior time series data is used for
forecasting the values for the next week.

43
Chapter 6

Performance Results and Analysis

In this Chapter, we provide a detailed discussion on the forecasting techniques that


we have used and the results obtained using those techniques. We first discuss the
classification techniques and then the regression techniques. For both the stocks
and for all the three cases, we computed the prediction accuracy of the classification
models using several metrics. We define the metrics below.

Sensitivity: It is the ratio of the number of true positives to the total number of
positives in the test dataset, expressed as a percentage. Here, positive refers to the
cases that belong to the target group (i.e., the class “1”). The term true positives
refer to the number of positive cases that the model correctly identified. The word
sensitivity is also sometimes referred to as recall.

Specificity: It is the ratio of the number of true negatives to the total number of
negatives in the test dataset, expressed as a percentage. Here, negative refers to the
cases that belong to the non-target group (i.e., the class “0”). The term true negative
refers to the number of negative cases that the model correctly identified.

Positive Predictive Value: Positive predictive ratio (PPV), also sometimes referred
to as precision, refers to the accuracy of the model in classifying the target group
cases among the total number of target group cases identified by it. It is computed
as the ratio of the number of correctly identified target group cases to the total
number of target group cases as identified by the model. Since the total number of
target group cases identified by the model is the sum of the number of true positive
cases and the number of false-positive cases, PPV is the ratio of the total number

44
of true positive cases to the sum of the number of true positive cases and the number
of false-positive cases, expressed as a percentage. The complement of PPV is also
called false discovery rate (FDR).

Negative Predictive Value: Negative predictive value (NPV) refers to the accuracy
of the model in classifying the non-target group cases among the total number of
non-target elements identified by it. It is computed as the ratio of the number of
correctly identified non-target group cases to the total number of non-target group
cases as identified by the model. Since the total number of non-target group cases
identified by the models is the sum of the number of true negative cases and the
number of false-negative cases, NPV is the ratio of the total number of true negative
cases to the sum of the number of true negative cases and the number of false-
negative cases, expressed as a percentage. The complement of NPV is also called
false omission rate (FOR).

Classification Accuracy (CA): It is the ratio of the total number of cases that are
correctly classified to the total number of cases in the dataset, expressed as a
percentage.

F1 Score: If the test data set is highly imbalanced, with the cases belonging to the
non-target group far outnumbering the target cases, sensitivity is usually found to
be very poor even with a very high classification accuracy. Hence, classification
accuracy is not considered a very robust and reliable metric. F1 score, which is
computed as the harmonic mean of the sensitivity and PPV, is found to be a very
robust metric, however.

Classification Methods:

Logistic Regression: We used glm function in R programming language to build


logistic regression-based classification models for all the three cases. The response
variable was converted into categorical type by using the function as.factor before
we built the models. The parameter family was set to binomial in order to build a
45
binary logistic regression model. The predict function was used to predict the class
of the test data records. We also built the lift curve and the receiver operating
characteristic (ROC) curve of the model for each case. The output of the
performance function defined in the ROCR library was plotted to illustrate the ROC
curve of the model. The area under the curve (AUC) for each ROC curve is
computed using the auc function defined in the pROC library in R programming
language.

Table 1 presents the performance results of the logistic regression classification


method. For Case I, out of 419 actual “0” cases, only 10 cases were misclassified
as “1”, while among 326 actual “1” cases, 17 cases were found to be wrongly
classified as “0”. The value of AUC for the ROC curve for Case I was 0.9934. For
Case II, 16 cases out of total 396 actual “0” cases were misclassified as “1”, and
out of 329 cases which were actually “1” were wrongly classified as “0”. The AUC
value for this case was found to be 0.9891. Case III yielded 42 cases which were
actually “0” but misclassified as “1” out of a total of 396 cases, while among 329
cases which were actually “1”, 26 cases were misclassified as “0”. The AUC value
for Case III was found to be 0.9587.

Fig 1(a), 1(b), and 1(c) present the classification performance, the lift curve, and
the ROC curve of the logistic regression-based classification model. In Fig 1(a), the
y-axis represents the actual classes of the records (either “0” or “1”) and the x-axis
denotes the probability that a case will belong to the class “1”. The threshold value
along the x-axis is by convention taken to be 0.5. Hence, all the cases which are
found to be lying on the level “0” along the y-axis and situated to the right of the
threshold value of 0.5 along the x-axis are misclassified. Similarly, all the points
which are on the level “1” along the y-axis, and are situated to the left of the
threshold value of 0.5 along the x-axis are also misclassified. It is evident from Fig
1(a) that the number of misclassified cases in Case I was very low. Fig 1(b) shows
that the lift curve is pulled up from the baseline indicating that the model was very
accurate in discriminating between the two classes. Fig 1(c) depicts the ROC curve
for the logistic regression model for Case I. The steepness of the curve makes it
46
evident that the model has been able to very effectively optimize the values of the
true positive rate (TPR) and the false positive rate (FPR).In Fig 1(c), the line
segment with red color presents the class “1” cases which are correctly classified,
while the blue line segment denotes the correctly classified cases which belong to
the class “0”. The portion of the ROC curve that is colored with yellow represents
those cases which actually belong to the class “0”, but the model wrongly classified
them to the class “1”. The “green” colored portion of the ROC curve depicts those
cases which are misclassified into the class “0”, while they actually belong to the
class “1”.

Table 1: Logistic regression classification results

Case I Case II Case III


Metrics Training Accuracy 2013 Training Accuracy 2014 Test Accuracy 2014
Sensitivity 94.79 94.83 92.10
Specificity 97.61 95.96 89.39
PPV 96.87 95.12 87.83
NPV 96.01 95.72 93.16
CA 96.38 95.45 90.62
F1 Score 95.82 94.97 89.91

Fig 1(a): Logistic Regression -- actual vs predicted probabilities of open_perc (Case I)

47
Fig 1(b): Logistic Regression for classification – lift curve (Case I)

Fig 1(c): Logistic Regression for classification – ROC curve (Case II)

Fig 2(a), Fig 2(b), and Fig 2(c) depict respectively the classification performance,
the lift curve, and the ROC curve of the logistic regression model for Case II. The
performance of the model, in this case, is similar to that of Case I. However, the
AUC value yielded by the model, in this case, was just marginally smaller than the
corresponding value in Case I.

48
Fig 2(a): Logistic Regression – actual vs predicted probabilities of open_perc (Case II)

Fig 2(b): Logistic Regression for classification – lift curve (Case II)

Fig 2(c): Logistic Regression for classification – ROC curve (Case II)
49
Fig 3(a): Logistic Regression – actual vs predicted probabilities of open_perc (Case III)

Fig 3(b): Logistic Regression for classification – lift curve (Case III)

Fig 3(a), Fig 3(b), and Fig 3(c) show the classification accuracy, the lift curve, and
the ROC curve for the logistic regression model in Case III. It is evident from Fig
3(c) that unlike in Case I and Case II, the classification model committed more
errors in classification. This case also yielded a lower AUC value of 0.9587.

50
Fig 3(c): Logistic Regression for classification – ROC curve (Case III)

KNN Classification: Table 2 presents the performance results of the KNN


classification method. For Case I, with the values of k = 1, 3, 5, 7, and 9, the
classification accuracy values were found to be 100, 93.42, 91.68, 92.35, and 92.08
respectively. We choose k = 3 in order to avoid the overfitted model with k = 1. In
this case, there were 419 cases were 0s and 326 cases were 1s. 15 cases of actual 0s
were misclassified as 1s, and 34 cases of actual 1s were misclassified as 0s. In Case
II, for k = 1,3,5,7, and 9, the classification accuracy values were 100, 90.21, 85.10,
83.22, and 84.16 respectively. Again k = 3 was chosen to avoid model overfitting.
28 cases of actual 0 were misclassified as 1, while 43 cases of actual 1 were
misclassified as 0. For Case III, the classification accuracy values were found to be
65.24, 65.10, 67.17, 68.69, and 67.44 for k = 1, 3, 5, 7, and 9 respectively. We chose
k = 3, for which 202 cases which were actually 0s were misclassified as 1s, while
51 cases of actual 1s were misclassified as 0s.

Table 2: KNN classification results

Case I Case II Case III


Metrics Training Accuracy 2013 Training Accuracy 2014 Test Accuracy 2014
Sensitivity 89.57 86.93 84.50
Specificity 96.42 92.93 48.99
PPV 95.11 91.08 57.92
NPV 92.24 89.54 79.18
CA 93.42 90.21 65.10
F1 Score 92.26 88.96 68.73

51
Decision Tree Classification: We used tree function defined in the tree library in
R programming language for building the decision tree-based classification models
in all the three cases. The response variable open_perc is converted into a
categorical type using as.factor function for the purpose of classification. The
predict function in the tree library was used for predicting the classes of the
response variable open_perc for the records in the test dataset. For Case I and Case
III the models were identical as they were trained on the 2013 data. However, while
the model in Case I was tested on the 2103 data, the 2014 data was used for testing
the model in Case II. For all these cases, we found high_perc, low_perc, and
close_perc were the three predictor variables that were used by the models to
construct the decision trees. However, in Case I, the predictor which was used for
splitting at the root node was close_prec, indicating that close_perc was the most
important predictor for classification in the 2013 dataset. However, for the 2014
dataset, high_perc was found to be the most discriminating one as the same was
used by the model for splitting at the root node. In Case I, the decision tree classifier
misclassified 8 cases out of a total of 419 cases which actually belonged to the class
“0”, while 16 cases were wrongly classified out of a total of 326 cases which were
actually the records of the class “1”.

Table 3: Decision Tree classification results

Case I Case II Case III


Metrics Training Accuracy 2013 Training Accuracy 2014 Test Accuracy 2014
Sensitivity 95.09 92.40 89.97
Specificity 98.09 95.71 92.42
PPV 97.48 94.70 90.80
NPV 96.25 93.81 91.73
CA 96.78 94.21 91.31
F1 Score 96.27 93.54 90.38

In Case II, the model failed to correctly classify 17 cases out of a total of 396 cases
which were actually “0” class members, while 25 cases were misclassified out of a
total of 329 cases that actually belonged to the class “1”. In Case III, the model had
a more difficult task at hand. We found that we could not correctly classify 30 cases
52
out of a total of 396 cases which actually belonged to the class “0”, while 33 cases
were misclassified out of a total of 329 cases which actually belonged to the actual
class of “1”.Table 3 presents the performance results of the decision tree
classification models under three different cases. Fig 4(a), 4(b), 4(c) depict the
decision tree classifiers for Case I, Case II, and Case III respectively.

Fig 4(a): Decision Tree for classification (Case I)

Fig 4(b): Decision Tree for classification (Case II)

53
Fig 4(c): Decision Tree for classification (Case III)

Table 4: Bagging classification results

Case I Case II Case III


Metrics Training Accuracy 2013 Training Accuracy 2014 Test Accuracy 2014
Sensitivity 95.09 95.44 89.97
Specificity 98.09 96.46 92.42
PPV 97.48 95.73 90.78
NPV 96.25 96.22 91.73
CA 96.78 96.00 91.31
F1 Score 96.07 95.58 90.37

Bagging Classification: We used bagging function defined in the ipred library in


R programming language for building the bagging classification models for all the
three cases. We set the value of the parameter nbag to 25 so that 25 decision trees
were created randomly, and a simple majority voting mechanism was applied in
constructing the classifier. In Case I, we found that the model failed to correctly
classify 8 cases out of a total of 419 cases which actually belonged to the class “0”,
while 16 cases out of a total of 326 cases which actually belonged to the class “1”
were also misclassified. In Case II, the model could not correctly classify 14 cases
out of a total of 396 cases that are of actual class “0”, while 15 cases out of a total
of 329 cases were misclassified which belonged to the class “1”. In Case III, 30
cases out of 396 actual “0” class cases were incorrectly classified by the model,
while 33 cases out a total of 329 cases of the class “1” were also misclassified. The
54
performance results of the bagging classification model for all three cases are
presented in Table 4. Fig 5(a), Fig 5(b), and Fig 5(c) depict the classification
accuracy of the model in Case I, Case II, and Case III respectively. In all these three
figures, the y-axis represents the actual class labels, while the values along the x-
axis show the probabilities of the predicted class for the records. The cases which
are on the label “0” on the y-axis and have their probability values greater than 0.5
along the x-axis are the misclassified cases. In a similar line, those cases which are
lying on the label “1” along the y-axis and have their probability values less than
0.5 along the x-axis, are also misclassified.

Fig 5(a): Bagging for classification – actual vs predicted classes of open_perc (Case I)

Fig 5(b): Bagging for classification – actual vs predicted classes of open_perc (Case II)
55
Fig 5(c): Bagging for classification – actual vs predicted classes of open_perc (Case III)

Table 5: Boosting classification results

Case I Case II Case III


Metrics Training Accuracy 2013 Training Accuracy 2014 Test Accuracy 2014
Sensitivity 100 100 92.10
Specificity 100 100 93.43
PPV 100 100 92.10
NPV 100 100 93.43
CA 100 100 92.83
F1 Score 100 100 92.10

Fig 6(a): Boosting for classification – actual vs predicted classes of open_perc (Case I)

56
Boosting Classification: We have used the boosting function defined in the adabag
library in R programming language for building the boosting models for
classification under all the three cases. The response variable open_perc was
transformed into the categorical type using as.factor function so as to satisfy the
requirement of a classification model. The predict function was used for predicting
the class of the response variable in the test data records. For both Case I and Case
II, the boosting classification models were found to have yielded 100% accuracy in
all the metrics of classification as presented in Table 5. This is not surprising as in
both the cases the models were built and tested using the same dataset, and thus the
learning of the models had been very accurate using the ensemble of the weighted
majority voting on a large number of random decision tree classifiers. However, the
model faced more challenges in Case III in which the ensemble model was built
using the 2013 data, and the testing was done using the 2014 data. In Case III, we
found that the model misclassified 26 cases out of a total of 396 cases which
actually belonged to the class “0”, while among 329 cases which were actually of
the class “1”, 26 cases were incorrectly classified. Table 5 presents the performance
results of the boosting classification models for all three cases. Fig 6(a), Fig 6(b),
and Fig 6(c) depict the performance of the boosting classifier for Case I, Case II,
and Case III respectively. In these three figures, along the y-axis the actual classes
are plotted – there are two actual class levels “0” and “1”. The x-axis presents the
predicted probability that a case will belong to the class “1”. Hence, the data points
which are situated to the left side of the threshold value of 0.5 along the x-axis and
lying on the level “1” along the y-axis are the misclassified cases. Similarly, the
point that is on the right side of the threshold value of 0.5 and lying on the level “0”
along the y-axis are also the misclassified cases. It is evident from Fig 6(a), Fig
6(b), and Fig 6(c) that boosting classifiers have performed very well in all the three
cases.

57
Fig 6(b): Boosting for classification – actual vs predicted classes of open_perc (Case II)

Fig 6(c): Boosting for classification – actual vs predicted classes of open_perc (Case III)

Table 6: Random Forest classification results

Case I Case II Case III


Metrics Training Accuracy 2013 Training Accuracy 2014 Test Accuracy 2014
Sensitivity 94.48 93.01 91.19
Specificity 97.61 94.19 92.93
PPV 96.86 93.01 91.46
NPV 98.08 94.19 92.70
CA 96.24 93.66 92.14
F1 Score 95.66 93.01 91.32

Random Forest Classification: We used randomForest function defined in the


randomForest library in R programming language for building random forest-based
58
classification models. In all three cases, the random forest algorithm created 500
decision trees using three predictors at each node in the decision trees for carrying
out the splitting task. In Case I, the model wrongly classified 10 cases as the class
“1” cases out of a total of 419 cases which actually belonged to the class “0”. On
the other hand, 18 cases were misclassified into the class “1” out of a total of 326
cases which were actually of the class “0”. The out of bag (OOB) estimate of the
error rate of the model, in this case, was 3.76%. In Case II, the model could not
correctly classify 23 cases out of a total of 396 cases that belonged to the actual
class of “0”. On the other hand, 23 cases out of a total of 329 cases that actually
belonged to the class of “0” were also misclassified. The OOB estimate of the error
rate of the classification model, in this case, was 6.34%. In Case III, the random
forest classification model was identical to that in Case I. However, the model was
tested on 2014 data unlike the model in Case I that was tested on 2013 data. We
found that in Case III, the model misclassified 28 cases out of a total number of 396
cases which actually belonged to the class “0”. On the other hand, 29 cases were
wrongly classified out of a total of 329 actual “1” cases. The performance results
of the random forest classification model for all three cases are presented in Table
6.
Table 7: ANN classification results

Case I Case II Case III


Metrics Training Accuracy 2013 Training Accuracy 2014 Test Accuracy 2014
Sensitivity 95.40 93.62 99.70
Specificity 97.61 95.71 34.60
PPV 96.88 94.77 55.88
NPV 96.46 94.75 99.28
CA 96.64 94.76 64.14
F1 Score 96.13 94.19 71.62

ANN Classification: We used the neuralnet function defined in the neuralnet


library in R programming language to build ANN classification models for all the
three cases. The parameter linear.output was set to false and the response variable
59
open_perc was converted into a categorical variable type by using the function
as.factor before the classification models were built. We found that only one node
at the hidden layer was sufficient to model the data, hence we passed the value of
the parameter hidden as 1 in the neuralnet function. In order to avoid any possible
scenario in which the backpropagation algorithm fails to converge, we set the
parameter stepmax to its maximum possible value of 106. In Case I, the ANN
classification model misclassified 10 cases out of a total of 419 cases as “1” while
they actually belonged to class “0”. On the other hand, 15 cases which were actually
“1” were wrongly classified as “0” out of a total of 326 cases. The ANN model for
classification for Case I is presented in Fig 7(a), and its performance in the
classification task is presented in Fig 7(b). Fig 7(b) plots along the y-axis the actual
classes and along the x-axis the predicted classes. The points lying on the actual
class label “0” along the y-axis while having their predicted class probabilities
greater than 0.5 (i.e., those points on the “0” label lying on the right-hand side of
the threshold value of 0.5 along the x-axis) represent the misclassified cases. In a
similar line, the points which are on the label “1” along y-axis while having their
probabilities smaller than 0.5 (i.e., those points on the “1” label lying on the left-
hand side of the threshold value of 0.5 along the x-axis) are also misclassified
points. In Case II, the ANN classification model misclassified 17 cases as class “0”
out of 396 cases which were actually belonged class “1”. On the other hand, 21
cases were misclassified as class “1” out of 329 cases which were actually class “0”
cases. Fig 8(a) and 8(b) presents the ANN classification model in Case II, and its
performance in classification task respectively. In Case III, the model was built
using 2013 data, hence it was identical to the model that was used in Case I.
However, since the model was tested on 2014 data, unlike in Case I in which the
model was tested on 2013 data, the performance results of the model in Case III
was very much different. In fact, the model in Case III faced a much bigger
challenge as there was a difference in the characteristics of the data in 2013 and
2014. We found that in Case III, the model wrongly classified 259 cases as class
“1” out of 396 cases which actually belonged to the class “0”. On the other hand,
only 1 case out of 329 cases which were actually of the class “1” was misclassified
as the class “0” case. It is evident, that model failed miserably in classifying the
60
class “0” cases which resulted in a very poor value of its specificity. The specificity
in Case III was found to be only 34.60%, while for Case I and Case II, the
specificity values were 97.61% and 95.71% respectively. This clearly indicated that
the ANN classification model had a poor generalization in learning during the
training phase using the 2013 data and that possibly led to a model overfitting. This
overfitted model failed to correctly classify the majority of the “0” cases in the test
data of 2014, which resulted in a very low specificity value. Fig 9(a) and Fig 9(b)
present the ANN classification model and its classification performance
respectively. Table 7 presents the performance of the ANN classification models in
all three cases.

Fig 7(a): ANN classification model (Case I)

Fig 7(b): ANN classification – actual vs predicted classes of open_perc (Case I)

61
Fig 8(a): ANN classification model (Case II)

Fig 8(b): ANN classification – actual vs predicted classes of open_perc (Case II)

Fig 9(a): ANN classification model (Case III)

62
Fig 9(b): ANN classification – actual vs predicted classes of open_perc (Case III)

Table 8: SVM classification results

Case I Case II Case III


Metrics Training Accuracy 2013 Training Accuracy 2014 Test Accuracy 2014
Sensitivity 94.46 94.67 93.81
Specificity 95.58 93.35 90.19
PPV 94.17 91.79 87.54
NPV 98.09 95.71 95.20
CA 96.38 93.93 91.72
F1 Score 94.31 93.21 90.57

SVM Classification: We used the ksvm function defined in the kernlab library in R
programming language for building the SVM-based classification models. The
function ksvm was used with the parameter kernel set to vanilladot. It implies that
a linear kernel is used for building the SVM classification models. For Case I, the
model found 120 number of support vectors. We found that out of a total number
of 430 cases which were actually “0” class records, 19 cases were misclassified as
“1”. On the other hand, 8 cases were wrongly classified as “0”, out of a total of 315
cases which were actually “1”. The training error for Case I was found to be 3.62%.
For Case II, the model found 156 support vectors in order to classify all the 725
records. Among 406 cases which actually belonged to the class “0”, 27 cases were
misclassified as “1”. On the other hand, 17 cases were wrongly classified as “0” out

63
of a total of 319 cases which were actually “1”. The training error for Case II was
found to be 6.07%. The SVM classification model found 116 support vector points
in Case III. The model misclassified 41 cases as “1” out of a total of 418 cases
which were actually “0”. On the other hand, out of a total of 307 cases which were
actually “1”, 19 cases were misclassified as “0”. Table 8 presents the results of the
SVM classification models for all three cases.

Regression Methods:

Multivariate Regression: We have already mentioned in Chapter 4 that the


predictors that were finally included in the multivariate regression models all the
three cases – Case I, Case II, and Case III – were low_perc and range_diff. For
Case I, the regression model yielded a value of 0.9919 for the adjusted R2 value and
the F statistic value of 4.58*104 with an associated p-value of 2.2*10-16. This
indicated that the regression model was successfully able to establish a linear
relationship between the response variable open_perc and the predictor variables
low_perc and range_diff. The RMSE value yielded by the regression model for this
case was found to be 0.0853, and the mean of the absolute values of the actual
open_perc was 0.6402. The ratio of the RMSE to the mean of the absolute values
of the actual open_perc was found to be 13.317. 14 cases out of a total of 745 cases
exhibited a sign mismatch between the predicted and the actual values of
open_perc. The correlation test produced a correlation coefficient value of 0.99
with the p-value of the t-statistic as 2.2*10-16. This indicated the there is a strong
linear relationship between the predicted and the actual values of open_perc. The
Breusch-Pagan test yielded a test statistic value of 10.239 with a p-value of
0.005978. Hence, it was evident that the residuals are not homoscedastic. However,
the Durbin-Watson test of autocorrelation produced a test statistic value of 3.023
with an associated p-value of 1. Hence, the null hypothesis that assumes presence
no autocorrelation among the residuals has the fullest support. We conclude that
the residuals do not exhibit any significant autocorrelation. For Case II, the
regression model yielded an adjusted R2 value of 0.9827 with the value of the F-
statistics as 2.052*104. The p-values of the F statistics were found to be less than
64
2.2*10-16 indicating a very highly significant F statistics and very good model fit.
RMSE value for Case II was found to be 0.1749 with the mean of the absolute
values of actual open_perc as 0.9286. The ratio of the RMSE to the mean of the
absolute values of the actual open_perc was 18.84. 39 cases out of a total of 725
cases were found to have a sign mismatch between the predicted and the actual
open_perc values. The correlation test for this case yielded a value of correlation
coefficient as 0.99 with the value of the t-statistic as 202.74. The p-value of the t-
statistic was 2.2*10-16 indicating a very strong linear relationship between the
predicted and the actual open_perc values. The Breusch-Pagan test yielded a test
statistic value of 3.1877 with an associated p-value of 0.203. It was thus evident
that the residuals did not exhibit significant heteroscedasticity. The Durbin-Watson
test of autocorrelation produced a test statistic value of 2.9005. The p-value of the
Durbin-Watson test was found to be 1 indicating that the null hypothesis of no
significant autocorrelation among the residues got full support. Hence, we conclude
that the residuals in the regression model in Case II did not exhibit any significant
autocorrelation. The model in Case III was the same as that in Case I. However, its
performance results were different as it was tested on 2014 data, unlike the model
in Case I which was tested on 2013 data. The RMSE for Case III was found to be
0.1753 with the mean of the absolute values of the actual open_perc equal to
0.9286. Thus, the ratio of the RMSE to the mean of the absolute values of the actual
open_perc values was found to be 18.88. We found that 39 cases out a total of 725
cases exhibited sign mismatch between the predicted and the actual values of
open_perc. The correlation test yielded a correlation coefficient of 0.99 with the
value of t-statistics as 202.53 and the associated p-value of 2.2*10-16. This indicated
that the predicted and the actual values of open_perc exhibited a strong linear
relationship between them. The Breusch-Pagan test yielded a test statistic of 3.1877
with an associated p-value of 0.2031 thereby indicating that the residuals were not
heteroscedastic. The test statistic value yielded the Durbin-Watson test was found
to be 2.9005 with an associated p-value of 1. Hence, the null hypothesis of no
autocorrelation among the residuals had the fullest support and we concluded that
the residuals did not exhibit any significant autocorrelation.

65
Table 9 presents the results of the multivariate regression results for all three cases.
Fig 10(a), (b) and (c) present some performance results of the multivariate
regression model for Case I. Fig 10(a) shows that the predicted values very closely
followed the pattern of the actual open_perc values, while Fig 10(b) exhibits that
there is a very strong linear relationship between the predicted and the actual values
of open_perc. The residuals of the model were found to be scattered and random
and exhibited no significant autocorrelation as depicted in Fig. 10(c). The
performance results of Case II are presented in Fig 11(a), (b), and (c). The predicted
and the actual values of the open_perc exhibited almost identical movement
patterns in this case as in Case I. The residuals did not show any significant
autocorrelations. Fig 12(a) shows how closely the predicted values of the
open_perc followed the patterns exhibited its actual values in Case III, while Fig
12(b) exhibits a strong linear relationship between them. Fig 12(c) depicts that the
residuals of the regression model for Case III were random and did not exhibit any
autocorrelations.

Table 9: Multivariate Regression results

Case I Case II Case III


Metrics Training 2013 Training 2014 Test 2014
Correlation Coefficient 0.99 0.99 0.99
RMSE/Mean of Absolute Values of Actuals 13.32 18.84 18.88
Percentage of Mismatched Cases 18.67 5.38 5.24

Fig10(a): Multivariate Regression- time-varying actual and predicted values of open_perc (Case1)

66
Fig 10(b): Multivariate Regression - relationship between actual and predicted open_perc (Case I)

Fig 10(c): Multivariate Regression- time-varying residuals (Case1)

Fig 11(a): Multivariate Regression- time-varying actual and predicted values of open_perc (Case II)

67
Fig 11(b): Multivariate Regression - relationship between actual and predicted open_perc (Case II)

Fig 11(c): Multivariate Regression- time-varying residuals (Case II)

Fig 12(a): Multivariate Regression- time-varying actual and predicted open_perc (Case III)

68
Fig 12(b): Multivariate Regression - actual and predicted open_perc (Case III)

Fig 12(c): Multivariate Regression- time-varying residuals (Case III)

MARS: We used the earth function defined in the earth library in R programming
language for building MARS regression models in all the three cases. In Case I, in
the forward pass of the execution of the algorithm, seven terms were used in the
model building as after the inclusion of the 8th term the change in the value of R2
was found to be only 5*10-5 which was less than the threshold value of 0.001. After
the completion of the forward pass, both the generalized R-square (GRSq) and the
R2 converged to a common value of 0.993. During the backward pass, the algorithm
could not prune any term and all the seven terms used in the forward pass were
finally retained in the model. In Case 1, the model retained three predictors out of
a total of ten predictors. The selected predictors in decreasing order of their
importance in the model were found to be: close_perc, high_perc, and low_perc.
The predictors which the algorithm did not use were: month, day_month, day_week,
69
time, vol_perc, nifty_perc, and range_diff. At the completion of the execution of
the algorithm, the values of some of the important metrics were as follows: (i)
generalized cross-validation (GCV): 0.0065, (ii) residual sum of square (RSS):
4.7006, (iii) GRSq: 0.9928, and (iv) R2: 0.9930. The seven terms that the MARS
algorithm used in Case I were found to be as follows: (i) the intercept of the model,
(ii) h(-0.83682 – high_perc), (iii) h(high_perc – 0.83682), (iv) h(-0.692841 –
low_perc), (v) h(low_perc – 0.692841), (vi) h(-2.11268 – close_perc), and (vii)
h(close_perc – 2.11268). In Case I, the MARS regression model yielded 9 cases
out of a total of 745 cases that exhibited mismatch in signs between the predicted
and the actual values of open_perc. The RMSE value for this case was 0.0794,
while the mean of the absolute values of the actual open_perc was 0.6402. Hence,
the ratio of the RMSE to the mean of the absolute values of the actual open_perc
was 2.4065. The correlation test yielded a value of correlation coefficient as 0.99
with the t-statistic value of 325.41 and an associated p-value of 2.2*10-16. This
indicated that there is a strong linear relationship between the predicted and the
actual values. Table 10 presents the results of the MARS regression model.

Table 10: MARS regression results

Case I Case II Case III


Metrics Training 2013 Training 2014 Test 2014
Correlation Coefficient 0.99 0.99 0.99
RMSE/Mean of Absolute Values of Actuals 12.41 17.09 20.40
Percentage of Mismatched Cases 1.21 4.28 6.34

Fig 13(a): MARS- actual and predicted values of open_perc (Case 1)


70
Fig 13(b): MARS – relationship between actual and predicted values of open_perc (Case I)

Fig 13(c): MARS - time-varying residuals (Case I)

In Case II, the algorithm used nine terms during its forward execution since the
change in the R2 value at the end of the 9th term was found to be only 0.0002, which
was less than the threshold value of 0.001. After the completion of the forward
pass, the values of GRSq and R2 were found to be 0.985 and 0.986 respectively.
During the backward pass of its execution, the algorithm could prune one term out
of the nine terms included in the forward pass. Hence, the algorithm used eight
terms in constructing the regression model. We also observed that the algorithm
retained four predictors out of a total of ten predictors available initially. The four
predictors which were retained in the model in the decreasing order of their
importance were found to be: low_perc, close_perc, range_diff and high_perc. At
the end of the execution of the backward pass of the algorithm, some important
metric values were noted: GCV: 0.0262, RSS: 18.2512, GRSq: 0.9852, and R2:
71
0.9858. The eight terms that the algorithm used in building the regression model in
Case II were: (i) the intercept of the model, (ii) h(0.3675 – high_perc), (iii)
h(high_perc – 0.3675), (iv) h(-2.6685 – low_perc), (v) h(low_perc – 2.6685), (vi)
h(0.3996 – close_perc), (vii) h(-1.8 – range_diff), and (viii) h(range_diff - -1.8). In
Case II, we found that 31 cases out of a total of 725 cases exhibited mismatched
signs between the predicted and the actual values of open_perc. With an RMSE
value of 0.1587 and the mean of the absolute values of the actual open_perc as
0.9286, their ratio was found to be 17.09. The correlation test yielded the value of
the correlation coefficient as 0.99, with the value of the t-statistic as 223.87, with
an associated p-value of 2.2*10-16. The high value of the correlation coefficient and
the negligible support for the null hypothesis in the form of a very low p-value
indicated that there was a very strong linear relationship between the predicted and
the actual values of open_perc in Case II.

Fig 14(a): MARS- time-varying actual and predicted values of open_perc (Case II)

Fig 14(b): MARS – relationship between actual and predicted values of open_perc (Case II)

72
Fig 14(c): MARS - time-varying residuals (Case II)

Fig 15(a): MARS- time-varying actual and predicted values of open_perc values (Case III)

Fig 15(b): MARS – relationship between actual and predicted values of open_perc (Case III)

In Case III, the MARS model of regression was identical to that of Case I. The

73
model was, however, tested on 2014 data. We observed that in Case III, the MARS
model yielded 46 cases out of a total of 725 cases that yielded a sign mismatch
between the predicted and the actual open_perc values. The RMSE for this case
was found to be 0.1894, while the mean of the absolute values of the actual
open_perc was 0.9286. The ratio of the RMSE to the mean value was found to be
20.40. The correlation test on the predicted and the actual values of open_perc
yielded a correlation coefficient value of 0.99 with the value of t-statistic as 187.13
and an associated p-value of 2.2*10-16. The results indicated that like in Case I and
Case II, the predicted and the actual values of open_perc exhibited a strong linear
relationship between them in Case III as well.

Fig 15(c): MARS - time-varying residuals (Case III)

Table 11: Decision Tree regression results

Case I Case II Case III


Metrics Training 2013 Training 2014 Test 2014
Correlation Coefficient 0.97 0.97 0.10
RMSE/Mean of Absolute Values of Actuals 35.35 37.04 165.92
Percentage of Mismatched Cases 13.42 17.38 47.72

Decision Tree Regression: We used the tree function defined in the tree library in
R programming language to build a decision tree-based regression model. For Case
I, close_perc turned out to be the splitting variable at the root node. Other important
variables that led to splitting at nodes were high_perc and low_perc. Fig. 16(a)

74
depicts the decision tree model. RMSE for this case was 0.2263, and the mean of
the absolute values of the actual open_perc was found to be 0.6402. Among the
total of 745 cases, 100 cases exhibited sign mismatch between the predicted and the
actual values of open_perc. The correlation coefficient between the predicted and
the actual open_perc values turned out to be 0.97. The t-statistics for the correlation
test yielded a value of 111.35 with a p-value of 2.2*10-16 which indicated that there
was a strong linear relationship between the predicted and the actual open_perc
values. Fig 16(b), (c), and (d) depict different performance characteristics of the
decision tree-based regression model for Case I. Fig 16(b) depicts that except for a
few instances, the predicted values of open_perc very closely followed the pattern
exhibited by its actual values. Fig 16(c) shows that with the increase in the actual
open_perc values, its predicted values also exhibited an upward trend stepwise. Fig
16(d) shows that residuals did not exhibit an autocorrelation among them. Table 11
depicts the results of the decision tree regression model.

Fig 16(a): Decision Tree regression model (Case I)

Fig 16(b): Decision Tree regression - time-varying actual and predicted open_perc (Case I)
75
Fig 16(c): Decision Tree regression - actual and predicted open_perc (Case I)

Fig 16(d): Decision Tree regression – time-varying residuals (Case I)

Fig. 17(a) presents the decision tree regression model for Case II. In this case too,
the variable close_perc was the node that was split at the root node, and the other
two variables which were split at subsequent nodes were high_perc and low_perc.
This case yielded an RMSE value of 0.3440, and the mean of the absolute values
of the actual open_perc values was 0.9286. 126 cases out of a total of 725 cases
exhibited sign mismatch between their predicted and actual open_perc values. The
correlation coefficient between the actual and the predicted values of open_perc
was found to be 0.96 with a t-statistics value of the correlation test as 100.47, and
its associated p-values as 2.2*10-16. The correlation test indicated that the predicted
and the actual open_perc values were highly correlated. Fig 17 (b), (c), (d) show

76
that the regression model was effective in establishing a linear relationship between
the response variable, open_perc, and all other predictor variables.

Fig 17(a): Decision Tree regression model (Case II)

Fig 17(b): Decision Tree regression - time-varying actual and predicted open_perc (Case II)

Fig 17(c): Decision Tree regression - actual and predicted open_perc (Case II)
77
Fig 17(d): Decision Tree regression – time-varying residuals (Case II)

Fig 18(a): Decision Tree regression model (Case III)

The decision tree regression model for Case III was the same as that in Case I. The
decision tree model is presented in Fig 18(a). However, the performance of the
model yielded different results as it was tested on 2014 data unlike in Case I, in
which the model was tested on 2013 data. The correlation coefficient between the
predicted and the actual values of open_perc for this was found to be 0.10 with the
t-statistics value of the correlation test as 2.8243 and its associated p-value as
0.00487. However, as expected, the RMSE for this case was higher than those in
the previous two cases. The RMSE was found to be 1.5407 with the mean of the
absolute values of the actual open_perc as 0.9286. This led to a very high value of
165.92 as their ratio. 346 cases out a total of 725 cases exhibited mismatch in sign
between the predicted and the actual values of open_perc. The model was
absolutely unable to predict as it had a very limited number of values to map into a
set of a large set of continuously varying open_perc values for the year 2014.
78
Fig 18(b): Decision Tree regression - time-varying actual and predicted open_perc (Case III)

Fig 18(c): Decision Tree - relationship between actual and predicted open_perc (Case III)

Fig 18(b), (c), and (d) present the performance of the model in Case III. While the
behavior of the model was almost identical to that in the other two cases, Fig 18(b)
shows clearly that there were more deviations between the patterns exhibited by the
actual values and the predicted values of open_perc. This led to a significantly
higher RMSE in this case as compared to Case I and Case II.

Fig 18(d): Decision Tree regression – time-varying residuals (Case III)


79
Table 12: Bagging regression results

Case I Case II Case III


Metrics Training 2013 Training 2014 Test 2014
Correlation Coefficient 0.96 0.98 0.97
RMSE/Mean of Absolute Values of Actuals 40.29 25.70 34.91
Percentage of Mismatched Cases 4.70 5.10 9.24

Bagging Regression: The bagging function defined in the ipred library of R


programming language was used in building the bagging regression model. In Case
I, RMSE value was found to be 0.2579 with the mean of the absolute values of
open_perc as 0.6402. Among 745 total cases, 35 cases exhibited mismatch in their
predicted and the corresponding actual values of open_perc.

Fig 19(a): Bagging regression - time-varying actual and predicted values of open_perc (Case I)

Fig 19(b): Bagging regression - relationship between actual and predicted open_perc (Case I)

80
Fig 19(c): Bagging regression – time-varying residuals (Case I)

Case II yielded an RMSE value of 0.2386 and the mean of the absolute values of
the actual open_perc as 0.9286. We found that 37 cases out of a total of 725 cases
yielded mismatch in sign between the predicted and the actual values of open_perc.
The RMSE value for Case III was found to be 0.3242. The mean of the absolute
values of the actual open_perc was 0.9286. We observed that 67 cases out of a total
of 725 cases showed a mismatch in sign among its predicted and the corresponding
actual values of open_perc. Table 12 presents the results of the bagging regression
model.

Fig 20(a): Bagging regression - time-varying actual and predicted values of open_perc (Case II)

81
Fig 20(b): Bagging regression - relationship between actual and predicted open_perc (Case II)

Fig 20(c): Bagging regression – time-varying residuals (Case II)

Fig 21(a): Bagging regression - time-varying actual and predicted open_perc (Case III)

82
Fig 21(b): Bagging regression - relationship between actual and predicted open_perc (Case III)

Fig 21(c): Bagging regression – time-varying residuals (Case III)

Table 13: Boosting regression results

Case I Case II Case III


Metrics Training 2013 Training 2014 Test 2014
Correlation Coefficient 0.99 0.99 0.97
RMSE/Mean of Absolute Values of Actuals 23.40 17.35 41.51
Percentage of Mismatched Cases 0.81 4.69 6.90

83
Fig 22(a): Boosting regression - time-varying actual and predicted open_perc (Case I)

Fig 22(b): Boosting regression - relationship between actual and predicted open_perc (Case I)

Fig 22(c): Boosting regression – time-varying residuals (Case I)

Boosting Regression: We used blackboost function defined in the mboost library


in R programming language. In Case I, 6 cases out of 745 cases exhibited
84
mismatched signs among the predicted and the corresponding actual open_perc
values. RMSE for this case was found to be 0.1498, while the mean of the absolute
values of the actual open_perc was 0.6402. For Case II, out of 725 total cases, 34
cases yielded mismatched signs between the actual and their corresponding
predicted values of open_perc. The RMSE for this case was 0.1611, and the mean
of the absolute values of the actual open_perc was 0.9286. Case III yielded an
RMSE value of 0.3855 with the mean of the absolute values of the actual open_perc
of 0.9286. In Case III, 50 cases out of a total of 725 cases exhibited mismatched
signs between the predicted and their corresponding actual values of open_perc.
Table 13 presents the results of the boosting regression model.

Fig 23(a): Boosting regression - time-varying actual and predicted open_perc (Case II)

Fig 23(b): Boosting regression - relationship between actual and predicted open_perc (Case II)

85
Fig 23(c): Boosting regression – time-varying residuals (Case II)

Fig 24(a): Boosting regression - time-varying actual and predicted open_perc (Case III)

Fig 24(b): Boosting regression - relationship between actual and predicted open_perc (Case III)

86
Fig 24(c): Boosting regression – time-varying residuals (Case III)

Random Forest Regression: We have used randomForest function defined in the


randomForest library in R programming language for building the random forest
regression model. For all three cases, the algorithm tried with three variables at each
split of the associated decision tree. The number of regression decision trees
constructed in each case was 500. The mean squared residual values were found to
be 0.0441, 0.0512, and 0.0441 respectively for Case I, Case II, and Case III
respectively. In Case I, the percentage of variance explained by the model was
95.13. None of the 745 cases exhibited any mismatching between their predicted
and the actual values of open_perc. While the RMSE for this case was 0.1041, the
mean of the absolute values of the actual open_perc was 0.6402. For Case II, the
model could explain 97.11% of the variance, and 19 cases out of a total of 725 cases
exhibited mismatched signs between the predicted and the actual values of
open_perc. The RMSE for this case was 0.1005, while the mean of the absolute
values of the actual open_perc was 0.9286. Case III had 95.13% of the variance
explained by the model. It was observed that 47 cases out of 725 cases exhibited
mismatched signs for the predicted and the actual open_perc values. RMSE value
was 0.2973 with a mean of the absolute values of the actual open_perc values as
0.9286. Table 14 presents the results of the random forest regression model.

87
Table 14: Random Forest regression results
Case I Case II Case III
Metrics Training 2013 Training 2014 Test 2014
Correlation Coefficient 0.99 0.99 0.97
RMSE/Mean of Absolute Values of Actuals 16.26 10.82 32.02
Percentage of Mismatched Cases 0.00 2.62 6.48

Fig 25(a): Random Forest regression - time-varying actual and predicted open_perc (Case I)

Fig 25(b): Random Forest - relationship between actual and predicted open_perc (Case I)

88
Fig 25(c): Random Forest regression – time-varying residuals (Case I)

Fig 26(a): Random Forest regression - time-varying actual and predicted open_perc (Case II)

Fig 26(b): Random Forest - relationship between actual and predicted open_perc (Case II)

89
Fig 26(c): Random Forest regression – time-varying residuals (Case II)

Fig 25(a) depicts the way the predicted open_perc values superimposed on their
corresponding actual values for each of the 745 time slots in Case I. The linear
relationship between the predicted and the actual open_perc values are presented
in Fig 25 (b). The residual values for the random forest regression model are
depicted in Fig 25 (c). These three graphs along with the numeric metrics presented
under Case I in Table 14 clearly indicate that the random forest regression very
effectively modeled the Case I of Godrej Consumer data.

Fig 27(a): Random Forest regression - time-varying actual and predicted open_perc (Case III)

90
Fig 27(b): Random Forest - relationship between actual and predicted open_perc (Case III)

Fig 26(a), (b), and (c) present various visual performance metrics of the random
forest regression model for Case II. It is evident from these figures that the
predicted values of the open_perc very closely follows the patterns of the actual
values. Moreover, the residual values of the regression model exhibited randomness
and no significant autocorrelations were observed among them.

It is also evident from Fig 27(a), (b), and (c) that the random forest regression was
very effective in modeling Case III. Fig 27(b) indicates there are some deviations
from linearity at the head and the tail of the linear segment that exhibited a linear
relationship between the actual and the predicted values of open_perc. This
manifested in the form of a marginally higher value of the ratio the RMSE and the
mean of the absolute values of open_perc for Case III in random forest regression.

Fig 27(c): Random Forest regression – time-varying residuals (Case III)


91
Table 15: ANN regression results

Case I Case II Case III


Metrics Training 2013 Training 2014 Test 2014
Correlation Coefficient 0.99 0.98 0.98
RMSE/Mean of Absolute Values of Actuals 17.16 31.39 36.83
Percentage of Mismatched Cases 1.21 9.38 10.90

ANN Regression: We used neuralnet function defined in the neuralnet library in R


programming language for designing the ANN regression model on Godrej data.
For Case I, 9 cases out of 745 records were found to have yielded mismatched signs
in their actual and predicted open_perc values. The RMSE, in this case, was found
to be 0.1099, while the mean of the absolute values of the actual open_perc values
was 0.6402. 68 cases out of 725 cases were found to have their signs mismatched
in their actual and predicted values of open_perc in Case II. The RMSE of the
model for Case II was found to be 0.2915. In Case III, we found that 79 cases out
of 725 cases had mismatched signs in their actual and predicted open_perc values.
RMSE for this case was found to be 0.3420. We also computed the product-moment
correlation coefficient of the predicted and actual open_perc values. The results for
the ANN regression model are presented in Table 15.

Fig 28(a): ANN regression model (Case I)

92
Fig 28(b): ANN regression - time-varying actual and predicted values of open_perc (Case I)

Fig 28(a) the ANN regression model for Case I. Only one node is used in the hidden
layer as additional nodes in this layer would have led to an overfitted model. The
link weights are written in black color while the bias values associated with the
hidden layer node and the output layer node are written in the blue color. The input
layer depicted nodes each of which corresponds to an input variable. While Fig
28(b) shows how the predicted values of open_perc followed the variational
patterns of its actual values. Fig 28(c) exhibits the linear relationship between the
predicted and the actual values of open_perc. From both these figures, it is evident
that the Case I was very elegantly modeled by ANN regression. Fig 28(d) showed
that the residuals are random and do not exhibit any autocorrelation. The correlation
for this case was found to be 0.99 and the percentage of cases that exhibited
mismatching signs in the predicted and the actual open_perc was only 1.07.

Fig 28(c): ANN regression - relationship between actual and predicted open_perc (Case I)

93
Fig 28(d): ANN regression – time-varying residuals (Case I)

Fig 29(a) depicts the ANN regression model built for modeling Case II. Fig 29(b)
and Fig 29(c) clearly show that the predicted series for the open_perc very closely
followed the patterns of its corresponding actual values. The linearity of the
relationship between the predicted and actual values of open_perc is depicted in
Fig 29(c). Fig 29(d) shows that the residuals of the regression model did not exhibit
any autocorrelation.

Fig 29(a): ANN regression model (Case II)

94
Fig 29(b): ANN regression - time-varying actual and predicted open_perc (Case II)

Fig 30(a), (b), (c), (d) the ANN regression model for Case III and the behavior of
the predicted values of open_perc with respect to its actual values, and the residuals
of the regression model. All these figures and the numerical metrics like correlation
coefficient, the ratio of RMSE and the mean of the absolute values of the actual
open_perc, and the number of cases in which the predicted values had different
signs from its actual values, all showed that the model was very accurate.

Fig 29(c): ANN regression - relationship between actual and predicted open_perc (Case II)

95
Fig 29(d): ANN regression – time-varying residuals (Case II)

Fig 30(a): ANN regression model (Case III)

Fig 30(b): ANN regression - time-varying actual and predicted open_perc (Case III)

96
Fig 30(c): ANN regression - relationship between actual and predicted open_perc (Case III)

Fig 30(d): ANN regression – time-varying residuals (Case III)

SVM Regression: In SVM regression we have used svm function defined in the
e1071 library of R programming language. For all the three cases, the regression
type used by R was eps-regression, SVM-kernel was radial. The values of the
parameters gamma and epsilon were both found to be 0.1. The algorithm found the
number of support vectors as 248, 265, and 246 for Case I, Case II, and Case III
respectively. The RMSE values for the three cases were found to be 0.3450, 0.2593,
and 0.7703 respectively. The mean of the absolute values of the open_perc was
0.6402. We computed the ratio of the RMSE values to the mean of the absolute
values of open_perc for all the three cases so as to get an idea about the magnitude
of RMSE with respect to mean of the actual open_perc values. We also identified
the cases which exhibited a difference in the signs in the actual and predicted values
of open_perc. These are the cases, where the regression model had failed to predict
the direction of the movement of the actual open_perc values.
97
Table 16: SVM regression results

Case I Case II Case III


Metrics Training 2013 Training 2014 Test 2014
Correlation Coefficient 0.93 0.98 0.83
RMSE/Mean of Absolute Values of Actuals 53.88 27.92 82.96
Percentage of Mismatched Cases 0.27 4.41 13.19

Fig 31(a): SVM regression - time-varying actual and predicted open_perc (Case I)

For Case I, 2 cases out of 745 cases were found to have exhibited sign mismatch in
the actual and the predicted values of open_perc. 32 out of 725 cases were found to
have yielded sign mismatch in Case II. In Case III, the model faced more challenges
in prediction, and thus 95 cases out of 725 cases mismatched in sign in their actual
and predicted values of open_perc. The product moment correlation coefficient
values were also computed between the actual and the predicted values of
open_perc. The SVM regression results are presented in Table 16. For all the three
cases, SVM regression was found to have yielded quite encouraging results.

98
Fig 31(b): SVM regression - relationship between actual and predicted open_perc (Case I)

Fig 31(c): SVM regression – time-varying residuals (Case I)

Fig 31(a) presents the variation of actual open_perc and its predicted values at 745
time slots for Case I. It is clear that in most of the cases, the predicted series has
been able to accurately predict the movement of the actual open_perc time series.
In Fig 32(b), we have plotted the predicted values of the open_perc as a function of
its actual values. It can be easily observed that except for some points at the tail and
the head, most of the points exhibit a strong linear relationship between the actual
and the predicted values of open_perc for Case I. The residual plots in Fig 32(c)
also depicts that most of the residuals are random within a small range with a very
few residuals exhibiting large positive or negative values.

99
Fig 32(a): SVM regression - time-varying actual and predicted open_perc (Case II)

Fig 32(b): SVM regression - relationship between actual and predicted open_perc (Case II)

Fig 32(a), (b), and (c) depict almost similar patterns as exhibited by Fig. 31(a), (b)
and (c) respectively, indicating an almost identical performance of SVM regression
in Case II as in Case I. In fact, if we closely observe the pattern of variation in Fig
32(a), we can see that the predicted open_perc series follows even more closely the
actual open_perc series, in this case. It can be verified by checking the ratio of the
RMSE to the mean of the absolute values of the actual open_perc values which was
much lower in Case I than it was in Case II.

100
Fig 32(c): SVM regression – time-varying residuals (Case II)

Fig 33(a): SVM regression - time-varying actual and predicted open_perc (Case III)

Fig 33(b): SVM regression - relationship between actual and predicted open_perc (Case III)

101
Fig 33(c): SVM regression – time-varying residuals (Case III)

However, Fig 33(a), (b), and (c) clearly shows that Case III proved to be much more
challenging for the SVM regression model. The correlation coefficient between the
actual and the predicted values of the open_perc was found to be much lower in
this case, which can be easily verified in Fig 33(a) and Fig 33 (b). While Fig 33(a)
showed that the predicted time series in many time instances failed to follow the
pattern exhibited by the actual open_perc time series, Fig 33(b) exhibited
substantial nonlinearity between the predicted and the actual open_perc values. Fig
33(c), however, depicts that the residuals were randomly scattered and did not
exhibit any significant autocorrelation.

LSTM Regression: In Chapter 5, we briefly discussed some major points on LSTM


networks in deep learning. In the following, we present, in detail, the results related
to the forecasting performance of the LSTM-based regression models used in the
three cases. For all the three cases, we followed the following steps in building the
LSTM models: (i) reading the raw data, (ii) normalizing the data, (iii) converting
the normalized data into a time series and then into a supervised learning problem,
(v) creating a deep learning model using Tensorflow and Keras frameworks, (vi)
training and validating the model, (vii) visualization of the training and validation
performance, and (viii) evaluating the predicting accuracy of the model on test data.
For all the three cases, the raw data consisted of the following attributes: (i) year,
(ii) month, (iii) day, (iv) hour (i.e., the time slot), (v) open, (vi) high, (vii) low, (viii)
close, (ix) volume, and (x) the NIFTY index. Using Python programming, we
102
combined the attributes (i) through (iv) into a single attribute so that the resultant
dataset consisted of seven attributes. We provide the details of the three cases in the
following.

Fig 34(a): LSTM regression – stock data representation (Case I)

Fig 34(b): LSTM model architecture: (Case I, Case II and Case III)

For Case I, we first plot the open, high, low, close, volume, and the NIFTY time
series. In this case, there were 746 records in total. Fig 34(a) depicts the time series
for each of the attributes in Case I. All these six attributes (leaving out the time
attribute) are then normalized using the MinMaxScalar function defined in the

103
sklearn.processing module in Python. Out of the 746 records, the first 500 records
are used for training and the remaining 246 for the validation. The Sequential
function defined in Keras is used for building the LSTM and the model is compiled
using MAE as the loss function and ADAM as the optimizer. The model
architecture is depicted in Fig, 34 (b). The input layer consisted of six-time series
data as six channels and the output of the input layer is passed on to the LSTM layer
that expands the feature set to 50. The output of the LSTM is passed on to a dense
layer (i.e., a fully connected layer) that has 50 nodes in its input and 1 node at the
final output layer. The behavior of the training and the validation loss values is
studied for different values of epochs and batch sizes. With a batch size of 72 and
an epoch value of 100, the training and validation losses are found to have
converged to a very low value. Fig 34(c) presents the behavioral patterns of the
training and the validation losses in Case I. At the completion of the final epoch,
the RMSE value was 8.812, and Pearson’s product moment correlation coefficient
was 0.983 between the actual and the predicted open values. The training and
validation loss values were 0.0194 and 0.0252 respectively.

Fig 34(c): LSTM regression – training and validation error (Case I)

Case II involved stock prices for the entire year 2014 and it consisted of 725 tuples.
As in Case II, first the six attributes are plotted for all the 725 records. Fig 35(a)
depicts the plots for the attributes – open, high, low, close, volume, and the NIFTY
index. Similar to Case I, the raw values of these six attributes are normalized using
the MinMaxScalar function. The LSTM model architecture for the case is exactly
identical to that of Case I and is represented in Fig. 34(b).
104
Fig 35(a): LSTM regression – stock data representation (Case II)

Fig 35(b): LSTM regression – training and validation error (Case II)

The first 500 records are used for model construction and the remaining 225 records
are utilized in validating the model. The validation loss converged with the training
loss at an epoch value of 40. However, it started increasing again with the increase
in epoch value. The validation loss converged finally with the training loss at an
epoch value of 100, and with a batch size of 72. The RMSE of the model was found
to 15.002 with a correlation value of 0.982 between the actual and the predicted
open values. The training and the validation loss were 0.0134 and 0.0301
respectively, after the completion of the last epoch. Fig 35(b) depicts the pattern of
variation of the training and the validation loss with different epochs in Case II.

105
Fig 36(a): LSTM regression – stock data representation (Case III)

Fig 36(b): LSTM regression – training and testing error (Case III)

In Case III, the LSTM model was built using the records of the year 2013, and then
the model was tested on the records of the year 2014. The raw dataset, in this case,
consisted of 1471 records in total, of which 746 records (those belonging to the year
2013) were used in building the model, and the remaining 725 records (those
belonging to the year 2014) were used for testing the model. Fig 36(a) presents the
plots of the open, high, low, close, volume, and the NIFTY time series for this case
with 1471 records. The LSTM model architecture in Case III remains identical to
those of Case I and Case II. The training and the test losses were found to have
converged at an epoch value of 60 with a batch size of 72. The RMSE and the
correlation values for this case were found to be 13.477 and 0.996 respectively. The

106
training and the test losses were 0.0116 and 0.0258 respectively. Fig 36(b) depicts
the patterns exhibited by the training and the testing losses with different values of
epoch.

Fig 37: CNN regression – stock data representation

CNN Regression: In Chapter 5, we discussed briefly about the way we have used
CNN regression to carryout multi-step forecasting of the open values of the Godrej
Consumer Products stock time series. We have followed a slightly different
approach in this case. We used Godrej Consumer Products stock price data for the
period December 31, 2012 (Monday) till January 9, 2015 (Friday). During this
period, the stock price movements have been captured at 5 minutes interval time.
At each slot the values of open, high, low, close, and volume are available. The
stock price data for the period December 31, 2012 till December 30, 2013 has been
used as the training dataset, and for the purpose testing, the data for the period
107
December 31, 2013 till January 9, 2015 has been used. The entire dataset has also
been arranged in the form of a weekly sequence: Monday to Friday. Fig. 37 depicts
the data at 5 minutes interval for the entire period under consideration. As
mentioned earlier in Chapter 5, we followed four different approaches to CNN
regression for the Godrej dataset. We describe them as under:

Fig 38: CNN model architecture – Univariate multistep with one week’s data as input (N = 5)

Case I: Univariate Forecasting with One-week prior data (N=5) – With one-week
prior data used for building a univariate forecasting model using a CNN, we had a
small amount of data and hence a very light model. We used only one convolution
layer with 16 filters and a kernel size of 3. In other words, it means that the input
sequence of five days is read with a convolutional operation in three time-steps at
a time and this operation is performed 16 times. A max pooling layer of size 2 is
used that reduces the size of the feature maps before the internal representation is
flattened to one long vector. This is then interpreted by a fully-connected layer
before the output layer (which is also fully connected) predicts the open values for

108
the next five days. Fig. 38 depicts the architecture of the CNN model for Case I.
Both for the convolution layer and the fully connected layer, the ReLU (Rectified
Linear Unit) function has been used as the activation function. The “ADAM”
implementation of the stochastic gradient descent algorithm has been used as the
optimizer with 20 epochs and a batch size of 4. The loss function used was mean
squared error (MSE). For computing the error in prediction, we used root mean
squared error (RMSE) as the metric. Since with small batch size and with the use
of stochastic nature of the gradient descent algorithm, the model is expected to learn
a slightly different mapping of the inputs to the outputs very time it is trained. This
implies that performance results will vary slightly in different runs.

Table 17: CNN regression results (Case I: Univariate multi-step N=5)


Round_No Overall RMSE Monday Tuesday Wednesday Thursday Friday Exec_Time
1 5.368 4.6 4.7 5.4 5.6 6.2 79.795
2 5.482 3.9 4.8 5.8 5.6 6.9 80.419
3 6.074 5.3 5.3 6.5 6.1 6.9 86.294
4 5.647 4.7 5.4 6.0 6.0 6.1 85.200
5 5.226 3.8 4.7 5.3 6.0 6.0 84.000
6 5.340 3.8 4.7 5.5 6.1 6.2 81.377
7 5.357 3.7 5.2 5.8 5.7 6.0 81.940
8 6.018 3.9 5.1 6.1 7.4 6.9 81.626
9 5.121 3.7 4.7 5.4 5.6 6.0 80.998
10 7.019 6.0 7.2 6.4 6.6 8.6 82.542
11 6.657 5.9 6.6 6.5 6.8 6.9 84.902
12 6.598 4.9 7.5 7.0 6.3 6.9 82.832
13 5.364 3.9 4.9 5.4 5.9 6.4 81.955
14 5.371 3.9 4.7 5.7 5.7 6.5 83.065
15 5.769 5.2 6.0 5.8 5.7 6.2 79.991
16 5.470 4.7 5.2 5.9 5.6 5.9 98.986
17 5.392 4.9 5.0 5.3 5.6 6.0 80.584
18 5.704 4.6 5.5 6.2 6.0 6.1 79.428
19 5.169 3.8 4.7 5.4 5.7 6.0 78.198
20 5.636 4.5 5.3 5.8 5.7 6.6 79.228
Mean 5.6891 4.485 5.36 5.86 5.985 6.465 82.668
SD 0.530 0.723 0.838 0.473 0.477 0.624 4.392
Min 5.121 3.7 4.7 5.3 5.6 5.9 78.198
Max 7.019 6 7.5 7 7.4 8.6 98.986
RMSE/Mean 0.0065 0.0052 0.0062 0.0068 0.0069 0.0075

We tested the model for 20 rounds and noted the performance of the model with
respect to its overall RMSE, the RMSE values for the individual days of a week
(i.e., Monday – Friday), the execution time of the model, and the ratio of the RMSE
to the mean value of the variable predicted (i.e., mean of the open value for the test
dataset). Table 17 depicts the results for the performance of the CNN model for
Case I. It may be noted that the mean value of open for the test data is 866.5875.

109
The training and the test data consisted of 19500 and 20250 records respectively.
The execution time for the models has been expressed in seconds. The model has
been executed on a system consisting of Intel i7 CPU with a clock speed of 2.60
GHz- 2.59 GHz and 16 GB random access memory (RAM) running on Windows
10 operating system.

Fig 39: CNN model architecture – Univariate multistep with two week’s data as input (N = 10)

Case II: Univariate Forecasting with Two-week prior data (N=10) – The
architecture of the model in the case is identical to that in Case I. However, the
model is fed with two weeks’ prior data (i.e., ten immediate past open values) for
the purpose of forecasting the open values of the subsequent week. Fig. 39 depicts
the architecture of the CNN model for Case II. Table 18 depicts the results for the
performance of the model for the 20 rounds of Case II. The execution time for the
model has been expressed in seconds. The system hardware and operating system
details on which the model was tested has been mentioned earlier under Case I.
110
Table 18: CNN regression results (Case II: Univariate multi-step N=10)
Round_No Overall RMSE Monday Tuesday Wednesday Thursday Friday Exec_Time
1 5.307 3.9 4.6 5.6 5.7 6.4 85.386
2 6.042 3.9 6.2 6.4 6.8 6.6 85.845
3 5.378 4.0 5.0 5.6 5.6 6.3 84.247
4 5.278 3.5 4.6 5.5 5.9 6.3 108.214
5 5.592 4.4 5.0 5.7 6.0 6.7 88.994
6 8.852 6.6 8.7 9.6 10.4 8.5 83.484
7 5.294 3.8 5.3 5.5 5.7 6.0 85.899
8 6.061 4.7 4.9 5.9 6.8 7.5 88.373
9 5.229 3.9 4.9 5.4 5.8 5.8 88.297
10 5.857 5.6 5.3 5.3 5.9 7.1 84.636
11 5.227 4.8 4.7 5.6 5.4 5.7 101.964
12 8.797 7.8 8.4 9.0 9.5 9.1 87.828
13 7.190 5.3 6.6 7.9 8.5 7.2 81.649
14 5.697 4.9 5.7 5.8 5.9 6.1 84.072
15 5.314 3.6 4.5 5.6 6.3 6.1 87.396
16 5.186 3.6 4.8 5.3 5.7 6.2 81.858
17 7.110 9.3 6.0 6.5 7.1 6.2 88.401
18 5.356 3.7 4.5 5.2 6.0 6.8 84.992
19 5.210 4.1 4.6 5.3 5.6 6.2 84.526
20 6.889 6.7 6.6 7.1 6.9 7.1 82.084
Mean 6.043 4.905 5.545 6.190 6.575 6.695 87.407
SD 1.145 1.578 1.228 1.261 1.370 0.871 6.528
Min 5.186 3.500 4.500 5.200 5.400 5.700 81.649
Max 8.852 9.300 8.700 9.600 10.400 9.100 108.214
RMSE/Mean 0.0070 0.0057 0.0064 0.0071 0.0076 0.0077

Case III: Multivariate Forecasting with Two-week prior data (N=10) – In this case,
with a multi-channel CNN approach, we used each of the five-time series variables,
open, high, low, close, and volume for forecasting the next week’s open values. We
do this by providing each one-dimensional time series to the model as a separate
channel of input. In this case, CNN uses a separate kernel and reads each input
sequence onto a separate set of filter maps (i.e., feature maps), essentially learning
features from each input time series variable. Five input variables are used with two
weeks of prior data for the purpose of training the model. The increase in the
amount of data requires a larger and more sophisticated model that is trained for a
longer time. We used two convolutional layers with 32 filter maps with a kernel
size of 3, followed by a max pooling layer of size 2, then another convolutional
layer with 16 filter maps with a kernel size of 3, and a max pooling layer of size 1.
The fully connected layer that interprets the features is increased to 100 nodes and
the model is fit for 70 epochs with a batch size of 16 samples of records. The
activation function for all layers has been chosen as ReLU and the ADAM
optimizer being used for optimizing the stochastic gradient descent algorithm. Fig.
40 depicts the architecture of the CNN model for the multivariate time series with
111
two weeks’ previous data as the input. Table 19 depicts the results of performance
for Case III. The execution time for each round of execution of the model has been
expressed in seconds.

Fig 40: CNN model architecture – Multivariate multistep with two week’s data as input (N = 10)

112
Table 19: CNN regression results (Case III: Multivariate multi-step N=10)
Round_No Overall RMSE Monday Tuesday Wednesday Thursday Friday Exec_Time
1 8.466 7.9 6.7 7.4 9.6 10.1 134.286
2 5.510 4.2 5.2 5.6 6.1 6.3 122.020
3 6.003 4.7 5.6 6.3 6.4 6.8 111.729
4 6.718 5.6 6.4 7.0 7.2 7.3 116.582
5 5.602 4.4 5.2 5.7 6.1 6.4 131.704
6 6.056 4.3 5.9 6.2 6.7 6.8 113.107
7 5.585 4.3 5.0 5.7 5.9 6.8 137.406
8 6.220 4.6 6.6 6.5 6.4 6.8 113.899
9 5.710 4.5 5.4 5.9 6.1 6.4 113.717
10 6.101 5.1 6.0 6.2 6.3 6.7 111.018
11 6.708 6.7 6.6 6.5 6.6 7.1 131.253
12 5.398 4.1 5.0 5.5 5.9 6.2 139.208
13 7.956 8.3 6.4 6.4 7.8 10.2 114.197
14 7.061 6.5 6.3 6.4 8.4 7.5 115.814
15 5.870 4.5 5.6 6.1 6.4 6.5 116.391
16 6.070 4.7 5.7 6.2 6.5 7.0 113.899
17 5.977 5.2 5.4 6.0 6.4 6.7 116.089
18 5.760 4.8 5.0 6.5 6.1 6.2 114.035
19 5.647 4.9 5.2 5.7 6.0 6.3 112.558
20 5.547 4.4 5.1 5.6 5.9 6.5 116.758
Mean 6.198 5.185 5.715 6.17 6.64 7.03 119.784
SD 0.819 1.218 0.601 0.488 0.949 1.124 9.307
Min 5.398 4.1 5 5.5 5.9 6.2 111.018
Max 8.466 8.3 6.7 7.4 9.6 10.2 139.208
RMSE/Mean 0.0072 0.0060 0.0066 0.0071 0.0077 0.0081

Case IV: Forecasting with multivariate input data with sub-model (N=10) – In this
case, we constructed a separate sub-CNN model for each of the five input variables,
which we refer to as a multi-headed CNN model. The configuration of the model,
including the number of layers and their hyperparameters, is modified to optimize
the overall model performance. Two convolutional layers with 32 feature maps and
kernel size of 3 are used followed by a max pooling layer of size 2. Two dense
layers are used for the output size each consisting of 200 and 100 nodes respectively
before the output layer receives the data using 100 nodes at its input and produces
finally 5 output values through 5 nodes at its output layer. ReLU was the chosen
activation function and the optimizer was ADAM. The number of epochs and the
batch size was 25 and 16 respectively. The multi-headed model is specified using a
more flexible functional API for defining Keras models (Brownlee, 2019). The
program designed for this approach loops over each input variable, and creates a
sub-model that takes a one-dimensional sequence of 10 days (two weeks) of data
and outputs a flat vector containing a summary of the learned features from the
sequence. Each of these vectors is merged by concatenation to make one very long
vector that is then interpreted by some fully-connected layers before the forecast

113
for the next week is made. The model needs four arrays as input – one each for the
sub-models. We achieved this by creating a list of three-dimensional (3D) arrays.
Fig. 41 depicts the architecture of the CNN model for Case IV.

Fig 41: CNN model architecture – Multivariate sub-models with two week’s data as input (N = 10)

Table 20 depicts the performance results of Case IV. The model execution time for
each of the 20 rounds has been expressed in seconds.

Table 20: CNN regression results (Case IV: Multiheaded CNN N=10)
Round_No Overall RMSE Monday Tuesday Wednesday Thursday Friday Exec_Time
1 5.765 5.3 5.5 5.8 5.7 6.4 88.420
2 9.371 8.5 7.9 8.7 11.8 11.1 101.872
3 6.480 5.8 6.4 7.1 6.6 6.5 88.159
4 6.632 4.7 5.8 7.3 7.5 7.4 89.144
5 6.458 6.8 5.5 6.2 6.4 7.3 83.641
6 5.013 3.6 5.3 5.0 5.3 5.6 87.065
7 5.271 4.3 5.0 5.0 5.4 6.4 91.820
8 5.393 2.8 4.3 5.2 5.7 7.7 96.391
9 5.112 3.7 4.9 5.3 5.5 5.9 83.661
10 4.923 3.6 4.6 5.0 5.3 5.8 80.920
11 7.133 3.8 6.1 8.0 7.7 8.9 100.523
12 5.198 3.8 4.7 5.4 5.7 6.0 85.323
13 6.544 4.8 6.8 6.4 7.2 7.3 84.169
14 6.782 4.8 5.8 7.1 7.6 8.1 84.272
15 5.502 3.9 5.3 6.0 5.8 6.1 84.145
16 5.343 5.7 4.7 5.3 5.4 5.5 84.995
17 4.853 3.3 4.3 4.7 5.6 5.9 104.519
18 6.018 6.2 4.5 5.5 5.9 7.6 94.429
19 6.114 4.1 7.2 6.6 6.1 6.1 83.603
20 6.205 3.4 5.7 5.3 7.4 8.1 99.166
Mean 6.006 4.645 5.515 6.045 6.48 6.985 89.812
SD 1.050 1.397 0.984 1.109 1.506 1.370 7.170
Min 4.853 2.800 4.300 4.700 5.300 5.500 80.920
Max 9.371 8.500 7.900 8.700 11.800 11.100 104.519
RMSE/Mean 0.0069 0.0054 0.0064 0.0070 0.0075 0.0081
114
From Tables 17, 18, 19, and 20, it is easy to observe that Case I with Univariate
multi-step walk-forward forecasting method with prior one week’s data as the input
is the most accurate model yielding a ratio of RMSE to the mean of the actual values
as 0.0065. This model is also found to be the fastest in execution with a mean
execution time of 82.668 seconds, and with a small value of 4.392 of the standard
deviation of the execution time for 20 rounds. Case III with multivariate, multi-
step, walk-forward forecasting with prior two weeks’ data as the input is found to
have performed the worst among all the CNN models. This model yielded a value
of 0.0072 for the ratio of the RMSE to the mean of the actual values. The execution
time for Case III was also observed to be the longest with a mean execution time
of 119.784 seconds and a standard deviation of 9.307 for 20 rounds of execution of
the model. It is also noted that the ratio of RMSE to the mean value of the response
variable is the lowest for Monday and the same is the highest for Friday. The ratio
of RMSE to the mean value of the response value consistently increased from
Monday through Friday.

We now present a summary of the performance results for all the machine learning-
based classification and regression results.

Overall Performance: Finally, we summarize the performance of different


predictive models that we have built, validated, and tested on the stock price data
of Godrej Consumer Products for the period of January 2013 till December 2014.
Tables 21 – 23 present the performance of the classification models under Case I,
Case II, and Case III respectively. For each case and for each metric, the model that
exhibited the best performance has been marked with a bold font.

Table 21: Summary of the performance of the classification models in Case I

LR KNN DT BAG BOOST RF ANN SVM


Sensitivity 94.79 89.57 95.09 95.09 100.00 94.48 95.40 94.46
Specificity 97.61 96.42 98.09 98.09 100.00 97.61 97.61 95.58
PPV 96.87 95.11 97.48 97.48 100.00 96.86 96.88 94.17
NPV 96.01 92.24 96.25 96.25 100.00 98.08 96.46 98.09
CA 96.38 93.42 96.78 96.78 100.00 96.24 96.64 96.38
F1 Score 95.82 92.26 96.27 96.07 100.00 95.66 96.13 94.31

115
Table 22: Summary of the performance of the classification models in Case II

LR KNN DT BAG BOOST RF ANN SVM


Sensitivity 94.83 86.93 92.40 95.44 100.00 93.01 93.62 94.67
Specificity 95.96 92.93 95.71 96.46 100.00 94.19 95.71 93.35
PPV 95.12 91.08 94.70 95.73 100.00 93.01 94.77 91.79
NPV 95.72 89.54 93.81 96.22 100.00 94.19 94.75 95.71
CA 95.45 90.21 94.21 96.00 100.00 93.66 94.76 93.93
F1 Score 94.97 88.96 93.54 95.58 100.00 93.01 94.19 93.21

We observe that both for Case I and Case II and all the metrics, boosting performed
the best among all the classification models. However, considering the fact that
Case I and Case II exhibit only the training accuracies, the performance in the Case
III should be considered as the most critical as it demonstrates the test accuracy of
a model. From Table 23, we find that ANN performed the best on sensitivity and
NPV while boosting outperformed all other models on specificity, PPV, and
classification accuracy. However, random forest was found to have performed best
on the F1 score, which is usually considered to be the most important metric in
classification. In Tables 21-26, the following abbreviations are used in the column
names: LR – Logistic Regression, KNN – K-Nearest Neighbor, DT- Decision Tree,
BAG – Bagging, BOOST – Boosting, RF – Random Forest, ANN – Artificial
Neural Networks, SVM – Support Vector Machines, LSTM – Long- and- Short-
Term Memory.

Table 23: Summary of the performance of the classification models in Case III

LR KNN DT BAG BOOST RF ANN SVM


Sensitivity 92.10 84.50 89.97 89.97 92.10 91.19 99.70 93.81
Specificity 89.39 48.99 92.42 92.42 93.43 92.93 34.60 90.19
PPV 87.83 57.92 90.80 90.78 92.10 91.46 55.88 87.54
NPV 93.16 79.18 91.73 91.73 93.43 92.70 99.28 95.20
CA 90.62 65.10 91.31 91.31 92.83 92.14 64.14 91.72
F1 Score 89.91 68.73 90.38 90.37 92.10 91.32 71.62 90.57

Table 24: Summary of the performance of the regression models in Case I

MV MARS DT BAG BOOST RF ANN SVM LSTM


Correlation 0.99 0.99 0.97 0.96 0.99 0.99 0.99 0.93 1.00
RMSE/Mean 13.32 12.41 35.35 40.29 23.40 16.26 17.16 53.88 7.94
Mismatched Cases 18.67 1.21 13.42 4.70 0.81 0.00 1.21 0.27 0.00

Tables 24 – 26 present the performance of the regression models, including the


LSTM-based deep learning model. Since the LSTM model has outperformed the

116
machine learning models on all metrics and for all the three cases, we have also
noted down the best performing machine learning model on each metric.

Table 25: Summary of the performance of the regression models in Case II


MV MARS DT BAG BOOST RF ANN SVM LSTM
Correlation 0.99 0.99 0.97 0.98 0.99 0.99 0.98 0.98 1.00
RMSE/Mean 18.84 17.09 37.04 25.70 17.35 10.82 31.39 27.92 4.04
Mismatched Cases 5.38 4.28 17.38 5.10 4.69 2.62 9.38 4.41 0.00

In Case I, multivariate regression, MARS, boosting, random forest, and ANN all
yielded the highest correlation coefficient value of 0.99. However, the correlation
coefficient was found to be 1.00 in the case of LSTM. For the ratio of the RMSE to
the mean of the absolute values of the open_perc values, MARS yielded the lowest
value of 12.41 among the machine learning models, while the corresponding value
for LSTM was 7.94. Both random forest and LSTM yielded no sign mismatch
among the predicted and the actual values of the open_perc.

Table 26: Summary of the performance of the regression models in Case III

MV MARS DT BAG BOOST RF ANN SVM LSTM


Correlation 0.99 0.99 0.10 0.97 0.97 0.97 0.98 0.83 0.99
RMSE/Mean 18.88 20.40 165.92 34.91 41.51 32.02 36.83 82.96 2.36
Mismatched Cases 5.24 6.34 47.72 9.24 6.90 6.48 10.90 13.19 2.40

In Case II, the highest value of the correlation coefficient was achieved by
multivariate regression, MARS, boosting, and random forest. LSTM outperformed
all the machine learning models on this metric by attaining a value of 1.00. The
RMSE to the mean ratio value of 10.82 was the least for random forest among the
machine learning models. However, the corresponding value yielded by LSTM was
4.04. Random forest produced only 2.62 percent cases that mismatched in the signs
of the actual and predicted open_perc values, however for LSTM, all the cases had
the same sign for the actual and the predicted open_perc values.

For Case III, while LSTM exhibited the best performance on all metrics,
multivariate regression and MARS yielded the same (the highest) value for the
correlation coefficient. For the metric RMSE to the mean ratio and the percentage

117
of the mismatched cases, multivariate regression produced the best results among
the machine learning models.

It may be noted that the CNN models worked on the stock price data of Godrej
Consumer Products during the period December 31, 2012 till January 9, 2015,
while the data were collected at 5 minutes interval of time for each day of a week:
Monday through Friday. Since all other models built in this work are based on stock
price data aggregated into three slots in a day, it is not wise to compare the
performance of the CNN model suites with the machine learning-based models and
the LSTM models. However, one can easily see that based on the ratio of the RMSE
to the mean of the actual values of the forecasted variable, all the CNN models
outperform the LSTM by a large margin. While the least value for the ratio of the
RMSE to the actual value of the forecasted variable for the LSTM model was found
to be 2.36, the corresponding value for the CNN suite was 0.0065.

118
Chapter 7

Conclusion and Future Work

In this work, we have proposed a robust forecasting framework for stock price and
stock price movement pattern prediction with a very high level of accuracy. The
predictive model consists of eight classification and eight regression models based
on several machine learning approaches. In addition to that, the framework also
includes two deep learning models of regression using an LSTM network and a
suite of CNNs. All these models work on a short-term time horizon, and they have
the ability to forecast stock price movement and stock price on the basis of three-
time slots on a given day. We constructed the models, trained, validated, and finally
tested them using the historical stock prices of a company – Godrej Consumer
Products Ltd. The data is taken from the listed values of the stock in the National
Stock Exchange (NSE) of India during the period of two years – January 2013 till
December 2014. The stock price data were extracted from the NSE database at five
minutes interval of time using the Metastock tool. After its collection, the raw data
were pre-processed, appropriate transformation (i.e., normalization,
standardization, NA removal, etc.) done, and a number of derived predictor
variables were created based on the rich features of the stock data. While a number
of newly derived predictors were used in building the model, we used the
percentage change in the open values of the stock, called open_perc, as the response
variable. The five minutes interval granular data are also aggregated into three slots
on a given day so that the predictive models can be built to forecast the value of the
open_perc in the next slot given stock price data till the current slot. While the
classification-based models are used to predict the movement pattern of open_perc
values, the objective of the regression models is to accurately predict the value of
the open_perc. In addition to exploiting the machine learning algorithms for
119
building the eight classification and eight regression models, we also leveraged the
rich features of Tensorflow and Keras frameworks in building two extremely
powerful deep learning-based regression models using an LSTM network and a
suite of CNNs. For building the machine learning models, we used R programming
language, while for the LSTM-based deep learning regression model, and the suite
of four CNN models, Python programming has been used. The models are trained,
validated, and tested on the stock data and extensive results are produced and
critically analyzed. The results elicited a very interesting observation. While there
was not a single machine learning model that performed the best on all the metrics
on classification and regression, the deep learning model using an LSTM network
outperformed all the regression models on every metric that we considered. Since
the CNN models were built using stock price data collected at 5 minutes interval of
time while the machine learning models and the LSTM models were based on stock
price data collected at three slots in a day, it is not wise to compare the performance
of the CNN suite with the other models. However, it has been found that based on
the metric of the ratio of the RMSE to the mean of the actual values of the forecasted
variable, the CNN models are far more accurate than the machine learning models
and the LSTM-based deep learning model of regression.

In another recently published work, we have also studied the efficacy and accuracy
of a CNN-based deep learning regression model in time series forecasting (Mehtab
& Sen, 2020). It is a very well-known fact now that deep learning models have a
much higher capability of extracting and learning the features from a time series
data than their machine learning counterparts. However, in order to exploit the
power of deep learning models, the volume of data should be very large. As a future
scope of work, we would explore the use of a large variety of hybrid LSTM models
such as: univariate and multivariate encoder-decoder LSTM models, CNN-LSTM
models, convolutional LSTM models and generalized adversarial networks (GAN)
in forecasting stock price movements patterns and stock price values. We believe
that an integrated approach to building deep learning models combining the power
of LSTM, CNN, and GAN can be a very interesting area of work in this direction.

120
References

Adebiyi, A., Adewumi, O., & Ayo, C.K. (2014). Stock Price Prediction Using the
ARIMA Model. Proceedings of the International Conference on Computer
Modelling and Simulation (UKSIM’14), March 2014, pp. 106 – 112,
Cambridge, UK. DOI: 10.1109/UKSim.2014.67.

Aussem, A. & Murtagh, F. (1997). Combining Neural Network Forecasts on


Wavelet-Transformed Time Series. Connection Science, Vol 9, Issue 1, pp.
113-122. DOI: 10.1080/095400997116766.

Basalto, N., Bellotti, R., De Carlo, F., Facchi, P., & Pascazio, S. (2005). Clustering
Stock Market Companies via Chaotic Map Synchronization. Physica A:
Statistical Mechanics and its Applications, Vol 345, Nos 1 – 2, pp. 196-206,
January 2005. DOI: 10.1016/j.physa.2004.07.034.

Basu, S. (1983). The Relationship between Earnings Yield, Market Value, and
Return for NYSE Common Stocks: Further Evidence. Journal of Financial
Economics, Vol 12, No 1, pp. 129-156, June 1983. DOI: 10.1016/0304-
405X(83)90031-4.

Bentes, S. R., Menezes, R., & Mendes, D. A. (2008). Long Memory and Volatility
Clustering: Is the Empirical Evidence Consistent across Stock Markets?
Physica A: Statistical Mechanics and its Applications, Vol 387, No 15, pp.
3826-3830, June 2008. DOI: 10.1016/j.physa.2008.01.046.

Binkowski, M., Marti, G., & Donnat, P. (2017). Autoregressive Convolutional


Neural Networks for Asynchronous Time Series. Proceedings of the ICML
2017 Time Series Workshop, Sydney, Australia.

Brownlee, J. (2019). Introduction to Time Series Forecasting with Python.

Chen, A.-S., Leung, M. T. & Daouk, H. (2003). Application of Neural Networks to

121
an Emerging Financial Market: Forecasting and Trading the Taiwan Stock
Index. Computers and Operations Research, Vol 30, No 6, pp. 901– 923.
DOI: 10.1016/S0305-0548(02)00037-0.

Chen, Y., Dong, X. & Zhao, Y. (2005). Stock Index Modeling using EDA Based
Local Linear Wavelet Neural Network. Proceedings of International
Conference on Neural Networks and Brain, 13 – 15 October 2005, Beijing,
China, pp. 1646 – 1650. DOI: 10.1109/ICNNB.2005.1614946.

Chui, A. & Wei, K. C. (1998). Book-to-Market Firm Size, and the Turn-of-the Year
Effect: Evidence from Pacific-Basin Emerging Markets. Pacific-Basin
Finance Journal, Vol 6, No 3-4, pp. 275-293, August 1998. DOI:
10.1016/S0927-538X(98)00013-4.

de Faria, E. L., Albuquerque, M. P., Gonzalez, J. L., Cavalcante, J. T. P. &


Albuquerque, M. P. (2009). Predicting the Brazilian Stock Market through
Neural Networks and Adaptive Exponential Smoothing Methods. Expert
Systems with Applications, Vol 36, No 10, pp. 12506-12509. DOI:
10.1016/j.eswa.2009.04.03210.1016/j.eswa.2009.04.032.

Dutta, G., Jha, P., Laha, A. K. & Mohan, N. (2006). Artificial Neural Network
Models for Forecasting Stock Price Index in the Bombay Stock Exchange.
Journal of Emerging Market Finance, Vol 5, No 3, pp. 283-295, December
2006. DOI: 10.1177/097265270600500305.

Fama, E.F. & French, K.R. (1995). Size and Book-to-Market Factors in Earning
and Returns. Journal of Finance, Vol 50, No 1, pp. 131-155, March 1995.
DOI: 10.1111/j.1540-6261.1995.tb05169.x

Fu, T-C, Chung, F-L., Luk, R., & Ng, C-M, (2008). Representing Financial Time
Series Based on Data Point Importance. Engineering Applications of
Artificial Intelligence, Vol 2, No 2, pp. 277-300, March 2008. DOI:
10.1016/j.engappai.2007.04.009.

Geron, A. (2019). Hands-on Machine Learning with Scikit-Learn Keras &


122
Tensorflow. O’Reilly Media Inc., 2nd Edition, September 2019. ISBN:
9781492032632.

Hanias, M., Curtis, P. & Thalassinos, J. (2012). Time Series Prediction with Neural
Networks for the Athens Stock Exchange Indicator. European Research
Studies Journal, Vol 15, No 2, pp. 23-32. DOI: 10.35808/ersj/351.

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer Feedforward


Networks are Universal Approximators. Neural Networks, Vol 2, No 5, pp.
359-366. DOI: 10.1016/0893-6080(89)90020-8.

Hutchinson, J. M., Lo, A. W., & Poggio, T. (1994). A Nonparametric Approach to


Pricing and Hedging Derivative Securities via Learning Networks. Journal of
Finance, Vol 49, No 3, pp. 851-889, January 1994. DOI: 10.1111/j.1540-
6261.1994.tb00081.x.

Jaffe, J., Keim, D. B., & Westerfield, R. (1989). Earnings Yields, Market Values,
and Stock Returns. Journal of Finance, Vol 44, No 1, pp. 135-148, March
1989. DOI: 10.1111/j.1540-6261.1989.tb02408.x

Jarrett, J. E. & Kyper E. (2011). ARIMA Modeling with Intervention to Forecast


and Analyze Chinese Stock Prices. International Journal of Engineering
Business Management, Vol 3, pp. 53-58, January 2011. DOI: 10.5772/50938.

Jaruszewicz, M. & Mandziuk, J. (2004). One day Prediction of Nikkei Index


Considering Information from Other Stock Markets. Proceedings of the
International Conference on Artificial Intelligence and Soft Computing,
LNCS Vol 3070, pp. 1130 – 1135, Zakopane, Poland, June 7 – 11, 2004. DOI:
10.1007/976-3-540-24844-6_177.

Kimoto, T., Asakawa, K., Yoda, M. & Takeoka, M. (1990). Stock Market
Prediction System with Modular Neural Networks. Proceedings of the IEEE
International Joint Conference on Neural Networks (IJCNN), 17-21 June
1990, San Diego, CA, USA. DOI: 10.1109/IJCNN.1990.137535.

123
Lahmiri, S. (2014). Wavelet Low- and High- Frequency Components as Features
for Predicting Stock Prices with Backpropagation Neural Networks. Journal
of King Saud University – Computer and Information Sciences, Vol 26, Issue
2, pp. 218-227. DOI: 10.1016/j.jksuci.2013.12.001.

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning
Applied to Document Recognition. Proceedings of the IEEE, Vol 86, No 11,
pp. 2278-2324, November 1998. DOI: 10.1109/5.726791

Leigh, W., Hightower, R. & Modani, N. (2005). Forecasting the New York Stock
Exchange Composite Index with Past Price and Interest Rate on Condition of
Volume Spike. Expert Systems with Applications, Vol 28, No 1, pp. 1-8,
January 2005. DOI: 10.1016/j.eswa.2004.08.001.

Liao, S-H., Ho, H-H., Lin, H-W. (2008). Mining Stock Category Association and
Cluster on Taiwan Stock Market. Expert System with Applications, Vol 35,
Nos 1-2, pp. 19-29, July-August 2008. DOI: 10.1016/j.eswa.2007.06.001.

Mehtab, S. & Sen, J. (2020). Stock Price Prediction Using Convolutional Neural
Networks on a Multivariate Time Series. Proceedings of the 3rd National
Conference on Machine Learning and Artificial Intelligence (NCMLAI’20),
New Delhi, India, February 1, 2020.

Mehtab, S. & Sen, J. (2019). A Robust Predictive Model for Stock Price Prediction
Using Deep Learning and Natural Language Processing. Proceedings of the
7th International Conference on Business Analytics and Intelligence
(BAICONF’19), Bangalore, India, December 5 – 7, 2019. DOI:
10.2139/ssrn.3502624.

Metastock Website: http://www.metastock.com

Mishra, S. (2016). The Quantile Regression Approach to Analysis of Dynamic


Interaction between Exchange Rate and Stock Returns in Emerging Markets:
Case of BRIC Nations. IUP Journal of Financial Risk Management, Vol 13,
No 1, pp. 7-27, March 2016.
124
Mittelman, R. (2015). Time Series Modeling with Undecimated Fully
Convolutional Neural Networks. arXiv preprint arXiv: 1508.00317.

Mondal, P., Shit, L., & Goswami, S. (2014). Study of Effectiveness of Time Series
Modeling (ARMA) in Forecasting Stock Prices. International Journal of
Computer Science, Engineering and Applications, Vol 4, No 2, pp. 13-29,
April 2014. DOI: 10.5121/ijcsea.2014.4202

Moshiri, S. & Cameron, N. (2010). Neural network versus econometric models in


forecasting inflation. Journal of Forecasting, Vol 19, No 3, pp. 201-217,
April 2000.

Mostafa, M. M. (2010). Forecasting Stock Exchange Movements Using Neural


Networks: Empirical Evidence from Kuwait. Expert Systems with
Application, Vol 37, No 9, pp. 6302-6309, September 2010. DOI:
10.1016/j.eswa.2010.02.091.

Phua, P. K. H., Ming, D., & Lin, W. (2001). Neural Network with Genetically
Evolve Algorithms for Stock Prediction. Asia-Pacific Journal of Operational
Research, Vol 18, No 1, pp. 103 – 107.

Rosenberg, B., Reid, K., & Lanstein, R. (1985). Persuasive Evidence of Market
Inefficiency. Journal of Portfolio Management, Vol 1, No 1, pp. 9 – 17. DOI:
10.3905/jpm.1985.409007.

Sen, J. & Datta Chaudhuri, T. (2018a). Understanding the Sectors of Indian


Economy for Portfolio Choice. International Journal of Business Forecasting
and Marketing Intelligence, Vol 4, No 2, pp. 178-222, February 2018. DOI:
10.1504/IJBFMI.2018.090914.

Sen, J. (2018b). Stock Composition of Mutual Funds and Fund Style: A Time Series
Decomposition Approach towards Testing for Consistency. International
Journal of Business Forecasting and Marketing Intelligence, Vol 4, No 3, pp.
235-292. DOI: 10.1504/IJBFMI.2018.092781.

125
Sen, J. (2018c). A Study of the Indian Metal Sector Using Time Series
Decomposition-Based Approach. Book Chapter in Selected Studies on
Economics and Finance, Editors: Basar, S., Celik, A. A., & Bayramoglu, T.,
pp. 105-152, Cambridge Scholars Publishing, UK, March 2018.

Sen, J. (2018d). Stock Price Prediction Using Machine Learning and Deep Learning
Frameworks. Proceedings of the 6th International Conference on Business
Analytics and Intelligence (ICBAI’18), Bangalore, India, December 20-22,
2018.

Sen, J. & Datta Chaudhuri, T. (2017a). A Time Series Analysis-Based Forecasting


Framework for the Indian Healthcare Sector, Journal of Insurance and
Financial Management, Vol 3, No 1, pp. 66-94.

Sen, J. & Datta Chaudhuri, T. (2017b). A Predictive Analysis of the Indian FMCG
Sector Using Time Series Decomposition-Based Approach. Journal of
Economics Library, Vol 4, No 2, pp. 206-226, June 2017. DOI:
10.1453/jel.v4i2.1282.

Sen, J. (2017c). A Time Series Analysis-Based Forecasting Approach for the Indian
Realty Sector. International Journal of Applied Economic Studies, Vol 5, No
4, pp. 8 – 27, August 2017.

Sen, J. (2017d). A Robust Analysis and Forecasting Framework for the Indian Mid
Cap Sector Using Time Series Decomposition. Journal of Insurance and
Financial Management, Vol 3, No 4, pp. 1-32, September 2017.

Sen, J. & Datta Chaudhuri, T. (2017e). A Robust Predictive Model for Stock Price
Forecasting. Proceedings of the 5th International Conference on Business
Analytics and Intelligence, Bangalore, India, December 11-13, 2017.

Sen, J. & Datta Chaudhuri, T. (2016a). Decomposition of Time Series Data of Stock
Markets and its Implications for Prediction -An Application for the Indian
Auto Sector. Proceedings of the 2nd National Conference on Advances in
Business Research and Management Practices (ABMRP’16), Kolkata, India,
126
January 8-9, 2016, pp. 1-28. DOI: 10.13140/RG.2.1.3232.0241.

Sen, J. & Datta Chaudhuri, T. (2016b). An Alternative Framework for Time Series
Decomposition and Forecasting and its Relevance for Portfolio Choice – A
Comparative Study of the Indian Consumer Durable and Small-Cap Sector.
Journal of Economics Library, Vol 3, No 2, pp. 303-326. DOI:
10.1453/jel.v3i2.787.

Sen, J. & Datta Chaudhuri, T. (2016c). An Investigation of the Structural


Characteristics of the Indian IT Sector and the Capital Goods Sector – An
Application of the R Programming Language in Time Series Decomposition
and Forecasting. Journal of Insurance and Financial Management, Vol 1, No
4, pp. 68 – 132, June 2016.

Sen, J. & Datta Chaudhuri, T. (2016d). Decomposition of Time Series Data to


Check Consistency between Fund Style and Actual Fund Composition of
Mutual Funds. Proceedings of the 4th International Conference on Business
Analytics and Intelligence (ICBAI’16), Bangalore, India, December 19-21,
2016. DOI: 10.13140/RG.2.2.33048.19206.

Sen, J. & Datta Chaudhuri, T. (2015). A Framework for Predictive Analysis of


Stock Market Indices – A Study of the Indian Auto Sector. Calcutta Business
School (CBS) Journal of Management Practices, Vol 2, No 2, pp. 1-20,
December 2015.

Senol, D. & Ozturan, M. (2008). Stock Price Direction Prediction Using Artificial
Neural Network Approach: The Case of Turkey. Journal of Artificial
Intelligence, 1, pp. 70-77. DOI: 10.3923/jai.2008.70.77

Shen, J., Fan, H. & Chang, S. (2007). Stock Index Prediction Based on Adaptive
Training and Pruning Algorithm. Advances in Neural Networks, Lecture
Notes in Computer Science, Springer-Verlag, Vol 4492, pp. 457–464. DOI.
10.1007/978-3-540-72393-6_55.

Siddiqui, T.A., Abdullah, Y. (2015). Developing a Nonlinear Model to Predict


127
Stock Prices in India: An Artificial Neural Networks Approach. IUP Journal
of Applied Finance, Vol 21, No 3, pp. 36-39.

Thenmozhi, M. (2006). Forecasting Stock Index Numbers Using Neural Networks.


Delhi Business Review, Vol 7, No 2, pp. 59-69.

Tsai, C.-F. & Wang, S.-P. (2009). Stock Price Forecasting by Hybrid Machine
Learning Techniques. Proceedings of International MultiConference of
Engineers and Computer Scientists, 1.

Tseng, K-C., Kwon, O., & Tjung, L. C. (2012). Time Series and Neural Network
Forecast of Daily Stock Prices. Investment Management and Financial
Innovations, Vol 9, No 1, pp. 32-54.

Wang, Z, Yan, W., & Oates, T. (2016). Time Series Classification from Scratch
with Deep Neural Networks: A Strong Baseline. Proceedings of the 2017
IEEE International Joint Conference on Neural Networks (IJCNN),
Anchorage, Alaska, USA, May 14-19, 2017. DOI:
10.1109/IJCNN.2017.7966039.

Wu, Q., Chen, Y. & Liu, Z. (2008). Ensemble Model of Intelligent Paradigms for
Stock Market Forecasting. Proceedings of the IEEE 1st International
Workshop on Knowledge Discovery and Data Mining, pp. 205 – 208,
Washington, DC, USA. DOI: 10.1109/WKDD.2008.54

Zhang, D., Jiang, Q., & Li, X. (2007). Application of Neural Networks in Financial
Data Mining. International Journal of Computer, Electrical, Automation, and
Information Engineering, Vol 1, No 1, pp. 225-228. DOI:
10.5281/zenodo.1333234.

Zhu, X., Wang, H., Xu, L. & Li, H. (2008). Predicting Stock Index Increments by
Neural Networks: The Role of Trading Volume under Different Horizons.
Expert Systems Applications, Vol 34, No 4, pp. 3043–3054, May 2008. DOI:
10.1016/j.eswa.2007.06.023.

128

View publication stats

You might also like