Final Report - Nanda
Final Report - Nanda
A PROJECT REPORT
Submitted by
ECE-C Batch - 59
BENGALURU 560035
May-2023
i
AMRITA VISHWA VIDYAPEETHAM
BONAFIDE CERTIFICATE
This is to certify that the project report entitled “Stock Market Return Prediction using Vector AutoRegression”
submitted by
in partial fulfillment for the course 19ECE499 Project Phase II is a bonafide record of the work carried out under our
guidance and supervision at the Department of Electronics and Communication Engineering, Amrita School of
Engineering, Bangalore.
ii
ACKNOWLEDGEMENT
We offer our sincere pranams at the lotus feet of Mata Amritanandamayi Devi, fondly called “Amma”. The
satisfaction that follows the successful completion of any task would be incomplete without the mention of the
people who made it possible and whose guidance, encouragement crowned our efforts with success.
First and foremost, we are grateful to Dr. Sriram Devanathan, Principal, Dr. Navin Kumar, Chairperson, Department
of Electronics and Communication Engineering and Dr. Sreeja Kochuvilla, Vice Chairperson, Department of
Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Bangalore for engraving a path for us
the utilize the available resources to the fullest and there by widen our perspective of education and growth through
it.
We are indebted to our guide, Ms. R Jeyanthi, Asst. Professor, Department of Electronics and Communication
Engineering for his friendly demeanour, constant support and guidance. He has been source of inspiration for us in
this project and it would not have materialized if not for his constant words of encouragement.
We also wish to thank our panel members Dr. . Sreeja Kochuvilla, Dr. TK Ramesh, Ms. Arya for their constructive
criticism and valuable suggestions throughout the project. We extend our heartiest thanks to all teaching and non-
teaching members of the college for their cooperation throughout the project.
We pay our respect and love to our parents, all other family members and friends for their love and encouragement
throughout our project. We thank all for the same
R Naren Kartik
iii
ABSTRACT
The prediction of stock market returns is a challenging task that has garnered significant attention from
researchers and investors due to its crucial role in informed investment decision-making, risk management,
and portfolio optimization. Over the years, several statistical models, including ARCH, GARCH, ARMA,
ARIMA, VAR, and DMD, have emerged for forecasting stock market returns.
This study aims to comprehensively compare these commonly used models in terms of their predictive
accuracy and suitability for stock market return forecasting. The ARCH and GARCH models, which capture
volatility clustering and time-varying conditional heteroskedasticity, are employed. Additionally, the
ARMA and ARIMA models are used to capture time series dynamics and trends, while the VAR model
analyzes interdependencies between multiple time series variables. The data-driven Dynamic Mode
Decomposition (DMD) method is also employed for predicting future stock market returns.
To evaluate the predictive performance of these models, historical stock market data for a selected set of
assets is utilized. Accuracy measures such as mean absolute error (MAE), root mean squared error (RMSE),
and forecast accuracy are employed for evaluation.
The findings of this study provide valuable insights into the strengths and limitations of each model in
predicting stock market returns. Through this comparative analysis, the study facilitates the identification of
the most suitable model for different types of stock market data and investment strategies. The practical
implications of these findings extend to investors, financial analysts, and policymakers, contributing to the
existing body of knowledge in stock market forecasting
This project aims to compare and evaluate the performance of statistical models, including ARCH,
GARCH, ARMA, ARIMA, VAR, and DMD, for stock market return prediction. The objectives include
assessing predictive accuracy, analysing suitability for different data types, identifying strengths and
limitations, and providing practical implications for investors and policymakers. By achieving these goals,
the project aims to enhance our understanding of these models and contribute to the development of
improved prediction methodologies for the financial industry.
iv
TABLE OF CONTENTS
TITLE Pg.
ACKNOWLEDGEMENT iii
ABSTACT iv
LIST OF FIGURES vi
LIST OF TABLES vii
CHAPTER 1: INTRODUCTION
1.1 OVERALL SYSTEM 1
1.2 OBJECTIVE 1
1.3 METHODOLOGY 2
CHAPTER 2: STATE OF ART
2.1 INTRODUCTION 3
2.2 LITERATURE REVIEW 3
CHAPTER 3: DESIGN AND ANALYSIS
3.1 INTRODUCTION 10
3.2 DESIGN 10
3.3 ALGORITHMS/MODELS USED 10
3.3.1 ARCH MODEL
3.3.2 GARCH MODEL
3.3.3 VECTOR AUTOREGRESSION
3.3.4 DYNAMIC MODE DECOMPOSITION
3.4 METHODOLOGY
3.4.1 OVERALL SCENARIO OF THE MODEL
3.4.2 INTRODUCTION TO STATIONARITY
3.4.3 METHODS TO CHECK STATIONARITY
3.4.3.1 VISUAL TEST
3.4.3.2 STATISTICAL TEST
3.4.3.3 ADF TEST
3.4.4 INTRODUCTION TO CROSS-CORRELATION
v
3.4.4.1 GRANGER CAUSALITY
3.4.5 DETERMINING THE LAG LENGTH
3.4.6 MULTIPLE LINEAR REGRESSION
3.4.7 ROOT MEAN SQUARE ERROR
CALCULATION
CHAPTER 4: RESULTS AND DISCUSSION 15
4.1 RESULTS
CHAPTER 5: CONCLUSION AND FUTURE SCOPE 16
5.1 CONCLUTION
REFERENCES 17
vi
LIST OF FIGURES
vii
INFY.NS
Fig 4.4.2 Plot of temporal image of Input Matrix (U)
Fig 4.4.3 Plot of Forecasted Value vs Actual Value
viii
List of Tables
Table 4.1.1 Results after performing ADF test over the stock 11
data
Table 4.2.1 Results of ADF test performed over stock data 12
Table 4.3.1 Granger Causation Matrix 12
Table 4.3.2 Correlation Matrix 13
Table 4.3.3 Forecasting Accuracy of the VAR model 14
Table 4.4.1 Forecasting Accuracy of the DMD model 14
ix
CHAPTER 1
INTRODUCTION
The overall system for stock market prediction involves the utilization of various statistical models and
techniques to forecast the future performance of stock markets. These models, such as ARCH, GARCH,
ARMA, ARIMA, VAR, and DMD, are employed to capture different aspects of stock market dynamics. By
analyzing historical stock market data, these models aim to provide accurate predictions of stock market
returns. The system considers factors like volatility clustering, time series dynamics, interdependencies
between variables, and data-driven patterns. The objective is to develop a comprehensive framework that
can assist investors and policymakers in making informed decisions, managing risks, and optimizing
portfolio performance.
1.2 Objective:
Assessing the accuracy of each model in predicting stock market returns: The project aims to
determine which model provides the most precise and reliable predictions of stock market
performance. Evaluation metrics such as mean absolute error (MAE), root mean squared error
(RMSE), and forecast accuracy measures will be utilized for this purpose.
Analyzing the suitability of the models for different types of data: Different stock market datasets
exhibit various characteristics, such as volatility clustering, trends, and interdependencies. The
project seeks to investigate how well each model captures these characteristics and performs across
different types of data.
Identifying the strengths and limitations of each model: Through the comparative analysis of the
models, the project aims to identify their specific strengths and limitations. This analysis will
1
provide valuable insights into each model's ability to handle different patterns and dynamics
observed in stock market returns.
Providing practical implications for investors and policymakers: The project aims to contribute to
the existing knowledge on stock market return prediction by offering practical implications for
investors, financial analysts, and policymakers. The findings of the study can guide investment
decision-making, risk management strategies, and the formulation of financial policies.\
1.3 Methodology:
The methodology employed for stock market prediction involves the application of various statistical
models and techniques. These models, including ARCH, GARCH, ARMA, ARIMA, VAR, and DMD, are
utilized to analyze historical stock market data and forecast future returns. The methodology typically
involves data preprocessing, such as cleaning and transforming the data, to ensure its suitability for analysis.
The selected models are then applied to capture different aspects of stock market behavior, such as volatility
clustering, time series dynamics, and interdependencies between variables. Evaluation metrics like mean
absolute error (MAE), root mean squared error (RMSE), and forecast accuracy measures are used to assess
the performance of the models. The goal of this methodology is to develop reliable and accurate predictions
that can inform investment decisions and risk management strategies in the stock market.
2
CHAPTER 2
STATE OF ART
2.1 Introduction
The state of the art in stock market prediction encompasses the latest advancements and cutting-edge
techniques employed in forecasting stock market returns. Researchers and practitioners have explored
various methodologies, including machine learning algorithms, deep learning models, and hybrid
approaches. These approaches leverage large volumes of historical financial data, market indicators, news
sentiment analysis, and social media data to capture complex patterns and trends in stock market behaviour.
Additionally, advancements in computational power and data processing techniques have facilitated the
development of more sophisticated models with improved predictive capabilities. The state of the art in
stock market prediction aims to address the challenges associated with market volatility, nonlinearity, and
dynamic dependencies among financial assets. Ongoing research focuses on refining existing models,
incorporating real-time data sources, and exploring novel techniques, such as reinforcement learning and
natural language processing, to enhance the accuracy and timeliness of stock market predictions
[1] Khan, S. and Alghulaiakh, H., 2020. ARIMA model for accurate time series stocks. International
Journal of Advanced Computer Science and Applications, 11(7).
The study uses the ARIMA model to predict the future values of stock prices based on historical
data.
The results show that the ARIMA model can provide accurate forecasts for stock prices, with a mean
absolute percentage error of 0.45% for the test period.
Evaluated the performance of the ARIMA model by comparing the predicted values with the actual
stock prices for the test period.
ARIMA model can be a useful tool for investors and traders in predicting future stock prices, and it
can help them make informed decisions regarding buying or selling stocks.
3
Explained the steps involved in building an ARIMA model, which includes selecting the appropriate
values for p, d, and q parameters.
[2] Musatov, D. and Petrusevich, D., 2022. Modeling of forecasts variance reduction at multiple time
series prediction averaging with ARMA (1, q) functions. In CEUR Workshop Proceedings (Vol. 3091,
pp. 1-11)
• The study uses real-world data on stock prices, exchange rates, and oil prices to demonstrate the
effectiveness of averaging multiple predictions with ARMA (1, q) functions.
• Evaluate the performance of the approach by comparing the predicted values with the actual values
for the test period
• Concluded that the approach can be a useful tool for decision-makers in various fields, including
finance, economics, and energy, to make informed decisions based on more accurate forecasts.
• Explained how averaging multiple predictions from different time series models can help reduce
variance in forecasts and improve accuracy.
• The paper focuses on the variance of predictions for AR(1), ARMA(1,q) and ARMA(1,1) models.
For AR(1) models, the variance of the forecast over an infinite period can be a finite number,
depending on the value of the parameter. The paper provides an equation for the variance and
discusses its behaviour.
• The paper discusses the averaging of models and its impact on variance reductions. It explains that if
models are divided into two equal parts, with negative and positive values of the parameter,
averaging can lead to a lower variance. However, in general cases, there may be individual models
with lower variance than the averaged model.
• The authors conclude by emphasizing that averaging can be a useful tool for prediction when there is
no clear method to select the best model. The averaged model provides a "good" quality prediction
compared to the worse models. The paper also suggests that the findings regarding averaging can be
extended to other techniques such as bagging and nonlinear combinations of models.
• The paper focusses om the expressing the variance of time series prediction in terms of model
coefficients and investigates the condition under which averaging technique can improve the
prediction
4
[3] Somarajan, S., Shankar, M., Sharma, T. and Jeyanthi, R., 2019. Modelling and analysis of
volatility in time series data. In Soft Computing and Signal Processing: Proceedings of ICSCSP 2018,
Volume 2 (pp. 609-618). Springer Singapore.
• The paper focuses on the modeling and analysis of volatility in time series data, which is important
in various domains such as finance, economics, and risk management. The authors highlight the
challenges associated with volatility modelling, including its nonlinear and time-varying nature,
which require sophisticated techniques for accurate analysis.
• The paper introduces traditional statistical models, specifically the Autoregressive Conditional
Heteroscedasticity (ARCH) and Generalized Autoregressive Conditional Heteroscedasticity
(GARCH) models, which are commonly used in financial econometrics to capture volatility patterns.
In addition to traditional models, the authors explore the use of soft computing techniques, such as
artificial neural networks (ANNs) and support vector machines (SVMs), as alternative approaches
for volatility modelling.
• The authors present a case study where they compare the performance of different models (GARCH,
ANN, and SVM) in capturing and predicting volatility patterns in time series data. To evaluate the
models, various metrics and criteria are used, such as Mean Absolute Error (MAE), Root Mean
Square Error (RMSE), and R-squared. These measures assess the accuracy and predictive ability of
the models
• The paper discusses the strengths and limitations of each model. For instance, traditional statistical
models like GARCH are known for their ability to capture conditional volatility patterns effectively,
while soft computing techniques like ANN and SVM offer more flexibility in capturing complex
nonlinear relationships.
• The findings from the case study indicate that both traditional statistical models and soft computing
techniques can be valuable for volatility modelling. However, the choice of model depends on the
specific characteristics of the time series data and the objectives of the analysis. The authors discuss
the practical implications of their findings, highlighting the potential applications of volatility
5
modelling in areas such as financial risk management, investment decision-making, and portfolio
optimization..
[4] Stavros Degiannakis, Evdokia Xekalaki. (2007) Assessing the performance of a prediction error
criterion model selection algorithm in the context of ARCH models. Applied Financial Economics
17:2, pages 149-171.
• The authors provide a detailed explanation of the basic ARCH(p) model, which expresses the
conditional variance as a linear combination of past squared residuals. They discuss the estimation of
ARCH parameters using maximum likelihood estimation and highlight the importance of model
diagnostics and goodness-of-fit tests.
• Furthermore, the paper discusses various extensions of ARCH models, including the Generalized
Autoregressive Conditional Heteroscedasticity (GARCH) model, which incorporates both past
squared residuals and past conditional variances. The authors also explore other variations, such as
exponential GARCH (EGARCH) and threshold GARCH (TGARCH) models, which allow for
asymmetry and leverage effects in the volatility dynamics.
• The authors review the empirical applications of ARCH models across different fields, such as
finance, economics, and risk management. They discuss how ARCH models have been used to
analyze asset returns, volatility forecasting, portfolio optimization, and risk measurement. In
6
summary, this paper offers a comprehensive review of ARCH models, providing readers with a deep
understanding of their theoretical foundations, estimation techniques, empirical applications, and
potential limitations. It serves as a valuable resource for researchers, practitioners, and students
interested in time series analysis and volatility modeling.
[5] H. C.J., D. K.B., A. R. and J. R., "Modeling of Multivariate Systems using Vector
Autoregression(VAR)," 2019 Innovations in Power and Advanced Computing Technologies (i-
PACT), Vellore, India, 2019, pp. 1-6, doi: 10.1109/i-PACT44901.2019.8960145.
• Stock price prediction is a challenging task due to the unpredictable nature of the market. This study
proposes a method using Dynamic Mode Decomposition (DMD) to predict stock prices by treating
the stock market as a dynamic system. DMD is a data-driven algorithm that decomposes a system
into modes, each having specific temporal behavior. These modes help determine how the system
evolves over time and can be used to predict its future state.
• The researchers collected minute-wise stock price data from companies listed in the National Stock
Exchange. They selected companies from various sectors and used the minute-wise stock prices to
predict their prices in the next few minutes. The accuracy of the predictions was evaluated by
comparing them with the actual stock prices using Mean Absolute Percentage Error (MAPE). Three
different methods were employed for prediction: (a) sampling companies from the same sector, (b)
sampling companies from all sectors, and (c) fixing the sampling window size and predicting until a
threshold error was crossed.
• It was found that predicting prices by sampling companies from all sectors yielded more accurate
results compared to sampling from a single sector. Additionally, in some cases, the prediction
window could be extended for a longer period using the third method.
• The study highlights the advantages of DMD as a computationally efficient approach that captures
the underlying dynamics of the stock market. DMD modes can be considered as coherent structures
representing the financial activity. DMD has been previously used in finance for extracting cyclic
7
patterns in the market and for price prediction. In this study, DMD was employed for short-term
price prediction.
• The methodology involved constructing an approximate linear evolution of the system using DMD.
The data matrix was decomposed into modes, and the underlying dynamics of the system were
captured. The obtained modes were used to reconstruct the system and make price predictions. The
results were compared with predictions made using an autoregressive integrated moving average
(ARIMA) model. The comparison showed that DMD performed well in predicting stock prices, and
in some cases, it outperformed the ARIMA model.
• The research concludes by discussing the obtained results, emphasizing the importance of
considering companies from various sectors for prediction accuracy. It also discusses the potential
for future work and the scope for further improvement in stock price prediction using DMD.Deep
learning methods of feature extraction are discussed, CNN is used for feature extraction which are
relatively expressive. The models are trained based on the loss function and the weights are set to
minimize the loss function. Resnet model is used for feature extraction in case of deep neural
network. Two branched Convolutional neural network is used to extract features and the model in
trained using an adaptive triplet loss which produces efficient results.
Stock price prediction is a challenging task due to market unpredictability. The paper proposes using DMD,
a data-driven algorithm, to treat the stock market as a dynamic system and predict stock prices. The
researchers collected minute-wise stock price data from various companies listed in the National Stock
Exchange. They used this data to predict the prices for the next few minutes.
The accuracy of the predictions was evaluated using Mean Absolute Percentage Error (MAPE). Three
different prediction methods were employed: sampling companies from the same sector, sampling
companies from all sectors, and fixing the sampling window size until a threshold error was crossed. The
study found that predicting prices by sampling companies from all sectors yielded more accurate results
8
compared to sampling from a single sector. Additionally, in some cases, the prediction window could be
extended for a longer period using the third method.
The advantages of DMD were highlighted, including its computational efficiency and ability to capture
underlying market dynamics. DMD modes represent coherent structures that reflect financial activity. DMD
has been used in finance for extracting cyclic patterns and predicting prices. In this study, it was employed
for short-term price prediction.
The methodology involved decomposing the data matrix into modes and capturing the system's dynamics.
The obtained modes were used to reconstruct the system and make price predictions. The performance of
DMD was compared with an autoregressive integrated moving average (ARIMA) model, and DMD showed
promising results.
The paper concludes by discussing the obtained results and emphasizing the importance of considering
companies from various sectors for prediction accuracy. It also suggests potential future work and
improvements in stock price prediction using DMD.
9
CHAPTER 3
DESIGN AND ANALYSIS
3.1 Introduction:
The design and analysis of stock market return prediction aims to forecast the future performance of stock
markets, which is a crucial aspect for investors, traders, and financial institutions. Accurate prediction of
stock market returns can help in making informed investment decisions and minimizing financial risks.
3.2 Design:
To detect and predict the stock market returns, we must import the stock data using YFinance and we must
inspect if the data is stationary or non-stationary. After the data Is stationary, we must apply the relavent
model and obtain the predictions
The ARCH (Autoregressive Conditional Heteroscedasticity) model is a statistical approach used to analyse
and capture the clustering of volatility in financial time series data. Developed by Robert F. Engle, the
ARCH model is widely used in econometrics and financial econometrics to account for the changing
variance in asset returns.
The main purpose of the ARCH model is to address heteroscedasticity, which refers to the varying volatility
observed in financial markets. This means that periods of high or low volatility tend to occur together. The
ARCH model specifically focuses on modelling this volatility clustering behaviour.
The fundamental concept of the ARCH model is that the conditional variance of a time series is determined
by past squared residuals or errors. In simpler terms, the current variance depends on the squared errors
10
observed in previous time periods. The model assumes an autoregressive process for the conditional
variance, using past variances and squared residuals as predictors.
Estimating the ARCH model typically involves maximum likelihood estimation, where the model
parameters are estimated by maximizing the likelihood function based on the available data. These
estimated parameters can then be used for predicting future volatility or studying the dynamics of volatility
in the data.
Extensions of the ARCH model, such as the GARCH (Generalized Autoregressive Conditional
Heteroscedasticity) model, have been developed to capture more complex volatility patterns, including
asymmetry and leverage effects. These models have found applications in volatility forecasting, risk
management, and option pricing in financial econometrics.
In summary, the ARCH model provides a framework for understanding and modelling the changing
volatility observed in financial data. It allows researchers and analysts to better analyze and comprehend the
characteristics of financial markets.
• ω is a constant term or intercept that captures the overall level of volatility in the series
• α1, α2, ..., αp are parameters to be estimated that represent the weights assigned to the previous error
terms.
11
3.3.2 GARCH Model:
The GARCH model offers several advantages as it captures both the autoregressive characteristics of
volatility and the impact of past squared residuals and conditional variances. This comprehensive approach
allows for a more accurate representation of the changing levels of volatility observed in financial data.
In the GARCH model, the conditional variance of a time series is determined by considering past squared
residuals, previous conditional variances, and potentially other explanatory variables. Similar to the ARCH
model, the GARCH model assumes an autoregressive process for conditional variance. By incorporating
lagged squared residuals and conditional variances, the model accounts for the persistence of volatility and
the feedback effects it has on future volatility.
Estimating the GARCH model involves maximizing the likelihood function using available data, which
enables the determination of optimal parameter values. These parameters include autoregressive
coefficients, as well as coefficients associated with squared residuals and conditional variances. Once the
model is estimated, it can be applied to forecast future volatility and analyze the dynamics of volatility
within the stock market.
The GARCH model finds various applications within stock market data. It is particularly useful for
volatility forecasting, allowing investors and traders to anticipate market movements. Risk management
benefits from the GARCH model as it aids in estimating Value-at-Risk (VaR). Additionally, incorporating
volatility forecasts from the GARCH model is valuable in option pricing. Traders can also develop
strategies based on GARCH models to identify periods of high volatility and adjust their positions
accordingly.
12
In conclusion, the GARCH model is a valuable tool for predicting and understanding volatility within stock
market data. Its comprehensive approach provides insights for risk management, option pricing, and trading
strategies by considering the dynamic nature of volatility in financial markets.
Vector Autoregression (VAR) is a statistical model that is commonly used to analyze and forecast the
relationships between multiple time series variables, particularly in the context of stock market data. This
modelling approach allows us to examine and predict the interdependencies among different stock prices or
financial indicators.
VAR models are designed to capture the dynamic interactions that exist among the variables. The basic idea
is to model each variable as a linear combination of its own lagged values as well as the lagged values of all
the other variables in the system. The order of the VAR model determines the number of lagged values
considered for each variable. By estimating the parameters of the VAR model, we can quantify the
relationships between the variables and utilize the model to make predictions.
13
The use of VAR models offers several benefits when analysing stock market data:
• Relationship Analysis: VAR models help us understand the causal relationships between different
stocks or financial indicators. By estimating the coefficients of the model, we can determine the
direction and strength of these relationships. For example, a VAR model may reveal that changes in
one stock price tend to precede or follow changes in another stock price.
• Forecasting: VAR models can be employed to forecast future values of the variables in the system.
By extending the time series into the future, the model can provide predictions for stock prices or
other financial indicators. These forecasts are valuable for investors and analysts as they assist in
making informed decisions related to investment strategies, risk management, and portfolio
optimization.
• Impulse Response Analysis: VAR models allow us to analyze how shocks or innovations in one
variable propagate through the system. This analysis helps us understand the dynamic responses and
interdependencies among the variables. For instance, if there is a shock in one stock price, the VAR
model can illustrate how it affects the other stock prices over time.
• Variance Decomposition: VAR models enable us to decompose the forecast error variance into the
contributions from each variable. This decomposition helps us understand the relative importance of
different variables in explaining the fluctuations and volatility observed in the system. It provides
insights into the sources of uncertainty and risk within the stock market.
14
3.3.4 Dynamic Mode Decomposition :
Dynamic Mode Decomposition (DMD) is a computational method that can be employed to analyze and
predict the dynamics of intricate systems, including the stock market. By breaking down the system into its
dynamic modes, DMD unveils the temporal behavior of the system.
When applied to stock market analysis, DMD can be utilized in the following manner:
• Data Collection: Historical stock price data is gathered, comprising time-stamped observations of
stock prices.
• Mode Decomposition: DMD employs singular value decomposition (SVD) to decompose the
collected data into dynamic modes. These modes effectively capture the underlying patterns and
trends within the stock market dynamics.
• Mode Analysis: The dynamic modes obtained from DMD offer insights into the behavior of the
stock market. Each mode corresponds to a specific temporal pattern or oscillation, thereby revealing
recurring trends in stock prices.
• Prediction: DMD facilitates short-term prediction of stock prices by extrapolating future values
based on the dynamic modes. These predictions can assist investors and traders in their decision-
making processes.
• Model Evaluation: The accuracy of DMD predictions can be assessed by comparing them with
actual stock price data using metrics such as Mean Absolute Percentage Error (MAPE) or Root
Mean Square Error (RMSE).
DMD brings several advantages to stock market analysis. It is a data-driven approach that effectively
captures both linear and nonlinear patterns, while remaining computationally efficient. Furthermore, it
provides a concise representation of stock market dynamics, aiding in their interpretation and analysis.
To summarize, DMD is a valuable tool for comprehending the temporal behavior of stock prices,
identifying recurring patterns, and making short-term predictions. However, it is essential to supplement
DMD with other approaches and factors to gain a comprehensive understanding of stock market behavior.
15
Fig 3.3.4 DMD model
3.4 METHODOLOGY:
i) Augmented Dickey-Fuller test: In the ADF test, there is a null hypothesis that the time series is
considered non-stationary. So, if the p-value of the test is less than the significance level then it
rejects the null hypothesis and considers that the time series is stationary.
ii) Granger causality checks for any correlation between the two variables. The variables that do
not affect the model can be removed.
16
3. If the data is non-stationary, take a first-order difference of the entire Data and re-run the augmented
Dickey-Fuller test.
4. Train-Test Split
5. Determination of Lag Order: VAR Model requires an appropriate lag order to be applied for fitting
the model. AIC and BIC tests are used to check the fit of the models concerning others. It is used to
select the most parsimonious models.
AIC=2ln(L)+2k BIC=-2ln(L)+kln(n)
Here, L represents the value of log-likelihood. K is the number of parameters. n is the number of
observations.
‘Stationarity’ is one of the most important concepts you will come across when working with time series
data. A stationary series is one in which the properties – mean, variance, and covariance, do not vary
with time.
In the first plot, we can see that the mean varies (increases) with time, resulting in an upward trend.
Thus, this is a non-stationary series. For a series to be classified as stationary, it should not exhibit a
trend.
Moving on to the second plot, we certainly do not see a trend in the series, but the variance of the
series is a function of time. As mentioned previously, a stationary series must have a constant
variance.
If you look at the third plot, the spread becomes closer as the time increases, which implies that the
covariance is a function of time.
17
The next step is to determine whether a given series is stationary or not and deal with it accordingly.
Consider the plots we used in the previous section. We were able to identify the series in which mean and
variance were changing with time, simply by looking at each plot. Similarly, we can plot the data and
determine if the properties of the series are changing with time or not. \
where it is the value at the time instant t and ε t is the error term. To calculate it we need the 𝑌𝑡 = 𝑎𝑌𝑡−1 +
𝑒𝑡, value of yt-1, which is:
If we do that for all observations, the value of Yt will come out to be:
𝑌𝑡 = 𝑎𝑛𝑌𝑡−𝑛 + 𝚺𝑒𝑡−1𝑎𝑖
If the value of a is 1 (unit) in the above equation, then the predictions will be equal to the Yt-n and sum of
all errors from t-n to t, which means that the variance will increase with time. This is known as the unit root
in a time series. We know that for a stationary time series, the variance must not be a function of time. The
unit root tests check the presence of unit root in the series by checking if the value of a=1.
The Dickey-Fuller test is one of the most popular statistical tests. It can be used to determine the presence of
unit root in the series, and hence help us understand if the series is stationary or not. The null and alternate
hypothesis of this test is:
t = (x-μ) / (s/√n)
• To find p value we need to use the t-Distribution table with n-1 degrees of freedom. • If p-value is less than
0.05 then we can reject the null hypothesis.
18
Figure 3.2.1: p value Table
In this method, we compute the difference of consecutive terms in the series. Differencing is typically
performed to get rid of the varying mean. Mathematically, differencing can be written as:
𝑌𝑡 = 𝑌𝑡 − 𝑌𝑡−1
Cross-correlation is a measurement that tracks the movements of two or more sets of time series data
relative to one another. It is used to compare multiple time series and objectively determine how well they
match up with each other and, in particular, at what point the best match occurs.
Granger causality is an econometric test used to verify the usefulness of one variable to forecast another.
A prerequisite for performing the Granger Causality test
19
Fail to Granger cause if it is not helpful for forecasting the other variable.
In the context of the vector autoregressive models, a variable fails to Granger-cause another variable
if it:
Lags are not statistically significant in the equation for another variable.
Past values aren’t significant in predicting the future values of another
There are two methods that could be used to arrive at the optimal lag length
• Cross-equation restrictions- must assume that the error terms from each equation are normally distributed.
• Information criteria: The Akaike information criterion (AIC) and the Bayesian information criterion (BIC).
20
g - No. of equations
Multiple linear regression is used to estimate the relationship between two or more independent variables
and one dependent variable. You can use multiple linear regression when you want to know:
• How strong the relationship is between two or more independent variables and one dependent variable
• The value of the dependent variable at a certain value of the independent variables
The formula for multiple linear regression
𝑌 = 𝛽0 + 𝛽1 𝑋1 + ⋯ + 𝛽𝑛 𝑋𝑛 + €
β1X1-regression coefficient of the first independent variable βnXn- regression coefficient of the last
independent variable €-model error
To find the best-fit line for each independent variable, multiple linear regression calculates three things:
The regression coefficients that lead to the smallest overall model error.
The t statistic of the overall model.
The associated p-value
It then calculates the t-statistic and p-value for each regression coefficient in the model.
29
• Root mean square error or root mean square deviation is one of the most commonly used measures for
evaluating the quality of predictions. To compute RMSE, calculate the residual (difference between
prediction and truth) for each data point, compute the norm of residual for each data point, compute the
mean of residuals and take the square root of that mean.
CHAPTER 4
Table 4.1.1 : Results after performing ADF test over the stock data
22
Fig 4.1.2 : Plot of PACF test to predict the lag order for the ARCH model and the
Model result
24
Fig 4.2.2 : Plot of PACF and ACF to determine the lag order of GARCH model
Fig 4.2.3 : GARCH model results after substituting GARCH (2,2) model
25
Fig 4.2.4 : Plot of Standardised Residuals and Conditional Volatility
26
Fig 4.2.6 : Volatility Prediction of ADANIENT.NS
Fig 4.3.1 : Plot of the stock data which will be sued for VAR
27
Stock TCS.NS_x INFY.NS_x WIPRO.NS_x HCLTECH.NS_x
TCS.NS_y 1.0000 0.0523 0.2772 0.0003
INFY.NS_y 0.1373 1.0000 0.1265 0.1111
WIPRO.NS_y 0.0060 0.1981 1.0000 0.0136
HCLTECH.NS_y 0.1613 0.1613 0.4133 1.0000
28
Table 4.3.2 : Correlation Matrix
29
Table 4.3.3 : Forecasting Accuracy of the VAR model
Fig 4.4.1 : Plot of Training data and Testing Data split of INFY.NS
30
Fig 4.4.2 : Plot of the temporal image of Input Matrix (U)
32
CHAPTER 5
5.1 Conclusion:
It is important to note that predicting stock market values is a challenging task, and no single model or
approach can provide accurate predictions all the time. Therefore, it is essential to use multiple models and
evaluate their performance according to the stock that is to be predicted. There is no single best model for
predicting the stock market as it is a highly complex and unpredictable system influenced by a variety of
factors including global economic conditions, political events, natural disasters, and more. Different models
may perform better or worse depending on the specific stock market data, market conditions, and time frame
being analyzed.
The research paper aimed to explore the effectiveness of time series models for predicting stock prices using
machine learning. It includes various time series models including Vector Auto-Regression (VAR), ARMA,
ARCH and GARCH. VAR models are used when the goal is to understand the relationships between
multiple variables that may influence each other. The model can be used to estimate the impact of one
variable on another and to predict the future values of these variables. VAR is particularly useful when the
relationships between the variables are not well understood, and it is unclear which variables are causing
changes in others. ARMA models are used when the time series data displays certain patterns, such as
trends, seasonal variations, or cycles, and when the data is stationary, meaning that its mean, variance, and
autocorrelation structure do not change over time. The ARMA model is a combination of two simpler
models: the autoregressive (AR) model and the moving average (MA) model. ARCH models are used when
the variance of a time series variable is not constant over time, meaning that there is heteroskedasticity in
the data. The model assumes that the current variance of the variable depends on its past values, with the
degree of dependence decreasing as the time lag increases. ARCH models can help to identify and model
volatility clustering, where periods of high volatility tend to be followed by periods of high volatility, and
vice versa.
33
Overall, the study demonstrated the potential of time series models for stock price prediction using machine
learning.
34
REFERENCES
[1] Harivigneshwar CJ, Dharma Venkatesan KB, Ajith R, Jeyanthi R. Modelling of multivariate systems
using vector autoregression (VAR). In2019 Innovations in Power and Advanced Computing Technologies
(i-PACT) 2019 Mar 22 (Vol. 1, pp. 1-6). IEEE
[2] Dynamic Mode Decomposition Y. Yu, Y. Zhang, S. Qian, S. Wang, Y. Hu and B. Yin, "A Low Rank
Dynamic Mode Decomposition Model for Short-Term Traffic Flow Prediction," in IEEE Transactions on
Intelligent Transportation Systems, vol. 22, no. 10, pp. 6547-6560, Oct. 2021, doi:
10.1109/TITS.2020.2994910
[3] D. P. Kuttichira, E. A. Gopalakrishnan, V. K. Menon and K. P. Soman, "Stock price prediction using
dynamic mode decomposition," 2017 International Conference on Advances in Computing,
Communications and Informatics (ICACCI), 2017, pp. 55-60, doi: 10.1109/ICACCI.2017.8125816.
[4] Aasi, B., Imtiaz, S.A., Qadeer, H.A., Singarajah, M. and Kashef, R., 2021, April. Stock Price Prediction
Using a Multivariate Multistep LSTM: A Sentiment and Public Engagement Analysis Model. In 2021 IEEE
International IOT, Electronics and Mechatronics Conference (IEMTRONICS) (pp. 1-8). IEEE.)
[5] Yu Y, Zhang Y, Qian S, Wang S, Hu Y, Yin B. A low rank dynamic mode decomposition model for
short-term traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems. 2020 May
27;22(10):6547-60.
[6] B. Aasi, S. A. Imtiaz, H. A. Qadeer, M. Singarajah and R. Kashef, "Stock Price Prediction Using
aMultivariate Multistep LSTM: A Sentiment and Public Engagement Analysis Model," 2021 IEEE
International IOT, Electronics and Mechatronics Conference (IEMTRONICS), 2021, pp. 1-8, DOI:
10.1109/IEMTRONICS52119.2021.9422526
[7] Ji X, Wang J, Yan Z. A stock price prediction method based on deep learning technology. International
Journal of Crowd Science. 2021 Mar 5.
[8] D. P. Kuttichira, E. A. Gopalakrishnan, V. K. Menon and K. P. Soman, "Stock price prediction using
dynamic mode decomposition," 2017 International Conference on Advances in Computing,
Communications and Informatics (ICACCI), 2017, pp. 55-60, doi: 10.1109/ICACCI.2017.8125816.
35
[9] Siddarth Somarajan, Monica Shankar, Tanmay Sharma and R Jeyanthi, ”Modelling and Analysis of
Volatility in Time series Data,” Soft Computing and Signal Processing (AISC), Proceeding of ICSCSP
2018, Vol: 2 , Springer Nature Singapore, 2019
[10] Khan, S. and Alghulaiakh, H., 2020. ARIMA model for accurate time series stocks. International
Journal of Advanced Computer Science and Applications, 11(7).
[11] Musatov, D. and Petrusevich, D., 2022. Modeling of forecasts variance reduction at multiple time series
prediction averaging with ARMA (1, q) functions. In CEUR Workshop Proceedings (Vol. 3091, pp. 1-11)
[12]. Stavros Degiannakis, Evdokia Xekalaki. (2007) Assessing the performance of a prediction error
criterion model selection algorithm in the context of ARCH models. Applied Financial Economics 17:2,
pages 149-
36