MBA Analytics For Finance 12
MBA Analytics For Finance 12
Names of Sub-Units
Overview
This unit begins with the introduction to autoregressive model. It further discusses about the concepts
of autoregressive integrated moving average (ARIMA) model and seasonal autoregressive integrated
moving average (SARIMA) model.
Learning Objectives
Learning Outcomes
https://blog.paperspace.com/time-series-forecasting-autoregressive-models-smoothing-
methods/
https://corporatefinanceinstitute.com/resources/knowledge/other/autoregressive-integrated-
moving-average-arima/
12.1 INTRODUCTION
Time has been an important aspect of data collection for as long as we can remember. Time is a
significant variable of the data in time series analysis. The study of time series analysis allows us to
better understand our world and how we move within it.
Time series analysis is a method for studying a collection of data points over a period of time. Instead of
capturing data points intermittently or arbitrarily, time series analysers record data points at constant
intervals over a predetermined length of time. The ability to depict how variables change over time
distinguishes time series data from other types of data. In other words, time is an important variable
since it reveals how the data changes through time as well as the outcomes. It provides an additional
source of data as well as a predetermined order of data dependencies.
In the previous chapter, we have studied what is time series forecasting and are the techniques and methods
for time series analysis. Now, in this chapter let us study the models of time series forecasting and analysis
and the types of models that will facilitate the forecasting activity of time series data.
It is based on autoregressive models, which include the current value as the dependent variable and N
prior values of the time series as the independent variables. The N stands for ‘order of autoregression.’
2
UNIT 12: Analysing and Forecasting Time Series Data JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Autoregressive models are useful for evaluating nature, economics and other time-varying systems
because they operate on the assumption that previous values have an effect on current values.
Multiple regression methods employ a linear combination of predictors to forecast a variable, whereas
autoregressive models use a combination of historical values to forecast the variable.
This forecasting technique makes use of the relationship of values (Yt) to previous values (Yt-1, Yt-3, …..
). This is a regression technique in which independent variables are time lagged versions of dependent
variables.
An autoregressive model containing independent variables for three time periods looks like:
Y = b0 + b1 Yt-1 + b2 Yt-2 + b3 Yt-3
The differenced autoregressive model and the moving average model are combined in the (ARIMA)
model. ARIMA (autoregressive integrated moving average) is a statistical analysis technique that uses
time series data to better comprehend a data collection or predict future trends.. If a statistical model
predicts future values based on past values, it is called autoregressive. For example, an ARIMA model
might try to anticipate a company’s earnings based on prior periods or predict a stock’s future pricing
based on historical performance.
Each of the ARIMA model’s essential components has a descriptive acronym that defines what they mean:
The “AR” in ARIMA stands for autoregression, which means the model is based on a dependent
relationship between current data and previous values. In other words, it demonstrates that the data
has been regressed against its previous values.
The letter “I” stands for “integrated,” which denotes that the information is static. Time-series data
that has been made “stationary” by subtracting observations from previous values is referred to as
“Stationary data”.
3
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Financial Analytics
The “MA” stands for moving average model, which means that the model’s forecast or outcome is
linearly dependent on previous values. It also implies that forecasting errors are linear consequences
of previous errors. It’s important to note that moving average models are not the same as statistical
moving averages.
Each of the AR, I and MA components is represented as a parameter in the model. Specific integer values
are assigned to the parameters, indicating the type of ARIMA model.
The ARIMA model is commonly denoted by the parameters (p, d, q), which can be given multiple values
to change and apply the model in various ways. The parameters are integers that must be defined in
order for the model to work. They can also be set to 0, indicating that they would be ignored in the model.
The ARIMA model can then be transformed into:
ARMA model (no stationary data, d = 0)
AR model (no moving averages or stationary data, just an autoregression on past values, d = 0, q = 0)
MA model (a moving average model with no autoregression or stationary data, p = 0, d = 0)
Hence, the ARIMA models can be defined as:
ARIMA (1, 0, 0) – known as the first-order autoregressive model
ARIMA (0, 1, 0) – known as the random walk model
ARIMA (1, 1, 0) – known as the differenced first-order autoregressive model and so on.
The ARIMA model seeks to estimate the coefficients, which is the outcome of using prior data points to
anticipate values, once the parameters (p, d, q) have been set.
A seasonal autoregressive integrated moving average (SARIMA) model is a step up from an ARIMA
model that uses seasonal trends as a concept. Seasonal impacts are common in many time series data
sets. Take, for instance, the average temperature in a four-season location. On an annual basis, there will
be a seasonal influence, and the temperature in this season will almost certainly have a high association
with the temperature observed the previous year in the same season.
Additional seasonal parameters are added to the ARIMA model to create a seasonal ARIMA model
[...] The seasonal component of the model consists of words that are quite similar to the non-seasonal
components, but they involve seasonal backshifts.
The techniques of ARIMA and SARIMA both are considered algorithms for forecasting. SARIMA takes
into account prior values as well as seasonality trends. SARIMA is substantially more powerful than
ARIMA in forecasting complicated data fields with cycles since it includes seasonality as a parameter.
4
UNIT 12: Analysing and Forecasting Time Series Data JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
The ARIMA model’s Autoregressive (AR), Integrated (I) and Moving Average (MA) elements stay the
same. Seasonality improves the robustness of the SARIMA model. The SARIMA model is represented as –
ARIMA (p, d, q) (P, D, Q)m
Where,
For the seasonal components of the model, the uppercase notation is used, and for the non-seasonal
elements of the model, we use lowercase notation. The P, D and Q values for seasonal components of the
model can be inferred from the ACF and PACF plots of the data, similarly to ARIMA.
Time series analysis is a method for studying a collection of data points over a period of time.
Time is an important variable since it reveals how the data changes through time as well as
the outcomes. It provides an additional source of data as well as a predetermined order of data
dependencies.
The term ‘Autoregression’ refers to a special branch of regression analysis. Regression analysis
provides a ‘best-fit’ mathematical equation for the relationship between the dependent variable
and independent variable aimed at the analysis of time series.
In statistics, econometrics, and signal processing, an autoregressive (AR) model is a representation
of a type of random process used to describe certain time-varying processes in nature, economics
and signal processing.
If a statistical model predicts future values based on past values, it is called autoregressive.
Autoregressive models are predicated on the assumption that the future will be similar to the past.
Multiple regression methods employ a linear combination of predictors to forecast a variable,
whereas autoregressive models use a combination of historical values to forecast the variable.
Autoregression is useful in locating seasonal or cyclical effects in time series data.
The Autoregressive Integrated Moving Average (ARIMA) model interprets and forecasts data using
time-series data and statistical analysis.
ARIMA (autoregressive integrated moving average) is a statistical analysis model that uses time
series data to better comprehend a data collection or predict future trends. The ARIMA model is
commonly denoted by the parameters (p, d and q), which can be given multiple values to change and
apply the model in various ways.
5
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Financial Analytics
The ARIMA model seeks to estimate the coefficients and, which is the outcome of using prior data
points to anticipate values, once the parameters (p, d, q) have been set.
SARIMA (Seasonal ARIMA) is a modification of ARIMA that explicitly allows univariate time series
data with a seasonal component.
Additional seasonal parameters are added to the ARIMA model to create a seasonal ARIMA model.
The techniques of ARIMA and SARIMA both are considered algorithms for forecasting. SARIMA
takes into account prior values as well as seasonality trends.
The ARIMA model’s Autoregressive (AR), Integrated (I) and Moving Average (MA) elements stay the
same. Seasonality improves the robustness of the SARIMA model.
12.6 GLOSSARY
The primary goal of this research is to use the ARIMA model to estimate COVID-19’s short-term
pervasiveness based on countrywide and selected region-wide data for the time after September 16,
2020 in India, where the virus is spreading quickly and wreaking havoc. In addition, the current research
aims to pinpoint the COVID-19 cases’ hypothetic inflection point and final size.
For a country like India, forecasting future COVID-19 confirmed cases using mathematical and statistical
models is important for interrupting the transmission chain. Because India is the world’s second most
populous country, there is a high danger of transmission. The ongoing fatal COVID-19 pandemic has
wreaked havoc in India, affecting nearly every sector.
From 30 January to 16 September 2020, the data in this study represent the number of confirmed
COVID-19 cases. The database includes the general national and regional trends for four Indian states:
6
UNIT 12: Analysing and Forecasting Time Series Data JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Maharashtra (MH), Tamil Nadu (TN), Andhra Pradesh (AP), and Karnataka (KR), all of which are located
in western India. These states were chosen for the study because they are key in the Indian pandemic,
with the largest number of COVID-19 confirmed cases as of September 16, 2020.
Time series analysis requires the use of an autoregressive integrated moving average model. It examines
the provided data to determine the data’s outline or drift in to forecast roughly future values, allowing
for better future decision-making. For good forecasting, the ARIMA model should include a least of 50
or, preferably, 100 data points. The majority of the studies on COVID-19 trends and projections (using
ARIMA modelling) in the prior literature used less than 100 data points.
Although the outbreak times of the states entering the contagious pool of the epidemic varied from 9
March 2020 (for Maharashtra), 7 March 2020 (for Tamil Nadu), 2 March 2020 (for Delhi), and 9 March
2020 (for Karnataka), the ARIMA model can be considered a prediction model for the underlying time
series data, as in all cities, more than 100 days have passed since the outbreak of the COVID-19 epidemic
in India (30 January 2020).
Auto regression (p) (considers own lagged values), trend difference (d) (number of times to achieve
stationarity) and moving average (q) are the underlying models (lag values of forecasted error). The
ARIMA (p,d,q) model’s regression form (1) is as follows:
Yt=+1Yt−1+2Yt−2+……t−1+t−2H
The above-mentioned regression model chooses the best ARIMA model parameters based on three
criteria: (a) using the AIC; (b) examining the autocorrelation function (ACF) to determine the q parameter,
i.e., the number of moving average (MA) coefficients; and (c) examining the residuals of the fitted series
to determine the p parameter, i.e., the number of auto regression coefficients in an ARIMA model.
According to the empirical findings, the number of verified COVID-19 cases will reach 25,669,294 in the
next 230 days at the national level. The final epidemic size at the national level, according to the model,
will be between 5,020,359 and 25,669,294 cases. The hypothetic inflection point of the cumulative number
of COVID-19 confirmed cases are predicted to be achieved at the national level at least after 23 April
2021, based on exponential increase in the series.
Because of its ease of application and understanding, ARIMA modelling is being used in this study
to track the spread of COVID-19 disease at both the national and regional levels. It has far-reaching
consequences for predicting in a variety of areas, including health care.
Questions
1. Which is the statistical model that is used in the above case study?
(Hint: To use the ARIMA model to estimate COVID-19’s short-term pervasiveness based on countrywide
and selected region-wide data for the time after September 16, 2020, in India).
2. What are the parameters that are used in the ARIMA model?
(Hint: p, d, q Auto regression (p) (considers own lagged values), trend difference (d) (number of times
to achieve stationarity), and moving average (q) are the underlying models).
7
JGI JAINDEEMED-TO-BE UNI VE RSI TY
Financial Analytics
https://medium.com/@kfoofw/seasonal-lags-sarima-model-fa671a858729
https://neptune.ai/blog/arima-sarima-real-world-time-series-forecasting-guide
8
UNIT 12: Analysing and Forecasting Time Series Data JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Discuss with your friends and explore the practical applications of the autoregressive models –
ARIMA and SARIMA. Try to understand the use of these models through Python.