0% found this document useful (0 votes)
22 views47 pages

Group 9 Time Series Data Analysis (ARIMA)

This document discusses time series data analysis using ARIMA models. It introduces time series components like trend, seasonality, cyclicity and irregularity. It explains how to check for stationarity using Dickey-Fuller test and how to make a non-stationary series stationary using differencing. The document also explains AR, MA, ARMA and ARIMA models and their properties.

Uploaded by

bitterguard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views47 pages

Group 9 Time Series Data Analysis (ARIMA)

This document discusses time series data analysis using ARIMA models. It introduces time series components like trend, seasonality, cyclicity and irregularity. It explains how to check for stationarity using Dickey-Fuller test and how to make a non-stationary series stationary using differencing. The document also explains AR, MA, ARMA and ARIMA models and their properties.

Uploaded by

bitterguard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

NATIONAL INSTITUTE OF TECHNOLOGY

TIRUCHIRAPALLI

DEPARMENT OF PRODUCTION ENGINEERING

DATA ANALYTICS

TOPIC : TIME SERIES DATA ANALYSIS


(ARIMA)

Under The Guidance of- SANKAR M (214223025)


Dr. Vimal K E K SATYAM (214223026)
SHIVAM SINGH (214223027)

M.Tech
Industrial Engineering & Management
OVERVIEW

Introductionto time series


Components of time series

1. Trend 2. Seasonal 3. Cyclic 4. Irregular


Forecasting models

ARIMA model
Implementation.
R programming
Python – 1.PANDAS 2. MATPLOTLIB
INTRODUCTION
 A time series is a sequence of data being recorded at
specific time intervals.
 These data points are analyzed to forecast a future.
 It is time dependent.

Categories and terminologies


 Univariate and multivariate
 Linear or non- linear
 Discrete or Continuous
 Stationary and non-stationary
COMPONENTS

 Trend : It defines whether, over a period, time series increases


or decreases. That is, it has an upward (increasing) trend or
downward (decreasing) trend.
 Eg : Population growth over the years can be seen as an
upward trend.
COMPONENTS

 Seasonality : It defines a pattern that repeats over a period.


This pattern which repeats periodically is called
as seasonality.
 It is a short-term change usually caused by climate, traditional
habits etc.
 Eg : Sales of ice cream increases during summer season.
COMPONENTS
Cyclicity : Cyclicity is also a pattern in the time series data
but it repeats aperiodically, meaning it doesn’t repeat after
fixed intervals.
 It is a medium-term variation caused by circumstances which
repeats in cycle.
 Eg :5 years of economic growth followed by 2 years of
economic recession, followed by 7 years of economic growth
followed by 1 year of economic recession.
COMPONENTS

 Irregularity : Irregular or random variations in a time


series are caused by unpredictable influences, which are
not regular. These variations are caused by incidences such
as war, earthquake , strike, flood etc. There is no defined
statistical technique for measuring random fluctuations in a
time series.
Combination of four models
Considering the effects of these components, two
different types of models are generally used for a time series.
Additive model
Y(t) = T(t) + S(t) + C(t) + I(t)
Assumption : Components are independent to each
other.
Multiplicative model
Y(t) = T(t) * S(t) * C(t) * I(t)
Assumption : Components are not necessarily
independent, and they can affect each other.
STATIONARY AND NON STATIONARY

 Stationary Time Series


 Defining a Stationary Time Series, it is the one where the
mean and the variance are both constant over time. In other
words, it is the one whose properties do not depend on the
time at which the series is observed. Thus, the Time Series
is a flat series without trend, with constant variance over
time, a constant mean, a constant autocorrelation and no
seasonality. This makes a Stationary Time Series easy to
predict.
STATIONARY AND NON
STATIONARY
 Non-Stationary Time Series
 A Non-Stationary Time Series is one where either mean
or variance or both are not constant over time.
 Most statistical forecasting methods are based on the
assumptions that the time series can be rendered
approximately stationary after mathematical
transformations.
STATIONARITY

Statistical properties more or less same over time


Properties:-

1) Constant mean
2) Constant variance
3) No seasonality
Seasonality : Repeating trends/patterns over time
How to check stationarity
VisualInspection
Dickey Fuller test
How to check stationarity
What Dickey-Fuller test?
Imagine a series where a fraction of the
current value is depending on a fraction of
previous value of the series.
DF builds a regression line between fraction
of the current value yt and fraction of
previous value δyt-1
The usual t-statistic is not valid, thus D-F
developed appropriate critical values. If p
value of DF test is <5% then the series is
stationary.
TESTING STATIONARITY
P value has to be less than 0.05 or 5%.
If p value is greater than 0.05 or 5%, you
accept the null hypothesis, you conclude
that the time series has a unit root.
In that case, you should first difference
the series before proceeding with analysis.
Dickey-Fuller Test for stationarity

 Assume an AR(1) model. The model is non-stationary or a unit root


is present if |ρ| =1
yt = ρyt-1 + et
yt – yt-1 = ρyt-1 – yt-1 + et
yt = (ρ -1)yt-1 + et
yt = γyt-1 + et
 We can estimate the above model and test for the significance of the
coefficient.
 If the null hypothesis is not rejected, γ= 0 then yt is not stationary.
Difference the variable and repeat the Dickey-Fuller test to see if the
differenced variable is stationary.
 If the null hypothesis is rejected, γ > 0, then yt is stationary. Use the
variable.

CONVERT NON-STATIONARY TO STATIONARY

 Differencing: Transformation of the series to a new time series


where the values are the differences between consecutive values.
 Procedure may be applied consecutively more than once, giving
rise to the “first differences”, “second differences”, etc.
 Regular differencing (RD)

(1st order) Xt = Xt – Xt-1


(2nd order) Xt = (
2
Xt – Xt-1 ) = Xt – 2Xt-1 + Xt-2

 Itis unlikely that more than two regular differencing would ever
be needed
 Sometimes regular differencing by itself is not sufficient and
prior transformation is also needed.
(Contd)Example1

US net electricity generation (billion kWh). Other panels show the same
data after transforming and differencing.
(Contd)Example 2

Logs and seasonal differences of the A10 (antidiabetic)


sales data. The logarithms stabilise the variance,
while the seasonal differences remove the seasonality and trend
ARIMA
ARIMA is an acronym that stands for Auto Regressive Integrated
Moving Average.

White Noise Processes


White noise is a series that’s not predictable, as it’s
a sequence of random numbers. If you build a
model and its residuals (the difference between
predicted and actual) values look like white noise,
then the model is a good fit. On the opposite side,
there’s a better model for your dataset if there are
visible patterns in the residuals, then you should go
for other model.
•If µ = 0 then the process is known as Zero Mean White Noise .

Moving Average (MA) Process


 Moving average (MA) models account for the possibility of a
relationship between a variable and the residuals from previous periods.
The previous equation is a qth order of Moving Average model , denoted by
MA(q).
This can also be expressed as :

•A moving average model is simply a linear combination of white noise


processes, so that yt depends on the current and previous values of a white noise
disturbance term .
•This equation could be written with the help of lag operator where Lyt = yt-1
denotes that yt is lagged once.
•In order to show the ith lag of yt the notation would be Li yt = yt-i . Using the lag
operator the above equation can be written as :
Autocorrelation Function
 ACF is the proportion of the auto covariance of yt and yt-k to the
variance of a dependent variable yt .

 The autocorrelation function ACF(k) gives the gross correlation


between yt and yt-k .
 An important property of MA(q) models in general is that there are
non zero autocorrelation for the first q lags , and pk = 0 for all lags k>q
.
 In other words , ACF provides a considerable amount of information
about the order of the dependence q for MA(q).
 Identification of an MA mode is often best done with the ACF .
MA Examples with ACF
MA(1) : Yt = ut +0.8 ut-1
MA(2) : Yt = ut + 0.5 ut-1 + 0.3 ut-2
The given below is a generalized equation of ACF for MA(q) process :
Auto Regression (AR)
 An autoregressive model is one where the current value of a
variable yt depends upon only the values that the variable took in
previous periods plus an error term .
 Generalized equation of AR(p) of order p :

 Where ut is white noise disturbance term .


 The above equation can be rewritten as :

where

 An AR(1) can be converted into infinite MA(q) process.


Partial Auto Correlation Function (PACF)
 Indicates the dependence of Xt on Xt-k when the dependence on all
other variables Xt-1 , Xt-2 ,……..,Xt-k are removed or not
considered .

Ø1 is PAC of order 1

Ø2 is PAC of order 2

 Partial Auto Correlation is calculated using Yule Walker Equation.


 AR(1) Process : Xt = 0.9Xt-1 +ut-1
ARMA Processes
 By combining the AR(p) and MA(q) models , an ARMA (p , q) model
is obtained . Such a model states that the current value of some series
y depends linearly on its own previous values plus a combination of
current and previous values of white noise error term .
 General equation of ARMA is :
 The characteristics of an ARMA process will be a combination of
those from the autoregressive (AR) and moving average (MA)
processes.
 The autocorrelation function will display combinations of
behaviour derived from the AR and MA parts , but for lags beyond
q the ACF will be simply identical to the individual AR(p) model ,
so that the AR part will dominate in the long term .
ARMA TO ARIMA
 In AR , MA , ARMA models there is one assumption or necessity
that the time series has to be stationary .
 If the time series is non stationary then the series has to be
transformed into stationary series .
 In ARMA model the I(integrated) term is zero but when the
differencing is done to make series stationary then I term becomes
non zero.
 After differencing , ARMA model becomes ARIMA model which is
represented by (p , d . q) .
 The general equation of ARIMA is given below :

 It is similar to ARMA equation its just that instead of the given


data(y) the differenced data(y’) is used in the equation.
Procedure of Box-Jenkins Approach
 To build an ARIMA model of order (p , d , q ) there are major 4
steps to be followed .
1. Ensuring Stationarity
2. Identification and Selection of Model Structure
3. Parameter Estimation
4. Model Testing/Validation
Box-Jenkins Approach
1.Identification of Model Structure

Identify if the series is stationary or not using


ADF test.
Remove Non Stationarity if any.
Obtain the order of AR component with the help
of PAC.
Obtain the order of MA component with the
help of ACF.
2.Parameter Estimation

Algorithms are available for parameter


estimation.
One such example is Marquadt’s Algorithm.
Statistical toolboxes can be used for estimation
‘armax’ toolbox in Matlab is one such example.
3. Model Selection

 Model selection is important in time series


analysis as there can be many possible models.
 In general, AR parameters are of order up to 6 and
MA parameters are of order up to 2,3 serve the
purpose .
 A model maybe selected using the following two
criteria from several candidate models
1. Maximum likelihood rule(ML)
2.Mean Square error(MSE)
1. Maximum Likelihood Rule (ML)

Maximum likelihood value for each of the


candidate models is evaluated.
The model with highest likelihood value is
chosen.
This model is mostly used for application where
long term forecasting is needed .
2. Mean Square Error(MSE)

Using a portion of available data(N/2) estimate


the parameters of different models.
Forecast the series for the remaining N/2 data by
using the candidate models.
Estimate the MSE corresponding to each model.
The model with least value of MSE is selected
for prediction.
This method is mostly used when model is
required for short term forecasting.
4. Model Testing / Validation

• First ‘T’ values are used to build the model(say 50% of


the available data) and the rest of data is used to validate
the model.
• All the tests are carried out on the residuals series.
• The tests are performed to examine whether the
following assumptions used in building the model are
valid for the model selection.
The residual series has zero mean – Significance
of residual mean.
The residual series is uncorrelated – Whittle’s
white noise test.

You might also like