0% found this document useful (0 votes)

26 views85 pages

Intro To Time Series

Uploaded by

kipropnathan15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views85 pages

Intro To Time Series

Uploaded by

kipropnathan15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

Intro to Time Series

Introduction
 Time series is a set of observations, each one being recorded at a specific time. (e.g.,
Annual GDP of a country, Sales figure, etc.)

 Discrete time series is one in which the set of time points at which observations are
made is a discrete set. (e.g., All above including irregularly spaced data)

 Continuous time series are obtained when observations are made continuously over
some time intervals. (e.g., ECG graph)

 Forecasting is estimating how the sequence of observations will continue in to the

future. (e.g., Forecasting of major economic variables like GDP, Unemployment,
Inflation, Exchange rates, Production and Consumption)

 Forecasting is very difficult, since it’s about the future! (e.g., forecasts of daily cases
of COVID-19)
Time Series Data
 A time series is a sequence of observations over time. What makes it distinguishable from other
statistical analyses is the explicit recognition of the importance of the order in which the
observations are made. Also, unlike many other problems where observations are independent, in
time series observations are most often dependent.

 Why do we need special models for time series data?

 Prediction of the future based on knowledge of the past (most important).

 To control the process producing the series.
 To have a description of the salient features of the series.

 Applications of time series forecasting

 Economic planning
 Sales forecasting
 Inventory (stock) control
 Exchange rate forecasting
 Etc…
Use of Time Series Data
 To develop forecast model
 What will the rate of inflation be next year?

 To estimate dynamic causal effects

 If the rate of interest increases the interest rate now, what will be the effect on the
rates of inflation and unemployment in 3 months? in 12 months?

 What is the effect over time on electronics good consumption of a hike in the
excise duty?

 Time dependent analysis

 Rates of inflation and unemployment in the country can be observed only over
time!
A Forecasting Problem: India / U.S. Foreign
Exchange Rate (EXINUS)
 Source: FRED ECONOMICS DATA (Shaded areas indicate US recessions)
 Units: Indian Rupees to One U.S. Dollar, Not Seasonally Adjusted
 Frequency: Monthly (Averages of daily figures)
Forecasting: Assumptions
 Time series Forecasting: Data collected at regular intervals of time (e.g.,
Weather and Electricity Forecasting).
 Assumptions: (a) Historical information is available;
(b) Past patterns will continue in the future.
Time Series Components
 Trend (𝑇𝑡 ) : pattern exists when there is a long-term increase or decrease in the data.

 Seasonal (𝑆𝑡 ) : pattern exists when a series is influenced by seasonal factors (e.g., the
quarter of the year, the month, or day of the week).

 Cyclic (𝐶𝑡 ) : pattern exists when data exhibit rises and falls that are not of fixed period
(duration usually of at least 2 years).
 Decomposition : 𝑌𝑡 = 𝑓(𝑇𝑡 ; 𝑆𝑡 ; 𝐶𝑡 ; 𝐼𝑡 ) , where 𝑌𝑡 is data at period t and 𝐼𝑡 is irregular
component at period t.
 Additive decomposition: : 𝑌𝑡 = 𝑇𝑡 + 𝑆𝑡 + 𝐶𝑡 + 𝐼𝑡

 Multiplicative decomposition: 𝑌𝑡 = 𝑇𝑡 ∗ 𝑆𝑡 ∗ 𝐶𝑡 ∗ 𝐼𝑡

 A stationary series is : roughly horizontal, constant variance and no patterns

predictable in the long-term.
Auto Regression Analysis
 Regression analysis for time-ordered data is known as Auto-Regression
Analysis

 Time series data are data collected on the same observational unit at multiple
time periods

Example: Indian rate of price inflation

Modeling with Time Series Data
 Correlation over time
 Serial correlation, also called autocorrelation
 Calculating standard error

 To estimate dynamic causal effects

 Under which dynamic effects can be estimated?

 How to estimate?

 Forecasting model Can we predict the tend at a time say 2017?

 Forecasting model build on regression model
Some Notations and Concepts
 𝑌𝑡 = Value of Y in a period t
 Data set [Y1, Y2, … YT-1, YT]: T observations on the time series random
variable Y
 Assumptions
 We consider only consecutive, evenly spaced observations
 For example, monthly, 2000-2015, no missing months

 A time series 𝑌𝑡 is stationary if its probability distribution does not change over
time, that is, if the joint distribution of (Yi+1, Yi+2, …, 𝑌𝑖+𝑇 ) does not depend on i.

 Stationary property implies that history is relevant. In other words, Stationary

requires the future to be like the past (in a probabilistic sense).

 Auto Regression analysis assumes that 𝑌𝑡 is stationary.

Some Notations and Concepts

Some Notations and Concepts
 Autocorrelation
 The correlation of a series with its own lagged values is called autocorrelation
(also called serial correlation)

Definition : j-th Autocorrelation

The j-th autocorrelation, denoted by 𝜌𝑗 is defined as

𝐶𝑜𝑣(𝑌𝑡 , 𝑌𝑡−𝑗 )
𝜌𝑗 =
𝜎𝑌𝑡 𝜎𝑌𝑡−𝑗
Where, 𝐶𝑜𝑣(𝑌𝑡 , 𝑌𝑡−𝑗 ) is the j-th autocovariance.
 For the given data, say ρ1 = 0.84
 This implies that the Dollars per Pound is highly serially correlated

 Similarly, we can determine ρ2 , ρ3 …. etc., and hence different regression analyses

Auto-Regression Model for Forecasting
 A natural starting point for forecasting model is to use past values of Y, that is,
Yt-1, Yt-2, … to predict Yt

 An autoregression is a regression model in which Yt is regressed against its

own lagged values.

 The number of lags used as regressors is called the order of autoregression

 In first order autoregression (denoted as AR(1)), Yt is regressed against Yt-1
 In p-th order autoregression (denoted as AR(p)), Yt is regressed against,
Yt-1, Yt-2, …,Yt-p .
p-th Order AutoRegression Model
Definition : p-th AutoRegression Model

 For example, AR(1) is 𝑌𝑡 = 𝛽0 + 𝛽1 𝑌𝑡 + 𝜀𝑡

 The task in AR analysis is to derive the ‘best’ possible values for
𝛽𝑖 given a time series 𝑌𝑡 .
Computing AR Coefficients
 A number of techniques known for computing the AR coefficients
 The most common method is called Least Squares Method (LSM)
 The LSM is based upon the Yule-Walker equations

 Here, ri (i = 1, 2 , 3, …, p-1) denotes the ith auto correlation coefficient.

 β0 can be chosen empirically, usually taken as zero.
AutoRegressive Integrated Moving Average (ARIMA) Model
 The ARIMA model, introduced by Box and Jenkins (1976), is a linear regression model indulged in
tracking linear tendencies in stationary time series data.

 AR: autoregressive (lagged observations as inputs) I: integrated (differencing to make series stationary)
MA: moving average (lagged errors as inputs).

 The model is expressed as ARIMA 𝑝, 𝑑, 𝑞 where 𝑝, 𝑑 𝑎𝑛𝑑 𝑞 are integer parameter values that decide the
structure of the model.

 More precisely, 𝑝 𝑎𝑛𝑑 𝑞 are the order of the AR model and the MA model respectively, and parameter d is
the level of differencing applied to the data.

 The mathematical expression of the ARIMA model is as follows:

𝑦𝑡 = 𝜃0 + 𝜙1 𝑦𝑡−1 + 𝜙2 𝑦𝑡−2 + ⋯ + 𝜙𝑝 𝑦𝑡−𝑝 + 𝜀𝑡 − 𝜃1 𝜀𝑡−1 − 𝜃2 𝜀𝑡−2 − ⋯ − 𝜃𝑞 𝜀𝑡−𝑞

where 𝑦𝑡 is the actual value, 𝜀𝑡 is the random error at time t, 𝜙𝑖 and 𝜃𝑗 are the coefficients of the model.

 It is assumed that 𝜀𝑡−1 𝜀𝑡−1 = 𝑦𝑡−1 − 𝑦𝑡−1 has zero mean with constant variance, and satisfies the i.i.d.
condition.

 Three basic Steps: Model identification, Parameter Estimation, and Diagnostic Checking.
Differencing in ARIMA Model
ARIMA model
ACF / PACF Plots
ACF / PACF Plots : Example
Forecast Evaluation
Performance metrics such as mean absolute error (MAE), root mean square error
(RMSE), and mean absolute percent error (MAPE) are used to evaluate the
performances of different forecasting models for the unemployment rate data sets:
𝑛
1 2;
𝑅𝑀𝑆𝐸 = 𝑦𝑖 − 𝑦𝑖
𝑛
𝑖=1
𝑛
1
𝑀𝐴𝐸 = 𝑦𝑖 − 𝑦𝑖 ;
𝑛
𝑖=1
1 𝑛 𝑦𝑖 −𝑦𝑖
𝑀𝐴𝑃𝐸 = 𝑛 𝑖=1 ,
𝑦𝑖

Where 𝑦𝑖 is the actual output, 𝑦𝑖 is the predicted output, and n denotes the number
of data points.
By definition, the lower the value of these performance metrics, the better is the
performance of the concerned forecasting model.
Time Series
Analysis using R
Time Series Plot:
The graphical representation of time series data by taking time on x axis & data on y
axis.
A plot of data over time

Example
The demand for a commodity E15 for last 20 months from April 2012 to October 2013
is given in E15demand.csv file. Draw the time series plot
Month Demand Month Demand
1 139 11 193
2 137 12 207
3 174 13 218
4 142 14 229
5 141 15 225
6 162 16 204
7 180 17 227
8 164 18 223
9 171 19 242
10 206 20 239
24
Reading data to R
mydata <- read.csv("E15demand.csv")
E15 = ts(mydata$Demand, start = c(2012,4), end = c(2013,10), frequency = 12)
E15
plot(E15, type = "b")

For quarterly data, frequency = 4

For monthly data, frequency = 12
Reading data to R
E15 = ts(mydata$Demand)
E15
plot(E15, type = "b")
Trend:
A long term increase or decrease in the data
Example: The data on Yearly average of Indian GDP during 1993 to 2005.

Year GDP
1993 94.43
1994 100.00
1995 107.25
1996 115.13
1997 124.16
1998 130.11
1999 138.57
2000 146.97
2001 153.40
2002 162.28
2003 168.73
Seasonal Pattern:
The time series data exhibiting rises and falls influenced by seasonal factors
Example: The data on monthly sales of a branded jackets

Month Sales Month Sales Month Sales Month Sales

Jan-02 164 Jan-03 147 Jan-04 139 Jan-05 151
Feb-02 148 Feb-03 133 Feb-04 143 Feb-05 134
Mar-02 152 Mar-03 163 Mar-04 150 Mar-05 164
Apr-02 144 Apr-03 150 Apr-04 154 Apr-05 126
May-02 155 May-03 129 May-04 137 May-05 131
Jun-02 125 Jun-03 131 Jun-04 129 Jun-05 125
Jul-02 153 Jul-03 145 Jul-04 128 Jul-05 127
Aug-02 146 Aug-03 137 Aug-04 140 Aug-05 143
Sep-02 138 Sep-03 138 Sep-04 143 Sep-05 143
Oct-02 190 Oct-03 168 Oct-04 151 Oct-05 160
Nov-02 192 Nov-03 176 Nov-04 177 Nov-05 190
Dec-02 192 Dec-03 188 Dec-04 184 Dec-05 182
Seasonal Pattern:
The time series data exhibiting rises and falls influenced by seasonal factors
Trend and Seasonal Patterns Combined
The time series data may include a combination of trend and seasonal patterns
Example: The data on monthly sales of an aircraft component is given below:

Month Sales Month Sales Month Sales

1 742 21 1341 41 1274
2 697 22 1296 42 1422
3 776 23 1066 43 1486
4 898 24 901 44 1555
5 1030 25 896 45 1604
6 1107 26 793 46 1600
7 1165 27 885 47 1403
8 1216 28 1055 48 1209
9 1208 29 1204 49 1030
10 1131 30 1326 50 1032
11 971 31 1303 51 1126
12 783 32 1436 52 1285
13 741 33 1473 53 1468
14 700 34 1453 54 1637
15 774 35 1170 55 1611
16 932 36 1023 56 1608
17 1099 37 951 57 1528
18 1223 38 861 58 1420
19 1290 39 938 59 1119
20 1349 40 1109 60 1013
Stationary Series:
A series free from trend and seasonal patterns
A series exhibits only random fluctuations around mean

Test for Stationary: Unit root test

Augmented Dickey Fuller Test (ADF) :
Checks whether any specific patterns exists in the series
H0: data is non stationary
H1: data is stationary
A small p-value suggest data is stationary.

Kwiatkowski-Phillips-Schmidt-Shin Test (KPSS) :

Another test for stationary.
Checks especially the existence of trend in the data set
H0: data is stationary
H1: data is non stationary
A large p-value suggest data is stationary.
Check stationary of data
Example : The data on daily shipments is given in shipment.csv. Check whether the
data is stationary

Day Shipments Day Shipments

1 99 13 101
2 103 14 111
3 92 15 94
4 100 16 101
5 99 17 104
6 99 18 99
7 103 19 94
8 101 20 110
9 100 21 108
10 100 22 102
11 102 23 100
12 101 24 98
Stationary Series: A series free from trend and seasonal patterns.
A series exhibits only random fluctuations around mean

Example : The data on daily shipments is given in shipment.csv. Check whether the
data is stationary
R code
mydata <- read.csv("shipment.csv")
shipments = ts(mydata$Shipments)
plot(shipments, type = "b")
Test for checking series is Stationary: Unit root test in R
ADF Test

R Code
install.packages("tseries")
library("tseries")
adf.test(shipments)

Statistic Value
Dickey-Fuller -3.2471
P value 0.09901

Since p value = 0.099 < 0.1, the data is stationary at 10% significant
level
Test for checking series is Stationary : Unit root test in R

KPSS test

R Code
kpss.test(shipments)

Statistic Value
KPSS Level 0.24322
P value > 0.1

Since p value > 0.1 >= 0.1, the data is stationary at 10% level of
significance
Differencing: A method for making series stationary
A differenced series is the series of difference between each observation 𝑌𝑡 and the
previous observation 𝑌𝑡−1
𝑌𝑡′ = 𝑌𝑡 − 𝑌𝑡−1

A series with trend can be made stationary with 1st differencing

A series with seasonality can be made stationary with seasonal differencing

Example: Is it possible to make the GDP data given in GDP.csv stationary?

Differencing: A method for making series stationary

Example: Is it possible to make the GDP data given in GDP.csv stationary?

R Code
>mydata = ts(GDP$GDP)
> plot(mydata, type = "b")

KPSS Statistic 0.48402

P value 0.04527

Conclusion
Series has a linear trend
KPSS test (p value < 0.05) shows data is not stationary
Differencing: A method for making data stationary

Example: Is it possible to make the GDP data given in GDP.csv stationary?

Identify the number of differencing required

R Code
install.packages("forecast")
library(forecast)
ndiffs(GDP)

Differencing required is 1
Yt’ = Yt – Yt-1

mydiffdata = diff(GDP, difference = 1)

plot(mydiffdata, type = "b")
adf.test(mydiffdata)
kpss.test(mydiffdata)
Differencing: A method for making series stationary
Example: Is it possible to make the GDP data given in GDP.csv stationary?

Test Statistic P value

ADF -5.0229 < 0.01
KPSS 0.20905 >0.1

Conclusion: Series became stationary after 1st order differencing

Single Exponential Smoothing:
Give more weight to recent values compared to the old values
More efficient for stationary data without any seasonality and trend

Single Exponential Smoothing: Methodology

Let y1,y2, - - - yt be the values, then
yt+1 estimate = St+1 =  yt + (1- ) St
where 0    1 and S1 = y1
Example: The data on ad revenue from an advertising agency for the last 12 months is
given in Amount.csv. Forecast the ad revenue from the agency in the future
month using single exponential smoothing method with best value of ?

Month Amount Month Amount

1 9 7 11
2 8 8 7
3 9 9 13
4 12 10 9
5 9 11 11
6 12 12 10
Example: The data on ad revenue from an advertising agency for the last 12 months is
given below. Forecast the ad revenue from the agency in the future month
using single exponential smoothing method with best value of ?

R code
Reading and plotting the data
mydata <- read.csv("Amount.csv")
amount = ts(mydata$Amount)
plot(amount, type ="b")
Example: The data on ad revenue from an advertising agency for the last 12 months is
given below. Forecast the ad revenue from the agency in the future month
using single exponential smoothing method with best value of ?

R code
Checking whether series is stationary
library(forecast)
adf.test(amount)
kpss.test(amount)

Test Statistic P value

ADF -2.3285 0.4472
KPSS 0.24038 >0.1

ADF and KPSS tests show that the series is stationary

R code
Fitting the model
mymodel = HoltWinters(amount, beta = FALSE, gamma = FALSE)
mymodel

Smoothing parameter value

alpha 0.1285076
Example: The data on ad revenue from an advertising agency for the last 12 months is
given below. Forecast the ad revenue from the agency in the future month
using single exponential smoothing method with best value of ?
R code

Actual Vs Fitted plot

plot(mymodel)
Example: The data on ad revenue from an advertising agency for the last 12 months is
given below. Forecast the ad revenue from the agency in the future month
using single exponential smoothing method with best value of ?

R code

Computing predicted values and residuals (errors)

pred = fitted(mymodel)
res = residuals(mymodel)
outputdata = cbind(amount, pred[,1], res)
write.csv(outputdata, “amount_outputdata.csv")
Example: The data on ad revenue from an advertising agency for the last 12 months is
given below. Forecast the ad revenue from the agency in the future month
using single exponential smoothing method with best value of ?

Month Actual Predicted Error

1 9
2 8 9 -1
3 9 8.8715 0.12851
4 12 8.8880 3.11199
5 9 9.2879 -0.2879
6 12 9.2509 2.74908
7 11 9.6042 1.3958
8 7 9.7836 -2.7836
9 13 9.4259 3.57414
10 9 9.8852 -0.8852
11 11 9.7714 1.22859
12 10 9.9293 0.0707
Example: The data on ad revenue from an advertising agency for the last 12 months is
given below. Forecast the ad revenue from the agency in the future month
using single exponential smoothing method with best value of ?
Model diagnostics

Residual = Actual – Predicted

Mean Absolute Error: MAE
Root Mean Square Error: RMSE
Mean Absolute Percentage Error: MAPE
Indian Statistical Institute

abs_res = abs(res)
res_sq = res^2
pae = abs_res/ amount

50
Example: The data on ad revenue from an advertising agency for the last 12 months is
given below. Forecast the ad revenue from the agency in the future month
using single exponential smoothing method with best value of ?
Model diagnostics

Month Absolute Error Error Squares Absolute Error / Actual

1.0000 1.0000 1.0000 0.1250
2.0000 0.1285 0.0165 0.0143
3.0000 3.1120 9.6845 0.2593
4.0000 0.2879 0.0829 0.0320
5.0000 2.7491 7.5574 0.2291
6.0000 1.3958 1.9483 0.1269
7.0000 2.7836 7.7483 0.3977
8.0000 3.5741 12.7745 0.2749
9.0000 0.8852 0.7835 0.0984
10.0000 1.2286 1.5094 0.1117
11.0000 0.0707 0.0050 0.0071
Example: The data on ad revenue from an advertising agency for the last 12 months is
given below. Forecast the ad revenue from the agency in the future month
using single exponential smoothing method with best value of ?
Model diagnostics
Statistic Description R Code Value
ME Average residuals mean(res) 0.6638322
MAE Average of absolute residuals mean(abs_res) 1.565
MSE Average of residual squares mse = mean(res_sq) 3.919
RMSE Square root of MSE sqrt(mse) 1.980
MAPE Average of absolute error / actual mean(PAE)*100 15.23%

Criteria

MAPE < 10% is reasonably good

MAPE < 5 % is very good
Example: The data on ad revenue from an advertising agency for the last 12 months is
given below. Forecast the ad revenue from the agency in the future month
using single exponential smoothing method with best value of ?
Model diagnostics - Normality of Errors with zero
R Code
qqnorm(res)
qqline(res)
shapiro.test(res)
mean(res)

Statistic (w) P value

0.962 0.7963

Error Mean 0.6638

Example: The data on ad revenue from an advertising agency for the last 12 months is
given below. Forecast the ad revenue from the agency in the future month
using single exponential smoothing method with best value of ?
Model diagnostics – Normal Q – Q plot
Forecast and Prediction Interval
Prediction interval : Predicted value  z MSE
where z = width of prediction interval

Prediction Interval z
90% 1.645
95% 1.960
99% 2.576

Forecasted value St+1 = yt + (1 - )St

Forecasted value S13 = y12 + (1 - )S12
Forecasted value S13 = 0.1285076 x 10 + (1 - 0.1285076) x 9.9293 = 9.9383
Forecast
R Code
library(forecast)
forecast = forecast(mymodel, 1)
forecast
plot(forecast)

80% Prediction Interval 95% Prediction Interval

Month Forecast
Lower Upper Lower Upper
13 9.938382 7.431552 12.44521 6.104517 13.77225
Forecast Plot
TIME SERIES MODELING

General form of linear model

y is modeled in terms of x’s
Y = a +b1x1+b2x2+ - - -+bkxk
Step 1: Check Correlation between y and x’s
y should be correlated with some of the x’s

Time series model

Generally there will not be any x’s
Hence patterns in y series is explored
y will be modeled in terms of previous values of y
yt = a +b1yt-1+b2yt-2+ - -
Step 1: Check correlation between yt and yt-1, etc
correlation between y and previous values of y are called autocorrelation
Example: Check the auto correlation up to 3 lags in GDP data

Year GDP(yt) yt-1 yt-2 yt-3

1993 94.43 Lag variables Auto Correlation
1994 100 94.43 1 yt vs yt-1 0.9985
1995 107.3 100 94.43 2 yt vs yt-2 0.9984
1996 115.1 107.3 100 94.43 3 yt vs yt-3 0.9981
1997 124.2 115.1 107.3 100
1998 130.1 124.2 115.1 107.3
1999 138.6 130.1 124.2 115.1
2000 147 138.6 130.1 124.2
2001 153.4 147 138.6 130.1
2002 162.3 153.4 147 138.6
2003 168.7 162.3 153.4 147
Example: Check the auto correlation up to 3 lags in GDP data

R Code
mydata <- read.csv("Trens_GDP.csv")
GDP <- ts(mydata$GDP, start = 1993, end = 2003)
acf(GDP, 3)
acf(GDP)
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Widely used and very effective modeling approach

Proposed by George Box and Gwilym Jenkins
Also known as Box – Jenkins model or ARIMA(p,d,q)
where
p: number of auto regressive (AR) terms
q: number of moving average (MA) terms
d: level of differencing
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

General Form
yt = c + 1yt-1+ 2yt-2 + - - - + 1et-1+ + 2et-2 - - - -
Where
c: constant
1, 2, 1, 2 , - - - are model parameters
et-1 = yt-1 – st-1, et are called errors or residuals
st-1 : predicted value for the t-1th observation (yt-1)
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 1:
Draw time series plot and check for trend, seasonality, etc

Step 2:
Draw Auto Correlation Function (ACF) and Partially Auto Correlation Function
(PACF) graphs to identify auto correlation structure of the series

Step 3:
Check whether the series is stationary using unit root test (ADF test, KPSS test)
If series is non stationary do differencing or transform the series
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 4:
Identify the model using ACF and PACF or automatically
The best model is one which minimizes AIC or BIC or both

Step 5:
Estimate the model parameters using maximum likelihood method (MLE)
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 6:
Do model diagnostic checks
The errors or residuals should be white noise and should not be auto
correlated
Do Portmanteau and Ljung & Box tests. If p value > 0.05, then there is no
autocorrelation in residuals and residuals are purely white noise.
The model is a good fit
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Example: The number of visitors to a web page is given in Visits.csv. Develop a

model to predict the daily number of visitors?
SL No. Data SL No. Data
1 259 16 416
2 310 17 248
3 268 18 314
4 379 19 351
5 275 20 417
6 102 21 276
7 139 22 164
8 60 23 120
9 93 24 379
10 45 25 277
11 101 26 208
12 161 27 361
13 288 28 289
14 372 29 138
15 291 30 206
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 1: Read and plot the series

mydata <- read.csv("Visits.csv")
mydata <- ts(mydata$Data)
plot(mydata, type = "b")
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 2: Descriptive Statistics

summary(mydata)

Statistic Value
Minimum 45
Quartile 1 144.5
Median 271.5
Mean 243.6
Quartile 3 313

Maximum 417
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 3: Check whether the series is stationary

library(tseries)
adf.test(mydata)
kpss.test(mydata)
ndiffs(mydata)

Test Statistic P value

ADF -2.494 0.3829
KPSS 0.15007 > 0.1

Both tests shows that series is stationary

Number of differences required = 0
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))
Step 4: Draw ACF & PACF Graphs
acf(mydata)
pacf(mydata)

Potential Models
ARMA(1,0) since acf at lag 1 is crossing 95% confidence interval
ARMA(0,1) since pacf at lag 1 is crossing 95% confidence interval
ARMA(1,1) since both acf and pacf at lag 1 is crossing 95% confidence interval
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 5: Identification of model automatically

library(forecast)
mymodel = auto.arima(mydata)
mymodel

Model Log likelihood AIC BIC

ARIMA(1,0,0) -178.31 362.62 366.82

Model Parameters Value

Intercept 242.8594
AR1 0.5064
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))
Step 6: Identification of model manually
arima(mydata, c(0,0,1))
arima(mydata, c(1,0,0))
arima(mydata, c(1,0,1))

Model Log likelihood AIC

p=0,q=1 ARIMA(0,0,1) -179.07 364.15

p=1,q=0 ARIMA(1,0,0) -178.31 362.62

p=1,q=1 ARIMA(1,0,1) -178.31 364.62

Conclusion:
The best model which minimizes AIC & BIC is p=1, q=0 or ARIMA(1,0,0)
Identified automatically
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 7: Estimation of parameters

ARIMA(1,0,0) Parameters Value Std Error

Intercept 242.8594 32.8552
AR1 0.5064 0.1520

The model is: 𝑌𝑡 = 242.8594 + 0.5064 𝑌𝑡−1

Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 8: Model Diagnostics

summary(mymodel)

Statistic Description Value

ME Residual average -0.3470709
MAE Average of absolute residuals 76.90398
RMSE Root mean square of residuals 91.81328
MAPE Mean absolute percent error 47.78088
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 8: Model Diagnostics

pred = fitted(mymodel)
res = residuals(mymodel)

Normality check on Residuals

qqnorm(res)
qqline(res)
shapiro.test(res)
hist(res, col = "grey")
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 8: Model Diagnostics

Normality check on Residuals : Normal Q – Q Plot

Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 8: Model Diagnostics

Normality check on Residuals: Histogram of Residuals

Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 8: Model Diagnostics

Normality check on Residuals: Shapiro Wilk Normality test

Statistic p value
0.96445 0.4004

P > 0.05, Residuals are normal

Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 8: Model Diagnostics

Checking auto correlation among residuals: ACF of Residuals

None of the autocorrelation values is exceeding 95% confidence interval

Residuals are not auto correlated
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 8: Model Diagnostics

Tests for checking auto correlation among residuals

Ljung-Box Test

Test whether the residuals are independent or not auto correlated

If p value  0.05, then the residuals are not auto correlated and independent
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 8: Model diagnostics

Ljung & Box Test
Box.test(res, lag = 15, type = "Ljung-Box")

Test Lag Statistic df p value

Ljung & Box 15 6.5528 15 0.9689

Since the p value  0.05, The residuals are not auto correlated
The residuals are white noise
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Step 9: Forecasting upcoming values

forecast = forecast(mymodel, h = 3)
forecast

80% Prediction Interval 95% Prediction Interval

Point Forecast
Lower Upper Lower Upper
31 224.1953 102.40201 345.9885 37.92856 410.4620
32 233.4086 96.89144 369.9258 24.62361 442.1936
33 238.0739 98.03062 378.1172 23.89618 452.2516
Auto Regressive Integrated Moving Average Models (ARIMA (p,d,q))

Exercise 1: The data on sales of a electro magnetic component is given in

Sales.csv. Develop a forecasting methodology?

Period Data Period Data

1 4737 16 4405
2 5117 17 4595
3 5091 18 5045
4 3468 19 5700
5 4320 20 5716
6 3825 21 5138
7 3673 22 5010
8 3694 23 5353
9 3708 24 6074
10 3333 25 5031
11 3367 26 5648
12 3614 27 5506
13 3362 28 4230
14 3655 29 4827
15 3963 30 3885
Cheatsheet
References

Read Online: https://otexts.com/fpp3/

A very updated Survey Paper:

https://arxiv.org/abs/2010.05079

Slides On ARIMA Models - Robert Nau
No ratings yet
Slides On ARIMA Models - Robert Nau
21 pages
Tsa - Time Series Analysis
No ratings yet
Tsa - Time Series Analysis
45 pages
Lecture 2 Time Series Analysis
No ratings yet
Lecture 2 Time Series Analysis
32 pages
Lecture 11
No ratings yet
Lecture 11
37 pages
Resumos Forecasting
No ratings yet
Resumos Forecasting
17 pages
Class Notes
No ratings yet
Class Notes
6 pages
08 ASAP TimeSeriesForcasting - Day 8-11
No ratings yet
08 ASAP TimeSeriesForcasting - Day 8-11
62 pages
Time Series Decomposition
No ratings yet
Time Series Decomposition
54 pages
Univariate Time Series
No ratings yet
Univariate Time Series
83 pages
Univariate Time Series
100% (1)
Univariate Time Series
83 pages
CSE4261 Lecture-9
No ratings yet
CSE4261 Lecture-9
45 pages
Unit 6
No ratings yet
Unit 6
15 pages
Math7339TS1TimesSeries Intro
No ratings yet
Math7339TS1TimesSeries Intro
33 pages
C TSAF Box Jenkins - Method
No ratings yet
C TSAF Box Jenkins - Method
83 pages
Wipro
No ratings yet
Wipro
21 pages
End Term Project (BA)
No ratings yet
End Term Project (BA)
19 pages
Quantitative Chapter10
No ratings yet
Quantitative Chapter10
27 pages
Arima Notes
No ratings yet
Arima Notes
4 pages
Group 9 Time Series Data Analysis (ARIMA)
No ratings yet
Group 9 Time Series Data Analysis (ARIMA)
47 pages
Time Series
No ratings yet
Time Series
67 pages
Characteristics of Time Series
No ratings yet
Characteristics of Time Series
17 pages
Time Series Analysis and Forecasting Using R
No ratings yet
Time Series Analysis and Forecasting Using R
30 pages
Time Series: H T 2008 P - G R
No ratings yet
Time Series: H T 2008 P - G R
161 pages
Time Series Forecasting With Python Cheat Sheet
No ratings yet
Time Series Forecasting With Python Cheat Sheet
7 pages
Module 5 PDF
No ratings yet
Module 5 PDF
23 pages
Document 1
No ratings yet
Document 1
6 pages
Chapter Six
No ratings yet
Chapter Six
56 pages
TSA Chapter 1
No ratings yet
TSA Chapter 1
2 pages
Forecasting
No ratings yet
Forecasting
75 pages
Module 3.1 Time Series Forecasting ARIMA Model
No ratings yet
Module 3.1 Time Series Forecasting ARIMA Model
19 pages
Intro of Time Series
No ratings yet
Intro of Time Series
18 pages
Timeseries - Analysis
No ratings yet
Timeseries - Analysis
37 pages
Time-Series-Analysis (تم حفظه تلقائيا)
No ratings yet
Time-Series-Analysis (تم حفظه تلقائيا)
10 pages
DSS16-Time Series
No ratings yet
DSS16-Time Series
65 pages
(Ebook PDF) The Analysis of Time Series: An Introduction With R 7th Edition Instant Download
100% (4)
(Ebook PDF) The Analysis of Time Series: An Introduction With R 7th Edition Instant Download
45 pages
Basic Time Series With Python Code
No ratings yet
Basic Time Series With Python Code
33 pages
Time Series Analysis. Trends, Patters, Seasonality
No ratings yet
Time Series Analysis. Trends, Patters, Seasonality
14 pages
(Ebook PDF) The Analysis of Time Series: An Introduction With R 7th Edition PDF Download
100% (2)
(Ebook PDF) The Analysis of Time Series: An Introduction With R 7th Edition PDF Download
45 pages
Time Series Analysis and Forecasting
No ratings yet
Time Series Analysis and Forecasting
21 pages
Regression Vs Box Jenkins Case Study
No ratings yet
Regression Vs Box Jenkins Case Study
14 pages
TSA Chapters 1: Introduction To Time Series
No ratings yet
TSA Chapters 1: Introduction To Time Series
4 pages
Time - Series - in - Brief
No ratings yet
Time - Series - in - Brief
11 pages
Time Series Data and Forecasting
No ratings yet
Time Series Data and Forecasting
8 pages
Lecture 05
No ratings yet
Lecture 05
57 pages
Statistical Methods Unit 5 Presentation
No ratings yet
Statistical Methods Unit 5 Presentation
19 pages
Time Series Analysis
100% (1)
Time Series Analysis
66 pages
Week4 - 1
No ratings yet
Week4 - 1
18 pages
Time Analysis in Statistics Presentation
No ratings yet
Time Analysis in Statistics Presentation
16 pages
Ai - Digital Assignmen1
No ratings yet
Ai - Digital Assignmen1
11 pages
Time Series Analysis Handbook 04
No ratings yet
Time Series Analysis Handbook 04
16 pages
TS CF
No ratings yet
TS CF
54 pages
Time Series 1
No ratings yet
Time Series 1
23 pages
STAT0010 Introductory Slides
No ratings yet
STAT0010 Introductory Slides
67 pages
Time Series
No ratings yet
Time Series
27 pages
Time Series Data: y + X + - . .+ X + U
No ratings yet
Time Series Data: y + X + - . .+ X + U
81 pages
Be A 65 Ads Exp 8
No ratings yet
Be A 65 Ads Exp 8
10 pages
Chapter 1 - Lecture Notes
No ratings yet
Chapter 1 - Lecture Notes
16 pages
Introduction To Time Series Regression and Forecasting: Time Series Data Are Data Collected On The Same
No ratings yet
Introduction To Time Series Regression and Forecasting: Time Series Data Are Data Collected On The Same
53 pages
Time Series Notes 1.
No ratings yet
Time Series Notes 1.
23 pages
Professions and Occupations
No ratings yet
Professions and Occupations
2 pages
1 3 1 Preparing The Title And-Title Page
No ratings yet
1 3 1 Preparing The Title And-Title Page
3 pages
Activity Sheets in Econ 101
100% (4)
Activity Sheets in Econ 101
14 pages
Here Are 40 Common Accounting Interview Questions and Answers For Freshers
No ratings yet
Here Are 40 Common Accounting Interview Questions and Answers For Freshers
4 pages
4 SRFP 2023 24
No ratings yet
4 SRFP 2023 24
80 pages
Grade Beam - Grade Foundation Analysis & Design
100% (2)
Grade Beam - Grade Foundation Analysis & Design
2 pages
Republic of The Philippines City of Taguig Taguig City University Gen. Santos Avenue, Central Bicutan, Taguig City
No ratings yet
Republic of The Philippines City of Taguig Taguig City University Gen. Santos Avenue, Central Bicutan, Taguig City
7 pages
Eo Organizing The BDC
No ratings yet
Eo Organizing The BDC
3 pages
Customer Behavior
No ratings yet
Customer Behavior
14 pages
Activins in Adipogenesis and Obesity: Review
No ratings yet
Activins in Adipogenesis and Obesity: Review
4 pages
Asset
No ratings yet
Asset
112 pages
Wu Et Al 2015 PDF
No ratings yet
Wu Et Al 2015 PDF
14 pages
Reading and Writing Skills: Quarter 3 - Module 1 Reading & Thinking Strategies Across Text Types
80% (5)
Reading and Writing Skills: Quarter 3 - Module 1 Reading & Thinking Strategies Across Text Types
30 pages
English 4 - Third Quarter - Teachers Guide - ENG4 - TG - U3
No ratings yet
English 4 - Third Quarter - Teachers Guide - ENG4 - TG - U3
139 pages
Review Till Priliminary
No ratings yet
Review Till Priliminary
56 pages
SVN Class 6 - Nov 10 Summary
No ratings yet
SVN Class 6 - Nov 10 Summary
3 pages
Prisoner Diving Gear
No ratings yet
Prisoner Diving Gear
2 pages
Antitumor Potential of Cisplatin Loaded Into SBA-15 Mesoporous Silica - in Vivo
No ratings yet
Antitumor Potential of Cisplatin Loaded Into SBA-15 Mesoporous Silica - in Vivo
10 pages
Caste Wise Schemes Abstract
No ratings yet
Caste Wise Schemes Abstract
2 pages
Sample Forms Affidavits
No ratings yet
Sample Forms Affidavits
6 pages
Pat B.ing Kls 3
No ratings yet
Pat B.ing Kls 3
5 pages
Women With Epilepsy: Clinically Relevant Issues: A B C D, E, F
No ratings yet
Women With Epilepsy: Clinically Relevant Issues: A B C D, E, F
8 pages
Record Management
100% (3)
Record Management
46 pages
Acca Afm Taha Popatia Whatsapp +923453086312
No ratings yet
Acca Afm Taha Popatia Whatsapp +923453086312
67 pages
Airborne A Short Story
No ratings yet
Airborne A Short Story
1 page
02 CHAPTER 2 Gears and Gear Trains
No ratings yet
02 CHAPTER 2 Gears and Gear Trains
23 pages
Infowar and Spiritual Apocalypse The Destiny of Mankind: DR Bill Deagle MD
100% (2)
Infowar and Spiritual Apocalypse The Destiny of Mankind: DR Bill Deagle MD
37 pages
Formative Assessments
No ratings yet
Formative Assessments
17 pages
LP Applied 2
No ratings yet
LP Applied 2
3 pages
Do You Enjoy or Are You Good at Economics - A4c
No ratings yet
Do You Enjoy or Are You Good at Economics - A4c
1 page