0% found this document useful (0 votes)
66 views

C TSAF Box Jenkins - Method

The document provides an overview of time series analysis and forecasting methods. It introduces key concepts like time series decomposition, autoregressive integrated moving average (ARIMA) models, and the Box-Jenkins method. The Box-Jenkins method uses an interactive approach to identify an appropriate ARIMA model based on a time series' autocorrelation and partial autocorrelation functions. It then checks if the chosen model accurately describes the time series properties.

Uploaded by

juanivazquez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

C TSAF Box Jenkins - Method

The document provides an overview of time series analysis and forecasting methods. It introduces key concepts like time series decomposition, autoregressive integrated moving average (ARIMA) models, and the Box-Jenkins method. The Box-Jenkins method uses an interactive approach to identify an appropriate ARIMA model based on a time series' autocorrelation and partial autocorrelation functions. It then checks if the chosen model accurately describes the time series properties.

Uploaded by

juanivazquez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Introduction to time series

analysis and forecasting

Hugues GARNIER

[email protected]

1 H. Garnier
Course outline

Introduction to time series analysis and forecasting

I. Main characteristics of time series data

II. Time series decomposition

III. Basic time series modelling and forecasting methods

IV. Stochastic time series modelling and forecasting:


The Box-Jenkins method for ARIMA models

2 H. Garnier
Classification of time series forecasting methods

Forecasting
methods

Decomposition Smoothing The Box-Jenkins Deep learning


methods methods method methods

- Moving average - ARIMA models - LSTM models


Regression methods
- Exp. smoothing - SARIMA models - GRU models, etc

• ARIMA methodology of forecasting is different from most methods because it does not
assume any particular patterns in the historical data of the time series to be forecast

3 H. Garnier
The Box-Jenkins method for ARIMA models

• It uses an interactive approach of identifying a possible model from a


general class of models, named ARIMA
$ '

1 − # 𝜙! 𝐿! (1 − 𝐿)% (𝑦& ) = 1 + # 𝜃! 𝐿! 𝜀&


!"# !"#

• The chosen model is then checked against the historical data to see if it
accurately describes the series

• The Box-Jenkins method


– has been remarkably successful
– has excellent performance on small data sets
– remains quite close to the performance of recent
cutting edge methods

4 H. Garnier
ARIMA models
• AutoRegressive Integrated Moving Average (ARIMA) models were
popularized by George Box and Gwilym Jenkins in the early 1970s

• ARIMA models rely heavily on autocorrelation patterns in the data

• ARIMA models do not involve independent variables in their construction


– They make use of the information in the series itself to generate forecasts

5 H. Garnier
Family of ARIMA models
• ARIMA models are a class of black-box models that is capable of
representing stationary as well as non-stationary time series

Time series

Stationary Non-stationary

Non-seasonal
AR models
ARIMA models

Seasonal
MA models
SARIMA models

ARMA models

6 H. Garnier
Major assumption: stationarity of the time series
• The properties of one section of a data are much like the properties of the other
sections. The future is “similar” to the past (in a probabilistic sense)
• A stationary time series has

- no trend / no seasonality

- no systematic change in variation

- no periodic fluctuations

• One of the first steps in the Box-Jenkins method is to transform a non-stationary


time series into a stationary one (by using a detrending or differencing method)

7 H. Garnier
Key statistics for time series analysis:
Autocorrelation and partial autocorrelation functions

• Autocorrelation and partial autocorrelation plots are heavily used in time


series analysis and forecasting

• These are plots that graphically summarize the strength of a relationship


with an observation in a time series with observations at prior time
instants

• The difference between autocorrelation and partial autocorrelation plots


can be difficult and confusing for beginners to time series forecasting

• Plots of the autocorrelation and partial autocorrelation functions for a


time series tell a very different story and are very useful to select the
order of an ARIMA model

8 H. Garnier
Autocorrelation function (ACF)

• Statistical correlation summarizes the


strength of the relationship between
two different variables

• We can calculate the correlation for


time series observations with
observations with previous time
instants, called lags. This is called an
autocorrelation

• A plot of the autocorrelation of a time


series in terms of lags is called the
AutoCorrelation Function, or its
acronym ACF

• Sample ACF at lag h, denoted as 𝛾( ℎ ,


measures the linear correlation between
𝑦& and 𝑦&)*

9 H. Garnier
ACF: stationarity case

• Autocovariance function of a stationary time series 𝑦!


𝛾" ℎ =Cov 𝑦!#$ , 𝑦! =Ε (𝑦!#$ −𝜇)(𝑦! −𝜇) ℎ <𝑁
with the following 3 properties 𝛾! ℎ
1. 𝛾! (0) ≥ 0, 𝜎"
2. 𝛾! (ℎ) ≤ 𝛾! (0)
𝑦. "
3. 𝛾! ℎ = 𝛾! −ℎ
⇒ even function. ACF is usually plotted for positive lags) 0 ℎ lag

• Autocorrelation function of a stationary time series 𝑦!


𝛾" (ℎ)
𝜌" ℎ = 0≤ℎ<𝑁
𝛾" (0)
with all the properties of the autocovariance function, except 𝜌( 0 = 1
• It measures the linear correlation between 𝑦! and 𝑦!#$

10 H. Garnier
Autocorrelation function (ACF)

• ACF: measures the speed of variation of temporal evolutions

– we compare the time series with itself but shifted by t (or h)


– it allows us to see how the time series at a given time is influenced
(linear autocorrelation) by what happened at a previous time

11 H. Garnier
Autocorrelation function (ACF)

12 H. Garnier
Sample statistics
• Given 𝑦& , . . . , 𝑦' observations of a stationary time series 𝑦! ,
estimate the sample mean, variance, autocovariance and ACF
– Sample mean
/ 0
𝜇! = 𝑦$ = ∑ 𝑦
0 12/ 1
– Sample variance
/
3
𝜎! = ∑0 𝑦1 − 𝜇! 3
04/ 12/
– Sample autocovariance function
'*$
1
𝛾1" ℎ = 3 𝑦(#$ − 𝑦4 𝑦( − 𝑦4 , 0 ≤ ℎ < 𝑁,
𝑁
()&
with 𝛾1! ℎ = 𝛾1! −ℎ , −𝑁 < ℎ ≤ 0
– Sample autocorrelation function (ACF)
𝛾1" (ℎ)
𝜌1" ℎ = , ℎ <𝑁
𝛾1" (0)

13 H. Garnier
Sample ACF - Example
%
1
y = [0 1 1 1 0] N=5 𝛾*! 0 = 0 𝑦" − 𝑦3 𝑦" − 𝑦3 = 0.24
5
$ "#$
1 1
&

𝑦4 = 6 𝑦! = 0.6 𝛾*! 1 = 0 𝑦"'$ − 𝑦3 𝑦" − 𝑦3 = −0.0320


5 5
"#$
!"# (
$'( 1
1 𝛾*! 2 =
5
0 𝑦"') − 𝑦3 𝑦" − 𝑦3 = −0.0620
𝛾9% ℎ = 6 𝑦&)( − 𝑦4 𝑦& − 𝑦4 , ℎ = 0, 1,2, 3,4 "#$
5 )
&"# 1
𝛾*! 3 = 0 𝑦"'( − 𝑦3 𝑦" − 𝑦3 = −0.0960
5
"#$
𝛾9% (ℎ) 1
$

𝜌9% ℎ = , ℎ = 0, 1,2, 3,4 𝛾*! 4 = 0 𝑦"'& − 𝑦3 𝑦" − 𝑦3 = 0.0720


𝛾9% (0) 5
"#$

𝜌9% = [1 − 0.13 − 0.26 − 0.4 0.3]

In Matlab :

y=[0 1 1 1 0];
[rho_hat_y,Lag]=xcov(y,’norm’);
stem(Lag,rho_hat_y)
Or
autocorr(y) (Matlab econometrics toolbox)

14 H. Garnier
Partial autocorrelation function (PACF)
• The autocorrelation for an observation 𝑦;
and an observation at a prior time-instant
𝑦;)( is comprised of both the direct
correlation and indirect correlations between
𝑦; and 𝑦;)# , 𝑦;)< , … , 𝑦;)('#

• These indirect correlations are a linear


function of the correlation of the
observation, with observations at
intermediate time-instants

• It is these indirect correlations that the partial


autocorrelation function tries to remove

• A plot of the partial autocorrelation of a time


series in terms of lags is called the Partial
Autocorrelation Function, or by its acronym
PACF

• Sample PACF at lag h, denoted as 𝛼% ℎ ,


measures the linear correlation between 𝑦;
and 𝑦;)( , but after statistically removing the
effect of 𝑦;)# , 𝑦;)< , … , 𝑦;)('#

15 H. Garnier
Plots of the ACF and PACF for a time series
tell a very different story - Example

16 H. Garnier
The white noise process
The most fundamental example of stationary process

• A white noise process is a sequence of independent and identically


distributed (i.i.d) random variables
– The sequences are uncorrelated, have zero mean, and constant variance
– A Gaussian white noise are i.i.d observations from 𝒩(0, 𝜎 + )
– Because independence implies that its variables are uncorrelated at different
times, its ACF looks like a Kronecker impulse

17 H. Garnier
Sampling distribution of sample ACF
• Sampling distribution of ACF for a white noise is asymptotically
#
Gaussian 𝒩 0,
$
#.-.
– 95% of all ACF coefficients for a white noise must lie within ±
/
#.-.
– It is common to plot horizontal limit lines at ± /
when plotting the ACF
#.&'
• If N = 125, critical values at ± = ±0.175
#"(
– All ACF coefficients lie within these limits, confirming that the data are
white noise (more precisely, the data cannot be distinguished from white noise)

18 H. Garnier
Properties of white noise process
• Best forecast of a white noise
– If a time series is white noise, it is unpredictable and so there is nothing to
forecast. Or more precisely, the best forecast is its mean value which is zero

• Whitening test of the residuals


– At the validation stage of the Box-Jenkins methodology, we will check whether
the forecast errors (=the residuals) are a white noise by plotting its sample ACF

Sample ACF shows


some significant
autocorrelations at
lags 1, 2, 3 and 4.
This shows the
residuals are not
white here

– If the residual ACF does not resemble to the ACF of a white noise, it suggests that
improvements could be made to the predictive model
– If the residual ACF resembles to the ACF of a white noise, the modelling procedure is
finished. There is nothing else to capture in the residuals and the estimated ARIMA model
can be used for forecast

19 H. Garnier
Models for stationary random signals or time series

Θ(𝐿)
𝑦; = 𝜀
Φ(𝐿) ;

20 H. Garnier
General linear parametric model
of stationary time series
• Box and Jenkins in 1970 (following Yule and Slutsky 1927)
– Many time series (or their derivatives) can be considered as a special class of
stochastic processes: (weakly) stationary stochastic processes
• First two moments are finite and constant over time
• Defined completely by the mean, variance and autocorrelation function

• General parametric model of stationary stochastic processes (Wold 1938)


– All (weakly) stationary stochastic processes can be written as
)0

𝑦& = 𝑐 + # 𝜓! 𝜀&1! + 𝜀&


!"#

where 𝑐 is a constant and 𝜀& is a white Gaussian noise

– 𝜀& is often called the innovation process because it captures all new
information in the series at time t

21 H. Garnier
Lag or backward shift L operator
• The Lag or backward shift operator, L , is defined as

𝐿 𝜀! = 𝜀!*&

𝐿J 𝜀! = 𝜀!*J

• The general linear model of a stationary stochastic process can be written as

𝑦) = 𝑐 + ∑,-
*+# 𝜓* 𝜀).* + 𝜀)

𝑦) = 𝑐 + 𝜓(𝐿) 𝜀)

Ψ 𝐿 = 1 + ∑,-
*+# 𝜓* 𝐿
*

• This model has an infinite-degree polynomial Ψ 𝐿 with infinite coefficients


which cannot be estimated from a finite amount of data in the time series 😩

22 H. Garnier
Towards AR, MA and ARMA models
for stationary time series
• If Ψ 𝐿 is a rational polynomial, we can write it (at least approximately) as the
quotient of two finite-degree polynomials
Θ(𝐿)
Ψ 𝐿 =
Φ(𝐿)

Θ 𝐿 = 1 + 𝜃# 𝐿# + ⋯ + 𝜃' 𝐿' (Matlab Econometrics


toolbox notations)
Φ 𝐿 = 1 − 𝜙# 𝐿# − ⋯ − 𝜙$ 𝐿$

• Wold’s theorem: every stationary stochastic process can be written as


/(1)
𝑦) = 𝑐 + 𝜀
3(1) )

– which has a finite number (p + q) of coefficients

• This leads to the use of parsimonious models : AR, MA and ARMA models
– They are most useful for practical applications since these models can be quite easily
estimated from a finite amount of data in the time series

23 H. Garnier
Family of ARMA models for stationary time series

• ARMA models: a way to “see” stationary time series as filtered white noise
– The filter takes different forms according to the time series properties

AR models MA models ARMA models

𝑐 𝑐 𝑐
𝜀) 𝑦) 𝜀) 𝑦) 𝜀) 𝑦)
1 + +
Θ(𝐿) +
+
Θ(𝐿) +
Φ(𝐿) Φ(𝐿) +

1 Θ(𝐿)
𝑦) = 𝑐 + 𝜀 𝑦) = 𝑐 + Θ(𝐿)𝜀) 𝑦) = 𝑐 + 𝜀
Φ(𝐿) ) Φ(𝐿) )

c is a constant (mean of the time series)


𝜀& ∼ 𝒩(0, 𝜎 + )

24 H. Garnier
AutoRegressive models: AR(p) models

• An autoregressive model of order p, AR(p), is defined by (Yule 1927)


4

𝑦) = 𝑐 + C 𝜙* 𝑦).* + 𝜀)
*+#
– where 𝑝 ≥1, 𝑐 is a constant and 𝜀; ∼ 𝒩(0, 𝜎 <)
• It can also be written in Lag-operator polynomial form:
Φ(𝐿) 𝑦) − 𝑐 = 𝜀)
(Matlab Econometrics
Φ 𝐿 = 1 − 𝜙# 𝐿# − ⋯ − 𝜙4 𝐿4 toolbox notations)

• Stationarity conditions
– An AR(p) process is stationary if all roots of Φ 𝐿 are outside the unit circle
• Special case
– if one or more roots lie on the unit circle (i.e., have absolute value of one), the
model is called a unit root process model, which is non-stationary
• When p=1 and 𝑐 = 0, 𝜙#=1, 𝑦& = 𝑦&1# + 𝜀& is a non-stationary random walk,
which is a unit root process

25 H. Garnier
Different forms of an AR(2) model
Example
• An autoregressive model AR(2) of order 2 is given
𝑦) = −0.5𝑦).# − 0.9𝑦)." + 𝜀)

• It can also be written in Lag-operator polynomial form:


𝑦) + 0.5𝑦).# + 0.9𝑦)." = 𝜀)
(1 + 0.5𝐿# + 0.9𝐿" )𝑦) = 𝜀)
Φ(𝐿)𝑦) = 𝜀)
Φ 𝐿 = 1 + 0.5𝐿# + 0.9𝐿"
• Stationarity condition
– This AR(2) process is stationary since the two roots of Φ 𝐿 are outside the unit
circle (𝐿!,# = −0.28 ± 1.02𝑖)
– Note that the polynomial 𝛷 𝐿 is written in a different form than usually used in Control Engineering or
Signal Processing where the backward operator 𝑞$! is used so that the polynomial would be
𝛷 𝑞$! = 1 + 0.5𝑞$! + 0.9𝑞$# . With this negative power notation for the polynomial, the filter would be
stable if the roots of 𝛷 𝑞$! are inside the unit circle. Do not be confused by the Lag-operator polynomial
form used here and apply the appropriate rule to test the stationarity of the AR process!

26 H. Garnier
Properties of AR(p) process
• Autocorrelation function
lim 𝛾( ℎ = 0
*→)0

– The sample ACF exponentially decreases


to 0 when h → +∞

• Partial autocorrelation function


𝛼( ℎ = 𝜙* for ℎ = 𝑝
𝛼( ℎ = 0 for ℎ > 𝑝

– The sample PACF of an AR(p) process


cuts off after p lags

• Order selection of an AR process


– PACF is the plot to be used to
select the order p of an AR process

27 H. Garnier
AR(1) process example:
𝑦! = 0.8𝑦!*& + 𝜀!

PACF cuts off after 1 lag ⇒ AR(1) process


PACF(1) = 𝜙$=0.8

28 H. Garnier
AR(1) process example:
𝑦! = −0.8𝑦!*& + 𝜀!

PACF cuts off after 1 lag ⇒ AR(1) process


PACF(1) = 𝜙$=-0.8

29 H. Garnier
AR(2) process example:
𝑦) = −0.9𝑦)." + 𝜀)

PACF cuts off after 2 lags ⇒ AR(2) process


PACF(2) = 𝜙) =-0.9

30 H. Garnier
AR(2) process example:
𝑦) = −0.5𝑦).# − 0.9𝑦)." + 𝜀)

PACF cuts off after 2 lags ⇒ AR(2) process


PACF(2) = 𝜙)=-0.9

31 H. Garnier
Moving Average models: MA(q) models
• A moving average model of order q , MA(q), is defined by (Slutsky 1927)
5

𝑦) = 𝑐 + C 𝜃* 𝜀).* + 𝜀)
*+#
– where q ≥1, c is a constant and 𝜀; ∼ 𝒩(0, 𝜎 <)

• It can also be written in Lag polynomial form:


𝑦) − 𝑐 = Θ(𝐿)𝜀)
(Matlab Econometrics
Θ 𝐿 = 1 + 𝜃# 𝐿# + ⋯ + 𝜃5 𝐿5 toolbox notations)

• Stationarity and invertibility conditions


– An MA(q) process is always stationary (to the second order)
– An MA(q) process is invertible if all its roots are outside the unit circle (required to be able to
compute forecast)

Moving Average models and related methods should not be confused


with Moving Average smoothing methods !

32 H. Garnier
Different forms of an MA(2) model
Example
• A moving average model MA(2) of order 2 is given
𝑦) = 𝜀) − 0.8𝜀).# + 0.5𝜀)."

• It can also be written in Lag polynomial form:


𝑦) = (1 − 0.8𝐿# + 0.5𝐿" )𝜀)
𝑦) = Θ(𝐿)𝜀)
Θ 𝐿 = 1 − 0.8𝐿# + 0.5𝐿"

• Stationarity and invertibility conditions


– The MA(2) process is stationary (always) and invertible since the two roots
of Θ 𝐿 are outside the unit circle (𝐿#,+ = 0.8 ± 1. 16𝑖)

33 H. Garnier
Properties of MA(q) process
• Autocorrelation function
𝛾( ℎ = 0 for ℎ > 𝑞

– The sample ACF of an MA(q) process cuts


off after q lags

• Partial autocorrelation function


lim 𝛼( ℎ =0
*→)0

– The absolute value of the sample PACF


exponentially decreases to 0 when
h → +∞

• Order selection of an MA(q) process


– ACF is the plot to be used to select
the order q of an MA process

34 H. Garnier
MA(1) process example:
𝑦! = 𝜀! − 0.8𝜀!*&
ACF cuts off after 1 lag ⇒ MA(1) process

35 H. Garnier
MA(1) process example:
𝑦! = 𝜀! + 0.8𝜀!*&
ACF cuts off after 1 lag ⇒ MA(1) process

36 H. Garnier
MA(2) process example:
𝑦) = 𝜀) − 0.5𝜀).# + 0.4𝜀)."

ACF cuts off after 2 lags ⇒ MA(2) process

37 H. Garnier
Moving Average Autoregressive models:
ARMA(p,q) models
• An ARMA(p,q) of order p and q is defined by (Slutsky 1927)
$ '

𝑦& = 𝜃4 + # 𝜙! 𝑦&1! + # 𝜃! 𝜀&1! + 𝜀&


!"# !"#
– where p ≥1, q ≥1, 𝜃4 = 𝑐𝛷 𝐿 and 𝜀! ∼ 𝒩(0, 𝜎 " )

• It can also be written in Lag-operator polynomial form:

Φ 𝐿 𝑦3 = 𝜃4 + Θ 𝐿 𝜀3
or Φ 𝐿 (𝑦3 −𝑐) = Θ 𝐿 𝜀3 (Matlab econometrics
toolbox notations)

Θ 𝐿 = 1 + 𝜃# 𝐿# + ⋯ + 𝜃' 𝐿'
Φ 𝐿 = 1 − 𝜙# 𝐿# − ⋯ − 𝜙$ 𝐿$

• Stationarity and invertibility conditions


– An ARMA(p,q) process is stationary if all roots of Φ 𝐿 are outside the unit circle
– An ARMA(p,q) process is invertible if all roots of Θ 𝐿 are outside the unit circle

38 H. Garnier
Different forms of an ARMA(1,1) model
Example

• A moving average model ARMA(1,1) is given


𝑦) = 0.8𝑦).# + 𝜀) − 0.5𝜀).#

• It can also be written in Lag-operator polynomial form:


𝑦) − 0.8𝑦).# = 𝜀) − 0.5𝜀).#
(1 − 0.8𝐿)𝑦) = (1 − 0.5𝐿)𝜀)
Φ 𝐿 𝑦) = Θ 𝐿 𝜀)
Θ 𝐿 = (1 − 0.5𝐿)
Φ 𝐿 = (1 − 0.8𝐿)

39 H. Garnier
Properties of ARMA process

• Autocorrelation function
– The ACF of an ARMA(p,q) process exponentially decreases to 0
when h → +∞ from order q+1

• Partial autocorrelation function


– No special properties

• Order selection of an ARMA(p,q) process


– There are no such simple rules for selecting the p and q orders from
the ACF and PACF plots

40 H. Garnier
ARMA(1,1) process example:
𝑦! = 0.8𝑦!*& + 𝜀! − 0.5𝜀!*&

No simple rules for selecting the p and


q orders from the ACF and PACF plots

41 H. Garnier
AR(p), MA(q) and ARMA(p,q) processes
Summary of ACF and PACF properties

– An AR(p) process has PACF 𝛼𝑦 ℎ = 0 for ℎ > 𝑝 and 𝛼( 𝑝 = 𝜙$

– An MA(q) process has ACF 𝜌𝑦 ℎ = 0 for ℎ > 𝑞

– For ARMA(p, q) processes, there are no such simple rules for selecting the
orders of ARMA(p, q) processes from its ACF or PACF

42 H. Garnier
Families of ARIMA models

Time series

Stationary Non-stationary

Non-seasonal
AR models
ARIMA models

Multiplicative
MA models Seasonal
SARIMA models

ARMA models

43 H. Garnier
Identifying stationary/non-stationary time series

• Stationary time series:


– is roughly horizontal
– has constant variance
– Has no trend nor seasonality
– has no patterns predictable in the long-term
– its ACF drops to zero relatively quickly

• Non-stationary time series


– has trend and seasonality
– its ACF decreases slowly
– the ACF value at lag 1 is often large and
positive

44 H. Garnier
Non-stationary time series:
standard decomposition model
• Recall the standard decomposition model of a non-stationary process 𝑦&

𝑦! = 𝑇! + 𝑆! + 𝑥!

• 𝑇& is a trend-cycle component


• 𝑆& is a seasonality component
• 𝑥& is a stationary random component

• Since the Box-Jenkins methodology is for stationary models only, it is first required to detrend
and deseasonalize the nonstationary series by using one of the two methods below
- Estimate (by linear regression) and then remove a deterministic trend and seasonality
- Difference the time series

Note that Box-Jenkins seemed to prefer the differencing method


while several others prefer the deterministic trend removal method

45 H. Garnier
Operator of lag-T

• Let Δ S be the operator of lag-T

Δ ? 𝑦) = 1 − 𝐿? # 𝑦) = 𝑦) −𝑦).?

A lag-T differencing of order 1 is applied to the time series

• Applying Δ S d times in a successive way to a time series

Δ? @𝑦 = 1 − 𝐿? @𝑦
) )

A lag-T differencing of order d is applied to the time series

46 H. Garnier
Lag-1 differencing to remove polynomial trend
and achieve stationarity
• Let 𝑦! be a time series with a polynomial trend of order k :
U

𝑦! = 3 𝛽𝑡 J + 𝑥!
J)T
• Applying the operator of lag-1 𝛥& to the time series

𝛥& 𝑦! = 𝑦! − 𝑦!*&

– Then, the lag-1 differenced time series will have a polynomial trend
of order k-1
– lag-1 difference reduces by 1 the degree of a polynomial trend

⇒ Applying successive Lag-1 differencing removes trend

47 H. Garnier
How to choose the order d of lag-1 differencing ?

• To remove deterministic trends


– Apply lag-1 differencing of order d on the time series
V
Δ& 𝑦! = 1 − 𝐿 V 𝑦!
– When d=1, we have the simple first difference of the time series
Δ# 𝑦) = 1 − 𝐿 𝑦) = 𝑦) − 𝑦).#
– When d=2, we have the double difference of the time series
Δ# " 𝑦) = 1 − 𝐿 " 𝑦) = 𝑦) −2 𝑦).# + 𝑦)."

• How to choose the order d of lag-1 differencing ?


– In practice d = 0 ; 1; 2
• d=0: no differencing (no trend)
• d=1: perform differencing once (to remove linear trend)
• d=2: double-differencing (to remove quadratic trend)

48 H. Garnier
Lag-s differencing to remove seasonality trend
and achieve stationarity
• Let 𝑦) be a time series with a trend 𝑇) and a season pattern 𝑆) of period s
(𝑆),A = 𝑆) ):
𝑦! = 𝑇! + 𝑆! + 𝑥!
• Applying the operator 𝛥A to the time series
𝛥W 𝑦! = 𝑦! − 𝑦!*W = (𝑇! − 𝑇!*W )+ (𝑆! − 𝑆!*W ) + ( 𝑥! − 𝑥!*W )
𝛥W 𝑦! = (𝑇! − 𝑇!*W )+( 𝑥! − 𝑥!*W )

– Then, the lag-s differentiated time series does not present any more seasonal
pattern

⇒ Applying lag-s differencing removes a seasonal pattern of period s

49 H. Garnier
How to choose the order D of lag-s differencing?

• To remove deterministic seasonality


– Apply lag-s differencing of order D on the time series
X
ΔW 𝑦! = 1 − 𝐿W X
𝑦!

• How to choose the order D of lag-s differencing?


– In practice D = 0 ; 1
• D=0: no differencing (no seasonality)
• D=1: perform differencing once (to remove seasonality)

• Example
– if s=12, we have the lag-12 differencing of the series as (for monthly time
series data and annual seasonality for example)
Δ&Y 𝑦! = 1 − 𝐿&Y 𝑦! = 𝑦! − 𝑦!*&Y

50 H. Garnier
Differencing in practice

• Advantage:
– easy to understand
– allows forecast since we can forecast ΔA 𝑦) and then go back to 𝑦)

• In practice:
– Start by removing the seasonality trend by applying Δ5
– Plot the deseasonalized time series and check whether it seems stationary
– If it does not visually seem stationary, apply then again Δ#
– Plot the deseasonalized and differenced time series and check whether it
now seems stationary
– If not, apply again Δ# , but try to keep small the value for the number of
differencing times

Beware of over-differencing

51 H. Garnier
Converting nonstationary to stationary time series
by differencing - Example

The data look nonstationary, with a linear The ACF does not die out quickly and
trend and seasonal periodicity shows a cyclical pattern of period 12.
This also points to nonstationarity in the
time series

52 H. Garnier
Converting nonstationary to stationary time series
by differencing - Example
A seasonal differences of length 12 has been applied. The linear trend has been removed by first-differencing the data.

The differenced series appears now Although the sample ACF and PACF of
much more stationary the differenced series still show significant
autocorrelation at certain lags, they seem
correspond to a stationary process.
The remaining autocorrelation could be
captured by an ARMA model

53 H. Garnier
ARIMA models for non seasonal time series data
• The general non-seasonal model is known as ARIMA(p,d,q):

Φ 𝐿 (1 − 𝐿)V (𝑦! −𝑐) = ΘD𝐿 𝜀!

AR part Lag-1 differencing MA part


of order p of order d of order q

• 𝑦) is an ARIMA(p,d,q) model if (1 − 𝐿)@ (𝑦) −𝑐) is an ARMA(p,q) model

4 5

(1 − 𝐿)@ (𝑦) − 𝑐) = C 𝜙* 𝑦).* + C 𝜃* 𝜀).* + 𝜀)


*+# *+#

54 H. Garnier
Understanding ARIMA(p,d,q) model orders
Example

Consider the following ARIMA(2,1,1) model:

(1 − 𝜙# 𝐿# − 𝜙+ 𝐿+ ) (1 − 𝐿) (𝑦& −𝑐) = 1 + 𝜃# 𝐿 𝜀&

AR part Lag-1 differencing MA part


of order p=2 of order d=1 of order p=1

The model includes all consecutive AR and MA lags from 1 through their respective
orders p and q

55 H. Garnier
Understanding ARIMA models
• The general non-seasonal model is known as ARIMA(p,d,q):

𝛷 𝐿 (1 − 𝐿)V (𝑦! −𝑐) = 𝛩 𝐿 𝜀!

• The intercept c of the model and the differencing order d have an


important effect on the long-term forecasts:
– c=0 and d=0 ⇒ long-term forecasts will go to 0
– c=0 and d=1 ⇒ long-term forecasts will go to constant ≠ 0
– c=0 and d=2 ⇒ long-term forecasts will follow a straight line
– c ≠ 0 and d=0 ⇒ long-term forecasts will go to the mean of the data
– c ≠ 0 and d=1 ⇒ long-term forecasts will follow a straight line
– c ≠ 0 and d=2 ⇒ long-term forecasts will follow a quadratic trend

56 H. Garnier
Special ARIMA models

- ARIMA(0,1,0) = random walk

- ARIMA(0,1,1) without constant = simple exponential smoothing

- ARIMA(0,2,1) without constant = linear exponential smoothing

- ARIMA(1,1,2) with constant = damped-trend linear exponential


smoothing

57 H. Garnier
How to choose ARIMA orders (p, d, q)
in practice ?

• Two situations can occur, depending on your goal:


– obtain an understanding of the model
– obtain a very good forecast

• General tips
– Start by differencing the series if needed, in order to obtain
something visually stationary
– Look at the ACF and PACF plots and identify possible model orders
– Estimate several models and select the best one by using model
selection criteria such as AIC or BIC

58 H. Garnier
SARIMA models for seasonal time series data
• The multiplicative seasonal model is known as SARIMA(p,d,q) ×(P,D,Q)s:

(1 − 𝜙# 𝐿# − ⋯ − 𝜙4 𝐿4 )(1 − Φ# 𝐿B − ⋯ − ΦB 𝐿AB ) 1 − 𝐿 @ 1 − 𝐿A C (𝑦
) − 𝑐) =
1 + 𝜃# 𝐿 + ⋯ + 𝜃# 𝐿5 (1 + Θ# 𝐿D + ⋯ + ΘD 𝐿AD )𝜀)

– p is the number of non-seasonal AR terms


– d is the order of non-seasonal first difference (lag-1)
– q is the number of non-seasonal MA terms
– s is the number of time periods for a season
– P is the time lag seasonal AR (SAR) terms
– D is the order of seasonal differences (lag-s)
– Q is the time lag seasonal MA (SMA) terms

59 H. Garnier
Understanding SARIMA(p,d,q) ×(P,D,Q)s model orders
Example

Consider the following SARIMA (2,1,1)×(2,1,1)12 model:

(1 − 𝜙#𝐿# − 𝜙+𝐿+) 1 − Φ#+𝐿#+ − Φ+6𝐿+6 (1 − 𝐿)(1 − 𝐿#+)(𝑦& −𝑐) = 𝑐 + 1 + 𝜃#𝐿 1 + Θ#+𝐿#+ 𝜀&

AR part Lag-1 Lag-12 MA part SMA part


SAR part
of order p=2 differencing differencing of order q=1 of order Q=1
of order P=2
of order d=1 of order D=1

– The period of the season is s=12


– The model includes all consecutive AR and MA lags from 1 through their respective orders p
and q
– The lags of the SAR and SMA polynomials are consecutive multiples of the period (s=12)
from 12 through their respective specified order P and Q, times 12

60 H. Garnier
How to choose SARIMA orders (P,D,Q)s
in practice
• The seasonal part of an AR or MA model can be seen in the seasonal
lags of the ACF and PACF

• Examples
– an SARIMA(0,0,0)(1,0,0)12 will show
§ a spike at lag 12 in the PACF, and no other significant spikes
§ an exponential decay in the seasonal lags of the ACF

– an SARIMA(0,0,0)(0,0,1)12 will show


§ a spike at lag 12 in the ACF, and no other significant spikes
§ an exponential decay in the seasonal lags of the PACF

61 H. Garnier
How to choose all the orders (p,d,q)×(P,D,Q)
of a SARIMA model in practice
• Use visual inspection to observe the trend and seasonality
– Look at the ACF and PACF plots and identify possible orders

• General tips
– Use differencing to remove the trend and seasonality
– Keep the orders simple
• According to Box-Jenkins
– “the maximum value of each SARIMA model orders p, d, q, P, D, Q is 2”
• According to Robert Nau, Duke University
– “In most cases, either p or q is zero and p+q ≤ 3”
• 𝐝 = 𝟎 ; 𝟏; 𝟐 , 𝐩 + 𝐪 ≤ 3
• 𝐃 = 𝟎 ; 𝟏 , P= 𝟎 ; 𝟏 , Q= 𝟎 ; 𝟏
• Standard value for the period s =12 for an annual seasonality if monthly time
series data

62 H. Garnier
The Box-Jenkins methodology

• The Box-Jenkins methodology


refers to a set of stages for
identifying, fitting, and checking
ARIMA models for time series data

• The basis of Box-Jenkins approach


to modeling time series consists of
three main stages:
1. Identification
2. Estimation
3. Diagnostics

• Forecasts follow directly from the


form of fitted model

63 H. Garnier
The Box-Jenkins methodology:
Estimation and model selection
• Once the orders (p, d, q) are selected, Maximum Likelihood Estimation
(MLE) through optimization algorithms can be used to estimate the
model parameters

• MLE cannot be used to choose orders(p,d,q)


– the larger (p, d, q) ⇒ the larger the number of parameters ⇒ the more
flexible the model ⇒ the larger the likelihood

– MLE should be penalized by the complexity of the model (≃ number of


parameters to be estimated)

• Some model selection criteria can be used. The idea is to test a range
of possible model candidates and to compute the criteria for each
model structure tested

64 H. Garnier
The Box-Jenkins methodology:
Model selection
• Let L(θ) denote the value of the maximized likelihood objective function for
a model with a total of np parameters fitted to N data points

• Information criteria are likelihood-based measures of model fit that include


a penalty for complexity, specifically, the number of parameters np

• Different information criteria are distinguished by the form of the penalty,


and can favor different models
– Akaike information criterion (AIC) : −2logL(θ) +2 np
– Bayesian information criterion (BIC) : −2logL(θ) + np log(N)

• When you compare values for a set of model candidates, smaller values of
the criterion (AIC or BIC) indicate a better, more parsimonious model

65 H. Garnier
Occam's Razor Rule
designates by metaphor the opportunity to "cut off", as with a razor, the superfluous assumptions of a theory

66 H. Garnier
The Box-Jenkins methodology:
Forecasting

• It is impossible to forecast without error

• The good engineer should


– forecast what can be forecast
AND
– provide uncertainty intervals

67 H. Garnier
•67
Confidence intervals
• Assuming that the residuals are normally distributed, we can usefully
assess the accuracy of a forecast by using MSE as an estimate of the
error
# $ "
where MSE= S)+# 𝑦) − 𝑦1)
$

• An approximate prediction interval for the next observations is

𝑦1!#& ± 𝑧 MSE

where z is a quantile of the normal distribution.


Typical values used are given in the table

• This enables, for example, 95% or 99% confidence intervals to be set


up for any forecast

68 H. Garnier
•68
The Box-Jenkins methodology in details
I. Identification
1. Data preparation
a. Transform data to stabilize variance (apply logarithm, etc)
b. Differencing data to obtain stationary series
2. Model selection
a. Examine data to identify potential models
b. Examine ACF, PACF
c. Use automatic search methods
II. Estimation
1. Estimate parameters in potential models
2. Select best model using suitable information criteria (AIC, BIC,…)
III. Diagnostics
1. Check ACF/PACF of residuals
2. Are the residuals white noise?
3. Do more statistical tests of residuals

69 H. Garnier
home.ubalt.edu/ntsbarsh/stat-data/BJApproach.gif

70 H. Garnier
Two case studies
Let us apply the Box-Jenkins methodology to

- ARIMA model estimation and forecast - Australian Consumer Price Index

- SARIMA model estimation and forecast - International airline passenger data

For each case study


1. Plot the time series and check whether the variance needs to be stabilized
2. Check whether it is stationary. Does it show trends and seasonality?
3. Apply the differencing method to remove possible trend and seasonal pattern
4. Specify the period of the seasonal pattern (if any), the degree of the polynomial trend.
5. Check whether the differenced series seems stationary? Does it look like a white noise?
6. If not, determine the best ARMA model structure for the time series and estimate the
full model form
7. Use your best model to forecast the time series over the next 5 years

71 H. Garnier
First case study – Australian Consumer Price Index (CPI)
Step 1 – Identification

– Data available are the logarithm Australian CPI. The variance of the log CPI remains constant over
time. There is no need for further transformation

– From the time plot, the time series is nonstationary, with a clear upward trend, also noticeable from
the slow decrease of the ACF
– We need to remove the linear trend by first differencing the data

72 H. Garnier
Step I.2 – Model selection
Differencing data to obtain stationary data. Observe its ACF and PACF

The sample ACF of the differenced series decays


The differenced series appears now more quickly
stationary although not zero mean The sample PACF cuts off after lag 2. This
behavior is consistent with a second-order
autoregressive AR(2) model

73 H. Garnier
Step 2 – Estimation
Estimate parameters of the chosen model structure

• The following ARIMA(2,1,0) model has been selected as a potential model:

Non-seasonal AR(2) First difference No non-seasonal MA part in the model

1 − 𝜙# 𝐿# − 𝜙$ 𝐿$ 1 − 𝐿 (𝑦a −𝑐) = 𝜀a

• The constant c and two AR (𝜙# and 𝜙+ ) model parameters have been estimated
by Maximum Likelihood through optimization algorithms (see estimate in Matlab)

74 H. Garnier
Step 3 – Diagnostics
Analysis of the residuals

All ACF and PACF coefficients lie within the limits, indicating that the residuals are white
(more precisely, the residuals cannot be distinguished from white noise)

75 H. Garnier
Step 4 – Forecasting
Use of the estimated model to forecast the next 5 years

76 H. Garnier
Second case study – Airline passengers
Step 1 – Identification
Transform data to stabilize variance by applying logarithm

77 H. Garnier
Step 1 – Identification
Differencing data to obtain stationary series
– Because these is monthly data, we use seasonal differences of length 12
– We also remove a linear trend by first differencing the data

78 H. Garnier
Step I.2 – Model selection
Examine data, ACF and PACF

The sample ACF and PACF of the differenced


The differenced series appears now series still show significant autocorrelation at lags
stationary that are multiples of 12. There is also potentially
significant autocorrelation at smaller lags. It has
been shown that this autocorrelation can be best
captured by an SARIMA(0,1,1)×(0,1,1)12 model
(from AIC and BIC model selection tests, not
shown here)

79 H. Garnier
Step 2 – Estimation
Estimate parameters of the chosen model structure

• The following SARIMA(0,1,1)×(0,1,1)12 model has been selected as a


potential model:
S

1 − 𝐿 1 − 𝐿#$ 𝑦% = 1 + 𝜃# 𝐿 (1 + 𝛩#$ 𝐿#$ )𝜀%

• The MA (𝜃# ) and SMA (𝛩# ) model parameters have been estimated by
Maximum Likelihood through optimization algorithms (see estimate in Matlab)

80 H. Garnier
Step 3 – Diagnostics
Check residuals

All ACF and PACF coefficients lie within the limits, indicating that the residuals are white
(more precisely, the residuals cannot be distinguished from white noise)

81 H. Garnier
Step 4 – Forecasting
Use of the estimated model to forecast the next 5 years

82 H. Garnier
Takeaway messages
• We have just introduced basics and the core ideas behind time series
analysis and forecasting

• Obviously, each problem has its own subtleties and demands special
steps: proper data preparation, way of handling missing values, or
defining evaluation metric satisfying some domain conditions

• It is impossible to come up with a general approach that can handle all


situations
– The Box-Jenkins method has been remarkably successful
– More complex models and methods exist as
§ Interrupted models to include the influence of critical effects
§ GARCH models for generalized autoregressive conditional heteroscedasticity models
§ State-space models and methods
§ Recursive methods
§ Deep learning-based methods…

83 H. Garnier

You might also like