C TSAF Box Jenkins - Method
C TSAF Box Jenkins - Method
Hugues GARNIER
1 H. Garnier
Course outline
2 H. Garnier
Classification of time series forecasting methods
Forecasting
methods
• ARIMA methodology of forecasting is different from most methods because it does not
assume any particular patterns in the historical data of the time series to be forecast
3 H. Garnier
The Box-Jenkins method for ARIMA models
• The chosen model is then checked against the historical data to see if it
accurately describes the series
4 H. Garnier
ARIMA models
• AutoRegressive Integrated Moving Average (ARIMA) models were
popularized by George Box and Gwilym Jenkins in the early 1970s
5 H. Garnier
Family of ARIMA models
• ARIMA models are a class of black-box models that is capable of
representing stationary as well as non-stationary time series
Time series
Stationary Non-stationary
Non-seasonal
AR models
ARIMA models
Seasonal
MA models
SARIMA models
ARMA models
6 H. Garnier
Major assumption: stationarity of the time series
• The properties of one section of a data are much like the properties of the other
sections. The future is “similar” to the past (in a probabilistic sense)
• A stationary time series has
- no trend / no seasonality
- no periodic fluctuations
7 H. Garnier
Key statistics for time series analysis:
Autocorrelation and partial autocorrelation functions
8 H. Garnier
Autocorrelation function (ACF)
9 H. Garnier
ACF: stationarity case
10 H. Garnier
Autocorrelation function (ACF)
11 H. Garnier
Autocorrelation function (ACF)
12 H. Garnier
Sample statistics
• Given 𝑦& , . . . , 𝑦' observations of a stationary time series 𝑦! ,
estimate the sample mean, variance, autocovariance and ACF
– Sample mean
/ 0
𝜇! = 𝑦$ = ∑ 𝑦
0 12/ 1
– Sample variance
/
3
𝜎! = ∑0 𝑦1 − 𝜇! 3
04/ 12/
– Sample autocovariance function
'*$
1
𝛾1" ℎ = 3 𝑦(#$ − 𝑦4 𝑦( − 𝑦4 , 0 ≤ ℎ < 𝑁,
𝑁
()&
with 𝛾1! ℎ = 𝛾1! −ℎ , −𝑁 < ℎ ≤ 0
– Sample autocorrelation function (ACF)
𝛾1" (ℎ)
𝜌1" ℎ = , ℎ <𝑁
𝛾1" (0)
13 H. Garnier
Sample ACF - Example
%
1
y = [0 1 1 1 0] N=5 𝛾*! 0 = 0 𝑦" − 𝑦3 𝑦" − 𝑦3 = 0.24
5
$ "#$
1 1
&
In Matlab :
y=[0 1 1 1 0];
[rho_hat_y,Lag]=xcov(y,’norm’);
stem(Lag,rho_hat_y)
Or
autocorr(y) (Matlab econometrics toolbox)
14 H. Garnier
Partial autocorrelation function (PACF)
• The autocorrelation for an observation 𝑦;
and an observation at a prior time-instant
𝑦;)( is comprised of both the direct
correlation and indirect correlations between
𝑦; and 𝑦;)# , 𝑦;)< , … , 𝑦;)('#
15 H. Garnier
Plots of the ACF and PACF for a time series
tell a very different story - Example
16 H. Garnier
The white noise process
The most fundamental example of stationary process
17 H. Garnier
Sampling distribution of sample ACF
• Sampling distribution of ACF for a white noise is asymptotically
#
Gaussian 𝒩 0,
$
#.-.
– 95% of all ACF coefficients for a white noise must lie within ±
/
#.-.
– It is common to plot horizontal limit lines at ± /
when plotting the ACF
#.&'
• If N = 125, critical values at ± = ±0.175
#"(
– All ACF coefficients lie within these limits, confirming that the data are
white noise (more precisely, the data cannot be distinguished from white noise)
18 H. Garnier
Properties of white noise process
• Best forecast of a white noise
– If a time series is white noise, it is unpredictable and so there is nothing to
forecast. Or more precisely, the best forecast is its mean value which is zero
– If the residual ACF does not resemble to the ACF of a white noise, it suggests that
improvements could be made to the predictive model
– If the residual ACF resembles to the ACF of a white noise, the modelling procedure is
finished. There is nothing else to capture in the residuals and the estimated ARIMA model
can be used for forecast
19 H. Garnier
Models for stationary random signals or time series
Θ(𝐿)
𝑦; = 𝜀
Φ(𝐿) ;
20 H. Garnier
General linear parametric model
of stationary time series
• Box and Jenkins in 1970 (following Yule and Slutsky 1927)
– Many time series (or their derivatives) can be considered as a special class of
stochastic processes: (weakly) stationary stochastic processes
• First two moments are finite and constant over time
• Defined completely by the mean, variance and autocorrelation function
– 𝜀& is often called the innovation process because it captures all new
information in the series at time t
21 H. Garnier
Lag or backward shift L operator
• The Lag or backward shift operator, L , is defined as
𝐿 𝜀! = 𝜀!*&
𝐿J 𝜀! = 𝜀!*J
𝑦) = 𝑐 + ∑,-
*+# 𝜓* 𝜀).* + 𝜀)
𝑦) = 𝑐 + 𝜓(𝐿) 𝜀)
Ψ 𝐿 = 1 + ∑,-
*+# 𝜓* 𝐿
*
22 H. Garnier
Towards AR, MA and ARMA models
for stationary time series
• If Ψ 𝐿 is a rational polynomial, we can write it (at least approximately) as the
quotient of two finite-degree polynomials
Θ(𝐿)
Ψ 𝐿 =
Φ(𝐿)
• This leads to the use of parsimonious models : AR, MA and ARMA models
– They are most useful for practical applications since these models can be quite easily
estimated from a finite amount of data in the time series
23 H. Garnier
Family of ARMA models for stationary time series
• ARMA models: a way to “see” stationary time series as filtered white noise
– The filter takes different forms according to the time series properties
𝑐 𝑐 𝑐
𝜀) 𝑦) 𝜀) 𝑦) 𝜀) 𝑦)
1 + +
Θ(𝐿) +
+
Θ(𝐿) +
Φ(𝐿) Φ(𝐿) +
1 Θ(𝐿)
𝑦) = 𝑐 + 𝜀 𝑦) = 𝑐 + Θ(𝐿)𝜀) 𝑦) = 𝑐 + 𝜀
Φ(𝐿) ) Φ(𝐿) )
24 H. Garnier
AutoRegressive models: AR(p) models
𝑦) = 𝑐 + C 𝜙* 𝑦).* + 𝜀)
*+#
– where 𝑝 ≥1, 𝑐 is a constant and 𝜀; ∼ 𝒩(0, 𝜎 <)
• It can also be written in Lag-operator polynomial form:
Φ(𝐿) 𝑦) − 𝑐 = 𝜀)
(Matlab Econometrics
Φ 𝐿 = 1 − 𝜙# 𝐿# − ⋯ − 𝜙4 𝐿4 toolbox notations)
• Stationarity conditions
– An AR(p) process is stationary if all roots of Φ 𝐿 are outside the unit circle
• Special case
– if one or more roots lie on the unit circle (i.e., have absolute value of one), the
model is called a unit root process model, which is non-stationary
• When p=1 and 𝑐 = 0, 𝜙#=1, 𝑦& = 𝑦&1# + 𝜀& is a non-stationary random walk,
which is a unit root process
25 H. Garnier
Different forms of an AR(2) model
Example
• An autoregressive model AR(2) of order 2 is given
𝑦) = −0.5𝑦).# − 0.9𝑦)." + 𝜀)
26 H. Garnier
Properties of AR(p) process
• Autocorrelation function
lim 𝛾( ℎ = 0
*→)0
27 H. Garnier
AR(1) process example:
𝑦! = 0.8𝑦!*& + 𝜀!
28 H. Garnier
AR(1) process example:
𝑦! = −0.8𝑦!*& + 𝜀!
29 H. Garnier
AR(2) process example:
𝑦) = −0.9𝑦)." + 𝜀)
30 H. Garnier
AR(2) process example:
𝑦) = −0.5𝑦).# − 0.9𝑦)." + 𝜀)
31 H. Garnier
Moving Average models: MA(q) models
• A moving average model of order q , MA(q), is defined by (Slutsky 1927)
5
𝑦) = 𝑐 + C 𝜃* 𝜀).* + 𝜀)
*+#
– where q ≥1, c is a constant and 𝜀; ∼ 𝒩(0, 𝜎 <)
32 H. Garnier
Different forms of an MA(2) model
Example
• A moving average model MA(2) of order 2 is given
𝑦) = 𝜀) − 0.8𝜀).# + 0.5𝜀)."
33 H. Garnier
Properties of MA(q) process
• Autocorrelation function
𝛾( ℎ = 0 for ℎ > 𝑞
34 H. Garnier
MA(1) process example:
𝑦! = 𝜀! − 0.8𝜀!*&
ACF cuts off after 1 lag ⇒ MA(1) process
35 H. Garnier
MA(1) process example:
𝑦! = 𝜀! + 0.8𝜀!*&
ACF cuts off after 1 lag ⇒ MA(1) process
36 H. Garnier
MA(2) process example:
𝑦) = 𝜀) − 0.5𝜀).# + 0.4𝜀)."
37 H. Garnier
Moving Average Autoregressive models:
ARMA(p,q) models
• An ARMA(p,q) of order p and q is defined by (Slutsky 1927)
$ '
Φ 𝐿 𝑦3 = 𝜃4 + Θ 𝐿 𝜀3
or Φ 𝐿 (𝑦3 −𝑐) = Θ 𝐿 𝜀3 (Matlab econometrics
toolbox notations)
Θ 𝐿 = 1 + 𝜃# 𝐿# + ⋯ + 𝜃' 𝐿'
Φ 𝐿 = 1 − 𝜙# 𝐿# − ⋯ − 𝜙$ 𝐿$
38 H. Garnier
Different forms of an ARMA(1,1) model
Example
39 H. Garnier
Properties of ARMA process
• Autocorrelation function
– The ACF of an ARMA(p,q) process exponentially decreases to 0
when h → +∞ from order q+1
40 H. Garnier
ARMA(1,1) process example:
𝑦! = 0.8𝑦!*& + 𝜀! − 0.5𝜀!*&
41 H. Garnier
AR(p), MA(q) and ARMA(p,q) processes
Summary of ACF and PACF properties
– For ARMA(p, q) processes, there are no such simple rules for selecting the
orders of ARMA(p, q) processes from its ACF or PACF
42 H. Garnier
Families of ARIMA models
Time series
Stationary Non-stationary
Non-seasonal
AR models
ARIMA models
Multiplicative
MA models Seasonal
SARIMA models
ARMA models
43 H. Garnier
Identifying stationary/non-stationary time series
44 H. Garnier
Non-stationary time series:
standard decomposition model
• Recall the standard decomposition model of a non-stationary process 𝑦&
𝑦! = 𝑇! + 𝑆! + 𝑥!
• Since the Box-Jenkins methodology is for stationary models only, it is first required to detrend
and deseasonalize the nonstationary series by using one of the two methods below
- Estimate (by linear regression) and then remove a deterministic trend and seasonality
- Difference the time series
45 H. Garnier
Operator of lag-T
Δ ? 𝑦) = 1 − 𝐿? # 𝑦) = 𝑦) −𝑦).?
Δ? @𝑦 = 1 − 𝐿? @𝑦
) )
46 H. Garnier
Lag-1 differencing to remove polynomial trend
and achieve stationarity
• Let 𝑦! be a time series with a polynomial trend of order k :
U
𝑦! = 3 𝛽𝑡 J + 𝑥!
J)T
• Applying the operator of lag-1 𝛥& to the time series
𝛥& 𝑦! = 𝑦! − 𝑦!*&
– Then, the lag-1 differenced time series will have a polynomial trend
of order k-1
– lag-1 difference reduces by 1 the degree of a polynomial trend
47 H. Garnier
How to choose the order d of lag-1 differencing ?
48 H. Garnier
Lag-s differencing to remove seasonality trend
and achieve stationarity
• Let 𝑦) be a time series with a trend 𝑇) and a season pattern 𝑆) of period s
(𝑆),A = 𝑆) ):
𝑦! = 𝑇! + 𝑆! + 𝑥!
• Applying the operator 𝛥A to the time series
𝛥W 𝑦! = 𝑦! − 𝑦!*W = (𝑇! − 𝑇!*W )+ (𝑆! − 𝑆!*W ) + ( 𝑥! − 𝑥!*W )
𝛥W 𝑦! = (𝑇! − 𝑇!*W )+( 𝑥! − 𝑥!*W )
– Then, the lag-s differentiated time series does not present any more seasonal
pattern
49 H. Garnier
How to choose the order D of lag-s differencing?
• Example
– if s=12, we have the lag-12 differencing of the series as (for monthly time
series data and annual seasonality for example)
Δ&Y 𝑦! = 1 − 𝐿&Y 𝑦! = 𝑦! − 𝑦!*&Y
50 H. Garnier
Differencing in practice
• Advantage:
– easy to understand
– allows forecast since we can forecast ΔA 𝑦) and then go back to 𝑦)
• In practice:
– Start by removing the seasonality trend by applying Δ5
– Plot the deseasonalized time series and check whether it seems stationary
– If it does not visually seem stationary, apply then again Δ#
– Plot the deseasonalized and differenced time series and check whether it
now seems stationary
– If not, apply again Δ# , but try to keep small the value for the number of
differencing times
Beware of over-differencing
51 H. Garnier
Converting nonstationary to stationary time series
by differencing - Example
The data look nonstationary, with a linear The ACF does not die out quickly and
trend and seasonal periodicity shows a cyclical pattern of period 12.
This also points to nonstationarity in the
time series
52 H. Garnier
Converting nonstationary to stationary time series
by differencing - Example
A seasonal differences of length 12 has been applied. The linear trend has been removed by first-differencing the data.
The differenced series appears now Although the sample ACF and PACF of
much more stationary the differenced series still show significant
autocorrelation at certain lags, they seem
correspond to a stationary process.
The remaining autocorrelation could be
captured by an ARMA model
53 H. Garnier
ARIMA models for non seasonal time series data
• The general non-seasonal model is known as ARIMA(p,d,q):
4 5
54 H. Garnier
Understanding ARIMA(p,d,q) model orders
Example
The model includes all consecutive AR and MA lags from 1 through their respective
orders p and q
55 H. Garnier
Understanding ARIMA models
• The general non-seasonal model is known as ARIMA(p,d,q):
56 H. Garnier
Special ARIMA models
57 H. Garnier
How to choose ARIMA orders (p, d, q)
in practice ?
• General tips
– Start by differencing the series if needed, in order to obtain
something visually stationary
– Look at the ACF and PACF plots and identify possible model orders
– Estimate several models and select the best one by using model
selection criteria such as AIC or BIC
58 H. Garnier
SARIMA models for seasonal time series data
• The multiplicative seasonal model is known as SARIMA(p,d,q) ×(P,D,Q)s:
(1 − 𝜙# 𝐿# − ⋯ − 𝜙4 𝐿4 )(1 − Φ# 𝐿B − ⋯ − ΦB 𝐿AB ) 1 − 𝐿 @ 1 − 𝐿A C (𝑦
) − 𝑐) =
1 + 𝜃# 𝐿 + ⋯ + 𝜃# 𝐿5 (1 + Θ# 𝐿D + ⋯ + ΘD 𝐿AD )𝜀)
59 H. Garnier
Understanding SARIMA(p,d,q) ×(P,D,Q)s model orders
Example
(1 − 𝜙#𝐿# − 𝜙+𝐿+) 1 − Φ#+𝐿#+ − Φ+6𝐿+6 (1 − 𝐿)(1 − 𝐿#+)(𝑦& −𝑐) = 𝑐 + 1 + 𝜃#𝐿 1 + Θ#+𝐿#+ 𝜀&
60 H. Garnier
How to choose SARIMA orders (P,D,Q)s
in practice
• The seasonal part of an AR or MA model can be seen in the seasonal
lags of the ACF and PACF
• Examples
– an SARIMA(0,0,0)(1,0,0)12 will show
§ a spike at lag 12 in the PACF, and no other significant spikes
§ an exponential decay in the seasonal lags of the ACF
61 H. Garnier
How to choose all the orders (p,d,q)×(P,D,Q)
of a SARIMA model in practice
• Use visual inspection to observe the trend and seasonality
– Look at the ACF and PACF plots and identify possible orders
• General tips
– Use differencing to remove the trend and seasonality
– Keep the orders simple
• According to Box-Jenkins
– “the maximum value of each SARIMA model orders p, d, q, P, D, Q is 2”
• According to Robert Nau, Duke University
– “In most cases, either p or q is zero and p+q ≤ 3”
• 𝐝 = 𝟎 ; 𝟏; 𝟐 , 𝐩 + 𝐪 ≤ 3
• 𝐃 = 𝟎 ; 𝟏 , P= 𝟎 ; 𝟏 , Q= 𝟎 ; 𝟏
• Standard value for the period s =12 for an annual seasonality if monthly time
series data
62 H. Garnier
The Box-Jenkins methodology
63 H. Garnier
The Box-Jenkins methodology:
Estimation and model selection
• Once the orders (p, d, q) are selected, Maximum Likelihood Estimation
(MLE) through optimization algorithms can be used to estimate the
model parameters
• Some model selection criteria can be used. The idea is to test a range
of possible model candidates and to compute the criteria for each
model structure tested
64 H. Garnier
The Box-Jenkins methodology:
Model selection
• Let L(θ) denote the value of the maximized likelihood objective function for
a model with a total of np parameters fitted to N data points
• When you compare values for a set of model candidates, smaller values of
the criterion (AIC or BIC) indicate a better, more parsimonious model
65 H. Garnier
Occam's Razor Rule
designates by metaphor the opportunity to "cut off", as with a razor, the superfluous assumptions of a theory
66 H. Garnier
The Box-Jenkins methodology:
Forecasting
67 H. Garnier
•67
Confidence intervals
• Assuming that the residuals are normally distributed, we can usefully
assess the accuracy of a forecast by using MSE as an estimate of the
error
# $ "
where MSE= S)+# 𝑦) − 𝑦1)
$
𝑦1!#& ± 𝑧 MSE
68 H. Garnier
•68
The Box-Jenkins methodology in details
I. Identification
1. Data preparation
a. Transform data to stabilize variance (apply logarithm, etc)
b. Differencing data to obtain stationary series
2. Model selection
a. Examine data to identify potential models
b. Examine ACF, PACF
c. Use automatic search methods
II. Estimation
1. Estimate parameters in potential models
2. Select best model using suitable information criteria (AIC, BIC,…)
III. Diagnostics
1. Check ACF/PACF of residuals
2. Are the residuals white noise?
3. Do more statistical tests of residuals
69 H. Garnier
home.ubalt.edu/ntsbarsh/stat-data/BJApproach.gif
70 H. Garnier
Two case studies
Let us apply the Box-Jenkins methodology to
71 H. Garnier
First case study – Australian Consumer Price Index (CPI)
Step 1 – Identification
– Data available are the logarithm Australian CPI. The variance of the log CPI remains constant over
time. There is no need for further transformation
– From the time plot, the time series is nonstationary, with a clear upward trend, also noticeable from
the slow decrease of the ACF
– We need to remove the linear trend by first differencing the data
72 H. Garnier
Step I.2 – Model selection
Differencing data to obtain stationary data. Observe its ACF and PACF
73 H. Garnier
Step 2 – Estimation
Estimate parameters of the chosen model structure
1 − 𝜙# 𝐿# − 𝜙$ 𝐿$ 1 − 𝐿 (𝑦a −𝑐) = 𝜀a
• The constant c and two AR (𝜙# and 𝜙+ ) model parameters have been estimated
by Maximum Likelihood through optimization algorithms (see estimate in Matlab)
74 H. Garnier
Step 3 – Diagnostics
Analysis of the residuals
All ACF and PACF coefficients lie within the limits, indicating that the residuals are white
(more precisely, the residuals cannot be distinguished from white noise)
75 H. Garnier
Step 4 – Forecasting
Use of the estimated model to forecast the next 5 years
76 H. Garnier
Second case study – Airline passengers
Step 1 – Identification
Transform data to stabilize variance by applying logarithm
77 H. Garnier
Step 1 – Identification
Differencing data to obtain stationary series
– Because these is monthly data, we use seasonal differences of length 12
– We also remove a linear trend by first differencing the data
78 H. Garnier
Step I.2 – Model selection
Examine data, ACF and PACF
79 H. Garnier
Step 2 – Estimation
Estimate parameters of the chosen model structure
• The MA (𝜃# ) and SMA (𝛩# ) model parameters have been estimated by
Maximum Likelihood through optimization algorithms (see estimate in Matlab)
80 H. Garnier
Step 3 – Diagnostics
Check residuals
All ACF and PACF coefficients lie within the limits, indicating that the residuals are white
(more precisely, the residuals cannot be distinguished from white noise)
81 H. Garnier
Step 4 – Forecasting
Use of the estimated model to forecast the next 5 years
82 H. Garnier
Takeaway messages
• We have just introduced basics and the core ideas behind time series
analysis and forecasting
• Obviously, each problem has its own subtleties and demands special
steps: proper data preparation, way of handling missing values, or
defining evaluation metric satisfying some domain conditions
83 H. Garnier