Chapter7 Explanatory Models3 ARIMA
Chapter7 Explanatory Models3 ARIMA
Explanatory
Models 3:
ARIMA
(Box-Jenkins)
Forecasting
Models
Box-Jenkins or ARIMA
•Box-Jenkins = ARIMA
Regression trend
Review
Multiple regression
•A time series of data is a sequence of numerical observations naturally ordered in time. The
order of the data is an important part of the data. Some examples would be:
average value
conditional forecasts
Explanatory variables Black box (approximated by linear regression) Observed Time Series
• In the Box-Jenkins methodology, on the other hand, we do not start with any explanatory variables, but rather
with the observed time series itself; what we attempt to discern is the “correct” black box that could have
produced such a series from white noise:
The name “moving average model” is not very descriptive of this type of model.
We would do better to call it a weighted average model.
Consider Table 7-2 (next slide)
The first column is white noise.
The second column labeled “MA1” was constructed:
5¼“
Mountain Skunk
Moose Lion
But What Are The “Tracks in the Soil?”
Correlation!
The partial autocorrelation coefficient is the second tool we will use to help identify
the relationship between the current values and past values of the original time
series.
Figure 7.1 Examples of theoretical autocorrelation and partial autocorrelation plots for MA(1) and MA(2) models.
Correlogram for an MA(1) Model as shown in
ForecastX
Figure 7.2
Autoregressive Model
Similar to the moving average model, except that the dependent variable Yt depends on
its own previous values rather than the white noise series or residuals.
Where:
Yt = time series generated
A1, A2,…, Ap = coefficients
Yt-1, Yt-2,…, Yt-p = lagged values of the time series
et = white noise
Autoregressive Model Example
Figure 7.3 Examples of theoretical autocorrelation and partial autocorrelation plots of AR(1) and AR(2) models.
Correlogram for an AR(1) Model as shown in
ForecastX
Figure 7.4
Mixed Autoregressive and Moving-Average Models
Figure 7.5 Examples of theoretical autocorrelation and partial autocorrelation plots of ARMA(1, 1) models.
Many real-world processes, once they have been adjusted for seasonality, can be adequately modeled with the low-
order models.
Stationarity
Incorporate elements from both the autoregressive and moving average models
All data in ARIMA analysis is assumed to be "stationary"
A stationary time series is one in which two consecutive values in the series depend only on the
time interval between them and not on time itself
If data is not stationary, it should be adjusted to correct for the nonstationarity
Differencing is usually used to make this correction
The resulting model is said to be an "integrated" (differenced) model
This is the source of the "I" in an ARIMA model
Figure 7.6
Autocorrelation and
partial autocorrelation
plots for the ARIMA(1, 1,
1) Series in Table 7.2
Same Series, After First Differencing
Figure 7.7
The Box-
Jenkins
methodology.
The Diagnostic Step (Visually first)
The Diagnostic Step a Second Way
The Ljung-Box statistic
Second Diagnostic Method (Use the Ljung-Box statistic)
The second test for correctness of the model (but again, not a definitive test) is
the Ljung-Box-Cox-Pierce Q statistic.
m-p-q
Degrees of
Freedom
Where:
n= the number of observations in the time series
k= the particular time lag to be checked
m= the number of time lags to be tested
rk= sample autocorrelation function of the kth residual term
Second Diagnostic Method (Use the Ljung-Box statistic)
The Ljung-Box, or Q, statistic tests whether the residual autocorrelations as a set are
significantly different from zero. If the residual autocorrelations ARE significantly
different from zero, the model specification should be reformulated (i.e., the model has
failed the test).
If the calculated Ljung-Box statistic is less than the table value, the autocorrelations are
not significantly different from zero (that’s good!).
Note:
ForecastX is set to check automatically for a lag length of 12 if a nonseasonal model has
been selected; if a seasonal model has been selected the lag length is equal to four times
the seasonal length.
Second Diagnostic Method (Use the Ljung-Box statistic)
ARIMA (p d q)
AR I MA Table 7-2
Force MA1 Model on the
p is the number of AR terms, MA1 column data.
d is the number of differences, and
q is the number of MA terms.
Second Diagnostic Method (Use the Ljung-Box statistic)
Use a Chi-Square table to check the Ljung-Box.
For example:
If the calculated Ljung-Box reported by ForecastX is 7.33 for the first 12
autocorrelations (as in a nonseasonal model), the resulting degrees of freedom are 11.
(m-p-q degrees of freedom)
Check the textbook Chi-Square table for 11 degrees of freedom at the 0.10 column to
find 17.275. This is the critical value.
In this case, the model passes the Q-test because 7.33 is less than 17.275.
Note that it is standard practice with the Ljung-Box to check “four times the number of seasons” in terms of lags
examined. For instance, if the data is quarterly check (4 X 4) or 16 lags at a minimum.
FORECASTING SEASONAL TIME SERIES
• In many actual business situations the time series to be forecast are quite seasonal.
• This seasonality can cause some problems in the ARIMA process, since a model fitted to such a series would
likely have a very high order. If monthly data were used and the seasonality occurred in every 12th month,
the order of the model might be 12 or more.
Seasonal ARIMA Model
ARIMA (p d q) (P D Q)
AR I MA SAR SI SMA
Autocorrelation and
partial
autocorrelation plots
for the total houses
sold series.
Figure 7.18 contains the diagnostic statistics for the estimation for an ARIMA (0, 1, 0) (2, 0, 2) model. The second set of P, D,
Q values (usually shown as uppercase letters) represents two seasonal AR and two seasonal MA terms. The Ljung-Box
statistic for the first 48 lags is 45.43 and confirms the acceptability of the model.
Estimation and order selection
In general, a stationary time series will have no predictable patterns in the long-term. Time plots will show the series to
be roughly horizontal (although some cyclic behaviour is possible), with constant variance.
DIFFERENCING
In previous Figure Google stock price was non-stationary in panel (a), but the daily changes were stationary in
panel (b). This shows one way to make a non-stationary time series stationary — compute the differences between
consecutive observations. This is known as differencing.
Transformations such as logarithms can help to stabilise the variance of a time series. Differencing can help
stabilise the mean of a time series by removing changes in the level of a time series, and therefore eliminating
(or reducing) trend and seasonality.