Time Series Practice P5
Time Series Practice P5
1 R Tutorial
Today, we will be going over the Australian hotel data (motel.dat) to see how to fit seasonal
ARIMA (SARIMA) models. The time series consists of total room nights occupied at hotels,
motels, and guest houses in Victoria, Australia from Jan 80 - Jun 95. Use the following
commands to read the data from the file and enter the number of rooms occupied into the
variable nights. We will only use the first 100 data points (why???).
motel<-read.table("motel.dat")
nights<-motel$V1[1:100]
1. Exploratory data analysis. Plot the data and comment on the seasonal and overall
trend.
(a) Let’s take a look at the differenced data which should remove some of the overall
trend but not the seasonal trend.
plot.ts(diff(nights))
(b) Apply a logarithmic transform of the data.
lnights<-log(nights)
(c) Next, we should look at both the differenced data and the seasonally differenced
data (after the log transform). We can use the following commands to create the
plots.
par(mfrow=c(1,2))
plot.ts(diff(lnights))
plot.ts(diff(lnights,12))
(d) We want to apply both non-seasonal and seasonal differencing, and examine the
time series plot, ACF, and PACF of the data.
plot.ts(diff(diff(lnights),12))
acf(diff(diff(lnights),12),36, xlim=c(1,36))
pacf(diff(diff(lnights),12),36)
1
2. Model fitting. Next, we want to fit appropriate models ARIMA(p, d, q) × (P, D, Q)s
with s = 12.
(a) Since we consider taking differences for both the regular trend and the seasonal
trend, d = 1 and D = 1.
(b) To determine p, q, P, Q, we can check ACF and PACF plots from part 1d.
• The ACF plot at lags 12,24,36. . . . suggests a seasonal moving average of order
Q = 0; the PACF plot at lags 12,24,36, . . . suggests a seasonal autoregressive
of order P = 1. We can also think that both the ACF and the PACF may
be tailing off at the seasonal lags, so perhaps both components P = 1 and
Q = 1 are needed.
• To determine the values of p and q, we check the ACF and PACF plots at the
within season lags 1,2,. . .,11. From the ACF plot, we can consider an MA(1),
and hence q = 1; from the PACF plot, we can consider an AR(1), and hence
p = 1; or we consider ARMA(1,1) with p = q = 1.
(c) We have identified a few possible models for our data.
i. ARIMA(1,1,0)×(1, 1, 0)12
ii. ARIMA(1,1,0)×(1, 1, 1)12
v. ARIMA(1,1,1)×(1, 1, 0)12
vi. ARIMA(1,1,1)×(1, 1, 1)12
(d) Recall we use the sarima() function to fit seasonal ARIMA models. Make sure
you have “sarima.R” saved in your working directory, and type source("sarima.R")
to load the function into your workspace. For example, to fit model 2(c)i, we type
fit1<-sarima(lnights,1,1,0,1,1,0,12)
##arguments are the vector, p, d, q, P, D, Q, s
(e) From the diagnostic plots, it appears that model 2(c)i is not a good model. So
we drop this model from consideration.
3. Model selection. We fit the other models using sarima() and only consider those that
have adequate diagnostic plots.
(a) If you go through the models listed in 2c one by one, models 2(c)iii to 2(c)vi have
adequate diagnostic plots.
(b) Finally we choose model 2(c)vi, ARIMA(1,1,1)×(1, 1, 1)12 , which passes the diag-
nostic test and has the smallest AIC, AICc , and BIC.
(c) For completeness, check that the estimated coefficients of model 2(c)vi are signif-
icant.
2
Coefficients:
ar1 ma1 sar1 sma1
0.3927 -0.9999 0.3518 -0.9997
s.e. 0.1037 0.2434 0.1453 0.3326
4. Prediction. We would like to use model 2(c)vi to forecast 12 months into the future.
This can be done with the command
sarima.for(lnights,12,1,1,1,1,1,1,12)
(a) Recall for this data set that we only analyzed the first 100 observations. We now
have a forecast for these observations, and we may compare them to the actual
observations. The following command will include the observations to our plot
all_nights<-motel$V1
lines(101:112, log(all_nights[101:112]), type="b")
Is it surprising for this data that some of the observations lie outside the confidence
intervals? Do you think we were justified in truncating the data at 100? (It turns
out that the bicentenary of Australia took place in 1988, after time point 100.
Take home message: when data exhibit sudden changes, it does not make sense
to forecast the values of post-change using the pre-change model, because the two
segments of data are governed by different model dynamics).
(b) To better understand the goodness of our forecasting, we can redo the model-
ing based on the first 88 data points and use the resulting model to predict 12
observations from 89 to 100:
lnights3<-lnights[1:88]
sarima.for(lnights3,12,1,1,1,1,1,1,12)
lines(89:100, lnights[89:100], type="b")
2 Assignment
1. (no R needed) Consider the following stationary seasonal model
xt = Φxt−4 + wt − θwt−1 .
2. The data set (labour.dat) that we are going to analyze is the number of persons in the
civilian labor force in Australia each month from Feb 1978 - Aug 1995. You only need
to fit a model for the first 12 years (the first 144 observations). (There was a rather
3
intense recession in Australia in 1990-1991. ) On top of fitting an appropriate SARIMA
model, you need to use your model to forecast 12 months into the future. (i.e. forecast
into the times 145, 146, . . ., 156.) Comment on whether the true observations lie in
the prediction intervals. If observation(s) do not lie in the intervals, give a plausible
explanation. Make sure to outline the steps you used in analyzing the data. If there
are two (or more) competing models, make sure you discuss why you chose your model
in favor of the others.