Models For Non-Stationary Time Series: T T T T T
Models For Non-Stationary Time Series: T T T T T
Non-Stationary
where t is a non-constant mean function and Xt is
a zero-mean, stationary series.
As an example consider the Dow-Jones index on The time plot of the series displays considerable
251 trading days ending 26 August 1994. variation and a stationary model does not seem to
be reasonable.
ARIMA Models
Consider then an ARIMA(p,1,q) process. With Wt If the process contains no autoregressive terms, we
= Yt Yt 1, we have call it an integrated moving average and abbreviate
the name to IMA(d,q).
Following figure shows the time series plot of the Finally, the second differences of the simulated
first difference of the simulated series. This series IMA(2,2) series values are plotted in the following
is also nonstationary, as it is governed by an figure. These values arise from a stationary MA(2)
IMA(1,2) model. model with 1 = 1 and 2 = 0.6.
where | < 1.
Theoretical autocorrelations for this model are 1 =
0.678 and 2 = 0.254. These correlation values Nonstationary ARIMA series can be simulated by
seem to be reflected in the appearance of the time first simulating the corresponding stationary
series plot. ARMA series and then it (really
partially summing it).
Model
the , and e for that model in the most
efficient way. Then we check the adequacy of the
model. If the model appears inadequate, we
consider the nature of the inadequacy to help us
Specification
select another model. We proceed to estimate that
new model and check it for adequacy.
General Behavior of the ACF and PACF for Simulated AR(1) series with = 0.9
ARMA model
ACF
Exponential decay: on positive side if >0 and
alternating in sign starting on negative side if <0.
PACF
Spike at lag 1, then cuts off to zero: spike positive
if >0, negative if <0.
AR(2)
ACF
Exponential decay or damped sine-wave. The exact
pattern depends on the signs and sizes of 1 and 2.
PACF
Spikes at lags 1 to 2, then cuts off to zero.
ACF
Exponential decay or damped sine-wave. The exact
pattern depends on the signs and sizes of 1,----, p.
PACF
Spikes at lags 1 to p, then cuts off to zero.
Simulated MA(1) series with = -.9
ACF
Spike at lag 1 then cuts off to zero: spike positive if
1 < 0, negative if 1 > 0.
PACF
Exponential decay: on negative side if 1 > 0 and
alternating in sign starting on positive side if 1 <
0.
MA(2)
ACF
Spikes at lags 1 to 2, then cuts off to zero.
PACF
Exponential decay or damped sine-wave. The exact
pattern depends on the signs and sizes of 1 and 2.
ACF
Spikes at lags 1 to q, then cuts off to zero.
PACF
Exponential decay or damped sine-wave. The exact
pattern depends on the signs and sizes of 1,----, q.
Simulated ARMA(1,1) series with =.8 and =.4
ACF
Exponential decay.
PACF
Exponential decay.
If the first difference of a series and its sample ACF The difference of any stationary time series is also
do not appear to support a stationary ARMA model, stationary. Overdifferencing introduces unnecessary
then we take another difference and again compute correlations into a series and will complicate the
the sample ACF and PACF to look for modeling process.
characteristics of a stationary ARMA process.
For example, suppose our observed series, {Yt}, is Specifying an IMA(2,1) model would not be
a random walk so that one difference would lead to appropriate here. The random walk model, which
a very simple white noise model can be thought of as IMA(1,1) with = 0, is the
correct model.
If we difference once more (that is, overdifference) The random walk model can also be thought of as
we have an ARI(1,1) with = 0 or as a nonstationary AR(1)
with = 1. Overdifferencing also creates a
which is an MA(1) model but with = 1. If we take noninvertible model. Noninvertible models also
two differences in this situation we unnecessarily create serious problems when we attempt to
have to estimate the unknown value of . estimate their parameters.
Dickey and Fuller have shown that under the null Other Specification Methods
hypothesis that a = 0, the estimated t value of the
coefficient of Y 1 follows the (tau) statistic. A number of other approaches to model
specification have been proposed. One of the most
If the computed absolute value of tau statistic studied is (1973) Information Criterion
exceeds the absolute DF values, we reject the (AIC). This criterion says to select the model that
hypothesis that a = 0, in which case the time series minimizes
is stationary.
AIC = 2log(maximum likelihood) + 2k
In case, where et are correlated, Dickey and Fuller
have developed another test, known as the where k = p + q + 1 if the model contains an
augmented Dickey-Fuller (ADF) test. intercept or constant term and k = p + q otherwise.
The corrected AIC, denoted by AICc can also be Another approach to determining the ARMA orders
used and defined by the formula is to select a model that minimizes the Schwarz
Bayesian Information Criterion (BIC) defined as
We may wish to consider several models with AIC Specification of the Actual Time Series
(BIC) values close to the minimum. A difference in (Dividend)
AIC (BIC) values of 2 or less is not regarded as
substantial and we may wish to choose a simpler Consider the U.S. dividend series for the quarterly
model either for simplicity, or for the sake of periods of 1970 1991, for a total of 88 quarterly
getting a better model fit. observations. The raw data are given below:
The dividend time series shows that over the period
of study dividend has been increasing, that is,
showing an upward trend, suggesting perhaps that
the mean of the dividend has been changing. This
perhaps suggests that the dividend series is not
stationary.
From a time series of 100 observations, we A stationary time series of length 121 produced
calculate r1 = 0.49, r2 = 0.31, r3 = 0.21, r4 = 0.11, sample partial autocorrelation of
and |rk| < 0.09 for k > 4. On this basis alone, what
ARIMA model would we tentatively specify for the
series? Based on this information alone, what model would
we tentatively specify for the series?
The method of moments is frequently one of the Consider first the AR(1) case:
easiest methods for obtaining parameter estimates,
but is not the most efficient. The method consists
of equating sample moments to corresponding For this process we obtain
theoretical moments and solving the resulting
equations to obtain estimates of any unknown
parameters. The simplest example of the method is
to estimate a stationary process mean by a sample We have the simple relationship 1 = . In the
mean. method of moments, 1 is equated to r1, the lag 1
sample autocorrelation. Thus we can estimate by
We know that
Solving these equations we obtain
Because the method of moments is unsatisfactory Consider the first-order case where
for many models, we must consider other methods
of estimation. We begin with least squares. At this
point, we introduce a possibly nonzero mean, We can view this as a regression model with
into our stationary models and treat it as another predictor variable Y 1 and response variable Yt.
parameter to be estimated by least squares. Least squares estimation then proceeds by
minimizing the sum of squares of the differences
Since only Y1, Y2, , Yn are observed, we can only or, simplifying and solving for
sum from t = 2 to t = n. The conditional sum-of-
squares function can be written as
For large n,
According to the principle of least squares, we
estimate and by the respective values that
minimize Sc( , given the observed values of Y1,
Regardless of the value of , we have
Y2, , Yn. Considering Sc = 0, we have
Consider now the minimization of Sc( , ) with Except for one term missing in the denominator,
respect to . We have this is the same as r1, namely
Entirely analogous results follow for the general If 1, we may continue this substitution
stationary AR(p) process. into the past and obtain the expression
It is clear that the least squares problem is
nonlinear in the parameters. We will not be able to
minimize Sc by taking a derivative with respect
This is an autoregressive model of infinite order. to setting it to zero, and solving. Thus, even for
The least squares can be carried out by choosing a the simple MA(1) model, we must resort to
value of that minimizes techniques of numerical optimization.
we obtain
Now we will find f(y1). Express the AR(1) process
in terms of et. For simplicity, assuming =0, we
obtain
Thus
Thus
we obtain
Now
For 0
AR(1)
AR(2)
Exercise 7.1
Forecasting
of 5. If we assume that an AR(2) model with a
constant term is appropriate, how can we get
(simple) estimates of 1, 2, and e2 ?
One of the primary objectives of building a model This forecast is developed based on minimizing the
for a time series is to be able to forecast the values mean square forecasting error. It turns out that
for that series at future times and the assessment of
the precision of those forecasts.
ARIMA Forecasting
Suppose the series is available up to time t, namely
Y1, Y2, , Y 1, Yt. We would like to forecast the AR(1)
value of Y that will occur time units into the
future. We call time t the forecast origin and the Consider the AR(1) process with a nonzero mean
lead time for the forecast, and denote the forecast
as .
Consider the problem of forecasting one unit time Since et+1 is independent of Y1, Y2, , Y 1, Yt, we
into the future. Replacing t by t + 1 we have obtain
Given Y1, Y2, , Y 1, Yt, we take the conditional Thus the one step ahead forecast is
expectations of both sides and obtain
and, for 1, e is independent of Y1, Y2, , Y 1, Iterating backward on of the difference equation
Yt . form, we have
In general, since | | < 1, we have simply for large Suppose, the last observed value is 67, so we
would forecast one time period ahead as
At lead 5, we have
or
Now
or
for large .
MA(1)
where
Thus the general nature of the forecast for long (a) What sort of ARIMA model is this (i.e., what
lead times will be determined by the autoregressive are p, d, and q)?
parameters 1, 2, , p. The variance of the (b) The last five values of the series are given
forecast error is below:
More generally,
The following series was fitted by working with the Exercise 9.1
square root and then fitting an AR(3) model. Notice
that the forecasts mimic the approximate cycle in
the actual series even when we forecast with a lead
time out to 25 years.
As we see in the displays, carbon dioxide levels are Consider the time series generated according to
higher during the winter months and much lower in
the summer.
Multiplying the equation by Yt , taking With this example in mind, we define a seasonal
expectations, and dividing by AR(P) model of order P and seasonal period s by
0 yields
The ARIMA notation can be extended readily to where s = number of periods per season.
handle seasonal aspects, and the general shorthand
notation is The algebra is simple but can get lengthy, so for
illustrative purposes consider the following general
ARIMA(1, 1, 1)(1, 1, 1)4 model:
Clearly, such models represent a broad, flexible
class from which to select an appropriate model for
a particular time series.
Model specification, fitting, and diagnostic As always, a careful inspection of the time series
checking for seasonal models follow the same plot is the first step. Following figure displays
general techniques. monthly carbon dioxide levels in northern Canada.
The upward trend alone would lead us to specify a
Here we shall simply highlight the application of nonstationary model.
these ideas specifically to seasonal models and pay
special attention to the seasonal lags.
Following figure shows the sample autocorrelation Notice the strong correlation at lags 12, 24, 36, and
function for that series. The seasonal so on. In addition, there is substantial other
autocorrelation relationships are shown quite correlation that needs to be modeled. Following
prominently in this display. figure shows the time series plot of the CO2 levels
after we take a first difference.
Model Fitting
Following table gives the maximum likelihood The coefficient estimates are all highly significant,
estimates and their standard errors for the and we proceed to check further on this model.
ARIMA(0,1,1) (0,1,1)12 model for CO2 levels.
This could easily happen by chance alone. Next we investigate the question of normality of
the error terms via the residuals. The quantile-
quantile plot shows - .
As one further check on the model, we consider Forecasting Seasonal Models
overfitting with an ARIMA(0,1,2) (0,1,1)12 model
with the results shown below Computing forecasts with seasonal ARIMA models
is, as expected, most easily carried out recursively
using the difference equation form for the model.
and so forth. The noise terms et 12, et 11, , et (as The forecast satisfy
residuals) will enter into the forecasts for lead
times l = 1, 2, , 13, but for l > 13 the
autoregressive part of the model takes over and we
have
Prediction Limits
Following figure displays the last year of observed Equivalences with exponential smoothing
data and forecasts out four years. At this lead time, models
it is easy to see that the forecast limits are getting
wider, as there is more uncertainty in the forecasts. It is possible to show that the forecasts obtained
from some exponential smoothing models are
identical with forecasts from particular ARIMA
models.