MSF 566 Topic 03 Stationary Time Series
MSF 566 Topic 03 Stationary Time Series
Topic 3 1
Sample autocorrelations of stationary series
Properties of forecasts
Topic 3 2
In practice, the theoretical mean, variance and
autocorrelations of a series are unknown to the
researcher
◦ Unknown population parameters
Topic 3 3
Let there be T observations labeled y1 through yT
We will have T
y (1 / T ) yt
t 1
T
ˆ 2 (1 / T ) ( yt y )2
t 1
( y t y )( yt s y )
rs t s 1
T
t
( y
t 1
y ) 2
Topic 3 4
The sample ACF and PACF can be compared to various
theoretical functions to help identify the actual nature of
the data-generating process
j 1
For large sample (large T), rs will be normally distributed with a mean
equal to zero
Topic 3 5
We can use sample values to form the sample ACF and
PACF and test for significance
◦ If the true value of rs=0, the true data-generating process is an
MA(s-1) [note: null hypothesis]
◦ For example
If we use a 95% confidence interval (i.e., two standard deviations),
and calculated value of r1 exceeds 2T-1/2, it is possible to reject the null
hypothesis that the first-order autocorrelation is not statistically
significant from zero
t(*0.025, ) 1.96
t1 ( r1 0) / var( r1 ) 2T 1 / 2 / T 1 / 2 2
Rejecting this hypothesis means rejecting and MA (s-1)=MA(0)
process and accepting the alternative q>0
Topic 3 6
The next step
◦ Try s=2
◦ For example
Var(r2) is (1+2r12)/T, If r1=0.5 and T=100
Var(r2)=0.015 (the standard deviation is (0.123)
For r2>2*0.123, it is possible to reject the null hypothesis r2=0
◦ Note:
The maximum number of sample autocorrelations and partial
autocorrelations to use is typically set equal to T/4
Topic 3 7
Under the null hypothesis of an AR(p) model
◦ i.e., under the null that all φp+i,p+i are zero
◦ The variance of the ˆp i , p i is approximately 1/T
Topic 3 8
Within any large group of autocorrelations, some
will exceed two standard deviations as a result of
pure chance even though the true values in the data-
generating process are zero
Topic 3 9
Box and Pierce (1970) used the sample autocorrelations to
form the statistic
s
Q T rk2
k 1
Topic 3 10
The problem with Box-Pierce Q-statistic
◦ Works poorly even in moderately large sample
Topic 3 11
The Box-Pierce and Ljung-Box Q-statistics can be used to
check whether the residuals from an estimated
ARMA(p,q) model behave as a white-noise process
Topic 3 12
In analog to what we have learned in statistics course
◦ Adding additional lags for p and/or q will necessarily reduce the
sum of squares of the estimated residuals (reduces SSE, increases
SSR, and consequently increases R-square)
Trade off
◦ A reduction in the sum of squares of the residuals (SSE)
◦ A more parsimonious model
Topic 3 13
Two commonly used model selection criteria are
◦ Akaike Information Criterion (AIC)
Topic 3 14
When you estimate a model using lagged variables, some
observations are lost
◦ To adequately compare the alternative models, T should be kept
fixed
Example
◦ With 100 data points, estimate an AR(1) and AR(2) using only the
last 98 observations in each estimation
Compare the two models using T=98
Topic 3 15
Ideally, the AIC and SBC will be as small as possible
◦ Note that as the fit of the model improves, the AIC and SBC
will approach -∞ (can be negative)
◦ Model A is said to fit better than model B if the AIC (or SBC)
for A is smaller than for model B
Estimate two models over the same time period to make them
comparable
Topic 3 16
AIC T ln( sum of squared residuals) 2n
◦ When ln(T)>2
The marginal cost of adding regressors is greater with the SBC
than with the AIC
Topic 3 17
SBC has superior large-sample properties
◦ Let the true order of the data-generating process be
(p*,q*), and suppose that we use the AIC and SBC to
estimate all ARMA models of order (p,q)
◦ Both AIC and SBC will select models of orders greater than
or equal to (p*,q*) as the sample size approaches infinity
Topic 3 18
AIC work better than SBC in small samples
Topic 3 19
In our textbook (page 41) and Eviews
AIC * 2 ln( L) / T 2n / T
SBC * 2 ln( L) / T n ln(T ) / T
For a normal distribution,
2 ln( L) T ln( 2 ) T ln( 2 ) (1 / 2 )( sum of squared residuals)
Topic 3 20
n
1 1
2
L( 0 , 1 , ) exp[ (Y X ) 2
]
(2 2 ) 2
n/2 i 0 1 i
i 1
n n 1
ln L ln 2 ln 2 (Yi 0 1 X i )2
2
2 2 2
1
2 ln L n ln 2 n ln 2 (Yi 0 1 X i )2
2
Topic 3 21
We can prove that AIC and AIC* will select the same
model
AIC * ln(ˆ 2 ) 2n / T
AIC T ln( SSE ) 2n
It is obvious that if T is fixed, AIC and AIC* are
equivalent and will select the same model
AIC T [ln( SSE ) 2n / T ln(T 1)] T ln(T 1)
SSE
T (ln( ) 2n / T ) T ln(T 1)
T 1
T [ln(ˆ 2 ) 2n / T ] T ln(T 1)
T AIC * T ln(T 1)
Topic 3 22
A specific example
◦ A computer program was used to draw 100 normally
distrusted random errors (εt) with a theoretical variance
equal to unity (i.e.,1)
◦ Beginning with t=1, values of yt were generated using the
formula yt 0.7 yt 1 t and the initial condition y0=0
Note that the problem of nonstationarity is avoided sine the
initial condition is consistent with long-run equilibrium
Topic 3 23
Topic 3 24
In practice, we never know the true data-generating
process
Topic 3 25
Under the null hypothesis of an MA(0) process
◦ The standard deviation of r1 is T-1/2=0.1
◦ The sample value of r1=0.74 is more than seven standard
deviations from zero
We can reject the null hypothesis that r1 equals 0
Topic 3 26
All partial autocorrelations (except for lag 12) are less
than 2T-1/2
The decay of the ACF and the single spike of the PACF
give the strong impression of a first-order
autoregressive model
Topic 3 27
Let’s compare the following two models
Model 1 : yt a1 yt 1 t
Model 2 : yt a1 yt 1 t 12 t 12
0.795
-0.035
Topic 3 28
The coefficient of Model 1 satisfies the stability
condition |a1|<1, and has a low standard error
(t>12)
Diagnostic test
◦ We plot the correlogram of the residuals of the fitted model
Topic 3 29
The Ljung-Box Q-statistics of these residuals
indicate
◦ as a group, 1 through 8, 1 through 16, and 1 through 24 are
not significantly different from zero
Topic 3 30
For model 2
◦ Estimates for the first-order autoregressive coefficient and
the associated standard error yield similar results
Topic 3 31
Comparing AIC and SBC values of both models
◦ Any benefit of a reduced SSE is overwhelmed by the
detrimental effects of estimating an additional parameter
Conclusion
◦ We should select AR(1)
Topic 3 32
A specific example
◦ A computer program was used to draw 100 normally
distrusted random errors (εt) with a theoretical variance
equal to unity (1)
◦ Beginning with t=1, values of yt were generated using the
formula yt 0.7 yt 1 t 0.7 t 1 and the initial condition
y0=0 and ε0=0
Topic 3 33
Topic 3 34
If the true data-generating process is unknown, we
can consider some other models that will generate
similar ACF and PACF
Model 1 : yt a1 yt 1 t
Model 2 : yt a1 yt 1 t 1 t 1
Model 3 : yt a1 yt 1 a2 yt 1 t
Topic 3 35
Topic 3 36
All of the estimated values of a1 are highly
significant
◦ At least eight standard deviations from zero
◦ AR(1) is appropriate
Topic 3 37
The coefficient estimations point out the AR(1)
Topic 3 38
A specific example
◦ A computer program was used to draw 100 normally
distrusted random errors (εt) with a theoretical variance
equal to unity (1)
◦ Beginning with t=1, values of yt were generated using the
formula yt 0.7 yt 1 0.49 yt 2 t and the initial condition
y0=0 and y1=0
Topic 3 39
ACF for Y3
+- 1.96/T^0.5
0.4
0.2
-0.2
-0.4
0 5 10 15 20
lag
PACF for Y3
+- 1.96/T^0.5
0.4
0.2
-0.2
-0.4
0 5 10 15 20
lag
Topic 3 40
Topic 3 41
Topic 3 42
Overall, the model appears to be adequate
◦ The AR(2) coefficients are unable to capture the
correlations at very long lags
◦ For example, the calculated Ljung-Box statistic for 16 lags is
significant at 10% level (p<0.10)
yt a1 yt 1 a2 yt 1 t 16 t 16
Topic 3 43
Topic 3 44
Topic 3 45
Both AIC and SBC point out that the model including
a MA-16 term has a better fit
Topic 3 46
A useful check is to split the sample into two parts
◦ If a coefficient is present in the data-generating process, its
influence should be seen in both sub-samples
Topic 3 47
Topic 3 48
Topic 3 49
Topic 3 50
Topic 3 51
In both sub-samples
◦ The significance levels of Q(16) can not be maintained
◦ In other words, the correlation at lag 16 is not meaningful
Note
◦ Most sophisticated practitioners warn against trying to fit any
model to the very long lags
◦ In small samples, a few “unusual” observations can create the
appearance of significant autocorrelations at long lags
◦ The more general point is
We always need to be wary of our estimated models
Topic 3 52
We have some general idea about selecting the
model through the AR(1) ARMA(1,1) and AR(2)
examples
Topic 3 53
The researcher visually examines the time plot of the
series, the ACF and PACF
◦ Plotting the time path the {yt} sequence provides useful
information concerning outliers, missing values, and structural
breaks in the data
Topic 3 54
Each of the potential models are examined
◦ The various αi and βi coefficients are estimated and
examined
Topic 3 55
Various diagnostic tests can be performed into to
ensure that the residuals from the estimated model
mimic a white-noise process
Topic 3 56
A fundamental idea in the Box-Jenkins approach is the
principle of parsimony
◦ Parsimony should come as second nature to economists
Topic 3 57
In selecting an appropriate model, we need to be
aware that several different models may have
similar properties
◦ As an extreme example, the AR(1) model yt 0.5 yt 1 t
◦ Has the equivalent infinite-order moving average
representation of
yt t 0.5 t 1 0.25 t 2 0.125 t 3 0.0625 t 4 ...
◦ In most samples, approximating this MA(∞) with an MA(2)
or MA(3) model will give a very good fit
However, the AR(1) model is the more parsimonious model
and is preferred
Topic 3 58
Suppose we want to fit the model
(1 a1L a2 L2 ) yt (1 1L 2 L2 3L3 ) t
Topic 3 59
A numerical example
(1 0.25L2 ) yt (1 0.5L) t
(1 0.5L)(1 0.5L) yt (1 0.5L) t
yt 0.5 yt 1 t
Topic 3 60
In order to ensure the model is parsimonious, the
various αi and βi coefficients should all have t-
statistics of 2.0 or greater (p<0.05)
Topic 3 61