0% found this document useful (0 votes)
19 views61 pages

MSF 566 Topic 03 Stationary Time Series

Uploaded by

gallardoania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views61 pages

MSF 566 Topic 03 Stationary Time Series

Uploaded by

gallardoania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

MSF 566

Topic 3 1
 Sample autocorrelations of stationary series

 Model selection criteria

 Properties of forecasts

Topic 3 2
 In practice, the theoretical mean, variance and
autocorrelations of a series are unknown to the
researcher
◦ Unknown population parameters

 Given a series is stationary, we can use the sample


mean, variance, and autocorrelations to estimate
parameters of the actual data-generating process

Topic 3 3
 Let there be T observations labeled y1 through yT
 We will have T
y  (1 / T ) yt
t 1
T
ˆ 2  (1 / T ) ( yt  y )2
t 1

for each value of s  1,2,...


T

( y t  y )( yt  s  y )
rs  t  s 1
T

 t
( y
t 1
 y ) 2

Topic 3 4
 The sample ACF and PACF can be compared to various
theoretical functions to help identify the actual nature of
the data-generating process

 Sampling variance of rs, denoted as var(rs)


◦ Box and Jenkins (1976)
var( rs )  T 1.............................. for s  1
s 1
 T (1  2 rj2 )............. for s  1
-1

j 1

 For large sample (large T), rs will be normally distributed with a mean
equal to zero

Topic 3 5
 We can use sample values to form the sample ACF and
PACF and test for significance
◦ If the true value of rs=0, the true data-generating process is an
MA(s-1) [note: null hypothesis]

◦ For example
 If we use a 95% confidence interval (i.e., two standard deviations),
and calculated value of r1 exceeds 2T-1/2, it is possible to reject the null
hypothesis that the first-order autocorrelation is not statistically
significant from zero
t(*0.025, )  1.96
t1  ( r1  0) / var( r1 )  2T 1 / 2 / T 1 / 2  2
 Rejecting this hypothesis means rejecting and MA (s-1)=MA(0)
process and accepting the alternative q>0

Topic 3 6
 The next step
◦ Try s=2
◦ For example
 Var(r2) is (1+2r12)/T, If r1=0.5 and T=100
 Var(r2)=0.015 (the standard deviation is (0.123)
 For r2>2*0.123, it is possible to reject the null hypothesis r2=0

◦ Note:
 The maximum number of sample autocorrelations and partial
autocorrelations to use is typically set equal to T/4

Topic 3 7
 Under the null hypothesis of an AR(p) model
◦ i.e., under the null that all φp+i,p+i are zero
◦ The variance of the ˆp i , p i is approximately 1/T

Topic 3 8
 Within any large group of autocorrelations, some
will exceed two standard deviations as a result of
pure chance even though the true values in the data-
generating process are zero

 The Q-statistic can be used to test whether a group


of autocorrelations is significantly different from
zero

Topic 3 9
 Box and Pierce (1970) used the sample autocorrelations to
form the statistic
s
Q  T  rk2
k 1

 Under the null hypothesis that all values of rk=0, Q is


asymptotically χ2 distributed with s degrees of freedom
◦ The intuition is that high sample autocorrelations lead to large
values of Q
◦ A white-noise process would have a Q value of zero
◦ If the calculated value of Q exceeds the appropriate value in a χ2
table, we reject the null hypothesis
 Note that rejecting null means accepting an alternative that at least one
autocorrelation is not zero

Topic 3 10
 The problem with Box-Pierce Q-statistic
◦ Works poorly even in moderately large sample

 Ljung and Box (1978) reported superior small-


sample performance for the modified Q-statistic
calculated as
s
Q  T (T  2) rk2 /(T  k )
k 1
◦ If the sample value of Q exceeds the critical value of χ2 with
s degrees of freedom, then at least one value of rk is
statistically different from zero at the specified significance
level

Topic 3 11
 The Box-Pierce and Ljung-Box Q-statistics can be used to
check whether the residuals from an estimated
ARMA(p,q) model behave as a white-noise process

◦ Note that, when the s correlations from an estimated ARMA(p,q)


models are formed, the degrees of freedom are reduced by the
number of estimated coefficients

◦ Using the residuals of an ARMA(p,q) model, Q has a χ2 with (s-p-


q) degrees of freedom
 If a constant is included, the d.f. are (s-p-q-1)

 A common practice is to simply report residuals using s degrees of


freedom

Topic 3 12
 In analog to what we have learned in statistics course
◦ Adding additional lags for p and/or q will necessarily reduce the
sum of squares of the estimated residuals (reduces SSE, increases
SSR, and consequently increases R-square)

◦ Adding such lags entails the estimation of additional coefficients


and an associated loss of degrees of freedom
 Also reduces the forecasting performance of fitted model

 Trade off
◦ A reduction in the sum of squares of the residuals (SSE)
◦ A more parsimonious model

Topic 3 13
 Two commonly used model selection criteria are
◦ Akaike Information Criterion (AIC)

AIC  T ln( sum of squared residuals)  2n


◦ Schwartz Bayesian Criterion (SBC)

SBC  T ln( sum of squared residuals)  n ln(T)


◦ Note
 n is the number of parameters estimated (p+q+possible
constant term)
 T is the number of usable observations

Topic 3 14
 When you estimate a model using lagged variables, some
observations are lost
◦ To adequately compare the alternative models, T should be kept
fixed

◦ Otherwise, you will be comparing the performance over different


sample periods

◦ Decreasing T also decreases AIC and SBE


 The goal is not to select a model because it has the smallest number of
usable observations

 Example
◦ With 100 data points, estimate an AR(1) and AR(2) using only the
last 98 observations in each estimation
 Compare the two models using T=98

Topic 3 15
 Ideally, the AIC and SBC will be as small as possible
◦ Note that as the fit of the model improves, the AIC and SBC
will approach -∞ (can be negative)

◦ Model A is said to fit better than model B if the AIC (or SBC)
for A is smaller than for model B
 Estimate two models over the same time period to make them
comparable

Topic 3 16
AIC  T ln( sum of squared residuals)  2n

SBC  T ln( sum of squared residuals)  n ln(T)


 Increase the number of regressors will reduce SSE
◦ If a regressor has no explanatory power, adding it to the
model will cause both the AIC and SBC increase
 Note that the SSE will not be reduced

◦ When ln(T)>2
 The marginal cost of adding regressors is greater with the SBC
than with the AIC

Topic 3 17
 SBC has superior large-sample properties
◦ Let the true order of the data-generating process be
(p*,q*), and suppose that we use the AIC and SBC to
estimate all ARMA models of order (p,q)

◦ Both AIC and SBC will select models of orders greater than
or equal to (p*,q*) as the sample size approaches infinity

◦ However, the SBC is asymptotically consistent while AIC is


biased toward selecting an overparameterized model

Topic 3 18
 AIC work better than SBC in small samples

 We can be quite confident if both AIC and SBC select the


same model

 If AIC and SBC select different model, we need to


proceed cautiously
◦ SBC selects the more parsimonious model
 Check whether the residuals behave like white noise
◦ AIC selects overparameterized model
 The t-statistics of all coefficients should be significant at conventional
level

Topic 3 19
 In our textbook (page 41) and Eviews
AIC *  2 ln( L) / T  2n / T
SBC *  2 ln( L) / T  n ln(T ) / T
 For a normal distribution,
 2 ln( L)  T ln( 2 )  T ln( 2 )  (1 /  2 )( sum of squared residuals)

 We have the reduced forms of AIC and SBC


AIC *  ln(ˆ 2 )  2n / T AIC  T ln( sum of squared residuals)  2n
SBC *  ln(ˆ 2 )  n ln(T ) / T SBC  T ln( sum of squared residuals)  n ln(T)

Topic 3 20
n
1 1
2 
L( 0 , 1 ,  )  exp[  (Y     X ) 2
]
(2 2 ) 2
n/2 i 0 1 i
i 1

n n 1
ln L   ln 2  ln   2  (Yi   0  1 X i )2
2

2 2 2
1
 2 ln L  n ln 2  n ln   2  (Yi   0  1 X i )2
2

Topic 3 21
 We can prove that AIC and AIC* will select the same
model
AIC *  ln(ˆ 2 )  2n / T
AIC  T ln( SSE )  2n
 It is obvious that if T is fixed, AIC and AIC* are
equivalent and will select the same model
AIC  T [ln( SSE )  2n / T  ln(T  1)]  T ln(T  1)
SSE
 T (ln( )  2n / T )  T ln(T  1)
T 1
 T [ln(ˆ 2 )  2n / T ]  T ln(T  1)
 T  AIC *  T ln(T  1)

Topic 3 22
 A specific example
◦ A computer program was used to draw 100 normally
distrusted random errors (εt) with a theoretical variance
equal to unity (i.e.,1)
◦ Beginning with t=1, values of yt were generated using the
formula yt  0.7 yt 1   t and the initial condition y0=0
 Note that the problem of nonstationarity is avoided sine the
initial condition is consistent with long-run equilibrium

Topic 3 23
Topic 3 24
 In practice, we never know the true data-generating
process

 We can compare the sample ACF and PACF to those


of the various theoretical models
◦ The decaying pattern of the ACF and the single large spike
at lag 1 in the sample PACF suggested an AR(1) model
◦ The first three autocorrelations are r1=0.74, r2=0.58 and
r3=0.47
 Greater than the theoretical values of 0.7, 0.72, 0.73
◦ In PACF, there is a sizable spike of 0.74 at lag 1, and all
other partial autocorrelations are very small

Topic 3 25
 Under the null hypothesis of an MA(0) process
◦ The standard deviation of r1 is T-1/2=0.1
◦ The sample value of r1=0.74 is more than seven standard
deviations from zero
 We can reject the null hypothesis that r1 equals 0

◦ The standard deviation of r2 is var( r2 )  [1  2(0.74)2 ] / 100  0.021


◦ The sample value of r2=0.58 is more than three standard
deviations [note that (0.021)1/2=0.1449] from zero
 We can reject the null hypothesis that r2 equals 0

◦ Similarly, we can check the significance of other values of


autocorrelations

Topic 3 26
 All partial autocorrelations (except for lag 12) are less
than 2T-1/2
 The decay of the ACF and the single spike of the PACF
give the strong impression of a first-order
autoregressive model

 However, if we did not know the true underlying model,


and happened to use monthly data, we might be
concerned with the significant partial autocorrelation at
lag 12
◦ We are expecting some direct relationship between yt and yt-12

Topic 3 27
 Let’s compare the following two models
Model 1 : yt  a1 yt 1   t
Model 2 : yt  a1 yt 1   t  12 t 12

0.795

-0.035

2.706 2.732 2.272 2.778

Topic 3 28
 The coefficient of Model 1 satisfies the stability
condition |a1|<1, and has a low standard error
(t>12)
 Diagnostic test
◦ We plot the correlogram of the residuals of the fitted model

Topic 3 29
 The Ljung-Box Q-statistics of these residuals
indicate
◦ as a group, 1 through 8, 1 through 16, and 1 through 24 are
not significantly different from zero

 This is strong evidence that the AR(1) Model “fits”


the data well

Topic 3 30
 For model 2
◦ Estimates for the first-order autoregressive coefficient and
the associated standard error yield similar results

◦ The estimate for β12 is of poor quality


 It is insignificant and should be dropped from the model

Topic 3 31
 Comparing AIC and SBC values of both models
◦ Any benefit of a reduced SSE is overwhelmed by the
detrimental effects of estimating an additional parameter

 Conclusion
◦ We should select AR(1)

Topic 3 32
 A specific example
◦ A computer program was used to draw 100 normally
distrusted random errors (εt) with a theoretical variance
equal to unity (1)
◦ Beginning with t=1, values of yt were generated using the
formula yt  0.7 yt 1   t  0.7 t 1 and the initial condition
y0=0 and ε0=0

Topic 3 33
Topic 3 34
 If the true data-generating process is unknown, we
can consider some other models that will generate
similar ACF and PACF

Model 1 : yt  a1 yt 1   t
Model 2 : yt  a1 yt 1   t  1 t 1
Model 3 : yt  a1 yt 1  a2 yt 1   t

Topic 3 35
Topic 3 36
 All of the estimated values of a1 are highly
significant
◦ At least eight standard deviations from zero
◦ AR(1) is appropriate

 The Q-statistics for Model 1 indicate that there is


significant autocorrelation in the residuals
◦ The estimated ARMA(1,1) model does not suffer from this
problem

 Both AIC and SBC select model 2 over model 1

Topic 3 37
 The coefficient estimations point out the AR(1)

 The Q-statistic at 24 lags indicates that these two


models do not suffer from correlated residuals

 The Q-statistic in Model 3 indicates serial


correlation in the residuals of Model 3
◦ AR(2) model does not capture the short-term dynamics as
well as the ARMA(1,1)

◦ Both AIC and SBC select Model 2

Topic 3 38
 A specific example
◦ A computer program was used to draw 100 normally
distrusted random errors (εt) with a theoretical variance
equal to unity (1)
◦ Beginning with t=1, values of yt were generated using the
formula yt  0.7 yt 1  0.49 yt 2   t and the initial condition
y0=0 and y1=0

Topic 3 39
ACF for Y3

+- 1.96/T^0.5
0.4

0.2

-0.2

-0.4

0 5 10 15 20
lag

PACF for Y3

+- 1.96/T^0.5
0.4

0.2

-0.2

-0.4

0 5 10 15 20
lag

Topic 3 40
Topic 3 41
Topic 3 42
 Overall, the model appears to be adequate
◦ The AR(2) coefficients are unable to capture the
correlations at very long lags
◦ For example, the calculated Ljung-Box statistic for 16 lags is
significant at 10% level (p<0.10)

 We might need to consider the following model

yt  a1 yt 1  a2 yt 1   t  16 t 16

Topic 3 43
Topic 3 44
Topic 3 45
 Both AIC and SBC point out that the model including
a MA-16 term has a better fit

 The diagnostic test of residuals also indicate a better


fit of model with a MA-16 term

 If the researcher does not know the true data-


generating process, model 2 will be selected
◦ The data-generating process includes a moving average
term at lag 16

Topic 3 46
 A useful check is to split the sample into two parts
◦ If a coefficient is present in the data-generating process, its
influence should be seen in both sub-samples

◦ We can split the 100 sample points into two sub-samples


 The 1st sub-sample: 1-50
 The 2nd sub-sample: 51-100

Topic 3 47
Topic 3 48
Topic 3 49
Topic 3 50
Topic 3 51
 In both sub-samples
◦ The significance levels of Q(16) can not be maintained
◦ In other words, the correlation at lag 16 is not meaningful

 Conclusion: AR(2) is a better fit

 Note
◦ Most sophisticated practitioners warn against trying to fit any
model to the very long lags
◦ In small samples, a few “unusual” observations can create the
appearance of significant autocorrelations at long lags
◦ The more general point is
 We always need to be wary of our estimated models

Topic 3 52
 We have some general idea about selecting the
model through the AR(1) ARMA(1,1) and AR(2)
examples

 Box and Jenkins popularized a three-stage method


aimed at selecting an appropriate model for the
purpose of estimating and forecasting a univariate
time series

Topic 3 53
 The researcher visually examines the time plot of the
series, the ACF and PACF
◦ Plotting the time path the {yt} sequence provides useful
information concerning outliers, missing values, and structural
breaks in the data

◦ Nonstationary variables may have a pronounced trend or appear


to meander without a constant long-run mean or variance

◦ Missing values and outliers can be corrected at this point

◦ A comparison of the sample ACF and PACF to those of various


theoretical ARMA processes may suggest several plausible
models

Topic 3 54
 Each of the potential models are examined
◦ The various αi and βi coefficients are estimated and
examined

◦ The goal is to select a stationary and parsimonious model


that has a good fit

Topic 3 55
 Various diagnostic tests can be performed into to
ensure that the residuals from the estimated model
mimic a white-noise process

Topic 3 56
 A fundamental idea in the Box-Jenkins approach is the
principle of parsimony
◦ Parsimony should come as second nature to economists

◦ Incorporating additional coefficients will necessarily increase fit


by reducing SSE and increase R-square
 But at a cost of reducing degrees of freedom

◦ Box and Jenkins argue that parsimonious models produce better


forecasts than overparameterized models

 A parsimonious model fits the data well without


incorporating any needless coefficients
◦ The aim is to approximate the true data-generating process

Topic 3 57
 In selecting an appropriate model, we need to be
aware that several different models may have
similar properties
◦ As an extreme example, the AR(1) model yt  0.5 yt 1   t
◦ Has the equivalent infinite-order moving average
representation of
yt   t  0.5 t 1  0.25 t 2  0.125 t 3  0.0625 t 4  ...
◦ In most samples, approximating this MA(∞) with an MA(2)
or MA(3) model will give a very good fit
 However, the AR(1) model is the more parsimonious model
and is preferred

Topic 3 58
 Suppose we want to fit the model
(1  a1L  a2 L2 ) yt  (1  1L  2 L2  3L3 ) t

 Also suppose that


(1  a1L  a2 L2 )  (1  cL)(1  aL)
(1  1L  2 L2  3L3 )  (1  cL)(1  b1L  b2 L2 )

 Note that (1  cL ) is the common factor

Topic 3 59
 A numerical example
(1  0.25L2 ) yt  (1  0.5L) t 
(1  0.5L)(1  0.5L) yt  (1  0.5L) t 
yt  0.5 yt 1   t

 In practice, the polynomials will not factor exactly


◦ However, if the factors are similar, you should try a more
parsimonious form

Topic 3 60
 In order to ensure the model is parsimonious, the
various αi and βi coefficients should all have t-
statistics of 2.0 or greater (p<0.05)

 The coefficients should not be strongly correlated


with each other
◦ Multicollinear problem will lead to unstable estimation
◦ Usually one or more can be eliminated from the model
without reducing forecast performance

Topic 3 61

You might also like