0% found this document useful (0 votes)
13 views35 pages

Lecture 2

labor

Uploaded by

brianmfula2021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views35 pages

Lecture 2

labor

Uploaded by

brianmfula2021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Lag selection, forecasting and ARDL models

Chikumbe, SE
21st June, 2023

Econometrics II
Department of Economics, Kwame Nkrumah University

1 / 35
Recap of last week

• OLS assumptions
• Zero conditional mean

• No serial correlation

• Static models → yt = β 0 + β 1 xt + ut

• Distributed Lag (DL) models → yt = β 0 + β 1 xt + β 2 xt −1 + ut

• Autoregressive (AR) models → yt = β 0 + β 1 yt −1 + ut

2 / 35
Roadmap for this week

• AR and DL model lag selection

• Autoregressive Distributed Lag (ARDL) model

• One period ahead forecasting and forecast errors

3 / 35
Lag selection

4 / 35
Putting the p in AR(p)

• We’ve looked at AR(1), AR(2), AR(3) etc.

• How do we know the correct order of an AR model?

• Why is it important to choose the correct order?

• The order, p, directly influences the zero conditional mean assumption

5 / 35
AR order and the zero conditional mean assumption

• We implement an AR(2) model: yt = β 0 + β 1 yt −1 + β 2 yt −2 + ut

• We find the coefficients on β 1 and β 2 to be highly statistically significant

• Zero conditional mean: E (ut |yt −1 , yt −2 ) = 0

• Now we implement an AR(3) model: yt = β 0 + β 1 yt −1 + β 2 yt −2 + β 3 yt −3 + ut

• We find β 3 to also be highly statistically significant: E (ut |yt −1 , yt −2 ) ̸= 0

• Estimating the AR(3) however: E (ut |yt −1 , yt −2 , yt −3 ) = 0

6 / 35
Choosing the p in AR(p)

• How do we know the correct order of an AR model?

• Three approaches
1. F-statistic approach

2. Bayes Information Criterion (BIC)

3. Akaike Information Criterion (AIC)

7 / 35
F-statistic approach

• Start with a model with as many lags as possible

• Test the significance of the last lag

• Do this until the last lag becomes significant at the 95% confidence level

• Advantage: intuitive

• Drawback: it will suggest too many lags, some of the time


• Assume the true AR order is 5, so that the sixth coefficient is zero

• A test using the t-statistic will incorrectly reject the null 5% of the time

• → when the true value of p is five, this method will estimate p to be six 5% of the time.

8 / 35
Autoregressive Model of % ∆ ZAR/USD

Dependent Variable: % ∆ZAR/USD


AR(1) AR(2) AR(3)
% ∆ZAR/USD t −1 0.307∗∗∗ 0.343∗∗∗ 0.344∗∗∗
(0.051) (0.054) (0.054)

% ∆ZAR/USD t −2 −0.109∗∗ −0.116∗∗


(0.054) (0.057)

% ∆ZAR/USD t −3 0.023
(0.055)

Intercept 0.004∗∗ 0.005∗∗ 0.004∗∗


(0.002) (0.002) (0.002)

R2 0.094 0.106 0.105

9 / 35
Bayes Information Criterion (BIC)

 
SSR (p ) ln(T )
BIC (p ) = ln + (p + 1)
T T

• SSR (p ) → the sum of squared residuals of the AR(p) model


• SSR = ∑ni=1 (ûi )2 → how much variation is not explained by our regression

• (p + 1) → number of variables used, plus 1 for the intercept

ln(T )
• T → log of the number of observations / number of observations

10 / 35
Bayes Information Criterion (BIC)

 
SSR (p ) ln(T )
BIC (p ) = ln + (p + 1)
T T

• SSR decreases as we add more AR terms


ln(T )
• (p + 1) T → increases as we add more AR terms

• BIC trades off these two forces

• We are interested in finding p̂, the value of p which minimizes BIC (p ) among possible
choices of p

11 / 35
Bayes Information Criterion (BIC)

12 / 35
Akaike Information Criterion (AIC)
 
SSR (p ) 2
AIC (p ) = ln + (p + 1)
T T

• ln(T ) is replaced by 2, in the second term → second term is smaller in the AIC than
the BIC

• In a dataset with 1000 observations


• ln(1000) = 6.9 which is more than 3 times larger than 2

• → a smaller decrease in the SSR is needed to justify including another AR term

• In large samples, AIC will overestimate p̂ with nonzero probability

• Still useful, if you are concerned there are too few lags suggested by the BIC
13 / 35
Comparing the 3 approaches
Dependent Variable: % ∆ZAR/USD
AR(1) AR(2) AR(3) • F-Statistic approach → AR(2)
% ∆ZAR/USD t −1 0.307∗∗∗ 0.343∗∗∗ 0.344∗∗∗
(0.051) (0.054) (0.054) • AIC → AR(2)
% ∆ZAR/USD t −2 −0.109∗∗ −0.116∗∗
(0.054) (0.057) • BIC → AR(1)
% ∆ZAR/USD t −3 0.023
(0.055)

Intercept 0.004∗∗ 0.005∗∗ 0.004∗∗


(0.002) (0.002) (0.002)

R2 0.094 0.106 0.105


AIC -6.774 -6.782 -6.744
BIC -6.752 -6.749 -6.730

14 / 35
Autoregressive Distributed Lag (ARDL)
models

15 / 35
Types of time series models

• Static models
• yt = β 0 + β 1 xt + ut

• Distributed lag models (DL)


• yt = β 0 + β 1 xt + β 2 xt −1 + ut

• Autoregressive models (AR)


• yt = β 0 + β 1 yt −1 + ut

• Autoregressive distributed lag models (ARDL)


• yt = β 0 + β 1 yt −1 + β 2 xt −1 + ut

16 / 35
ARDL models

yt = β 0 + β 1 yt −1 + β 2 yt −2 + σ1 xt −1 + σ2 xt −2 + ut

• ARDL(p, q) where p = order of AR terms and q = order of DL terms

• The example above is therefore an ARDL(2,2) model

• Combines the benefits of AR and DL models

• Zero conditional mean assumption


• → at time t, the error term is independent of every explanatory variable (y & x), in every
period

17 / 35
ARDL models

yt = β 0 + β 1 yt −1 + β 2 yt −2 + σ1 xt −1 + σ2 xt −2 + ut

• A note on ARDL convention


• When using ARDL models, the convention is to typically exclude contemporaneous DL
terms (no xt )

• → does not allow for contemporaneous effects

• Why? In time series, the focus is often on forecasting, so values in time t are never known

• However, including xt is perfectly fine

18 / 35
ARDL example: ARDL Model of the %∆ ZAR/USD

Dependent Variable: %∆ ZAR/USD


ARDL(1,1) ARDL(2,2)
%∆ ZAR/USDt −1 0.211∗∗∗ 0.221∗∗∗
(0.074) (0.081)

%∆ ZAR/USDt −2 −0.132∗
(0.079)

%∆ ALSIt −1 −0.253∗∗∗ −0.261∗∗∗


(0.064) (0.065)

%∆ ALSIt −2 −0.072
(0.070)

Intercept 0.007∗∗ 0.008∗∗∗


(0.003) (0.003)

R2 0.155 0.175

19 / 35
ARDL models and lag selection
   
SSR (k ) ln(T ) SSR (k ) 2
BIC (k ) = ln + (k + 1) AIC (k ) = ln + (k + 1)
T T T T
• k = p+q

• This can result in many different models needing to be tested


• k = 4, with p = 1 and q = 3

• k = 4, with p = 2 and q = 2

• k = 4, with p = 3 and q = 1 etc.

• Convention (but not a requirement!): set p = q


• If you do this however, think about implications for zero conditional mean

20 / 35
ARDL lag selection example: ARDL Model of the %∆ ZAR/USD
ARDL(1,1) ARDL(2,2)
%∆ ZAR/USDt −1 0.211∗∗∗ 0.221∗∗∗
(0.074) (0.081)

%∆ ZAR/USDt −2 −0.132∗
(0.079)

%∆ ALSIt −1 −0.253∗∗∗ −0.261∗∗∗


(0.064) (0.065)

%∆ ALSIt −2 −0.072
(0.070)

Intercept 0.007∗∗ 0.008∗∗∗


(0.003) (0.003)

R2 0.155 0.175
AIC -6.681 -6.676
BIC -6.624 -6.582

21 / 35
Forecasting

22 / 35
Forecasting is a core part of time series econometrics

• “What is your best forecast of next month’s unemployment rate?”

• Critical for policy making, planning, investment decisions etc.

• Given dependence in time series data, AR models are the workhorse model for
forecasting

23 / 35
Forecasting is a core part of time series econometrics

Figure: National Treasury forecasts of active vs passive debt stablization policies

24 / 35
Forecasting is a core part of time series econometrics

• Contrast this with the cross sectional domain


• Example: predict test scores in a cross section vs predicting test scores in a time series

• Time series models can be used to forecast, even if none of the coefficients have a
causal interpretation

25 / 35
Forecast Error
• We want to forecast yt +1 based off an AR(1) model

• yt +1 = β 0 + β 1 yt

• Since t + 1 is into the future, the true values of β 0 and β 1 are unknown

• → we use the OLS estimators β̂ 0 and β̂ 1 from historical data as a proxy

• We then seek to estimate yt +1 conditional on information at time t

• ŷt +1|t = β̂ 0 + β̂ 1 yt

• Forecast error: yt +1 − ŷt +1|t


• The difference between the realized value of yt +1 and the forecasted value

26 / 35
A forecast is not a predicted value

• From lecture 2: the predicted value of y is ŷ

• Formally, ŷi = βˆ0 + βˆ1 xi

• Predicted values are calculated for for observations in the sample used to estimate the
regression

• Forecasts are made for a date that exists outside of the sample

27 / 35
The forecast error is not an OLS residual

• From lecture 2: the residual of a regression is the difference the actual value for y and
the predicted value for y , ŷ for observations in the sample

• Formally, ûi = yi − ŷi = yi − βˆ0 − βˆ1 xi

• The forecast error is the difference between the future value of y, which is not
contained in the sample, and the forecast of that future value

• “Out-of-sample” versus “In-sample”

28 / 35
Forecasting with an AR(1) model

%∆ERt = 0.004 + 0.311%∆ERt −1

• Our forecast equation becomes

%∆ERt +1|t = 0.004 + 0.311%∆ERt

• %∆ERt = −0.023

• → %∆ERt +1|t = 0.004 + 0.31 ∗ −0.023 = −0.003

29 / 35
Forecasting with an AR(1) model

→ %∆ERt +1|t = 0.004 + 0.31 ∗ −0.023 = −0.003

• You are now told that %∆ERt +1 = −0.056

ˆ t +1|t = −0.056 − (−0.003) = −0.053


• Forecast error: ERt +1 − ER

• Our forecast overpredicts the realized value by -0.053

30 / 35
Forecast uncertainty

• Forecast error can be divided into two parts


1. Uncertainty about the regression coefficients

2. Uncertainty about the future value of ut

• If there are few coefficients, (2) > (1)

• We’ll introduce the Root Mean Squared Forecast Error (RMSFE) that incorporates
both (1) and (2)

31 / 35
Root Mean Squared Forecast Error (RMSFE)

• Start with an ARDL(1,1): yt = β 0 + β 1 yt −1 + β 2 xt −1 + ut

• Forecast: Ŷt +1|t = βˆ0 + βˆ1 yt + βˆ2 xt

• Forecast error: yt +1 − ŷt +1|t = ut +1 − ( βˆ0 − β 0 ) + ( βˆ1 − β 1 )yt + ( βˆ2 − β 2 )xt


 

 
MSFE = E yt +1 − ŷt +1|t
= σu2 + var ( βˆ0 − β 0 ) + ( βˆ1 − β 1 )yt + ( βˆ2 − β 2 )xt
 


• The RMSFE is then simply MSFE

32 / 35
Root Mean Squared Forecast Error (RMSFE)

• Forecast error can be divided into two parts


1. Uncertainty about the regression coefficients

2. Uncertainty about the future value of ut

• If there are few coefficients, (2) > (1)

• We’ll introduce the Root Mean Squared Forecast Error (RMSFE) that incorporates
both (1) and (2)

• If (1) > (2), then the RMSFE ≈


p
var (ut )

33 / 35
Forecast interval

• Similar in spirit to a confidence interval

• One major difference1


• Confidence interval: coefficient ± 1.96 * standard error

• Confidence interval is justified by CLT and therefore holds for a wide range of
distributions of ut

• The forecast error contains

1 If
this sounds a bit tricky, revisit your Chapter 4 lectures with Safia to refresh your grasp of confidence
intervals, hypothesis tests and the central limit theorem
34 / 35
Forecasting interest rates


Figure: Fan chart of interest rate forecast from the SARB

35 / 35

You might also like