0% found this document useful (0 votes)
84 views128 pages

Var Korea2012

This document discusses vector autoregressive (VAR) models. VAR models specify that a vector of variables is determined by lags of itself plus an error term. They have advantages over simple regression models in that all variables are endogenous. The Wold theorem states that any time series can be represented as the accumulated discounted effect of past news or innovations. VAR models can be used to estimate these dynamic effects. Identification of structural shocks from reduced form VARs is challenging due to the uncertainty surrounding the underlying structural model.

Uploaded by

SERGIO REQUENA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views128 pages

Var Korea2012

This document discusses vector autoregressive (VAR) models. VAR models specify that a vector of variables is determined by lags of itself plus an error term. They have advantages over simple regression models in that all variables are endogenous. The Wold theorem states that any time series can be represented as the accumulated discounted effect of past news or innovations. VAR models can be used to estimate these dynamic effects. Identification of structural shocks from reduced form VARs is challenging due to the uncertainty surrounding the underlying structural model.

Uploaded by

SERGIO REQUENA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 128

VAR models

Fabio Canova
EUI and CEPR
December 2012
Outline

• Preliminaries.

• Wold Theorem and VAR Specification.

• Coefficients and covariance matrix estimation.

• Computing impulse responses, variance and historical decompositions.

• Structural VARs.

• Application to the transmission of monetary policy shocks.

• Specification and interpretation problems.


References

Hamilton, J., (1994), Time Series Analysis, Princeton University Press, Princeton, NJ,
ch.10-11.

Canova, F., (1995), ”VAR Models: Specification, Estimation, Inference and Forecasting”,
in H. Pesaran and M. Wickens, eds., Handbook of Applied Econometrics, Ch.2, Blackwell,
Oxford, UK.

Canova, F., (1995), ”The Economics of VAR Models”, in K. Hoover, ed., Macroecono-
metrics: Tensions and Prospects, Kluwer Press, NY, NY.

Blanchard, O. and Quah, D. (1989), ”The Dynamic Effect of Aggregate Demand and
Supply Disturbances”, American Economic Review, 79, 655-673.

Canova, F. and Pina, J. (2005) ”Monetary Policy Misspecification in VAR models”, in C.


Diebolt, and Krystou, C. (eds.) New Trends In Macroeconomics, Springer Verlag.
Canova, F. and De Nicolo, G (2002), ” Money Matters for Business Cycle Fluctuations
in the G7”, Journal of Monetary Economics, 49, 1131-1159.

Canova, F. and Ciccarelli, M. (2009), ”Estimating Multi-country VARs”, International


Economic Review, 50, 929-961.

Canova, F. and Paustian, M. (2011), ”Business cycle measurement with some theory”,
forthcoming, Journal of Monetary Economics.

Carlstrom, C. Fuerst, T. and M. Paustian (2009) ”Monetary Policy Shocks, Choleski


identification and DNK models”, Journal of Monetary Economics, 56, 1014-1021.

Cooley, T. and Dwyer, M. (1998), ”Business Cycle Analysis without much Theory: A
Look at Structural VARs, Journal of Econometrics, 83, 57-88.

Favara, G. and Giordani, P. (2009), ”Reconsidering the role of money for output prices
and interest rates Journal of Monetary Economics, 56, 419-430.
Faust, J. (1998), ” On the Robustness of Identified VAR Conclusions about Money” ,
Carnegie-Rochester Conference Series on Public Policy, 49, 207-244.

Faust, J. and Leeper, E. (1997), ”Do Long Run Restrictions Really Identify Anything?”,
Journal of Business and Economic Statistics, 15, 345-353.

Fry, R. and Pagan, A. (2009), ”Sign restrictions in structural vector autoregressions: A


Critical Review”, forthcoming, Journal of Economic Literature.

Hansen, L. and Sargent, T., (1991), ”Two Difficulties in Interpreting Vector Autore-
gressions”, in Hansen, L. and Sargent, T., (eds.), Rational Expectations Econometrics,
Westview Press: Boulder London.

Kilian, L. (1998), ”Small Sample confidence Intervals for Impulse Response Functions”,
Review of Economics and Statistics, 218-230.

Kilian, L. and Murphy, D.(2011), ”Why agnostic sign restrictions are not enough? Under-
standing the dynamics of oil market VAR models”, forthcoming,Journal of the European
Economic Association.
Leeper,M.E.,and Roush,J.E.,(2003), ”Putting ’M’ back in monetarypolicy”,Journal of
Money, Credit, and Banking, 35,1257—1264.

Lippi, M. and Reichlin, L., (1993), ”The Dynamic Effect of Aggregate Demand and
Supply Disturbances: A Comment”, American Economic Review, 83, 644-652.

Lippi, M. and Reichlin, L., (1994), ”VAR Analysis, Non-Fundamental Representation,


Blaschke Matrices”, Journal of Econometrics, 63, 307- 325.

Lanne, M and Lutkepohl, H. (2008), ”Identifying monetary policy shocks via changes in
volatility”, Journal of Money, Credit and Banking, 40, 1131-1149.

Marcet, A. (1991), ”Time Aggregation of Econometric Time Series ”, in Hansen, L. and


Sargent, T., (eds.), Rational Expectations Econometrics, Westview Press: Boulder &
London.

Rigobon, R. (2003) ”Identification through heteroskedasticity”, Review of Economics and


Statistics, 85, 777-792.
Rubio, J., Waggonner, D. and T. Zha (2010) Structural Vector Autoregressions Theory
of identification and Algorithms for inference, Review of Economic Studies, 77, 665-696.
Sims, C. and Zha, T. (1999), “Error Bands for Impulse Responses”, Econometrica, 67,
1113-1155.
Chari, V., Kehoe, P. and McGrattan, E. (2008) ”Are Structural VARs with Long run
restrictions useful in developing Business cycle Theory”, Journal of Monetary Economics,
55, 1137-1352.
Fernandez Villaverde, J., Rubio Ramirez, J., Sargent, T. and M. Watson (2007) The ABC
and (D’s) to understand VARs, American Economic Review, 97, 1021-1026.
Uhlig, H. (2005) What are the Effects of Monetary Policy? Results from an agnostic
Identification procedure, Journal of Monetary Economics, 52, 381-419.
Erceg, C, Guerrieri, L. and Gust, C. (2005) Can long run restrictions identify technology
shocks?, Journal of the European Economic Association, 3, 1237-1278.
Dedola, L. and Neri, S. (2007), ”What does a technology shock do? A VAR analysis with
model-based sign restrictions”, Journal of Monetary Economics, 54, 512-549.
1 Preliminaries

• Lag operator: ℓyt = yt−1; ℓiyt = yt−i, ℓ−iyt = yt+i, yt m × 1 vector.

• Matrix lag operator (a0 = I normalization):

a(ℓ)yt ≡ a0yt + a1ℓyt + a2ℓ2yt + .......aq ℓq yt


= yt + a1yt−1 + a2yt−2 + .......aq yt−q (1)

Example 1 yt = et + d1et−1 + d2et−2. Using the lag operator yt = et +


d1ℓet + d2ℓ2et or yt = (1 + d1ℓ + d2ℓ2)et ≡ d(ℓ)et.

Example 2 yt = a1yt−1 + et. Using the lag operator yt = a1ℓyt + et or


(1 − a1ℓ)yt ≡ a(ℓ)yt = et.
2 What are VARs?

- They are multivariate autoregressive linear time series models of the form

yt = A1yt−1 + A2yt−2 + . . . + Aq yt−q + et et ∼ (0, Σe) (2)


where

- yt is a m × 1 vector

- Aj are full rank, m × m matrices each j = 1, . . . q

- Σe is a full rank, m × m matrix.


Advantages of VARs over simple regression models:

- Every variable is endogenous (no incredible exogeneity assumptions).

- Every variable depends on the others (no incredible exclusion restrictions).

- Simple to estimate and use.

General disadvantages:

- It is a reduced form model; no economic interpretation of the dynamics


is possible.

- Potentially difficult to relate VAR dynamics with DSGE dynamics (which


have an ARMA structure). It is not always possible to approximate a n-
dimensional ARMA(p1, p2) with a m-dimensional VAR(q) where n ≥ m.
- If reference models are DSGEs why aren’t we using VARMA models?

1) Inference in VARs easier.

2) VARMA models have identification and numerical problems (likelihood


function may not be globally concave).
Example 3

yt = εt − φεt−1 εt ∼ N(0, σ2) (3)


1
yt = εt − εt−1 εt ∼ N(0, σ̃2) (4)
φ
have same likelihood function. Identification problem!

Example 4
yt = αyt−1 + ǫt − θǫt−1 (5)
The likelihood is flat for α ≈ θ. Lack of identifiability creates numerical
problems, especialy in multivariate setups.
3 Wold theorem and the news

Wold Theorem: Under linearity and stationarity, any vector of time series
† †
yt can be represented as yt = ay−∞ + ∞j=0 Dj et−j , where y−∞ contains
information about yt† known in the infinite past (i.e. constants) and a is
a m × k matrix of coefficients; et−j are the news at t − j, Dj are m × m
matrices each j.

- Let yt ≡ yt† − ay−∞. Wold theorem tells us that, apart from initial
conditions, time series are the discounted accumulation of news.

- A news et = 1 at time t has D0 effect on yt, D1 effect on yt+1, D2


on yt+2, etc.. Hence yt is a moving average (MA) of the news, i.e. yt =
D(ℓ)et.

- If linearity is not assumed, the representation is yt = f1(y−∞) + j
f2(et−j ), where f1 and f2 are non-linear functions.

- If stationarity is not assumed, a and Dj will depend on t, i.e. yt has a


MA representation with time varying coefficients.

If Ft−1 is the information available at t − 1, the news are defined by:

et ≡ yt − E[yt|Ft−1] (6)
where the notation E[yt|Ft−1] indicates the mathematical conditional ex-
pectation of yt.
Two issues

a) News are unpredictable given the past (E(et|Ft−1) = 0) but are con-
temporaneously correlated (Σe) is not diagonal).

To give a name the news in each equation, need to find a matrix P̃ such
that P̃ P̃ ′ = Σe. Then:

yt = D(ℓ)P̃ P̃ −1et = D̃(ℓ)ẽt ẽt ∼ (0, P̃ −1ΣeP̃ −1 = I) (7)
Examples of P̃: Choleski (lower triangular) factor; P̃ = PΛ0.5; where P is
the eigenvector matrix, Λ the eigenvalue matrix, etc.
1 4 1 4
Example 5 If Σe = its Choleski factor is P̃ = so that
4 25 0 3
P̃ −1et ∼ (0, I).

b) The news are not uniquely defined.

For any H such that HH′ = I

yt = D(ℓ)et = D(ℓ)HH′et = D̃(ℓ)ẽt (8)


and E(ete′t) = E(ẽtẽ′t) = Σe.
• Standard packages choose the ”fundamental” representation: i.e. the
one for which D0 is the largest among all the Dj coefficients. Formally:
det[D0E(et, e′t)D0′ ] > det[Dj E(et−j e′t−j )Dj′ ], ∀j.

Why are fundamental representations chosen?

- In fundamental representations the information present in yt’s equals the


information present in et’s.

- In fundamental representation, the effect of current news is the largest.

• If Dj decays with j, the importance of past news smaller than importance


of current news. Makes sense. Some economic models (e.g. models where
news are anticipated - see later on) do not have this property and imply
non-fundamental representations.
VARs

• If the Dj coefficients decay to zero ”fast enough”, D(ℓ) is invertible and

yt = D(ℓ)et
D(ℓ)−1yt = et
yt = A(ℓ)yt−1 + et (9)
where I − A(ℓ) equivD(ℓ)−1 and A(ℓ) is of infinite length.

• A VAR(∞) can represents any vector of time series yt under linearity,


stationarity and invertibility.

• A VAR(q), q fixed, approximates yt well if Dj are close to zero for j


large.
Summary

- We can represent any yt with a linear VAR(∞) under certain assumptions.

- With a finite sample of data need to carefully select the lag length of the
VAR (recall: news can’t be predictable).

- If we want a constant coefficient VAR, we need stationarity of yt. If


non-stationarities are present a VAR representation exists, but with time
varying coefficients.
4 Classical Specification

Selecting the lag length of a VAR

A) Likelihood ratio (LR) test


LR = 2[ln L(αun, Σun re re
e ) − ln L(α , Σe )] (10)
D
= T (ln |Σre un 2
e | − ln |Σe |) → χ (ν) (11)
where L is the likelihood function, ”UN”(”RE”) denotes the unrestricted
(restricted) estimator, ν = number of restrictions of the form R(α) = 0.

• LR test biased in small samples. If T small, it is better to use


LRc = (T − qm)(ln |Σre| − ln |Σun|)
where q = number of lags, m = number of variables.
• Sequential testing approach (”general-to-specific”)

1) Choose an upper q̄

2) Test a VAR(q̄ − 1) against a VAR(q̄), if do not reject

3) Test a VAR(q̄ − 2) against a VAR(q̄ − 1)

4) Continue until rejection.

LR is an in-sample criteria. What if we want a VAR for out-of-sample


purposes?
Let Σy (1) = T +mq
T Σe (q =number of lags, m = size of yt, T sample size)
.

2qm2
B) AIC criterion: minq AIC(q) = ln |Σy (1)|(q) + T

• AIC is inconsistent. It overestimates true order q with positive probability.

ln T
C) HQC criterion: minq HQC(q) = ln |Σy (1)|(q) + (2qm2) ln T

• HQC is consistent (in probability).

D) SWC criterion: minq SW C(q) = ln |Σy (1)|(q) + (qm2) lnTT

• SWC is strongly consistent (a.s.).


• Criteria B)-D) trade-off the fit of the model (the size of Σe) with the
number of parameters of the model (m2 ∗ q), for a given sample size T .
Hence, criteria B)-D) prefer smaller scale to larger scale models.
Criterion T=40 T=80 T=120 T=200
q=2 q=4 q=6 q=2 q=4 q=6 q=2 q=4 q=6 q=2 q=4 q=6
AIC 1.6 3.2 4.8 0.8 1.6 2.4 0.53 1.06 1.6 0.32 0.64 0.96
HQC 0.52 4.17 6.26 1.18 2.36 3.54 0.83 1.67 2.50 0.53 1.06 1.6
SWC 2.95 5.9 8.85 1.75 3.5 5.25 1.27 2.55 3.83 0.84 1.69 2.52
Table 1: Penalties of AIC, HQC, SWC, m=4

- Penalties increase with q and fall with T . Penalty of SWC is the harshest.

- Ivanov and Kilian (2006): Quality of B)-D) depends on the frequency of


data and on the DGP. Overall, HQC is the best.
• Criteria A)-D) must be applied to the system not to single equations.

Example 6 VAR for the Euro area, 1980:1-1999:4; use output, prices, in-
terest rates and M3, set q̄ = 7.
Hypothesis LR LRc q AIC HQC SWC
q=6 vs. q=72.9314e-5(∗) 0.1447 7 -7.556 -6.335 -4.482
q=5 vs. q=6 3.6400e-4 0.1171 6 -7.413 -6.394 -4.851
q=4 vs. q=5 0.0509 0.5833 5 -7.494 -6.675 -5.437
q=3 vs. q=4 0.0182 0.4374 4 -7.522 -6.905 -5.972
q=2 vs. q=3 0.0919 0.6770 3 -7.635(∗)-7.219(∗) -6.591
q=1 vs. q=2 3.0242e-7 6.8182e-3(∗) 2 -7.226 -7.012 -6.689(∗)
Table 2: Tests for the Lag length of a VAR

• Different criteria choose different lag lenghts!.


Checking Stationarity

All variable stationary/ all unit roots → easy.

Some cointegration. Transform VAR into VECM.

• Impose cointegration restrictions.

• Disregard cointegration restrictions.

Data is stationary. Can’t see it because of small samples.

If Bayesian: stationarity/nonstationarity issue does not matter for infer-


ence.
Checking for Breaks

Wald test: yt = (A1(ℓ)I1)yt−1 + (A2(ℓ)I2)yt−1 + et

I1 = 0 for t ≤ t1; I1 = 1 for t > t1 and I2 = 1 − I1.

D
Use S(t1, T ) = T (ln |Σre un 2
e | − ln |Σe |) → χ (ν); ν = dim(A1(ℓ)) (Andrew
and Ploberger (1994)).

If t1 unknown, but belongs [tl , tu] compute S(t1, T ) for all the t1 in the
interval. Check for breaks using maxt1 S(t1, T ).
Other specification issues

- Include constant? Constant and trend? Dummies?

- Small scale vs. large scale VAR? Core VAR plus rotating variables?

y
- Variables in ratios vs. variables in levels (i.e. used n and per-capita n)?

- Maximize the sample size or look for stable periods?


5 Alternative Representation of VAR(q)

Consider
yt = A(ℓ)yt−1 + et (12)
yt, et m × 1 vectors; et ∼ (0, Σe).

Different representations are useful for different purposes.

- Companion form useful for computing moments, ML estimators.

- Simultaneous equation setup useful for evaluating the likelihood and com-
puting restricted estimates.
5.1 Companion form

• Transform a m-variable VAR(p) into a mp-variable VAR(1).

Example 7 Consider a VAR(3). Let Yt = [yt, yt−1, yt−2]′; Et = [et, 0, 0]′;

   
A1 A2 A3 Σe 0 0
A =  Im 0 0 



ΣE =  0 0 0

0 Im 0 0 0 0
Then the VAR(3) can be rewritten as

Yt = AYt−1 + Et Et ∼ N(0, ΣE ) (13)


where Yt, Et are 3m × 1 vectors and A is 3m × 3m.
5.2 Simultaneous equations setup (SES)

There are two alternative representations:

1) Let xt = [yt−1, yt−2, . . .]; X = [x1, . . . , xT ]′ (a T × mq matrix), Y =


[y1, . . . , yT ]′ (a T ×m matrix); and if A = [A′1, . . . A′q ]′ is a mq ×m matrix

Y = XA + E (14)

2) Let i indicate the subscript for the i − th column vector. The equation
for variable i is yi = xαi + ei. Stacking the columns of yi, ei into where
mT × 1 vectors we have

y = (Im ⊗ x)α + e ≡ Xα + e (15)


6 Parameters and covariance matrix estimation

6.1 Unrestricted VAR(q)

Assume that y−q+1, . . . , y0 are known and et ∼ N(0, Σe) then

yt|(yt−1, . . . , y0, y−1, y−q+1) ∼ N(A(ℓ)yt−1, Σe) (16)


∼ N(A′1Yt−1, Σe) (17)
where A′1 is the first row of A (m × mq). Let α = vec(A1).
Since f (yt|yt−1, . . . , y−q+1) = j f (yj |yj−1, . . . , y−q+1 )
ln L(α, Σe) = ln L(yj |yj−1, . . . , y−q+1)
j
Tm T
= − ln(2π) + ln |Σ−1
e |
2 2
1
− (yt − A′1Yt−1)′Σ−1 ′
e (yt − A1Yj−1) (18)
2 t
∂ ln L(α,Σe)
Setting ∂α = 0 we have
T T
A′1,M L = [ ′ −1
Yt−1Yt−1] [ Yt−1yt′ ] = A′1,OLS (19)
t=1 t=1
and j-th column (a 1 × mq vector) is
T
A′1j,ML =[ ′ −1
Yt−1Yt−1] [ Yt−1yjt] = A′1j,OLS
t t=1
Why is OLS equivalent to maximum likelihood?

- If the initial conditions are known, maximizing the log-likelihood is equiv-


alent to minimizing the sum of square errors!

Why is it that single equation OLS is the same as full information maximum
likelihood?

- There are the same regressors in every equation! A VAR is a SUR system.
Plugging A1,ML into ln L(α, Σe), we obtain the concentrated likelihood

T −1 1 T ′
ln L(Σe) = − (m ln(2π) + ln |Σe |) − et,MLΣ−1
e et,ML (20)
2 2 t=1
∂(b′Qb) ∂ ln |Q|
where et,ML = (yt − A1,MLYt−1). Using ∂Q = b′b; ∂Q = (Q′)−1
∂ ln L(Σe) T Σ′ − 1 T e ′
we have ∂Σ = 2 e 2 t=1 t,M L et,M L = 0 or

1 T
Σ′ML = et,MLe′t,M L (21)
T t=1

and σi,i′ = T1 T e′ e′
t=1 i t,ML it,ML.

1
Σ′ML = Σ′OLS = T −1 T e ′
t=1 t,MLet,M L but equivalent for large T .
6.2 VAR(q) with restrictions

Assume restrictions are of the form α = Rθ +r, where R is mk ×k1 matrix


of rank k1; r is a mk × 1 vector; θ a k1 × 1 vector.

Example 8 i) Lag restrictions: Aq = 0. Here k1 = m2(q − 1), r = 0, and


R = [Im1 , 0].

ii) Block exogeneity of y2t in a bivariate VAR(2). Here


R = blockdiag[R1, R2], where Ri, i = 1, 2 is upper triangular.

iii) Cointegration restrictions.


Plugging the restrictions in (15) we have

y = (Im ⊗ x)α + e = (Im ⊗ x)(Rθ + r) + e


Let y † ≡ y − (I ⊗ x)r = (I ⊗ x)Rθ + e. Since ∂ ∂θ
ln L = R ∂ ln L :
∂α

θML = [R′(Σ−1 ′ −1 −1
e ⊗ x x)R] R[Σe ⊗ x]y
† (22)
αML = R θML + r (23)
1
Σ′e = eMLe′M L (24)
T t
Alternative method

Assume the restrictions are of the form Rα = r where R is a mk × mk


matrix and r a mk × 1 vector.

Write the VAR as Yt = A0 + A1Yt−1 + Et, Et ∼ N(0, ΣE ) where Yt, Et


are mk × 1 vectors and A is mk × mk.

If we minimize t EtΣE −1Et subject to the restriction, we get a restricted


VAR estimator.

Let W = [1, Y−1], α = vec([A′0, A′1]′), S = (ΣE ⊗ W ′W ), α̂ = I ⊗


(W ′W )−1(W ′Y).
The objective function is

min(α − α̂)′S(α − α̂) + λ(Rα − r) (25)

FOC: S(α − α̂) = Rλ.

Since Rα = r = Rα̂ + RS −1R′λ′ the (restricted) estimator is

α = α̂ + S −1R′(RS −1R′)−1(r − Rα) (26)


Summary

• For a VAR(q) without restrictions:

- ML and OLS estimators of A1 coincide.

- OLS estimation of A1, equation by equation, is consistent and efficient


(if assumptions are correct).

- OLS and ML estimators of Σe asymptotically coincide for large T .


• For a VAR(q) with restrictions:

- ML estimator of A1 is different from the OLS estimator.

- ML is consistent/efficient if restrictions are true. It is inconsistent if


restrictions are false.

In general:

- OLS consistent if stationarity assumption is wrong (t-tests incorrect).

- OLS inconsistent if lag length wrong (regressors correlated with error


term).
7 Bayesian Specification, estimation, inference

- No lag specification, no unit root issue. Choose a generous lag length to


make sure residuals are white noise. Use the prior to reduce overparame-
terization of the model (e.g. Litterman prior).

- Specify a prior distribution for the coefficients and the covariance matrix
of the shocks. Prior can be subjective or objective (based on a training
sample).

- Combine prior and likelihood to construct a posterior distribution for the


parameters (which are random variables not unknown quantitites).
- The likelihood of a VAR is the product of a normal distribution for the
regression parameters, conditional on the OLS estimator, the covariance
matrix and the data, and an inverted Wishart for the elements of the
covariance matrix of the shocks, i.e g(β, Σ) = N(β|β OLS , Σ, y)IW (Σ|y).

- If the prior has similar format i.e. g(β, Σ) = N(β|Σ)IW (Σ) (conjugate
prior), the posterior will preserve the Normal-IW format.
- If the prior are conjugates, posterior moments are simple.
β̃ = Ω̃(Ω̄−1β̄ + X ′X β̂) (27)
Ω̃ = (Ω̄−1 + X ′X)−1 (28)

• Conditional posterior mean (variance) of the coefficients is a linear com-


bination of prior mean (variance) and sample mean (variance) with weights
given by precision of the two informations.

- Marginal posterior mean of the covariance matrix is Σ̃ = B̂ ′X ′X B̂ +


B̄ ′Ω̄−1B̄ + Σ̄+(y −X B̂)′(y −X B̂)− B̃(Ω̄−1 +X ′X)B̃ where β = vec(B).

• Marginal of β is t with parameters (Ω̃−1, Σ̃e, B̃, T + ν̄).

• If the prior g(β, Σ) is diffuse (proportional to a constant), the posterior


will be equal to the likelihood. Bayesian and classical estimators coincide!
Tight Prior Loose Prior
2.0 POSTERIOR
2.0 POSTERIOR
PRIOR PRIOR

1.5 1.5

1.0 1.0

0.5 0.5

0.0 0.0
-4 -2 0 2 4 -4 -2 0 2 4
- Inference: report the posterior mean and posterior standard error.

- Report credible sets (different from confidence intervals).

- To test hypotheses (e.g. β 1 = 0), check if the posterior distribution of


β 1 contains the zero value.

- To test a model against another use Bayes factors, i.e. the ratio of the
marginal likelihood of the two models.

- Marginal likelihood of model M1: L(β, σ|y, M1)p(β, Σ|M1)dβdΣ.

• If sample is large posterior and likelihood inference are the same.


8 Summarizing the results

Unusual to report estimates of VAR coefficients, standard errors and R2.

- Most of VAR coefficients insignificant.

- R2 always exceeds 0.99.

How do we summarize results in an informative way?


8.1 Impulse responses (IR)

• What is the effect of a surprise cut in interest rates on inflation? In a two


variable VAR, with inflation being the first variable the contemporaneous
effect is D012, the effect at lag 1 is D112, and the effect at lag q is Dq12.

• It traces out the MAR of yt (i.e. the Dj coefficients).

Three ways to calculate impulse responses:

- Recursive approach.

- Non-recursive (companion form) approach.

- Forecast revisions (see e.g. Canova- Ciccarelli (2009, IER)).


• Recursive method.

i,i′
Assume we have an estimate of the coefficients Aj . Then Dτ = [Dτ ] =
max[τ ,q]
j=1 Aτ −j Dj , where τ refers to the horizon, D0 = I, Dj = 0 ∀ j ≥
q.

Example 9 Suppose yt = ȳ + A1yt−1 + A2yt−2 + et et ∼ N(0, 1). Then


applying the formula we have D0 = I, D1 = D0A1, D2 = D1A1 +
D0A2, . . ., Dk = Dk−1A1 + Dk−2A2 + . . . + Dk−q Aq .

Note: if news are not orthogonal let P̃eP̃e′ = Σe. Then D̃0 = D0P̃e.
• Companion form method:

Yt = AYt−1 + Et
t−1
= AtY0 + Aj Et−j (29)
j=0
t−1
= AtY0 + Ãj Ẽt−j (30)
j=1

where Ãj = Aj PE , Ẽt = P−1 ′


E Et, PE PE = ΣE ;

(29) is valid for non-orthogonal news and (30) for orthogonal ones.
• 1-step ahead revision of the forecast Yt+τ .

EtYt+τ = Aτ Yt; Et−1Yt+τ = Aτ +1Yt−1. Hence

Revt(τ ) = EtYt+τ − Et−1Yt+τ = Aτ [Yt − AYt−1] = Aτ Et


Suppose ei′t = 1; ei′,τ = 0, τ > t; ei,t = 0 ∀i = i′
Then Revt,i′ (1) = A′i; Revt,i′ (2) = A2i′ ; Revt,j (τ ) = Aτi′
(this is the i′ − th column of A).

Hence, response of Yit+τ to a shock in ei′t is (i,i’) element of τ -steps


ahead revision in the forecasts.
Sometimes useful to calculate multipliers to the news.

• Long run multiplier D(1) = (A0 + A1 + . . . + Aq )−1

• Partial multipliers, up to horizon τ , are ( τj=0 Aj )−1.


8.2 Variance decomposition: τ -steps ahead forecast error

• How much of the variance of output is due to supply shocks?

It uses:
τ −1
yt+τ − yt(τ ) = D̃j ẽt+τ −j D0 = I (31)
j=0
yt(τ ) is the τ -steps ahead prediction of yt based on the VAR.

Computes share of the variance of yi,t+τ − yi,t(τ ) due to each ẽi′,t+τ −j ,


i, i′ = 1, 2, . . . , m.
8.3 Historical decomposition

• What is the contribution of supply shocks to the productivity revival of


the late 1990s?

Let ŷi,t(τ ) = yi,t+τ − yi,t(τ ) be the τ -steps ahead forecast error in the i-th
variable of the VAR. Then:
m ′
ŷi,t(τ ) = i
D̃ (ℓ)ẽi′t+τ (32)
i′=1

- Computes the path of ŷi,t(τ ) due to each ẽi′ .

• Same ingredients are needed to compute impulse responses, the variance


and the historical decompositions. Different packaging!!
Example 10 US data for (Y,π, R, M1) for 1973:1-1993:12. Othogonalize
using a Choleski decomposition. What is the effect of a money shock?

response to a shock in money


1.0
gnp
prices
interest
money

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4
0 5 10 15 20
What is the contribution of various shocks to var(y) and var(π)?
Y π
HorizonShock1Shock2Shock3Shock4Shock1Shock2Shock3Shock4
4 0.99 0.001 0.003 0.001 0.07 0.86 0.01 0.03
12 0.93 0.01 0.039 0.02 0.24 0.60 0.08 0.07
24 0.79 0.01 0.15 0.04 0.52 0.36 0.07 0.04
Table 3: Variance decomposition, percentages
Historical decomposition of gnp
Shocks in gnp Shocks in interest
4.55 4.55

4.50 4.50

4.45 4.45

4.40 4.40

4.35 4.35

4.30 4.30

4.25 4.25
variable variable
baseline baseline
shocks shocks
4.20 4.20
1975 1976 1977 1978 1975 1976 1977 1978

Shocks in prices Shocks in money


4.55 4.55

4.50 4.50

4.45 4.45

4.40 4.40

4.35 4.35

4.30 4.30

4.25 4.25
variable variable
baseline baseline
shocks shocks
4.20 4.20
1975 1976 1977 1978 1975 1976 1977 1978
9 Identification: Obtaining SVARs

9.1 Why Structural VARs

VARs are reduced form models. Therefore:

• Shocks are linear combination of meaningful economic disturbances.

• Difficult to relate responses computed from VARs with the responses


obtained from theoretical models.

• Can’t be used for certain policy analyses (Lucas critique).


What is a SVAR? It is a linear dynamic model of the form:

A0yt = A1yt−1 + . . . + Aq yt−q + εt εt ∼ (0, Σε) (33)


Its reduced form is:

yt = A1yt−1 + . . . + Aq yt−q + et et ∼ (0, Σe) (34)


where Aj = Aj A−1
0 , et = A−1
0 εt.

We want to go from (34) to (33), since (34) is easy to estimate (just use
OLS equation by equation). To do this, we need A0. To estimate A0, we
need restrictions, since Aj , Σe have less free parameters than A0, Aj , Σε.

Restrictions should be derived from economic theory !!!

Distinguish: Stationary vs. nonstationary VARs.


9.2 Stationary VARs

VAR : yt = A(ℓ)yt−1 + et et ∼ (0, Σe) (35)

SVAR : A0yt = A(ℓ)yt−1 + ǫt ǫt ∼ (0, Σǫ = diag{σi}) (36)


Note: log linearized DSGE models are stationary SVARs! Solution is:

y2t = A22y2t−1 + A21y3t (37)


y1t = A11y2t−1 + A12y3t (38)
where y2t are states, y1t are controls, y3t are shocks. So
−1 −1
A21 0 A21 0 A22 0
A0 = , A(ℓ) =
0 A12 0 A12 A11 0
(35) and (36) imply
A−1
0 ǫt = et (39)
so that
−1 ′ −1
A0 ΣǫA0 = Σe (40)

To recover unknown parameters in A0 from (40) we need at least as many


equations as unknowns.

• Order condition: If the VAR has m variables, need m(m − 1)/2 restric-
tions. This is because there are m2 free parameters on the left hand side
of (40) and only m(m + 1)/2 parameters in Σe (m2 = m(m + 1)/2 +
m(m − 1)/2).

• Rank condition: (see Hamilton, 1994, p.332-335).


- Exactly identified vs. overidentified (number of restrictions larger than
m(m − 1)/2).

- Rank and order conditions are valid only for ”local identification” (need
to be checked at one specific point).

Example 11 i) Choleski decomposition of Σe has exactly m(m−1)/2 zeros


restrictions. Note A−10 is lower triangular and variable i does not affect
variable i − 1 simultaneously, but it affects variable i + 1.

ii) yt = [GDPt, Pt, it, Mt]. Then we need at



least 6 restrictions for local
1 0 0 0
 α 1 0 α02 
 01 
identification, e.g.  
 0 0 1 α03 
α04 α05 α06 1
Rubio, Waggoner and Zha (2009) provide sufficient conditions for global
identification.

- They are valid for exactly or just identified models.

- They are valid for SVAR restrictions which are linear or nonlinear in the
parameters.

- When the system is exactly identified a necessary and sufficient condition


for global identification is that the rank of Qj (call it qj ) is equal to
m − j, j = 1, 2, . . . , m, where Qj is a matrix composed of zero and ones
indicating whether the elements of a column of [A0 A1]′ are restricted or
not, and A1 is the first row of the companion form of the system.
 
0 a12 a13
 
Example 12 Let A0 =  0 a22 a23 . To check globally identification
a31 0 a33
we need to rewite the three zero restrictions into 0 and 1 elements of the
matrices Qj , j = 1, 2, 3, where Qj is a 3 × 3 matrix of rank qj .

In the first column there are two restrictions, in the positions 1


 and 2.
1 0 0
 
Therefore the matrix Q1 has the following form Q1 =  0 1 0 
0 0 0

In the second column there is one restriction, in the position


 3. Therefore
0 0 1
 
the matrix Q2 has the following form Q2 =  0 0 0
0 0 0
In the third column there are
 no restrictions.
 Therefore the matrix Q3 has
0 0 0
 
the following form: Q3 =  0 0 0 
0 0 0

Since q1 = 2, q2 = 1, q0 = 0, the SVAR is globally identified.


 
a11 0 a13
 
Example 13 (local vs. global identification) Let A0 =  a21 a22 0 .
0 a32 a33

Here m(m − 1)/2 = 3: order condition satisfied; rank of Hamilton matrix


(at a11 = a22 = a33 = 1, a13 = a21 = a32 = 2) is 6: rank condition
satisfied. SVAR is locally identified. However, since q1 = q2 = q3 = 1:
SVAR is not globally identified.
 Can also show this in another way. Let
2/3 1/3 −1/3
 
P =  −1/3 2/3 2/3 . Note that P P = I. If we set a11 = a22 =
2/3 −1/3 2/3
a33 = 1, a13 = a21 = a32 = 2 then A0 = A0∗ = A0P . Since we have
found an observationally equivalent A0 the SVAR is not globally identified.

The SVAR is not globally identified because the restrictions are not ”ap-
propriately” placed.
How do you estimate a SVAR? Use a two-step approach:

- Get (unrestricted) estimates of A(ℓ) and Σe.

- Use restrictions on A0 to estimate Σǫ and free parameters of A0.

- Use A(ℓ) = A−1


0 A(ℓ) to trace out structural dynamics.

Unless the system is in Choleski format, we need ML to estimate A0 in


just identified systems (see appendix).

For over-identified systems, always need ML to estimate A0.

Alternative IV approach interpretation.


Example 14 (Blanchard and Perotti, 2002) VAR with (Tt, gt, yt). Assume
that the SVAR is A0yt = A(ℓ)yt−1 + Bǫt ǫt ∼ (0, Σǫ = diag{σi})
Here A0et = Bǫt where
   
1 0 a01 1 b1 0
   
A0 =  0 1 a02  B =  b2 1 0
a03 a04 1 0 0 1
Impose that there is no discretionary response of Tt and gt shocks to yt
within the quarter (last column of B has zeros in first two positions).
Information delays.

We have a total 6+3 (variance) parameters to estimate. At most 6 para-


meters in Σe. Need additional restrictions. Get information about a01, a02
from external sources; further impose either b1 = 0 or b2 = 0

With a01, a02 fixed, two stage approach has a IV interpretation: e1t, e2t are
predetermined used as instruments for structrual shocks in third equation.
9.3 Nonstationary VARs

Let VAR and SVAR be:


∆yt = D(ℓ)et = D(1)et + D∗(ℓ)∆et (41)
∆yt = D(ℓ)A0ǫt = D(ℓ)(1)A0ǫt + D∗(ℓ)A0∆ǫt (42)
D(ℓ)−D(1)
where D(ℓ) = (I −A(ℓ)ℓ)−1, D(ℓ) = (1−A(ℓ)ℓ)−1, D∗(ℓ) ≡ 1−ℓ ,
D(ℓ)−D(1)
D∗(ℓ) ≡ 1−ℓ . Matching coefficients: D(ℓ)A0ǫt = D(ℓ)et.

Separating permanent and transitory components and using for the latter
only contemporaneous restrictions we have
D(1)A0ǫt = D(1)et (43)
A0∆ǫt = ∆et (44)
If yt is stationary, D(1) = D(1) = 0 and (43) is vacuous.
Two types of restrictions to estimate A0: short and long run.

Example 15 In a VAR(2) imposing (43) requires one restriction. Suppose


that D(1)12 = 0 (ǫ2t has no long run effect on y1t). If Σǫ = I, the three
elements of D(1)A0ΣǫA′0D(1)′ can be obtained from the Choleski factor
of D(1)ΣeD(1)′.

• Blanchard-Quah: decomposition in permanent-transitory components


(use (43)-(44)). If yt = [∆y1t, y2t], (m × 1); y1t are I(1); y2t are I(0) and
yt = ȳ + D(ℓ)ǫt, where ǫt ∼ iid(0, Σǫ)
∆y1t ȳ1 D1(1) (1 − ℓ)D1† (ℓ)
= + ǫt + ǫt (45)
∆y2t 0 0 (1 − ℓ)D2† (ℓ)
and D1(1) = [1, 0].

-y2t is any set of stationary variables which is influenced by both shocks.


Identification through heteroskedasticity

Idea: var(et) = Σ1, t = 1, . . . , T1, var(et) = Σ2, t = T1 + 1, . . . , T ,

- Lutkepohl (1996, Chapter 6.1.2): There exists a W and a diagonal Ω


with typical element ωi > 0, i = 1, 2, . . . , m such that Σ1 = W W ′ and
Σ2 = W ΩW ′.

• W is a full matrix. It is unique up to sign changes if ω i are distinct.

• Ω incorporates volatility changes (if one ω i = 1 there is a change in


volatility) - shocks are normalized to 1 in the first sample to ω i in the
second.

• Restrict the impact effect W to be unchanged.


• If A−1
0 = W all shocks of the system are identified.

• Under the above restrictions, and if the variance of the shocks changes
once at know date, shocks can be identified without economic restrictions
(!!). If economic restrictions exist, they become overidentifying and can
be tested

• If more than two regimes, variance changes provides overidentification


restrictions (see Rigobon (2003)).

• Lanne-Lutkepohl (2008): Markov switching structure in the variance of


the shocks. Same idea applies.
Example 16

pt = βyt + ǫ1t (46)


yt = αpt + ǫ2t (47)
(48)
where E(ǫ1tǫ2t) = 0. The covariance matrix of [pt, yt]′ is
v11 v12 1 β 2σ22 + σ 21 βσ 22 + ασ 21
V ≡ =
v12 v22 (1 − αβ)2 βσ22 + ασ21 σ 22 + α2σ21
Thus there are three free elements in V and 4 structural parameters
(α, β, σ21, σ22) - the system is underidentified.

Suppose σ21, σ22 depend on the state. Let s = 1, 2. Then


1 β 2σ221 + σ211 βσ221 + ασ211
V1 =
(1 − αβ)2 βσ221 + ασ 211 σ221 + α2σ 211
1 β 2σ222 + σ212 βσ222 + ασ212
V2 =
(1 − αβ)2 βσ222 + ασ 212 σ222 + α2σ 212
There are six free elements in V1, V2 and six structural parameters (α, β, σ211,
σ212σ221, σ 222) - system just identified by order condition!!

If we have three variance regimes, we have (3 × 3 = 9) reduced form


parameters and 8 structural parameters (3 × 2 structrual variances, α, β)
- system over-identified.

Important: to identify all the parameters we need:

1) α and β to be unchanged.

2) The variance of both shocks to be changing (if there are two regimes).
Problems with standard identification approaches

- Two equally reasonable Choleski systems - which one to choose?

Example 17
pt = a11est (49)
yt = a21est + a21edt (50)
Price is set prior to knowing demand shocks. Choleski ordering with p first.

Equivalent to estimating p on lagged p and lagged y (this gives est) and


then estimating y on lagged y, on current and lagged p (this gives edt).

yt = a11est (51)
pt = a21est + a21edt (52)
Quantity set prior to knowing demand shocks. Choleski ordering with y
first.

Equivalent to estimating y on lagged y and lagged p (this gives est) and


then estimating p on lagged p, on current and lagged y (this gives edt).

Without a structural model in mind difficult to choose between Choleski


systems.

- Cooley-LeRoy (1985): unless some strong restrictions are imposed dy-


namic models do not have a Choleski structure.
- Long run restrictions 1: Faust-Leeper (1997).

Long run restrictions

1.2 1.2

0.8 0.8

0.4 0.4

-0.0 -0.0

-0.4 -0.4

-0.8 -0.8

-1.2 -1.2
5 10 15 20 5 10 15 20
Restriction not satified Restriction satified
- Long run restrictions 2: Cooley-Dweyer (1998): take a RBC driven by a
unit root technology shock. Simulate data. Run a VAR with(yt, nt) and
identify two shocks (permanent/transitory).Possible to do this. Transitory
shocks explain large portion of variance of yt.

- Long run restrictions 3: Erceg, et. al (2005): long run restrictions poor
in small samples. Chari, et. al. (2006) potentially important truncation
bias due to the estimation of a VAR(q), q finite.
- Short run restrictions: Canova-Pina (2005), Fuerst, Carlstrom, Paustian
(2009).

The DGP is a 3 equations New-Keynesian model

True responses Inertial responses


Summary

- Problematic to relate SVAR identified with Choleski, short or long re-


striction to theories.

- Need to link SVAR to theory better: use restrictions which are more
common in DSGE models (and are robust).
9.4 Alternative identification Scheme

Canova-De Nicolo’ (2002), Faust (1998), Uhlig (2005): use sign (and
shape) restrictions.

Example 18 i) Aggregate supply shocks: Y ↑, Inf ↓; aggregate demand


shocks: Y ↑, Inf ↑ → demand and supply shocks impose different
sign restrictions on cov(Yt, INFs). Restrictions shared by a large class of
models with different foundations. Use these for identification.

ii) Monetary Shocks: response of Y is humped shaped, dies out in 3-4


quarters → shape restrictions on cov(Yt, is). Use these for identification.
- Exploits the non-uniqueness of news.

- Practical implementation of sign restrictions (Canova-De Nicolo’(2002)):

• Orthogonalize Σe = P̃ P̃ ′ (e.g. Choleski or eigenvalue-eigenvector de-


composition). Check if the shocks produce the required pattern for yit, i =
1, 2, . . .. If not

• For any H : HH′ = I, Σe = P̃HH′P̃ ′ = P̂ P̂ ′. Check if any shock


under new decomposition produced the required pattern for yit. If not
choose another H, and repeat.

• Stop when you find a εjt with the right characteristics or compute the
mean/ median (and s.e.) of the statistics of interest for all εljt satisfying
the restrictions, where l is the number of shocks found.
• Number of H infinite. Write H = H(ω), ω ∈ (0, 2π). H(ω) are called
rotation (Givens) matrices.

cos(ω) −sin(ω)
Example 19 Let M=2. Then H(ω) = or H(ω) =
sin(ω) cos(ω)
cos(ω) sin(ω)
. Varying ω, we trace out all possible structural MA
sin(ω) −cos(ω)
representations that could have generated the data.
9.5 Sign restrictions in large systems

• The use of rotation matrices is complicated in large scale systems since


there are many rotations one needs to consider.

- In a 4 variable VAR there are 6 bivariate H matrices which rotate shocks.


So for each orthogonal P̃ and each ω there are 6 different shocks to try.
But ω could be different for different H.

- High dimensional problem. Not feasible to explore the space of identifi-


cation this way.
Alternative: use a QR decomposition:

Algorithm 9.1 1. Start from some orthogonal representation yt = D(ℓ)ǫt

2. Draw an m × m matrix G from N(0,1). Find G = QR.

3. Compute responses as D′(ℓ) = D(ℓ)Q. Check if restrictions are satisfied.

4. Repeat 2.-3. until L draws are found.

Fast even in large dimensional systems.


Example 20 Comparing responses to US monetary shocks 1964-2001.
Sign restrictions Choleski restrictions
10 12
5
6
0

Output
-5 0
-10
-15 -6
0 0
6 10

0 5
Prices

-6 0

-12 -5
0 0
10 18
0
9
Money

-10
0
-20

-30 -9
0 0
Horizon (Months) Horizon (Months)
Example 21 Studying the effects of fiscal shocks in US states: 1950-2005.
corr(G,Y)corr(T,Y)corr(G, DEF)corr(T,DEF)corr(G,T)
G shocks >0 >0 >0
BB shocks <0 =0 =1
Tax shocks <0 <0 =0
Table 4: Identification restrictions
Discussion

- Sign restrictions are weak: they identify a portion of the parameter space
not a point. Large uncertainty in the results compared with standard
SVARs.

- May want to use additional information to reduce the range of acceptable


shocks. Kilian and Murphy (2011): in the oil market sign restrictions allow
unreasonable elasticity of supply and demand responses. Use sign plus
quantity restrictions on impact responses to shocks.

- Sign restrictions may be poor if shock signal is too weak (Canova and
Paustian (2011)).

- Performance of sign restrictions improves if more variables are restricted


and if more shocks are jointly identified (Canova and Paustian (2011)).
- Reporting the median response at different horizon problematic - see
Kilian and Inohue (2011) for an alternative.

- Use robust DSGE restrictions (see Dedola and Neri (2007), Pappa (2009),
Persmann and Straub (2009), Lippi and Nobili (2011)).

- No need to setup the likelihood or maximize it. Use simulation methods.

- Rank and order conditions do not apply here.

- Can run the algorithm conditioning on α = αML, Σ = ΣM L. Then


uncertainty in the bands only reflect identification uncertainty.

- Number of rejected draws tells us how easy (likely) is to find the restric-
tions in the data. Can be used as first test on the restrictions used.

- Recent review of this literature: Fry and Pagan (2009).


10 The transmission of monetary policy shocks

- Large body of evidence in US, UK, Europe. Crucial to know to understand


what we expect to see after a change in a policy controlled interest rate.

- Evidence is consistent, but not entirely uniform across countries.

- Many interesting puzzles are often generated.

Literature: Sims (1992, EER), Christiano et. al. (1999, Handbook of Mon-
etary Economics), Leeper and Roush (2003, JMCB), Favara and Giordani
(2009, JME) among others.
What does (standard New Keynesian) theory tells us about transmission
of MP shocks?

Responses in a basic NK model

- Money has no role in this model. Equilibrium output, inflation, interest


rate determined without any reference to money.
Important predictions:

1) After an interest rate increase, output and inflation should fall.

2) The largest output and inflation effects are instantaneous.

3) Shocks to money have no effects on output, inflation and the nominal


rate (weak version: no contemporaneous effect; strong version: they have
no effects at any horizon).

Do these implications hold in the data?


Use US data Leeper-Roush (2003):
Favara-Giordani (2009)
11 Interpretation problems with VARs

• Time Aggregation (Sargent-Hansen (1991), Marcet (1991)).

- Agents take decisions at a frequency which is different than the frequency


of the data available to the econometrician. What are the consequences?

- The MA representation of the econometrician is a complex combination


of the MA representation due to agents’ actions.
Example 22 A humped-shaped monthly response can be transformed into
a smoothly declining quarterly response.

1.4
monthly
quarterly
1.2

1.0

0.8
Size of responses

0.6

0.4

0.2

0.0

-0.2
0 2 4 6 8 10 12 14
Horizon (Months)

How to detect aggregation problems? Run a VAR with data at different


frequencies, if you can. Check if differences exists.
• Non-linearities

Example 23 (Markov switching model). Suppose P (st = 1|st−1 = 1) =


p, P (st = 0|st−1 = 0) = q. This process has a linear VAR representation
st = (1 − q) + (p + q − 1)st−1 + et
and as long as either p or q or both are less than one a MAR exist. Good!!!

But: errors are non-normal (binomial). Conditional on st−1 = 1


et = 1 − p with probability p (53)
= −p with probability 1 − p (54)
Conditional on st−1 = 0
et = −(1 − q) with probability q (55)
= q with probability 1 − q (56)
• How do you check for normality/ nonlinearities

- If et is normal:

S3 6 ∗ Im 0
T 0.5 ∼ N(0, )
S4 − 3 ∗ Im 0 24 ∗ Im

Sj is the j-th estimated moment of et

2 , log y
- Regress êt on yt−1 t−1, etc. Check significance.
• Stationarity is violated

Example 24 Great Moderation.

- Changes in the variance of the process are continuous. Can’t really use
subsample analysis.

- There exist a version of the Wold theorem without covariance stationarity.




yt = aty−∞ + Djtet−j
j=0
where var(et) = Σt.

• Use time varying coefficients VARs with e.g. stochastic volatility.


• Small Scale VARs. People use them because:

a) Estimates more precise

b) Easier to identify shocks. But generate:

- Omitted variables, Braun-Mittnik (1993).

- Misaggregation of shocks, Cooley-Dweyer (1998), Canova-Pina (2005).


What is the consequence of omitting variables?

A11(ℓ) A12(ℓ) y1t e1t


In a bivariate VAR(q): = , the univari-
A21(ℓ) A22(ℓ) y2t e2t
ate representation for y1t is
[A11(ℓ) − A12(ℓ)A22(ℓ)−1A21(ℓ)]y1t = e1t − A12(ℓ)A22(ℓ)−1e2t ≡ υt
(57)

Example 25 Suppose m = 4, estimate bivariate VAR; three possible mod-


υ1t e1t
els. The system with variables 1 and 3 has errors ≡ −
υ2t e3t
e2t A12(ℓ) A14(ℓ)
Ψ(ℓ)Φ(ℓ) where Ψ(ℓ) =
e4t A32(ℓ) A34(ℓ)
−1
A22(ℓ) A24(ℓ)
Φ(ℓ) = . Easy to verify that:
A42(ℓ) A44(ℓ)
• A true m variables VAR(1), is transformed into a VAR(∞) with distur-
bance υt if only m1 < m variables are used.

• If et’s are contemporaneously and serially uncorrelated, υt’s are contem-


poraneously and serially correlated unless Ψ(ℓ) is block diagonal.

• Two small scale VAR systems, both with m1 < m variables, have differ-
ent innovations.

• υt is a linear combination of current and past et’s. The timing of inno-


vations is preserved if and only if the m1 included variables are Granger
causally priori to the m − m1 omitted ones.
What is the problem of omitting shocks?

Aggregation theorem (Faust and Leeper (1997)): Structural MA for a


partition with m1 < m variables of the true VAR is

yt = D(ℓ)ǫt (58)
ǫt is m × 1 and Di is m1 × m ∀i. Suppose a researcher specifies a VAR
with m1 < m variables and obtains the MA:

yt = D̃(ℓ)et (59)
et is m1 × 1, and D̃i is m1 × m1 ∀i. Matching (58) and (59): D̃(ℓ)et =
D(ℓ)ǫt or letting D‡(ℓ) be a m1 × m matrix

D‡(ℓ)ǫt = et (60)
If there are ma shocks of one type and mb shocks of another type, ma +
mb = m and m1 = 2. Then

• eit, i = 1, 2 recovers a linear combination of shocks of type i′ = a, b only


if D‡(ℓ) is block diagonal.

• eit, i = 1, 2 recovers a linear combination of current shocks of type


i′ = a, b only if D‡(ℓ) = D‡, ∀ℓ and block diagonal.
Example 26 Suppose m = 4, m1 = 2, m2 = 2. Then
 
  ǫ1t
‡ ‡ ‡ ‡
D
 11(ℓ) D12(ℓ) D13(ℓ) D14(ℓ)  
 ǫ2t

 e1t
‡ ‡ ‡ ‡  =
D21(ℓ) D22(ℓ) D23(ℓ) D24(ℓ)  ǫ3t  e2t
ǫ4t

‡ ‡
- e1t recovers type 1 shocks if D13(ℓ) = D14(ℓ) = 0 and e2t recovers type
‡ ‡
2 shocks if D21(ℓ) = D22(ℓ) = 0.

‡ ‡
- e1t recovers current type 1 shocks if Dii′ (ℓ) = Dii′ , ∀ℓ i, i′ = 1, 2.
• Non-Wold decompositions (Lippi-Reichlin (1994), Leeper (1991), Hansen-
Sargent (1991), Leeper, Walker, Yang (2009)). Certain economic models
do not have a fundamental MAR representation, e.g., diffusion models;
models where agents anticipate tax changes.

Example 27 Hall consumption/saving problem.

Assume yt = et a white noise. Assume β = R−1 < 1 and quadratic


preferences. Solution for consumption: ct = ct−1 + (1 − R−1)et. Growth
rate of consumption has a fundamental representation.

If instead of ct, we observe savings st = yt − ct, the solution is


st − st−1 = R−1et − et−1 (61)
(61) is non-fundamental: the coefficient on et less than the coefficient on
et−1
Problem! Standard packages estimate

st − st−1 = ut − R−1ut−1 (62)


Different shape for the response to shocks!!

Same autocovariance generating function.

Same Likelihood function.


• Relationship DSGE models and VARs

Log-linearized solution of a DSGE model is of the form:


y2t = A22(θ)y2t−1 + A21(θ)y3t (63)
y1t = A11(θ)y2t−1 + A12(θ)y3t (64)
y2t = states and the driving forces, y1t = controls, y3t shocks. If both y2t
and y1t are observables, DSGE is a restricted VAR(1))

- If y2 are omitted, what is the representation of y1t?

• Three alternative results for reduced systems with only y1t

A true VAR(p) model is transformed in either a VAR(∞) or VARMA(p-


1,p-1) or a VARMA(p,p) depending on the assumptions made.

- Generally need a long number of lags to capture the dynamics of a DGSE


model. Problem is the data is short.
Example 28 Suppose

yt = kt + et (65)
kt = a1kt−1 + a0ǫt (66)
a1 persistence, a0 contemporaneous effect. If we observe both yt and kt
restricted VAR(1). No problem.

1−a1ℓ
If only yt is observable 1+a yt = et or
0−a1 ℓ
a0 a1 j
yt = ( ) yt−j + et (67)
1 + a0 j 1 + a0
a1 j
If a0 is small and a1 large ( 1+a ) will be large even for large j. Need
0
very long lag length to whiten residuals. If long run restrictions are used,
potentially important truncation bias.
Summary

- A system with reduced number of variables needs a very generous lag


length to approximate the dynamics of the true model.

- If sample size is short, problems may occur.

- Omitting a ”state” much more important than omitting a ”control”.

- Omission does not matter very much if the true model has dynamics
which die out quickly.

See: Chari, Kehoe, McGrattan (2006), Christiano, Eichenbaum and Vig-


fusson (2006), Fernandez et al.(2007), Ravenna (2007).
11.1 Forecasting with VAR (SVAR) models

yt = A(ℓ)yt−1 + et (68)
A0yt = A(ℓ)yt−1 + ǫt (69)

- Assume that we have point estimates of A(ℓ), A0, A(ℓ) and a distribution
(asymptotic, small sample or posterior) for these point estimates.

- Transform (68) and(69) into a companion form.

Yt = AYt−1 + Et (70)
A0Yt = AYt−1 + Υt (71)
• Unconditional forecast: Set Et+τ = 0(Υt+τ = 0), ∀τ > 0.

Then Yt+τ = Âτ Yt (Â0Yt+τ = ÂYt+τ −1).

• For fan charts (measuring uncertainty around point forecasts):

1. Draw Al (Al0, Al ) from the available distribution, compute Yt+τ


l , l=

1, 2, . . . , L, each horizon τ .

l
2. Order Yt+τ over l, each τ and extract required intervals (25-75, 16-84
or 2.5-97.5 percentiles).
• Conditional forecast 1: Manipulating shocks.

- This is the same as computing impulse responses (simply the impulse


may last longer than on period). Need to orthogonalize the disturbances
if you have structural scenarios in mind.

- Choose Ejt+τ = Ējt+τ , (Υjt+τ = Ῡjt+τ ) τ = 0, 1, 2, . . ., some j. Given


A (A0, A) find Yt+τ = AYt+τ −1 + Ejt+τ (A0Yt+τ = AYt+τ −1 + Υt+τ )
and let the system run as in unconditional forecasts after the impulse has
been exhausted.

To calculate uncertainty around the forecasted path, use same algorithm


employed for unconditional forecasts (i.e. draw A or A0, A from their
available distributions).
• Conditional Forecast 2: Manipulating endogenous variables.

- Separate yt = [ytA, ytB ] and set yt+τ


A = ȳ A , τ = 0, 1, 2, ..... Back out
1t+τ
A . With this path
from the path of Et+τ (Υt+τ ) needed to produce ȳt+τ
compute the path for yt+τB using (70) ((71).

- Potential identification problems. Many sources of shocks could produce


A
the required path for yt+τ

Example 29 Suppose that interest rates are (discretionarily) kept 50 basis


point higher than the endogenous Taylor rule would imply. What is the
effect on inflation? No identification problem: only a monetary shock enter
the rule.

Example 30 Suppose that oil prices are expected to be 10 percent higher


in the next two years. Here there is a problem: what has generated this
increase? Is it demand? Is it supply? Is it a combination of the two?
12 Exercises

1) Take quarterly data for output growth and inflation for your favorite country. Identify
supply and demand shocks by finding all the rotations which satisfy the following restric-
tions: supply ∆y ↑, Inf ↓, demand ∆y ↑, Inf ↑. How do impulse responses produced
by the rotations that jointly satisfy the restrictions look like? How do they compare with
those obtained using the restriction that only supply shocks affect output in the long run,
but both demand and supply shocks can affect inflation in the long run?
2) Consider the following New Keynesian model
1
xt = Et xt+1 + (it − Et πt+1) + v1t (72)
ϕ
πt = βπ t+1 + κxt + v2t (73)
it = φr it−1 + (1 − φr )(φπ πt + φxxt) + v3t (74)

where xt is the output gap, πt the inflation rate and Rt the nominal interest rate.
i) Plot impulse responses to the three shocks.
ii) Simulate 11000 data points from this model after you have set ϕ = 1.5, β = 0.99,
κ = 0.8, φr = 0.6, φπ = 1.2, φx = 0.2 ρ1 = 0.9, ρ2 = 0.9, σ1 = σ 2 = σ 3 = 0.1, and dis-
card the first 1000. With the remaining data estimate a three variable VAR. In particular
(i) estimate the lag length optimally, (ii) check if the model you have selected have well
specified residuals and (iii) whether you detect breaks in the specification or not.
iii) With the estimated model apply a Choleski decomposition in the order (y, π, R) and
check how the impulse responses compare with the true ones in i). Is there any noticeable
difference. Why?
iv) Now try the ordering (R, y, π) do you notice any difference with iii)? Why?

3) Obtain data for output and hours (employment) for your favorite country - each group
should use a different country. Construct a measure of labor productivity and run a VAR
on labor productivity and hours as in Gali (1999, AER) after you have appropriately se-
lected the statistical nature of the model. Identify technology shocks as the only source
of labor productivity in the long run. How much of the fluctuations in hours and labor
productivity do they explain at the 4 years horizon? Repeat the exercise using the restric-
tion that in response to technology shocks output and labor productivity must increase
contemporaneously. Are technology shocks a major source of cyclical fluctuations?
Appendix: Inference in SVARs

- The theory applies to Choleski, non-recursive and long run identifications.

- It applies to Classical or Bayesian inference (flat prior) .

• Find the ML estimators of Aj and A0 (this is enough to obtain the mode of Aj ).

• Find the posterior distribution of Aj and A0 (to get the posterior of Aj ).

If prior on Aj , A0 is non-informative and data abundant, shape of the likelihood is the


same as the shape of the posterior. In the other cases, Bayesian analysis is different from
classical one.
Assume Σǫ = I. The likelihood of the SVAR is

L(Aj , A0|y) ∝ |A−1 −1 −0.5T
0 A0 |

exp{0.5 (yt − A(ℓ)yt−1)′ (A−1 −1 −1
0 A0 ) (yt − A(ℓ)yt−1 )} (75)
t

= |A0|T exp{0.5 (yt − A(ℓ)yt−1)′ (A−1 −1 −1
0 A0 ) (yt − A(ℓ)yt−1 )}(76)
t

If there are no restrictions on Aj , A(ℓ)ML = A(ℓ)OLS and var(A(ℓ)ML) = A−1 −1
0 A0 ⊗
′ Y −1
(Yt−1 t−1 ) , Yt−1 = [yt−1 , . . . , yt−p ]. Nice, because easy to compute.

Using the estimator of A(ℓ)ML into the likelihood we have:


L(A(ℓ) = A(ℓ)ML, A0|y) ∝ |A0|T exp{0.5tr(SM LA′0A0)} (77)

where SML = (yt − A(ℓ)MLyt−1)′ (yt − A(ℓ)ML yt−1 )/T − k, k is number of regressors in
each equation, tr is the trace of the matrix.

Conclusion: (Two step approach):

a) Find A(ℓ)ML.

b) Maximize (77) to find A0.

c) Use Aj = Aj A0 to trace out structural dynamics.

Typically difficult to maximize analytically, need numerical routines (both for likelihood
and posterior computations).
Note if instead of conditioning on A(ℓ)ML we integrate it out we have:
L(A0|y) ∝ |A0|T −k exp{0.5tr(SMLA′0A0)}

If g(A0 )| ∝ |A0|k , g(A0|y) ∝ L(A0|y, A(ℓ) = A(ℓ)ML).

• Bayesian analysis with flat priors equivalent to classical analysis, conditional on A(ℓ)ML .
Summary

• Choleski identification, no restrictions on the VAR. Maximization of (77) implies that


A0A′0 = (SML/T )−1. Hence Â0 = chol((SML/T )−0.5). Nice shortcut.

• Non-recursive identification, no restrictions on the VAR. Need to maximize (77) (no


short cuts possible).
• Long run restrictions. Note that A(1)−1 = A(1)−1A−10 . If A(1)
−1 is lower triangular, A
0
′ −1 −1 −1
can be found using A0A0 = (SML/T ) where A(1)MLA0 is lower triangular. Solution
is
−1′ −1
Â0 = chol(A(1)−1
ML SML /T A(1)ML ) A(1)M L
−1

• If the long run system is not recursive, solution more complicated.

• If the system is over-identified, can’t use a two step approach. Need to jointly maximize
the likelihood function with respect to Aj , A0.

• With sign restrictions no maximization needed. Find the region of the parameter space
which satisfies the restrictions. Can do this numerically using a version of an acceptance
sampling algorithm .
Monte Carlo standard errors for impulse responses

If prior on Aj , A0 is non-informative posterior is proportional to the likelihood. The


likelihood of the VAR is the product of a normal for A(ℓ), conditional on A(ℓ)ML and
Σ−1 and a Wishart for Σ−1. Then the algorithm works as follows (Choleski system):

Algorithm 12.1
1. Draw Σ−1 from a Wishart, conditional on the data.

2. Set Al0 = chol(Σ−1)l .


3. Draw A(ℓ)l from a N(A(ℓ)M L, (Al0)−1(Al0)−1 ⊗ (Yt−1
′ Y −1
t−1 ) .

4. Set A(ℓ)l = A(ℓ)l Al0. Compute (A(ℓ)l )−1 (the MA of the model).

5. Repeat steps 1.-4. L times. Order draws and compute percentiles.


If the restrictions are not in Choleski format, substitute step 2. with the maximization of
the likelihood of A0. If the system is overidentified, need to use another approach (see
chapter 10). For long run restrictions use:

Algorithm 12.2

1. Draw Σ−1 from a Wishart, conditional on the data.


2. Set Al0 = chol(A(1)−1 l −1 −1 −1
M L Σ A(1)ML ) A(1)ML .


3. Draw A(ℓ)l from a N(A(ℓ)M L, (Al0)−1(Al0)−1 ⊗ (Yt−1
′ Y −1
t−1 ) .

4. Repeat steps 1.-4. L times. Order draws and compute percentiles.


For a system where sign restrictions are imposed the approach is easy. Just need draws
for Σ and A0. The algorithm is:

Algorithm 12.3

1. Choose a H such that HH ′ = I.

2. Draw Σ−1 from a Wishart, conditional on the data.

3. Set Al0 = Hsqrt(Σl ).


4. Draw A(ℓ)l from a N(A(ℓ)M L, (Al0)−1HH ′(Al0)−1 ⊗ (Yt−1
′ Y −1
t−1 ) .
5. Set A(ℓ)l = A(ℓ)l Al0. Compute (A(ℓ)l )−1 (the MA of the model). If column i of
A(ℓ)l )−1 satisfies the sign restriction, keep draw otherwise throw it.

6. Repeat steps 1.-5. until L draws are obtained. Order draws and compute median, mode,
mean, percentiles, etc.

• Could also randomize on H. Many H such that HH ′ = I. Could have a prior on Hs.
Since H does not enter the likelihood, posterior of H = prior of H.

You might also like