Var Korea2012
Var Korea2012
Fabio Canova
EUI and CEPR
December 2012
Outline
• Preliminaries.
• Structural VARs.
Hamilton, J., (1994), Time Series Analysis, Princeton University Press, Princeton, NJ,
ch.10-11.
Canova, F., (1995), ”VAR Models: Specification, Estimation, Inference and Forecasting”,
in H. Pesaran and M. Wickens, eds., Handbook of Applied Econometrics, Ch.2, Blackwell,
Oxford, UK.
Canova, F., (1995), ”The Economics of VAR Models”, in K. Hoover, ed., Macroecono-
metrics: Tensions and Prospects, Kluwer Press, NY, NY.
Blanchard, O. and Quah, D. (1989), ”The Dynamic Effect of Aggregate Demand and
Supply Disturbances”, American Economic Review, 79, 655-673.
Canova, F. and Paustian, M. (2011), ”Business cycle measurement with some theory”,
forthcoming, Journal of Monetary Economics.
Cooley, T. and Dwyer, M. (1998), ”Business Cycle Analysis without much Theory: A
Look at Structural VARs, Journal of Econometrics, 83, 57-88.
Favara, G. and Giordani, P. (2009), ”Reconsidering the role of money for output prices
and interest rates Journal of Monetary Economics, 56, 419-430.
Faust, J. (1998), ” On the Robustness of Identified VAR Conclusions about Money” ,
Carnegie-Rochester Conference Series on Public Policy, 49, 207-244.
Faust, J. and Leeper, E. (1997), ”Do Long Run Restrictions Really Identify Anything?”,
Journal of Business and Economic Statistics, 15, 345-353.
Hansen, L. and Sargent, T., (1991), ”Two Difficulties in Interpreting Vector Autore-
gressions”, in Hansen, L. and Sargent, T., (eds.), Rational Expectations Econometrics,
Westview Press: Boulder London.
Kilian, L. (1998), ”Small Sample confidence Intervals for Impulse Response Functions”,
Review of Economics and Statistics, 218-230.
Kilian, L. and Murphy, D.(2011), ”Why agnostic sign restrictions are not enough? Under-
standing the dynamics of oil market VAR models”, forthcoming,Journal of the European
Economic Association.
Leeper,M.E.,and Roush,J.E.,(2003), ”Putting ’M’ back in monetarypolicy”,Journal of
Money, Credit, and Banking, 35,1257—1264.
Lippi, M. and Reichlin, L., (1993), ”The Dynamic Effect of Aggregate Demand and
Supply Disturbances: A Comment”, American Economic Review, 83, 644-652.
Lanne, M and Lutkepohl, H. (2008), ”Identifying monetary policy shocks via changes in
volatility”, Journal of Money, Credit and Banking, 40, 1131-1149.
- They are multivariate autoregressive linear time series models of the form
- yt is a m × 1 vector
General disadvantages:
Example 4
yt = αyt−1 + ǫt − θǫt−1 (5)
The likelihood is flat for α ≈ θ. Lack of identifiability creates numerical
problems, especialy in multivariate setups.
3 Wold theorem and the news
Wold Theorem: Under linearity and stationarity, any vector of time series
† †
yt can be represented as yt = ay−∞ + ∞j=0 Dj et−j , where y−∞ contains
information about yt† known in the infinite past (i.e. constants) and a is
a m × k matrix of coefficients; et−j are the news at t − j, Dj are m × m
matrices each j.
- Let yt ≡ yt† − ay−∞. Wold theorem tells us that, apart from initial
conditions, time series are the discounted accumulation of news.
et ≡ yt − E[yt|Ft−1] (6)
where the notation E[yt|Ft−1] indicates the mathematical conditional ex-
pectation of yt.
Two issues
a) News are unpredictable given the past (E(et|Ft−1) = 0) but are con-
temporaneously correlated (Σe) is not diagonal).
To give a name the news in each equation, need to find a matrix P̃ such
that P̃ P̃ ′ = Σe. Then:
′
yt = D(ℓ)P̃ P̃ −1et = D̃(ℓ)ẽt ẽt ∼ (0, P̃ −1ΣeP̃ −1 = I) (7)
Examples of P̃: Choleski (lower triangular) factor; P̃ = PΛ0.5; where P is
the eigenvector matrix, Λ the eigenvalue matrix, etc.
1 4 1 4
Example 5 If Σe = its Choleski factor is P̃ = so that
4 25 0 3
P̃ −1et ∼ (0, I).
yt = D(ℓ)et
D(ℓ)−1yt = et
yt = A(ℓ)yt−1 + et (9)
where I − A(ℓ) equivD(ℓ)−1 and A(ℓ) is of infinite length.
- With a finite sample of data need to carefully select the lag length of the
VAR (recall: news can’t be predictable).
1) Choose an upper q̄
2qm2
B) AIC criterion: minq AIC(q) = ln |Σy (1)|(q) + T
ln T
C) HQC criterion: minq HQC(q) = ln |Σy (1)|(q) + (2qm2) ln T
- Penalties increase with q and fall with T . Penalty of SWC is the harshest.
Example 6 VAR for the Euro area, 1980:1-1999:4; use output, prices, in-
terest rates and M3, set q̄ = 7.
Hypothesis LR LRc q AIC HQC SWC
q=6 vs. q=72.9314e-5(∗) 0.1447 7 -7.556 -6.335 -4.482
q=5 vs. q=6 3.6400e-4 0.1171 6 -7.413 -6.394 -4.851
q=4 vs. q=5 0.0509 0.5833 5 -7.494 -6.675 -5.437
q=3 vs. q=4 0.0182 0.4374 4 -7.522 -6.905 -5.972
q=2 vs. q=3 0.0919 0.6770 3 -7.635(∗)-7.219(∗) -6.591
q=1 vs. q=2 3.0242e-7 6.8182e-3(∗) 2 -7.226 -7.012 -6.689(∗)
Table 2: Tests for the Lag length of a VAR
D
Use S(t1, T ) = T (ln |Σre un 2
e | − ln |Σe |) → χ (ν); ν = dim(A1(ℓ)) (Andrew
and Ploberger (1994)).
If t1 unknown, but belongs [tl , tu] compute S(t1, T ) for all the t1 in the
interval. Check for breaks using maxt1 S(t1, T ).
Other specification issues
- Small scale vs. large scale VAR? Core VAR plus rotating variables?
y
- Variables in ratios vs. variables in levels (i.e. used n and per-capita n)?
Consider
yt = A(ℓ)yt−1 + et (12)
yt, et m × 1 vectors; et ∼ (0, Σe).
- Simultaneous equation setup useful for evaluating the likelihood and com-
puting restricted estimates.
5.1 Companion form
A1 A2 A3 Σe 0 0
A = Im 0 0
ΣE = 0 0 0
0 Im 0 0 0 0
Then the VAR(3) can be rewritten as
Y = XA + E (14)
2) Let i indicate the subscript for the i − th column vector. The equation
for variable i is yi = xαi + ei. Stacking the columns of yi, ei into where
mT × 1 vectors we have
Why is it that single equation OLS is the same as full information maximum
likelihood?
- There are the same regressors in every equation! A VAR is a SUR system.
Plugging A1,ML into ln L(α, Σe), we obtain the concentrated likelihood
T −1 1 T ′
ln L(Σe) = − (m ln(2π) + ln |Σe |) − et,MLΣ−1
e et,ML (20)
2 2 t=1
∂(b′Qb) ∂ ln |Q|
where et,ML = (yt − A1,MLYt−1). Using ∂Q = b′b; ∂Q = (Q′)−1
∂ ln L(Σe) T Σ′ − 1 T e ′
we have ∂Σ = 2 e 2 t=1 t,M L et,M L = 0 or
1 T
Σ′ML = et,MLe′t,M L (21)
T t=1
and σi,i′ = T1 T e′ e′
t=1 i t,ML it,ML.
1
Σ′ML = Σ′OLS = T −1 T e ′
t=1 t,MLet,M L but equivalent for large T .
6.2 VAR(q) with restrictions
θML = [R′(Σ−1 ′ −1 −1
e ⊗ x x)R] R[Σe ⊗ x]y
† (22)
αML = R θML + r (23)
1
Σ′e = eMLe′M L (24)
T t
Alternative method
In general:
- Specify a prior distribution for the coefficients and the covariance matrix
of the shocks. Prior can be subjective or objective (based on a training
sample).
- If the prior has similar format i.e. g(β, Σ) = N(β|Σ)IW (Σ) (conjugate
prior), the posterior will preserve the Normal-IW format.
- If the prior are conjugates, posterior moments are simple.
β̃ = Ω̃(Ω̄−1β̄ + X ′X β̂) (27)
Ω̃ = (Ω̄−1 + X ′X)−1 (28)
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
-4 -2 0 2 4 -4 -2 0 2 4
- Inference: report the posterior mean and posterior standard error.
- To test a model against another use Bayes factors, i.e. the ratio of the
marginal likelihood of the two models.
- Recursive approach.
i,i′
Assume we have an estimate of the coefficients Aj . Then Dτ = [Dτ ] =
max[τ ,q]
j=1 Aτ −j Dj , where τ refers to the horizon, D0 = I, Dj = 0 ∀ j ≥
q.
Note: if news are not orthogonal let P̃eP̃e′ = Σe. Then D̃0 = D0P̃e.
• Companion form method:
Yt = AYt−1 + Et
t−1
= AtY0 + Aj Et−j (29)
j=0
t−1
= AtY0 + Ãj Ẽt−j (30)
j=1
(29) is valid for non-orthogonal news and (30) for orthogonal ones.
• 1-step ahead revision of the forecast Yt+τ .
It uses:
τ −1
yt+τ − yt(τ ) = D̃j ẽt+τ −j D0 = I (31)
j=0
yt(τ ) is the τ -steps ahead prediction of yt based on the VAR.
Let ŷi,t(τ ) = yi,t+τ − yi,t(τ ) be the τ -steps ahead forecast error in the i-th
variable of the VAR. Then:
m ′
ŷi,t(τ ) = i
D̃ (ℓ)ẽi′t+τ (32)
i′=1
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
0 5 10 15 20
What is the contribution of various shocks to var(y) and var(π)?
Y π
HorizonShock1Shock2Shock3Shock4Shock1Shock2Shock3Shock4
4 0.99 0.001 0.003 0.001 0.07 0.86 0.01 0.03
12 0.93 0.01 0.039 0.02 0.24 0.60 0.08 0.07
24 0.79 0.01 0.15 0.04 0.52 0.36 0.07 0.04
Table 3: Variance decomposition, percentages
Historical decomposition of gnp
Shocks in gnp Shocks in interest
4.55 4.55
4.50 4.50
4.45 4.45
4.40 4.40
4.35 4.35
4.30 4.30
4.25 4.25
variable variable
baseline baseline
shocks shocks
4.20 4.20
1975 1976 1977 1978 1975 1976 1977 1978
4.50 4.50
4.45 4.45
4.40 4.40
4.35 4.35
4.30 4.30
4.25 4.25
variable variable
baseline baseline
shocks shocks
4.20 4.20
1975 1976 1977 1978 1975 1976 1977 1978
9 Identification: Obtaining SVARs
We want to go from (34) to (33), since (34) is easy to estimate (just use
OLS equation by equation). To do this, we need A0. To estimate A0, we
need restrictions, since Aj , Σe have less free parameters than A0, Aj , Σε.
• Order condition: If the VAR has m variables, need m(m − 1)/2 restric-
tions. This is because there are m2 free parameters on the left hand side
of (40) and only m(m + 1)/2 parameters in Σe (m2 = m(m + 1)/2 +
m(m − 1)/2).
- Rank and order conditions are valid only for ”local identification” (need
to be checked at one specific point).
- They are valid for SVAR restrictions which are linear or nonlinear in the
parameters.
The SVAR is not globally identified because the restrictions are not ”ap-
propriately” placed.
How do you estimate a SVAR? Use a two-step approach:
With a01, a02 fixed, two stage approach has a IV interpretation: e1t, e2t are
predetermined used as instruments for structrual shocks in third equation.
9.3 Nonstationary VARs
Separating permanent and transitory components and using for the latter
only contemporaneous restrictions we have
D(1)A0ǫt = D(1)et (43)
A0∆ǫt = ∆et (44)
If yt is stationary, D(1) = D(1) = 0 and (43) is vacuous.
Two types of restrictions to estimate A0: short and long run.
• Under the above restrictions, and if the variance of the shocks changes
once at know date, shocks can be identified without economic restrictions
(!!). If economic restrictions exist, they become overidentifying and can
be tested
1) α and β to be unchanged.
2) The variance of both shocks to be changing (if there are two regimes).
Problems with standard identification approaches
Example 17
pt = a11est (49)
yt = a21est + a21edt (50)
Price is set prior to knowing demand shocks. Choleski ordering with p first.
yt = a11est (51)
pt = a21est + a21edt (52)
Quantity set prior to knowing demand shocks. Choleski ordering with y
first.
1.2 1.2
0.8 0.8
0.4 0.4
-0.0 -0.0
-0.4 -0.4
-0.8 -0.8
-1.2 -1.2
5 10 15 20 5 10 15 20
Restriction not satified Restriction satified
- Long run restrictions 2: Cooley-Dweyer (1998): take a RBC driven by a
unit root technology shock. Simulate data. Run a VAR with(yt, nt) and
identify two shocks (permanent/transitory).Possible to do this. Transitory
shocks explain large portion of variance of yt.
- Long run restrictions 3: Erceg, et. al (2005): long run restrictions poor
in small samples. Chari, et. al. (2006) potentially important truncation
bias due to the estimation of a VAR(q), q finite.
- Short run restrictions: Canova-Pina (2005), Fuerst, Carlstrom, Paustian
(2009).
- Need to link SVAR to theory better: use restrictions which are more
common in DSGE models (and are robust).
9.4 Alternative identification Scheme
Canova-De Nicolo’ (2002), Faust (1998), Uhlig (2005): use sign (and
shape) restrictions.
• Stop when you find a εjt with the right characteristics or compute the
mean/ median (and s.e.) of the statistics of interest for all εljt satisfying
the restrictions, where l is the number of shocks found.
• Number of H infinite. Write H = H(ω), ω ∈ (0, 2π). H(ω) are called
rotation (Givens) matrices.
cos(ω) −sin(ω)
Example 19 Let M=2. Then H(ω) = or H(ω) =
sin(ω) cos(ω)
cos(ω) sin(ω)
. Varying ω, we trace out all possible structural MA
sin(ω) −cos(ω)
representations that could have generated the data.
9.5 Sign restrictions in large systems
Output
-5 0
-10
-15 -6
0 0
6 10
0 5
Prices
-6 0
-12 -5
0 0
10 18
0
9
Money
-10
0
-20
-30 -9
0 0
Horizon (Months) Horizon (Months)
Example 21 Studying the effects of fiscal shocks in US states: 1950-2005.
corr(G,Y)corr(T,Y)corr(G, DEF)corr(T,DEF)corr(G,T)
G shocks >0 >0 >0
BB shocks <0 =0 =1
Tax shocks <0 <0 =0
Table 4: Identification restrictions
Discussion
- Sign restrictions are weak: they identify a portion of the parameter space
not a point. Large uncertainty in the results compared with standard
SVARs.
- Sign restrictions may be poor if shock signal is too weak (Canova and
Paustian (2011)).
- Use robust DSGE restrictions (see Dedola and Neri (2007), Pappa (2009),
Persmann and Straub (2009), Lippi and Nobili (2011)).
- Number of rejected draws tells us how easy (likely) is to find the restric-
tions in the data. Can be used as first test on the restrictions used.
Literature: Sims (1992, EER), Christiano et. al. (1999, Handbook of Mon-
etary Economics), Leeper and Roush (2003, JMCB), Favara and Giordani
(2009, JME) among others.
What does (standard New Keynesian) theory tells us about transmission
of MP shocks?
1.4
monthly
quarterly
1.2
1.0
0.8
Size of responses
0.6
0.4
0.2
0.0
-0.2
0 2 4 6 8 10 12 14
Horizon (Months)
- If et is normal:
S3 6 ∗ Im 0
T 0.5 ∼ N(0, )
S4 − 3 ∗ Im 0 24 ∗ Im
2 , log y
- Regress êt on yt−1 t−1, etc. Check significance.
• Stationarity is violated
- Changes in the variance of the process are continuous. Can’t really use
subsample analysis.
• Two small scale VAR systems, both with m1 < m variables, have differ-
ent innovations.
yt = D(ℓ)ǫt (58)
ǫt is m × 1 and Di is m1 × m ∀i. Suppose a researcher specifies a VAR
with m1 < m variables and obtains the MA:
yt = D̃(ℓ)et (59)
et is m1 × 1, and D̃i is m1 × m1 ∀i. Matching (58) and (59): D̃(ℓ)et =
D(ℓ)ǫt or letting D‡(ℓ) be a m1 × m matrix
D‡(ℓ)ǫt = et (60)
If there are ma shocks of one type and mb shocks of another type, ma +
mb = m and m1 = 2. Then
‡ ‡
- e1t recovers type 1 shocks if D13(ℓ) = D14(ℓ) = 0 and e2t recovers type
‡ ‡
2 shocks if D21(ℓ) = D22(ℓ) = 0.
‡ ‡
- e1t recovers current type 1 shocks if Dii′ (ℓ) = Dii′ , ∀ℓ i, i′ = 1, 2.
• Non-Wold decompositions (Lippi-Reichlin (1994), Leeper (1991), Hansen-
Sargent (1991), Leeper, Walker, Yang (2009)). Certain economic models
do not have a fundamental MAR representation, e.g., diffusion models;
models where agents anticipate tax changes.
yt = kt + et (65)
kt = a1kt−1 + a0ǫt (66)
a1 persistence, a0 contemporaneous effect. If we observe both yt and kt
restricted VAR(1). No problem.
1−a1ℓ
If only yt is observable 1+a yt = et or
0−a1 ℓ
a0 a1 j
yt = ( ) yt−j + et (67)
1 + a0 j 1 + a0
a1 j
If a0 is small and a1 large ( 1+a ) will be large even for large j. Need
0
very long lag length to whiten residuals. If long run restrictions are used,
potentially important truncation bias.
Summary
- Omission does not matter very much if the true model has dynamics
which die out quickly.
yt = A(ℓ)yt−1 + et (68)
A0yt = A(ℓ)yt−1 + ǫt (69)
- Assume that we have point estimates of A(ℓ), A0, A(ℓ) and a distribution
(asymptotic, small sample or posterior) for these point estimates.
Yt = AYt−1 + Et (70)
A0Yt = AYt−1 + Υt (71)
• Unconditional forecast: Set Et+τ = 0(Υt+τ = 0), ∀τ > 0.
1, 2, . . . , L, each horizon τ .
l
2. Order Yt+τ over l, each τ and extract required intervals (25-75, 16-84
or 2.5-97.5 percentiles).
• Conditional forecast 1: Manipulating shocks.
1) Take quarterly data for output growth and inflation for your favorite country. Identify
supply and demand shocks by finding all the rotations which satisfy the following restric-
tions: supply ∆y ↑, Inf ↓, demand ∆y ↑, Inf ↑. How do impulse responses produced
by the rotations that jointly satisfy the restrictions look like? How do they compare with
those obtained using the restriction that only supply shocks affect output in the long run,
but both demand and supply shocks can affect inflation in the long run?
2) Consider the following New Keynesian model
1
xt = Et xt+1 + (it − Et πt+1) + v1t (72)
ϕ
πt = βπ t+1 + κxt + v2t (73)
it = φr it−1 + (1 − φr )(φπ πt + φxxt) + v3t (74)
where xt is the output gap, πt the inflation rate and Rt the nominal interest rate.
i) Plot impulse responses to the three shocks.
ii) Simulate 11000 data points from this model after you have set ϕ = 1.5, β = 0.99,
κ = 0.8, φr = 0.6, φπ = 1.2, φx = 0.2 ρ1 = 0.9, ρ2 = 0.9, σ1 = σ 2 = σ 3 = 0.1, and dis-
card the first 1000. With the remaining data estimate a three variable VAR. In particular
(i) estimate the lag length optimally, (ii) check if the model you have selected have well
specified residuals and (iii) whether you detect breaks in the specification or not.
iii) With the estimated model apply a Choleski decomposition in the order (y, π, R) and
check how the impulse responses compare with the true ones in i). Is there any noticeable
difference. Why?
iv) Now try the ordering (R, y, π) do you notice any difference with iii)? Why?
3) Obtain data for output and hours (employment) for your favorite country - each group
should use a different country. Construct a measure of labor productivity and run a VAR
on labor productivity and hours as in Gali (1999, AER) after you have appropriately se-
lected the statistical nature of the model. Identify technology shocks as the only source
of labor productivity in the long run. How much of the fluctuations in hours and labor
productivity do they explain at the 4 years horizon? Repeat the exercise using the restric-
tion that in response to technology shocks output and labor productivity must increase
contemporaneously. Are technology shocks a major source of cyclical fluctuations?
Appendix: Inference in SVARs
where SML = (yt − A(ℓ)MLyt−1)′ (yt − A(ℓ)ML yt−1 )/T − k, k is number of regressors in
each equation, tr is the trace of the matrix.
a) Find A(ℓ)ML.
Typically difficult to maximize analytically, need numerical routines (both for likelihood
and posterior computations).
Note if instead of conditioning on A(ℓ)ML we integrate it out we have:
L(A0|y) ∝ |A0|T −k exp{0.5tr(SMLA′0A0)}
• Bayesian analysis with flat priors equivalent to classical analysis, conditional on A(ℓ)ML .
Summary
• If the system is over-identified, can’t use a two step approach. Need to jointly maximize
the likelihood function with respect to Aj , A0.
• With sign restrictions no maximization needed. Find the region of the parameter space
which satisfies the restrictions. Can do this numerically using a version of an acceptance
sampling algorithm .
Monte Carlo standard errors for impulse responses
Algorithm 12.1
1. Draw Σ−1 from a Wishart, conditional on the data.
′
3. Draw A(ℓ)l from a N(A(ℓ)M L, (Al0)−1(Al0)−1 ⊗ (Yt−1
′ Y −1
t−1 ) .
4. Set A(ℓ)l = A(ℓ)l Al0. Compute (A(ℓ)l )−1 (the MA of the model).
Algorithm 12.2
′
2. Set Al0 = chol(A(1)−1 l −1 −1 −1
M L Σ A(1)ML ) A(1)ML .
′
3. Draw A(ℓ)l from a N(A(ℓ)M L, (Al0)−1(Al0)−1 ⊗ (Yt−1
′ Y −1
t−1 ) .
Algorithm 12.3
′
4. Draw A(ℓ)l from a N(A(ℓ)M L, (Al0)−1HH ′(Al0)−1 ⊗ (Yt−1
′ Y −1
t−1 ) .
5. Set A(ℓ)l = A(ℓ)l Al0. Compute (A(ℓ)l )−1 (the MA of the model). If column i of
A(ℓ)l )−1 satisfies the sign restriction, keep draw otherwise throw it.
6. Repeat steps 1.-5. until L draws are obtained. Order draws and compute median, mode,
mean, percentiles, etc.
• Could also randomize on H. Many H such that HH ′ = I. Could have a prior on Hs.
Since H does not enter the likelihood, posterior of H = prior of H.