Econ 212: Using Stata To Estimate VAR and Stractural VAR Models
Econ 212: Using Stata To Estimate VAR and Stractural VAR Models
Econ 212: Using Stata To Estimate VAR and Stractural VAR Models
Hosein Joshaghani
November 14, 2010
1 Introduction to VARs
A VAR is a model in which K variables are specied as linear functions of p of their own lags,
p lags of the other K 1 variables, and possibly additional exogenous variables. Algebraically, a
p-order VAR model, written VAR(p), with exogenous variables y
t
is given by
x
t
= A
1
x
t1
+ A
2
x
t2
+ : : : + A
p
x
tp
+ B
1
y
t1
+ B
2
y
t2
+ : : : + B
p
y
tp
+
t
(1)
where x
t
= (x
1t
; : : : ; x
Kt
) is a K 1 random vector, y
t
= (y
1t
; : : : ; y
Mt
) is a M 1 vector of
exogenous variables, the A
i
are xed KK matrices of parameters, the B
i
are xed KK matrices
of parameters, and
t
is assumed to be a K 1 white noise random vector; that is, E(
t
) = 0 ,
E(
t
0
t
) = and E(
t
0
s
) = 0 for t 6= s. For our class we assume there are no exogenous variables
(i.e. B
i
= 0 for all i)
or simply
x
t
= A
1
x
t1
+ A
2
x
t2
+ : : : + A
p
x
tp
+
t
_
I
K
A
1
L + A
2
L
2
+ : : : + A
p
L
p
_
x
t
=
t
This is a draft version of a guide which needs more examples. But since Professor Bondarenko has up-
loaded the VAR pset, this draft version could be very helpful. Please send your comments to: Hosein Joshaghani
([email protected])
1
A(L) x
t
=
t
(2)
The cross-equation error variancecovariance matrix contains all the information about contem-
poraneous correlations in a VAR and may be the VARs greatest strength and its greatest weakness.
Because no questionable a priory assumptions are imposed, tting a VAR allows the dataset to
speak for itself. However, without imposing some restrictions on the structure of , we cannot make
a causal interpretation of the results.
In the absence of contemporaneous exogenous variables, the disturbance variancecovariance
matrix contains all the information about contemporaneous correlations among the variables. VARs
are sometimes classied into three types by how they account for this contemporaneous correlation.
(See Stock and Watson [2001] for one derivation of this taxonomy.)
1. A reduced-form VAR, aside from estimating the variancecovariance matrix of the distur-
bance, does not try to account for contemporaneous correlations.
2. In a recursive VAR, the K variables are assumed to form a recursive dynamic structural
equation model in which the rst variable is a function of lagged variables, the second is a
function of contemporaneous values of the rst variable and lagged values, and so on.
3. In a structural VAR, the theory you are working with places restrictions on the contempora-
neous correlations that are not necessarily recursive. In practice it is very common to treat
recursive VAR as a very special case of structural VAR.
1.1 Stata Commands:
Stata has two commands for tting reduced-form VARs: var and varbasic. var allows for con-
straints to be imposed on the coecients. varbasic allows you to t a simple VAR quickly without
constraints and graph the IRFs.
Because tting a VAR of the correct order can be important, varsoc oers several methods for
choosing the lag order p of the VAR to t. After tting a VAR, and before proceeding with inference,
interpretation, or forecasting, checking that the VAR ts the data is important. varlmar can be
used to check for autocorrelation in the disturbances. varwle performs Wald tests to determine
2
whether certain lags can be excluded. varnorm tests the null hypothesis that the disturbances are
normally distributed. varstable checks the eigenvalue condition for stability, which is needed to
interpret the IRFs and IRFs.
1.2 Examples
To illustrate the basic usage of var, we replicate the example in Lutkepohl (2005, 7778). The
data consists of three variables: the rst dierence of the natural log of investment, dln_inv; the
rst dierence of the natural log of income, dln_inc; and the rst dierence of the natural log of
consumption, dln_consump. The dataset contains data through the fourth quarter of 1982, though
Lutkepohl uses only the observations through the fourth quarter of 1978.
. use http://www.stata-press.com/data/r11/lutkepohl2
. tsset
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4),lag(1/4)
. varstable ,graph
The output has two parts: a header and the standard Stata output table for the coecients,
standard errors, and condence intervals. The header contains summary statistics for each equation
in the VAR and statistics used in selecting the lag order of the VAR. Although there are standard
formulas for all the lag-order statistics, Lutkepohl (2005) gives dierent versions of the three infor-
mation criteria that drop the constant term from the likelihood. To obtain the Lutkepohl (2005)
versions, we specied the lutstats option. The formulas for the standard and Lutkepohl versions of
these statistics are given in Methods and formulas of [TS] varsoc.
The lag() option takes a numlist of lags. To specify a model that includes the rst and second
lags, type
. var x1 x2 x3, lags(1/2)
not
. var X1 x2 x3, lags(2)
because the latter specication would t a model that included only the second lag. varbasic
simplies tting simple VARs and graphing the IRFs, the OIRFs, or the FEVDs. Below we use
varbasic to t a VAR(2) model on the data from the second quarter of 1961 through the fourth
3
quarter of 1978. By default, varbasic produces graphs of the OIRFs
1
. It illustrates how varbasic
serves as an entry point to further analysis.
. use http://www.stata-press.com/data/r11/lutkepohl2
. varbasic dln_inv dln_inc dln_consump if qtr<=tq(1978q4)
2 Introduction to SVARs
A problem with VAR analysis is that, because is not restricted to be a diagonal matrix, an increase
in an innovation to one variable provides information about the innovations to other variables. This
implies that no causal interpretation of the simple IRFs is possible: there is no way to determine
whether the shock to the rst variable caused the shock in the second variable or vice versa.
However, suppose that we had a matrix P = Q
1
, such that = PP0. If we had such a P,
then P
1
_
P
1
_
0
= QQ
0
= I
K
, and
E
_
Q
t
(Q
t
)
0
= QQ
0
= I
K
(3)
We can thus use Q to orthogonalize the
t
and rewrite (7) as
x
t
=
1
i=0
B
i
PP
1
ti
=
1
i=0
B
i
Q
1
Q
ti
=
1
i=0
C
i
ti
(4)
where C
i
= B
i
P and
t
= P
1
t
. If we had such a P, the
t
would be mutually orthogonal,
and the C
i
would allow the causal interpretation that we seek.
SVAR models provide a framework for estimation of and inference about a broad class of P
matrices. As described in class, the estimated P matrices can then be used to estimate structural
IRFs and structural FEVDs. There are two types of SVAR models. Short-run SVAR models identify
a P matrix by placing restrictions on the contemporaneous correlations between the variables. Long-
run SVAR models, on the other hand, do so by placing restrictions on the long-term accumulated
1
We will talk about Orthogonalized IRF (OIRF), which are the result of Cholesky decomposition, in section 3.
4
eects of the innovations.
2.1 Short-run SVAR models
A short-run SVAR model can be written as
A
_
I
K
A
1
L + A
2
L
2
+ : : : + A
p
L
p
_
x
t
= A
t
= B
t
(5)
where A and B are K K nonsingular matrices of parameters to be estimated,
t
is a K 1
vector of disturbances with
t
s N(0; I
K
), and E[
t
0
s
] = 0
K
for all s 6= t. Note that these A and
B are dierent from those A
i
and B
i
matrices in (7). Since they have 2K
2
unknown parameters
and has only K (K + 1) =2 independent known parameters, sucient constraints must be placed
on A and B so that P is identied. This notation is the most general short-run SVAR that Stata
uses. In our class notation we imposed A = I
K
. So we write the short-run SVAR model as
_
I
K
A
1
L + A
2
L
2
+ : : : + A
p
L
p
_
x
t
=
t
= B
t
(6)
One way to see the connection is to draw out the implications of the latter equality in (5). From
(5) it can be shown that
= A
1
B
_
A
1
B
_
0
useing our notation A = I , (5) is simplied to
= BB
0
or equivalantly
B = Q = P
1
2.2 Lon-grun restrictions
Constraining A to be an identity matrix allows us to rewrite the short-run SVAR equation as
5
x
t
= A(L)
1
B
t
= B (L) B
t
which implies that = BB
0
. Thus C = A(L)
1
B is the matrix of long-run responses to the
orthogonalized shocks, and
x
t
= C
t
In long-run models, the constraints are placed on the elements of C, and the free parameters
are estimated. These constraints are often exclusion restrictions. For instance, constraining C[1; 2]
to be zero can be interpreted as setting the long-run response of variable 1 to the structural shocks
driving variable 2 to be zero.
2.3 Stata Command
Statas svar command estimates the parameters of structural VARs. See [TS] var svar for more
information and examples.
2.4 Examples
2.4.1 Example 1: Short-run just-identied SVAR model
Following Sims (1980), the Cholesky decomposition is one method of identifying the impulse
response functions in a VAR; thus, this method corresponds to an SVAR. There are several sets of
constraints on A and B that are easily manipulated back to the Cholesky decomposition, and the
following example illustrates this point. One way is to impose A to be identity matrix and B be a
lower triangular matrix.
Then the P = Q
1
matrix for this model is P
sr
= A
1
B = B, its estimate,
P
sr
=
B, obtained
by plugging in estimates of B, should equal the Cholesky decomposition of .
To illustrate, we use the German macroeconomic data discussed in Lutkepohl (2005) and used in
[TS] var. In this example, y
t
= (dln_inv; dln_inc; dln_consump), where dln_inv is the rst dier-
ence of the log of investment, dln_inc is the rst dierence of the log of income, and dln_consump
is the rst dierence of the log of consumption. Because the rst dierence of the natural log of a
6
variable can be treated as an approximation of the percent change in that variable, we will refer to
these variables as percent changes in inv, inc, and consump, respectively.
We will impose the Cholesky restrictions on this system by applying equality constraints with
the constraint matrices
A = I
3
=
_
_
_
_
_
_
1 0 0
0 1 0
0 0 1
_
_
_
_
_
_
and B =
_
_
_
_
_
_
: 0 0
: : 0
: : :
_
_
_
_
_
_
With these structural restrictions, we assume that the percent change in inv is not contem-
poraneously aected by the percent changes in either inc or consump. We also assume that the
percent change of inc is aected by contemporaneous changes in inv but not consump. Finally,
we assume that percent changes in consump are aected by contemporaneous changes in both inv
and inc.
. use http://www.stata-press.com/data/r11/lutkepohl2
. matrix A = (1,0,0n.,1,0n.,.,1)
. matrix B = (.,0,0n0,.,0n0,0,.)
. svar dln_inv dln_inc dln_consump if qtr<=tq(1978q4), aeq(A) beq(B)
. varstable ,graph
The SVAR output has four parts: an iteration log, a display of the constraints imposed, a
header with sample and SVAR log-likelihood information, and a table displaying the estimates of
the parameters from the A and B matrices. From the output above, we can see that the equality
constraint matrices supplied to svar imposed the intended constraints and that the SVAR header
informs us that the model we t is just identied. The estimates of a_2_1, a_3_1, and a_3_2 are
all negative. Because the o-diagonal elements of the A matrix contain the negative of the actual
contemporaneous eects, the estimated eects are positive, as expected.
The estimates
A and
B are stored in e(A) and e(B), respectively, allowing us to compute the
estimated Cholesky decomposition.
. matrix Aest = e(A)
. matrix Best = e(B)
. matrix chol_est = inv(Aest)*Best
7
. matrix list chol_est
svar saves the estimated from the underlying var in e(Sigma). The output below illustrates the
computation of the Cholesky decomposition of e(Sigma). It is the same as the output computed
from the SVAR estimates.
. matrix sig_var = e(Sigma)
. matrix chol_var = cholesky(sig_var)
. matrix list chol_var
2.4.2 Example 2: Long-run SVAR model
As discussed before, a long-run SVAR has the form
x
t
= C
t
In long-run models, the constraints are placed on the elements of C, and the free parameters
are estimated. These constraints are often exclusion restrictions. For instance, constraining C[1; 2]
to be zero can be interpreted as setting the long-run response of variable 1 to the structural shocks
driving variable 2 to be zero.
Similar to the short-run model, the P
lr
matrix such that P
lr
P
0
lr
= identies the structural
impulseresponse functions. P
lr
= C is identied by the restrictions placed on the parameters in
C. There are K
2
parameters in C, and the order condition for identication requires that there be
at least K
2
K(K +1)=2 restrictions placed on those parameters. As in the short-run model, this
order condition is necessary but not sucient, so the Amisano and Giannini (1997) check for local
identication is performed by default.
Suppose that we have a theory in which unexpected changes to the money supply have no
long-run eects on changes in output and, similarly, that unexpected changes in output have no
long-run eects on changes in the money supply. The C matrix implied by this theory is
C =
_
_
_
: 0
0 :
_
_
_
. use http://www.stata-press.com/data/r11/m1gdp
8
. matrix lr = (.,0n0,.)
. svar d.ln_m1 d.ln_gdp, lreq(lr)
We have assumed that the underlying VAR has 2 lags.
3 IRF results for VARs
3.1 An introduction to impulseresponse functions for VARs
A pth-order vector autoregressive model (VAR) with exogenous variables is given by
x
t
= A
1
x
t1
+ A
2
x
t2
+ : : : + A
p
x
tp
+
t
_
I
K
A
1
L + A
2
L
2
+ : : : + A
p
L
p
_
x
t
=
t
A(L) x
t
=
t
(7)
where x
t
= (x
1t
; : : : ; x
Kt
) is a K 1 random vector, the A
i
are xed K K matrices of
parameters, and
t
is assumed to be a K 1 white noise random vector; that is, E(
t
) = 0 ,
E(
t
0
t
) = and E(
t
0
s
) = 0 for t 6= s.
As discussed in class, a VAR can be rewritten in moving-average form only if it is stable. Below
we discuss conditions under which the IRFs and forecast-error variance decompositions have a
causal interpretation. IRFs describe how the innovations to one variable aect another variable
after a given number of periods. For an example of how IRFs are interpreted, see Stock and Watson
(2001). They use IRFs to investigate the eect of surprise shocks to the Federal Funds rate on
ination and unemployment. In another example, Christiano, Eichenbaum, and Evans (1999) use
IRFs to investigate how shocks to monetary policy aect other macroeconomic variables.
The VAR represents the variables in x
t
as functions of its own lags and serially uncorrelated
innovations
t
. All the information about contemporaneous correlations among the K variables in
x
t
is contained in . In fact, as discussed in class, a VAR can be viewed as the reduced form of a
dynamic simultaneous-equation model.
9
x
t
= B (L)
t
(8)
To see how the innovations aect the variables in x
t
after, say, i periods, rewrite the model in
its moving-average form
x
t
= B
0
t
+ B
1
t1
+ B
2
t2
+ : : : (9)
=
1
i=0
B
i
ti
Note that since A(0) = I
K
then B
0
= I
K
. We can rewrite a VAR in the moving-average form
only if it is stable. Essentially, a VAR is stable if the variables are covariance stationary and none
of the autocorrelations are too high.
The B
i
are the simple IRFs. The j; k element of B
i
gives the eect of a 1time unit increase
in the kth element of
t
on the jth element of x
t
after i periods, holding everything else constant.
Unfortunately, these eects have no causal interpretation, which would require us to be able to
answer the question, How does an innovation to variable k, holding everything else constant,
aect variable j after i periods? Because the
t
are contemporaneously correlated, we cannot
assume that everything else is held constant. Contemporaneous correlation among the
t
implies
that a shock to one variable is likely to be accompanied by shocks to some of the other variables,
so it does not make sense to shock one variable and hold everything else constant. For this reason,
(9) cannot provide a causal interpretation.
This shortcoming may be overcome by rewriting (9) in terms of mutually uncorrelated inno-
vations. Suppose that we had a matrix P = Q
1
, such that = PP0. If we had such a P, then
P
1
_
P
1
_
0
= QQ
0
= I
K
, and
E
_
Q
t
(Q
t
)
0
= QQ
0
= I
K
We can thus use Q to orthogonalize the
t
and rewrite (9) as
10
x
t
=
1
i=0
B
i
PP
1
ti
=
1
i=0
B
i
Q
1
Q
ti
=
1
i=0
C
i
ti
(10)
where C
i
= B
i
P and
t
= P
1
t
. If we had such a P, the
t
would be mutually orthogonal, and
no information would be lost in the holding-everything-else-constant assumption, implying that the
C
i
would have the causal interpretation that we seek.
Note that P is not a unique matrix. While
is a symmetric variance covariance K K matrix,
and has only K (K + 1) =2 knwon parameters, P
KK
has K
2
unknown parameters to be estimated.
Hence we need at least K
2
K (K + 1) =2 independent restrictions on elements of P. (e.g if K = 2,
we need only one restriction and if K = 3 we need 3 restrictions.)
So, where do we get such a P? Sims (1980) popularized the method of choosing P to be
the Cholesky decomposition of
. The IRFs based on this choice of P are known as the or-
thogonalized IRFs (OIRF). Choosing P to be the Cholesky decomposition of
is equivalent
to imposing a recursive structure for the corresponding dynamic structural equation model. The
ordering of the recursive structure is the same as the ordering imposed in the Cholesky decom-
position. Because this choice is arbitrary, some researchers will look at the OIRFs with dierent
orderings assumed in the Cholesky decomposition. The order() option available with irf create
facilitates this type of analysis.
The SVAR approach integrates the need to identify the causal IRFs into the model specication
and estimation process. Sucient identication restrictions can be obtained by placing either short-
run or long-run restrictions on the model. The VAR in (7) can be rewritten as
_
I
K
A
1
L + A
2
L
2
+ : : : + A
p
L
p
_
x
t
=
t
Similarly, a short-run SVAR model can be written as
A
_
I
K
A
1
L + A
2
L
2
+ : : : + A
p
L
p
_
x
t
= A
t
= B
t
(11)
11
where A and B are K K nonsingular matrices of parameters to be estimated,
t
is a K 1
vector of disturbances with
t
s N(0; I
K
), and E[
t
0
s
] = 0
K
for all s 6= t. Sucient constraints
must be placed on A and B so that P is identied. This notation is the most general short-run
SVAR that Stata assumes. In our class notation we imposed A = I
K
. So the short-run SVAR
model can be written as
_
I
K
A
1
L + A
2
L
2
+ : : : + A
p
L
p
_
x
t
=
t
= B
t
(12)
One way to see the connection is to draw out the implications of the latter equality in (11).
From (11) it was shown that
= A
1
B
_
A
1
B
_
0
using our notation A = I it is simplied to
= BB
0
As discussed in class, the estimates
A and
B are obtained by maximizing the concentrated
log-likelihood function on the basis of the
obtained from the underlying VAR. The short-run
SVAR approach chooses P =
A
1
B =
B to identify the causal IRFs.
The long-run SVAR approach works similarly, with P =
C =
A
1
B =
B, where
A is the matrix
of estimated long-run or accumulated
eects of the reduced-form VAR shocks.
There is one important dierence between long-run and short-run SVAR models. As discussed
by Amisano and Giannini (1997, chap. 6), in the short-run model the constraints are applied
directly to the parameters in A and B. Then A and B interact with the estimated parameters
of the underlying VAR. In contrast, in a long-run model, the constraints are placed on functions
of the estimated VAR parameters. Although estimation and inference of the parameters in C is
straightforward, obtaining the asymptotic standard errors of the structural IRFs requires untenable
assumptions. For this reason, irf create does not estimate the asymptotic standard errors of the
structural IRFs generated by long-run SVAR models. However, bootstrap standard errors are still
12
available.
3.2 Stata Commands
irf create estimates IRFs, Cholesky orthogonalized IRFs, dynamic-multiplier functions, and
structural IRFs and their standard errors. It also estimates Cholesky and structural FEVDs. The
irf graph, irf cgraph, irf ograph, irf table, and irf ctable commands graph and tabulate
these estimates. Stata also has several other commands to manage IRF and FEVD results. See
[TS] irf for a description of these commands.
fcast compute computes dynamic forecasts and their standard errors from VARs. fcast graph
graphs the forecasts that are generated using fcast compute.
VARs allow researchers to investigate whether one variable is useful in predicting another vari-
able. A variable y is said to Granger-cause a variable x if, given the past values of x, past values of
y are useful for predicting x. The Stata command vargranger performs Wald tests to investigate
Granger causality between the variables in a VAR.
3.3 Example
3.3.1 Example 1: Impulse Response Function of a VAR model
To analyze IRFs and FEVDs in Stata, you rst t a model, then use irf create to estimate the
IRFs and FEVDs and store them in a le, and nally use irf graph or any of the other irf analysis
commands to examine results:
. use http://www.stata-press.com/data/r11/lutkepohl2
. var dln_inv dln_inc dln_consump if qtr<=tq(1978q4), lags(1/2) dfk
. irf create order1, step(10) set(myirf1)
. irf graph oirf, impulse(dln_inc) response(dln_consump)
13
-.002
0
.002
.004
.006
0 5 10
order1, dln_inc, dln_consump
95% CI orthogonalized irf
step
Graphs by irfname, impulse variable, and response variable
Multiple sets of IRFs and FEVDs can be placed in the same le, with each set of results in a
le bearing a distinct name. The irf create command above created le myirf1.irf and put
one set of results in it, named order1. The order1 results include estimates of the simple IRFs,
orthogonalized IRFs, cumulative IRFs, cumulative orthogonalized IRFs, and Cholesky FEVDs.
Below we use the same estimated var but use a dierent Cholesky ordering to create a second
set of IRF results, which we will store as order2 in the same le, and then we will graph both
results:
. irf create order2, step(10) order(dln_inc dln_inv dln_consump)
. irf graph oirf, irf(order1 order2) impulse(dln_inc) response(dln_consump)
14
-.005
0
.005
.01
0 5 10 0 5 10
order1, dln_inc, dln_consump order2, dln_inc, dln_consump
95% CI orthogonalized irf
step
Graphs by irfname, impulse variable, and response variable
We have compared results for one model under two dierent identication schemes. We could
just as well have compared results of two dierent models. We now use irf table to display the
results tabularly:
. irf table oirf, irf(order1 order2) impulse(dln_inc) response(dln_consump)
Both the table and the graph show that the two orthogonalized IRFs are essentially the same.
In both functions, an increase in the orthogonalized shock to dln_inc causes a short series of
increases in dln_consump that dies out after four or ve periods.
3.3.2 Example 2: IRF of a SVAR model
Suppose that we have results generated from two dierent SVAR models. We want to know whether
the shapes of the structural IRFs are similar in the two models. We are also interested in knowing
whether the structural IRFs dier signicantly from their Cholesky counterparts.
Filling in the background, we have previously issued the commands:
. use http://www.stata-press.com/data/r11/lutkepohl2
. mat a = I(3)
15
. mat b = (., 0, 0n0,.,0n.,.,.)
. svar dln_inv dln_inc dln_consump, aeq(a) beq(b)
. irf create modela, set(results3) step(8)
. svar dln_inc dln_inv dln_consump, aeq(a) beq(b)
. irf create modelb, step(8)
To see whether the shapes of the structural IRFs and the structural FEVDs are similar in the
two models, we type
. irf graph oirf sirf, impulse(dln_inc) response(dln_consump)
-.005
0
.005
.01
0 2 4 6 8 0 2 4 6 8
modela, dln_inc, dln_consump modelb, dln_inc, dln_consump
95% CI for oirf 95% CI for sirf
orthogonalized irf structural irf
step
Graphs by irfname, impulse variable, and response variable
The graph reveals that the oirf and the sirf estimates are essentially the same for both models
and that the shapes of the functions are very similar for the two models.
3.3.3 Example 3
Lets focus on the results from modela. Suppose that we were interested in examining how
dln_consump responded to impulses in its own structural innovations, structural innovations to
dln_inc, and structural innovations to dln_inv. We type
16
. irf graph sirf, irf(modela) response(dln_consump)
-.005
0
.005
.01
-.005
0
.005
.01
0 2 4 6 8
0 2 4 6 8
modela, dln_consump, dln_consump modela, dln_inc, dln_consump
modela, dln_inv, dln_consump
95% CI structural irf
step
Graphs by irfname, impulse variable, and response variable
The upper-left graph shows the structural IRF of an innovation in dln_consump on dln_consump.
It indicates that the identication restrictions used in modela imply that a positive shock to
dln_consump causes an increase in dln_consump, followed by a decrease, followed by an_increase,
and so on, until the eect dies out after roughly 5 periods.
The upper-right graph shows the structural IRF of an innovation in dln_inc on dln_consump,
indicating that a positive shock to dln_inc causes an increase in dln_consump, which dies out after
4 or 5 periods.
17