0% found this document useful (0 votes)
92 views8 pages

Vector Autoregressions: How To Choose The Order of A VAR

A vector autoregression (VAR) is a statistical model used to capture the linear interdependencies among multiple time series. The VAR represents an extension of the univariate autoregressive model to dynamic multivariate time series. It consists of a set of regression equations where each variable is explained by its own lags and the lags of other variables. The document discusses the mathematical formulation of VAR models, methods for estimating the model parameters and selecting the optimal lag order, diagnostic testing, and applications such as forecasting, Granger causality analysis, impulse response functions, and variance decompositions.

Uploaded by

v4nhuy3n
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views8 pages

Vector Autoregressions: How To Choose The Order of A VAR

A vector autoregression (VAR) is a statistical model used to capture the linear interdependencies among multiple time series. The VAR represents an extension of the univariate autoregressive model to dynamic multivariate time series. It consists of a set of regression equations where each variable is explained by its own lags and the lags of other variables. The document discusses the mathematical formulation of VAR models, methods for estimating the model parameters and selecting the optimal lag order, diagnostic testing, and applications such as forecasting, Granger causality analysis, impulse response functions, and variance decompositions.

Uploaded by

v4nhuy3n
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Vector Autoregressions

A vector autoregressive (VAR) is simply an autoregressive process for a vector of variables. xt 1t Let us dene Wt = , a matrix A2X2 and t = yt 2t Then a VAR(1) may be written as Wt = AWt1 + t or xt = a11 xt1 + a12 yt1 + 1t yt = a21 xt1 + a22 yt1 + 2t where E(t ) = 0 , E(t 0 ) = s 0 t = s ( = 0 , c0 c > 0, c 6= 0) otherwise

VAR(p) Wt = A1 Wt1 + A2 Wt2 + ... + Ap Wtp + t or (I A1 L A2 L2 ... Ap Lp )Wt = t The VAR is covariance stationary if all the values of L satisfying |I A1 L A2 L2 ... Ap Lp | = 0 lie outside the unit circle1 .

How to choose the order of a VAR

The results of any test that we carry out using a VAR crucially depend on identifying correctly the order of that VAR. An easy way to attempt to identify the order of a VAR is to perform likelihood ratio tests. To do this turns out to be computationally very simple since the test can be constructed using OLS results. Consider the likelihood function at is Maximum value of a VAR with p0 lags, denoted L0 (, ) = (T n/2)log(2) + (T /2)log|1 | (1/2) 0 whixh can be written as
1 For the VAR(1) example with 2 variables this is equivalent to say that the real part both roots in L are greater than 1. h i 1 0 a11 a12 Det L =0 0 1 a21 a22 or 1 (a11 + a22 )L + (a11 a22 a12 a21 )L2 = 0

T X t=1

0 1 t . t 0

L0 (, ) = (T n/2)log(2) + (T /2)log|1 | (nT )/2 0 If we want to test the Hypothesis that the VAR has p lags against p0 lags we calculate the likelihood for the VAR with p1 lags (p1 > p0 ) L1 (, ) = (T n/2)log(2) + (T /2)log|1 | (nT )/2 1 and compute the likelihood ratio which is 2(L1 (, )L0 (, )) = T (log |1 | 1 1 2 log |0 |) which is distributed under the Null with degrees of freedom equal to the number of restrictions imposed under H0 , n2 (p1 p0 ). Sims (1980) proposed a modication of the likelihood ratio test to take into account small sample bias (T k)(log |1 | log |1 |) 1 0 where k = np1 = number of parameters estimated per equation.

Goodness of t Criteria
The goodness of t criteria are measures of how good a model is relative to others. They reect a balance between the models goodness of t and the complexity of the model. Typically, we want to minimize a scalar measure such as C(p) = 2max(logL) + (number of f reely estimated parameters) For Gaussian models, the maximized log-likelihood is proportional to (T /2)log|| Hence, we choose p to minimize: C(p) = T log|| + (n2 p) AIC SBC HQ = 2 (Akaike information criterion) = log(T ) (Scharz Bayesian criterion) = 2log(log(T )) (Hannan-Quin criterion) (since |1 | = 1/|| )

Alternatively the Akaikes prediction error (FPE) criterion chooses p so that to minimize the expected one -step ahead squared forecast error: FPE = [ T + np + 1 n ] || T np 1

AIC and FPE are not consistent, so that asymptotically they overestimate p with positive probability. SBC and HQ are consistent in the sense that p p.

Asymptotic Distribution of The maximum likelihood estimates of and will give consistent estimates of the population parameters. Standard errors for can be based on the usual OLS formulas. Let T = vec(T ) denote the nk 1 vector of coecients resulting from OLS regressions of each of the elements of Wt on xt Then T ( T ) N (0, Q1 ) L where Q = E(xt x0 ). t This establishes that the standard OLS t and F statistics applied to the coecients of any single equation in the VAR are asymptotically valid. T ( iT i ) N (0, ( 2 Q1 )) L i where 2 = E(2 ). i it

Main uses of Vector Autoregressions


i) Forecasting ii) Testing Hypothesis iii) Granger Causality iv) Use of Impulse Response Functions v) Use of the variance decomposition.

ii)

Testing Rational expectations Hypothesis

The VAR methodology is very useful to test linear rational expectations hypothesis. These models usually impose non-linear cross equation restrictions between the parameters of the model which are tested using a likelihood ratio test which is distributed under the Null (that the model is correct) as a 2 distribution with degrees of freedom equal to the number of restrictions imposed by the model. Consider a rst order bivariate VAR xt = a11 xt1 + a12 yt1 + t yt = a21 xt1 + a22 yt1 + t Assume that xt is the interest rates dierential, say (i1t i ), and that yt is t the rst dierence of the logs of the spot exchange rate (et et1 ); Then uncovered interest parity can be written as 3

xt = Et yt+1 . Then if we condition on both sides of the previous equation on information available at t 1 we get the following set of non linear restrictions. 1 0 A = 0 1 A2

The above equation can be easily solved and yields the following nonlinear restrictions on the parameters a11 = a22 a21 /(1 a21 ) a12 = a2 /(1 a21 ) 22

Then we simply estimate the unrestricted model and the restricted (a function of only two parameters (and the variance-covariance)), and perform a likelihood ratio test, where 2(Lu Lr )~asymptotically 2 (number of restrictions=2). iii)

Granger Causality

One of the key questions that can be addressed with vector autoregressions is how useful some variables are for forecasting others. Denition The question investigated is whether a scalar y can help forecast another scalar x. If it cannot, then we say that y does not Granger-cause x Then, y fails to Granger-cause x if for all s > 0 the mean squared error of a forecast of xt+s based on (xt , xt1 , ....) is the same as the MSE of a forecast of xt+s based on (xt , xt1 , ....) and (yt , yt1 , ....). For linear functions M SE[E(xt+s |xt , xt1 , ....)] = M SE[E(xt+s |xt , xt1 , ...., yt , yt1 , ....)] Grangers reason for proposing this denition was that if an event Y is the cause of another event X, then the event Y should precede the event X. Testing for Granger causality Ho) y does not cause x or a12 = 0 We just regress both the general model i) xt = a11 xt1 + a12 yt1 + 1t and the restricted model ii) xt = a11 xt1 + 0 1t and compare the residuals sum squares T (RRS(0 )RRS())/RRS()~2 (1) (asymptotically)

Granger-Causality Tests and Forward-Looking Behaviour Let us assume risk neutral agents such that stock prices may be written as Pt = suppose Dt = d + ut + ut1 + t where ut and t are independent white noise processes, then d + ut for i = 1 Et Dt+i = d for i = 2, 3, ... The stock prices will be given by Pt = d/r + ut /(1 + r) Thus for this example the stock price is a white noise and could not be forecast on the basis of lagged prices or dividends. No series should granger cause stock prices. Nevertheless, notice that using the stock price equation and rearranging terms, I might express ut1 = (1 + r)Pt1 (1 + r)d/r Then substituting back in the Dividend process we get the following expression for dividends Dt = d + ut + (1 + r)Pt1 (1 + r)d/r + t Thus stock prices Granger cause dividends The bivariate VAR takes the form Pt d/r 0 0 Pt1 ut /(1 + r) = + + d/r (1 + r) 0 Dt1 ut + t Dt Hence in this model, Granger causation runs in the opposite direction from the true causation. Dividends fail to G-C prices even though investors perceptions of dividends are the sole determinant of stock prices. On the other hand, prices do granger-cause dividends, even though the markets evaluation of the stock in reality has no eect on the dividend process.
X (1/(1 + r))i E(Dt+i |It ) i=1

iv)

The Impulse - Response Function


5

If a VAR is stationary it can always be written as an innite vector moving average. Consider the following vector innite moving average representation of Wt Wt =
X z=0

z tz

, 0 = I

Analogously, if we lead the above expression s periods we get Wt+s =


X z=0

z t+sz

Therefore we can easily see from the above expression (evaluated at z = s) that matrix s has the interpretation of a dynamic multiplier s = Wt+s 0 t

(dynamic multiplier or impulse response) where ( s )ij = eect of a one unit increase in the j th variables innovation at time t (jt ) for the value of the ith variable at time t + s (Wi,t+s ), holding all other innovations at all dates constant2 .

A simple way of nding these multipliers numerically is by simulation. To implement the simulation set Wt = ... = Wtp = 0, then set jt = 1 and all the other terms to zero, and simulate the system Wt = A1 Wt1 + A2 Wt2 + ... + Ap Wtp + t for t, t + 1, t + s, with t+1 , t+2 , .... = 0 This simulation corresponds to the J column of the matrix s . By doing this for other values of j we get the whole matrix. A plot of ( s )ij , that is row i column j of s , as a function of s is called the impulse response function. It describes the response of Wi,t+s to a one time impulse in Wjt with all other variables dated t or earlier held constant.
2 Consider

a rst order two-variable Var. The relevant term of the innite moving average

is s t =

a b 1 c d 2 Now assume a unit increase in 1 (i.e. 1 = 1 and 2 = 0), then the eect on Wt+s is h i a . (that is a on the rst variable and c on the second). Now varying s we can get a plot c of the eects of a shock in the innovation of variable 1 on both variables. A similar argument can be constructed for a schock in 2 , therefor the columns, rst or second, of s represent the eect on each variable of W at time t + s of a unit shock to the rst or second variable innovation, keeping the other constant.

ih

We can also dene the Interim multipliers, which are given by the accumulated responses over m periods m X j ,
j=1

and the long run multiplier which give the total accumulated eects for all future time periods:
X j=1

j .

The assumption that a shock in one innovation does not aect others is problematic since E(t 0 ) = 6= a diagonal matrix . This means that a shock in t one variable is likely to be accompanied by a shock in another variable in the same period. Since is symmetric and positive denite, it can be expressed as = ADA0 , where A is a lower triangular matrix and D is a diagonal Matrix. Let ut = A1 t , then Wt = where = j A j E(ut u0 ) = E(A1 t 0 (A1 )0 ) = A1 (A1 )0 = A1 ADA0 (A1 )0 = D t t The matrix D gives the variance of ujt A plot of as a function of s is known as an orthogonalized impulse s response function. The matrix Wt+s = s u0 t gives the consequences of an increase in Wjt by a unit impulse in ut . In the new MA representation, it is reasonable to assume that a change in one component of ut has no eect on the other components because the components are orthogonal. Notice that = 0 A = IA is lower triangular. This implies that the or0 dering of variables is of importance. The ordering has to be such that Wlt is the only one with a potential immediate impact on all other variables. W2t may have an immediate impact on the last n 2 components but not on W1t , and so on. The ordering cannot be determined with statistical methods.
X j=1

j tj =

X j=1

j AA1 tj =

X j=1

utj j

v)

Variance Decomposition
7

Consider the error in forecasting a VAR s periods ahead, c Wt+S Wt+S|t =


s1 X j=0

j t+sj ,

0 = I

The mean squared error of this s-period ahead forecast is thus c M SE(Wt+s|t ) = + 1 0 + 2 0 ........... + s1 0 1 2 s1

Let us now consider how each of the orthogonalized disturbances (u1t , ..unt ) contributes to this MSE. Write t = Aut = a1 u1t + ....an unt , where aj denotes the j th column of the matrix A. Recalling that the us are uncorrelated, we get
0 0 = a1 a0 1(nn) V ar(u1t ) + a2 a2(nn) V ar(u2t ) + ...an an(nn) V ar(unt )

Substituting this in the MSE of the s period ahead forecast we get c M SE(Wt+s|t ) =
n X j=1

V ar(ujt )(aj a0 + 1 aj a0 0 + 2 aj a0 0 ... + s1 aj a0 0 ) j j 1 j 2 j s1

With this expression we can calculate the contribution of the j th orthogonalized innovation to the MSE of the s-period ahead forecast. V ar(ujt )(aj a0 + 1 aj a0 0 + 2 aj a0 0 ... + s1 aj a0 0 ) j j 1 j 2 j s1 Again the magnitude in general depends on the ordering of the variables.

You might also like