0% found this document useful (0 votes)
17 views50 pages

Bond (2007 Lecture) Dynamic PD First Difference GMM

Uploaded by

zifang tian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views50 pages

Bond (2007 Lecture) Dynamic PD First Difference GMM

Uploaded by

zifang tian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Dynamic linear model

yit = yi;t 1 + xit + ( i + vit) j j<1

for i = 1; :::; N and t = 2; :::; T:

NB. The …rst observation is yi1, so that the …rst available equation is

yi2 = yi1 + xi2 + i + vi2

and we have T 1 equations in levels.

Some authors assume yi0 is observed, and thus have T equations in levels.
Two important properties of the lagged dependent variable.

yi;t 1 = yi;t 2 + xi;t 1 + ( i + vi;t 1)

E[yi;t 1 i] > 0 since i is part of the process that generates yi;t 1 according

to our speci…cation.

E[yi;t 1vi;t 1] > 0 for the same reason.

Thus yi;t 1 is correlated with the individual e¤ects, and is not strictly

exogenous.
Most of the estimation issues are present in the simpler dynamic model

yit = yi;t 1 + ( i + vit) j j<1

that we focus on initially.

Useful to establish the properties of pooled OLS and Within Groups esti-

mators in this setting.

Assuming E[yi;t 1vit] = 0, then p lim b OLS > as a result of the positive

correlation between yi;t 1 and i.


yit = yi;t 1 + ( i + vit) j j<1 for t = 2; :::; T

Consider
1
yei;t 1 = yi;t 1 (yi1 + ::: + yi;T 1)
T 1
and
1
veit = vit (vi2 + ::: + viT )
T 1
Notice that all correlations of order T 1 1 are negative. E.g. corr(yi;t 1; T 1
1 vi;t 1)

1
and corr(vit; T 1 yit):

1
yi;t 1veit] < 0 and is of order
This suggests that E[e T 1 yi;t 1veit] ! 0
(i.e. E[e

as (T 1) ! 1).
These properties can be shown more formally (e.g. Nickell, Econometrica

1981).

Thus p limN !1 b W G < for …xed T . The Within estimator is inconsistent

as the cross-section dimension of the panel (only) becomes large.

Also p limT !1 b W G = . The Within estimator is consistent as the time

dimension of the panel becomes large.


1
The inconsistency of the Within Groups estimator is of order T 1:

However calculations of this inconsistency, and Monte Carlo experiments,

suggest the bias of the Within estimator remains non-negligible in cases with

T = 10 or even T = 15:
Also note that the inconsistency does not disappear as ! 0. So, unless T

is large, the Within estimator does not provide reliable evidence on whether

a lagged dependent variable should be included in the model or not.

However, in practice it is useful to know that OLS levels is likely to be

biased upwards, and (in short panels) Within Groups is likely to be biased

downwards.

Supposedly consistent estimators that give b >> b OLS or b << b W G

should be viewed with suspicion.


Instrumental variables

A popular class of estimators that are consistent as N ! 1 with T …xed

…rst transform the model to eliminate the individual e¤ects, and then apply

instrumental variables.

The Within transformation is not useful in this context, since it introduces

the shocks from all time periods into the transformed error term.

The …rst-di¤erencing transformation is more promising.

yit = yi;t 1 + vit

for i = 1; :::; N and t = 3; :::; T:


yit = yi;t 1 + vit

First-di¤erenced OLS is not consistent (as N ! 1 or T ! 1, or both).

Since yi;t 1 = yi;t 1 yi;t 2 and vit = vit vi;t 1, we have

E[ yi;t 1 vit] < 0

However if we are willing to assume that E[yi;t 1vit] = 0, then yi;t 2 or

yi;t 2 are valid instrumental variables for yi;t 1 in the …rst-di¤erenced

equations.
Two-stage least squares (2SLS) estimators of this type were suggested by

Anderson and Hsiao (Journal of the American Statistical Association, 1981).

E.g.

b AH = ( y 0 1Z(Z 0Z) 1Z 0 y 1) 1
y 0 1Z(Z 0Z) 1Z 0 y

where y is the stacked N (T 2) 1 vector of observations on yit, y 1

is the stacked N (T 2) 1 vector of observations on yi;t 1, and Z is the

stacked N (T 2) 1 vector of observations on yi;t 2:

One further time series observation is lost if yi;t 2 rather than yi;t 2 is

used as the instrument.


The assumption that yi;t 1 is predetermined follows naturally from assum-

ing that the vit are serially uncorrelated shocks, provided the initial condi-

tions (yi1) are also uncorrelated with subsequent vit shocks.

Note that a minimum of T = 3 time series observations are required to

identify using this approach.

With T = 3, we have one instrument yi1 for yi2 in the equation

yi3 = yi2 + vi3 for i = 1; :::; N


The Anderson-Hsiao 2SLS estimators are consistent as N ! 1 for …xed
T . But they are not e¢ cient, except in the special case with T = 3:

With T > 3, further valid instruments become available for the …rst-
di¤erenced equations in the later time periods. E¢ ciency can be improved
by exploiting these additional instruments.

The transformed error term vit also has a known moving average form
of serial correlation, under the maintained assumption that vit is serially
uncorrelated. More generally, vit may be heteroskedastic. These features
can be exploited to improve e¢ ciency when T > 3 (i.e. when the model is
overidenti…ed).
Generalised method of moments (GMM)

Holtz-Eakin, Newey and Rosen (Econometrica, 1988) and Arellano and

Bond (Review of Economic Studies, 1991) applied the generalised method

of moments approach developed by Hansen (Econometrica, 1982) to exploit

this additional information in the dynamic panel data problem.

We …rst brie‡y review some properties of GMM estimators.


GMM formulates a set of orthogonality restrictions (moment conditions)

related to an econometric model, and …nds parameter estimates that come

as close as possible to achieving these orthogonality properties in the sample.

Example

yi f (xi; ) = ui i = 1; :::; N is K 1

g(xi) = zi s:t: E(zi0ui) = 0 zi is 1 L

The model speci…es L (population) orthogonality restrictions: E(zi0ui) = 0

Sample analogue:
XN XN
1 0 1
bN ( ) = ziui( ) = zi0[yi f (xi; )]
N i=1 N i=1
GMM estimators choose bGM M to minimise the distance of bN ( ) from

zero.

L = K (just identi…ed): bGM M is the unique solution to bN ( ) = 0:

L > K (overidenti…ed): bGM M minimises a weighted quadratic distance,

i.e.
bGM M = arg min JN ( ) = bN ( )0WN bN ( )

for some positive de…nite weight matrix WN :

Note that this gives a family of GMM estimators, based on the same mo-

ment conditions E(zi0ui) = 0, for di¤erent choices of the weight matrix WN :


Properties

Under conditions given in Hansen (1982)


b a:s:
i) Strong consistency: GM M ! as N ! 1:
p D
ii) Asymptotic normality: N (bGM M ) ! N (0; avar(bGM M )) where

avar(bGM M ) = (D0W D) 1D0W SW W D(D0W D) 1

and
@bN ( )
D = p lim
N !1 @ 0
W = p lim WN
N !1
XN
1
SW = E(zi0uiu0izi)
N i=1
SW is the average covariance matrix for the moment conditions.

The optimal GMM estimator, which minimises avar(bGM M ) for a given

set of moment conditions, chooses WN s.t. W = SW1, giving

avar(bGM M ) = (D0W D) 1

This is typically implemented in two steps, choosing WN = SbW1; where SbW

bi = yi f (xi; b)
is a consistent estimate of SW , obtained using the residuals u

from some consistent initial estimate of (e.g. a ‘one step’GMM estimator,

computed using some known weight matrix WN ).


Note:

OLS is a (just-identi…ed) GMM estimator for f (xi; ) = xi linear and

zi = xi:

2SLS is a (possibly overidenti…ed) GMM estimator for f (xi; ) = xi and


XN 1
WN = N1 zi0zi . This weight matrix is optimal if ui iid(0; 2u),
i=1

but not more generally.


The AR(1) panel data model

yit = yi;t 1 + ( i + vit) j j<1

for i = 1; :::; N and t = 2; :::; T:

Assumption (error components)

E( i) = E(vit) = E( ivit) = 0

Assumption (serially uncorrelated shocks)

E(visvit) = 0 for s 6= t

Assumption (predetermined initial conditions)

E(yi1vit) = 0 for t = 2; :::; T


These assumptions specify a …nite number of linear moment conditions,

which can be exploited using a linear GMM estimator.

First-di¤erenced equations Valid instruments

(yi3 yi2) = (yi2 yi1) + (vi3 vi2) yi1

(yi4 yi3) = (yi3 yi2) + (vi4 vi3) yi1; yi2


..

(yiT yi;T 1) = (yi;T 1 yi;T 2) + (viT vi;T 1) yi1; yi2; :::; yi;T 2

Clearly E(yi1 vi3) = 0 follows from assuming predetermined initial condi-

tions.
E(yi1 vi4) = 0 follows similarly.

yi2 = yi1 + i + vi2

E( i vi4) = 0 follows from the error components assumption.

E(vi2 vi4) = 0 follows from serially uncorrelated shocks.

Hence we obtain E(yi2 vi4) = 0:

Similar arguments establish the m = (T 2)(T 1)=2 moment conditions

E(yi;t s vit) = 0 for t = 3; :::; T and s 2

Giving the set of valid instruments proposed in the previous table.


These can also be written as E(Zi0 vi) = 0 where

0 1 0 1
B yi1 0 0 ::: 0 0 ::: 0 C B vi3 C
B C B C
B C B C
B 0 yi1 yi2 : : : 0 0 : : : 0 C B vi4 C
B C B C
Zi = B C and vi = B C
B . .. .. . . . .. .. .. C B .. C
B . C B C
B C B C
@ A @ A
0 0 0 : : : yi1 yi2 : : : yi;T 2 viT

(T 2) m (T 2) 1

Sample analogue
XN
1
bN ( ) = Zi0 vi( )
N i=1
For T = 3, we have 1 moment condition E(yi1 vi3) = 0 and 1 parameter.

is just identi…ed, the choice of the weight matrix is irrelevant, and the

optimal GMM estimator coincides with the Anderson-Hsiao 2SLS estimator

(using the level yi;t 2 as the instrument).


For T > 3, we have m > 1 moment conditions. is overidenti…ed.

GMM estimators minimise a weighted quadratic distance

b GM M = arg min JN ( ) = bN ( )0WN bN ( )


N
! N
!
1 X 0 1 X 0
= arg min v i Zi W N Zi v i
N i=1 N i=1

1
= ( y 0 1ZWN Z 0 y 1) y 0 1ZWN Z 0 y

where y and y 1 are the stacked N (T 2) 1 vectors of observations on

yit and yi;t 1 as before, and Z = (Z1; :::; ZN )0 is the stacked N (T 2) m

matrix of observations on the instruments.


Compare

1
b GM M = ( y 0 1ZWN Z 0 y 1) y 0 1ZWN Z 0 y

and

b AH = ( y 0 1Z(Z 0Z) 1Z 0 y 1) 1
y 0 1Z(Z 0Z) 1Z 0 y

For T > 3, there are two sources of greater (asymptotic) e¢ ciency:

b GM M exploits more moment conditions (m > 1). The Anderson-Hsiao

instrument matrix is a linear combination of the GMM instrument matrix.


XN 1
1 1
2SLS weight matrix (WN = (Z 0Z) = N Zi0Zi ) is not optimal
i=1

for the …rst-di¤erenced speci…cation.


General results for GMM estimators indicate that b GM M is strongly con-

sistent (as N ! 1 for …xed T ) and asymptotically normal.

For an arbitrary WN

avar(b GM M ) = N ( y 0 1ZWN Z 0 y 1) 1 y 0 1ZWN VbN WN Z 0 y 1( y 0 1ZWN Z 0 y 1) 1

where
XN
b 1 c c 0
VN = Zi v i v i Zi
N i=1
and cv it = yit b yi;t 1 are consistent estimates of the …rst-di¤erenced

residuals, based on some consistent initial estimator b .


The optimal (two step) GMM estimator thus sets WN = VbN 1, or

N
! 1
1 X 0
WN = c c
Zi v i v i Zi
N i=1

giving
1
avar(b GM M ) = N ( y 0 1ZWN Z 0 y 1)
One step weight matrix

2
For the special case in which vit iid(0; v ), we can obtain a one step

GMM estimator that is asymptotically equivalent to two step GMM.

NB. homoskedasticity is not required to derive the moment conditions we

are using.

Role here is merely to suggest a good choice for the one step weight matrix.

For the …rst-di¤erenced equations, this choice is not 2SLS, due to the serial

correlation in vit introduced by the …rst-di¤erencing transformation.


vit = vit vi;t 1 vi;t 1 = vi;t 1 vi;t 2

E( vit2 ) = 2 2
v

2
E( vit vi;t 1) = v

E( vit vi;t s) = 0 for s 2


0 1
B 2 1 0 ::: 0 C
B C
B C
B 1 2 1 ::: 0 C
B C
B C
B C
E( vi vi0) = 2v B 0 1 2 ::: 0 C = 2
vH
B C
B C
B .. .. .. . . . .. C
B C
B C
@ A
0 0 0::: ::: 2
In the iid case, the weight matrix

N
! 1
1 X
WN = Zi0HZi
N i=1

gives a one step GMM estimator for the …rst-di¤erenced equations that is

asymptotically equivalent to the (generally) optimal two step GMM estima-

tor.
While asymptotic results for the two step estimator only require an initial

estimator that is consistent, small sample properties tend to be better when

the estimate of the optimal weight matrix

N
! 1
1 X 0
WN = c c
Zi v i v i Zi
N i=1

uses residuals cv i based on an initial estimator that is also as e¢ cient as

possible.
Two step inference

Making explicit the dependence of the estimated optimal weight matrix on

the initial consistent estimator


N
! 1
1 X
WN (b ) = Zi cv i(b ) cv i(b )0Zi
N i=1

indicates a small sample problem with the usual estimate of the asymptotic

variance for the two step GMM estimator

1
avar(b GM M ) = N ( y 0 1ZWN (b )Z 0 y 1)

This neglects variation introduced by using an estimate b to construct the

optimal weight matrix.


In very large samples, this variation is negligible, and the usual expression

for the asymptotic variance is correct.

But in (reasonably large) …nite samples, this additional variation makes

inference based on avar(b GM M ) unreliable.

In fact, avar(b GM M ) provides a good estimate of the variance of an in-

feasible GMM estimator, that uses the true value rather than the initial

estimate b to construct the optimal weight matrix.

Windmeijer (Journal of Econometrics, 2005) proposes a …nite sample cor-

rection that provides more accurate estimates of the variance of (linear) two

step GMM estimators.


This …nite sample correction is now implemented in programs such as Stata

and PC-Give/Ox.

t-tests based on these corrected standard errors are found to be as reliable

as those based on the one step GMM estimator (where no parameters are

estimated in the construction of the weight matrix).

(As usual with t and Wald tests) these are quite accurate in cases where

the estimated parameter(s) are not subject to serious …nite sample bias prob-

lems (to be discussed). Bond and Windmeijer (Econometric Reviews, 2005)

provide Monte Carlo evidence, and consider alternative tests of linear re-

strictions in the GMM context.


Linearity

The simple linear (generalised instrumental variables) expressions for these

GMM estimators follows from the linearity of the moment conditions in

E(yi;t s vit) = E(yi;t s( yit yi;t 1))

= E(yi;t s yit) E(yi;t s yi;t 1) = 0


Quadratic moment conditions

The ‘standard assumptions’stated earlier imply a further T 3 moment

conditions that are quadratic in :

These can be written as

E( vi;t 1uit) = 0 for t = 4; :::; T

where uit = i + vit is the error term for the untransformed equations in

levels.

Exploiting these additional moment conditions requires numerical optimi-

sation procedures, and is less common in practice.


The resulting optimal non-linear GMM estimator is e¢ cient for this model

under the ‘standard assumptions’stated previously, whereas the linear ‘Arellano-

Bond’GMM estimator is not.

See Ahn and Schmidt (Journal of Econometrics, 1995) for further discus-

sion.
Homoskedasticity over time

Under the additional homoskedasticity assumption

E(vit2 ) = 2
i

there are a further T 3 linear moment conditions

E(yi;t 2 vi;t 1 yi;t 1 vit) = 0 for t = 4; :::; T

suggested by Ahn and Schmidt (1995).

These are simple to implement. They will improve e¢ ciency if the addi-

tional homoskedasticity assumption is valid, but may introduce inconsistency

if not.
Alternative transformations

We introduced these GMM estimators using the …rst-di¤erencing transfor-

mation to eliminate the time-invariant individual e¤ects from uit = i + vit:

The two key properties of …rst-di¤erencing are:

- eliminates time-invariant variables

- does not introduce lagged shocks earlier than vi;t 1 into the transformed

error term

Any transformation that shares these properties would do equally well.


Arellano and Bover (Journal of Econometrics, 1995) show that the optimal

GMM estimator is invariant (numerically!) to the particular transformation

used within this class.

An alternative transformation of some interest is the forward Helmert’s or

‘orthogonal deviations’transformation
1
T t+1 2
1
vitO = vi;t 1 (vit + vi;t+1 + ::: + viT )
T t+2 T t+1
This estimates the mean for individual i using future observations on the

series only, and takes (weighted) deviations from this mean.

One property is that if the vit are iid, then so are the vitO .
Hence the asymptotically e¢ cient one step estimator for the iid special

case is simply 2SLS.

Transformed model

yitO = yi;t
O
1 + v O
it for t = 3; :::; T

OLS on this transformed model coincides with Within Groups.

2SLS using all available linear moment conditions coincides with the one

step …rst-di¤erenced GMM estimator discussed above.

These results are useful for thinking about …nite sample bias issues.
Over…tting

One source of …nite sample bias is the use of ‘too many’instruments relative

to the sample size (N ).

If we use all the available lagged instruments, the number of instruments

grows rapidly with the time dimension of the panel.

Hence if T is moderately large relative to N , there is a danger of ‘over…t-

ting’.
For 2SLS estimators, over…tting results in a …nite sample bias in the direc-
tion of the corresponding OLS estimator.

By analogy with 2SLS on the orthogonal deviations model, this gives a


(downward) …nite sample bias, in the direction of the Within estimator.

We can investigate this by comparing GMM and Within Groups estimates.


If they are too close, we can reduce the number of lagged instruments used,
to reduce this source of …nite sample bias.

More generally, it is good practice to check the sensitivity of empirical


results to using more or fewer lagged observations in the set of instruments
for each equation.
This also indicates the properties of the …rst-di¤erenced GMM estimator
in the case where T ! 1:

Over…tting leads …rst-di¤erenced GMM to converge on Within Groups.


But recall that the Within estimator is consistent as T ! 1:
So …rst-di¤erenced GMM is also consistent (although not very useful) in
the case where T ! 1:

This is shown formally by Alvarez and Arellano (Econometrica, 2003).

The result is also much less useful in the context of models with endogenous
explanatory variables, where the Within estimator is not consistent as T !
1, and hence not such a benign estimator to be converging on.
Weak instruments

Instrumental variables (and GMM) estimators have poor small sample

properties in cases where the instruments, although valid, are only weakly

correlated with the endogenous explanatory variables.

This is relevant for the …rst-di¤erenced GMM estimator in the AR(1) model

in the case where ! 1.

By analogy with random walks (innovations uncorrelated with past levels),

the correlation between yi;t 1 and the lagged levels yi;t s for s 2 becomes

weaker as ! 1.
In the model we have focused on

yit = yi;t 1 + ( i + vit)

remains formally identi…ed as ! 1 , and the …rst-di¤erenced GMM

estimator remains consistent as N ! 1; provided E( 2i ) 6= 0:

At = 1 we have

yi;t 1 = i + vi;t 1 and yi;t 2 = i + vi;t 2

so that

E( yi;t 1 yi;t 2) = E( 2i ) 6= 0

yi;t 2 is not a completely uninformative instrument for yi;t 1.


But Monte Carlo evidence suggests that …rst-di¤erenced GMM estimators

become very imprecise, and subject to serious …nite sample biases, for values

of around 0.8 and above, unless the available samples are huge.

The …nite sample bias is again found to be downward, in the direction of

the Within estimator, consistent with …ndings for 2SLS estimators in simple

cases where the weak instruments problem has been studied analytically.
Blundell and Bond (Journal of Econometrics, 1998) provide Monte Carlo

evidence, and develop an extended GMM estimator that is more useful for

estimating panel data models using very persistent series.

We will return to this after brie‡y considering how the basic GMM esti-

mator studied so far can be adapted for models with additional explanatory

variables.
Note also that if we combine

yit = i + "it

"it = "i;t 1 + vit

we obtain the alternative speci…cation

yit yi;t 1 = i i + "it "i;t 1

or

yit = yi;t 1 + (1 ) i + vit

In this speci…cation, the process for yit approaches a pure random walk as

! 1 (rather than a random walk with individual-speci…c drifts).


yit = yi;t 1 + (1 ) i + vit

Now at =1

yi;t 1 = vi;t 1

Consequently lagged levels are completely uninformative instruments for

yi;t 1 in the case where = 1, and is not identi…ed using only the

moment conditions

E(yi;t s vit) = 0 for t = 3; :::; T and s 2

for equations in …rst-di¤erences.


yit = yi;t 1 + (1 ) i + vit

Although for this model we can note that the OLS levels estimator is

consistent when = 1.

In this case a consistent test of the null hypothesis that = 1 can be

obtained using a simple t-test based on the pooled OLS estimate of .

You might also like