Appendix B
Appendix B
The LSDV estimator is consistent for the static model whether the e¤ects are …xed or random.
On the contrary, the LSDV estimator is inconsistent for a dynamic panel data model with individual
e¤ects, whether the e¤ects are …xed or random.
De…nition 1 ( Nickell’s bias (1981)) The bias of the LSDV estimator in a dynamic model is generally
known as dynamic panel bias or Nickel’s bias (1981).
i = + i
Pn
to avoid imposing the restriction that i=1 i = 0 or E ( i ) = 0 in the case of random individual
e¤ects.
Assumptions
1. The autoregressive parameter satis…es j j < 1.
2 2
3. The error term it i:i:d: 0; ; i.e., E( it ) = 0, E( it js ) = if j = i and t = s, and E( it js ) =0
otherwise.
bi = yi bLSDV y i; 1
n X
T
! 1 n X
T
!
X 2 X
bLSDV = yi;t 1 y i; 1 yi;t 1 y i; 1 (yit yi )
i=1 t=1 i=1 t=1
where
T T
1X 1X
yi = yit , y i; 1 = yi;t 1
T t=1 T t=1
1
Given that yit yi = yi;t 1 y i; 1 +( it i) ; the LSDV estimator can be written as
n X
T
! 1 ( n X
T n X
T
!)
X 2 X 2 X
bLSDV = yi;t 1 y i; 1 yi;t 1 y i; 1 + yi;t 1 y i; 1 ( it i)
i=1 t=1 i=1 t=1 i=1 t=1
n X
T
! 1 n X
T
!
X 2 X
= + yi;t 1 y i; 1 yi;t 1 y i; 1 ( it i)
i=1 t=1 i=1 t=1
De…nition 3 (bias)
n X
T
! 1 n X
T
!
X 2 X
bLSDV = yi;t 1 y i; 1 yi;t 1 y i; 1 ( it i)
i=1 t=1 i=1 t=1
Let us consider the numerator of equation (1). Because it are (1) uncorrelated with i and (2) are
independently and identically distributed, we have
n T n T n T
1 XX 1 XX 1 XX
plim yi;t 1 y i; 1 ( it i) = plim yi;t 1 it plim yi;t 1 i
n!1 nT n!1 nT n!1 nT
i=1 t=1 i=1 t=1 i=1 t=1
| {z } | {z }
N1 N2
n T n T
1 XX 1 XX
plim y i; 1 it + plim y i; 1 i (2)
n!1 nT n!1 nT
i=1 t=1 i=1 t=1
| {z } | {z }
N3 N4
Theorem 4 (Weak law of large numbers, Khinchine): If fXi g, for i = 1; 2; : : : ; m is a sequence of i.i.d.
random variables with E(Xi ) = < 1, then the sample mean converges in probability to :
m m
1 X p 1 X
Xi ! E(Xi ) = or plim Xi = E(Xi ) =
m i=1 m!1 m
i=1
Since (1) yi;t 1 only depends on i;t 1 , i;t 2 ,. . . and (2) the it are uncorrelated, then we have
E (yi;t 1 it ) =0
and …nally
n T
1 XX
N1 = plim yi;t 1 it =0
n!1 nT
i=1 t=1
2
For the second term N2 , we have:
n T n T
1 XX 1 X X
N2 = plim yi;t 1 i = plim i yi;t 1
n!1 nT n!1 nT
i=1 t=1 i=1 t=1
n n
1 X 1X
= plim i T y i; 1 = plim y i; 1 i
n!1 nT n!1 n
i=1 i=1
If this plim is not null, then the LSDV estimator bLSDV is biased when n tends to in…nity and T is
…xed.
Let us examine this plim
n
1X
plim y i; 1 i
n!1 n
i=1
We know that
yit = yi;t 1 + i + it
2
= yi;t 2 + (1 + ) i + it + i;t 1
3 2 2
= yi;t 3 + 1+ + i + it + i;t 1 + i;t 2
..
.
t
t 1 2 t 1
= yi0 + i + it + i;t 1 + i;t 2 + + i1
1
3
Summing yi;t 1 over t, we get:
T
X T X
X t 2 T T
l (T 1) T + 1
yi;t 1 = i;t 1 l + 2 i + yi0
t=1 t=1 l=0 (1 ) 1
T
1X
y i; 1 = yi;t 1
T t=1
( T t 2 )
1 XX l (T 1) T + T
1 T
= i;t 1 l + 2 i + yi0
T t=1 (1 ) 1
l=0
E (yi0 is ) = 0 for s = 1; 2; : : : ; T
Therefore,
n T X
t 2 T
1X 2 X
l
2 X 1 t 1
plim y i; 1 i = =
n!1 n
i=1
T2 t=1 l=0
T2 t=1
1
( )
2 T
T 1 = (1 )
=
T2 1
( )
2 T
(T 1) T +
= 2
T2 (1 )
4
2
Theorem 5 1: If the error (idiosyncratic) terms it are i.i.d. 0; , we have:
n T n
( )
1 XX 1X 2
(T 1) T + T
plim yi;t 1 y i; 1 ( it i) = plim y i; 1 i = 2
n!1 nT
i=1 t=1
n!1 n
i=1
T2 (1 )
n T n T n
1 XX 2 1 XX 2 1X 2
plim yi;t 1 y i; 1 = plim yi;t 1 plim y i; 1
n!1 nT n!1 nT n!1 n
i=1 t=1 i=1 t=1 i=1
T
1X 2
= E yi;t 1 E y 2i; 1 (5)
T t=1
However,
yi;0 = yi; 1 + i + i0
1
X
i i0 i j
yi;0 = + = + i;0 j
1 1 L 1 j=0
i
) E yi;0 =
1
00 12 1
h i 1
2 B X A C
2
) var yi;0 = E yi;0 E yi;0 = E @@ j
i;0 j A= 2
j=0
1
2 2 2
2 2 i
) E yi;0 = 2
+ E yi;0 = 2
+ (7)
1 1 1
8 0 19
< X1 = 2
) E i yi;0 =E @ i
+ j
i;0 j
A = i
(8)
: i 1 ; 1
j=0
5
( ) 2
Xt 2
2
1 t 1
1 t 1
2 j 2 2 t 1 2 t 1
E yi;t 1 =E t 1 j + E i + E yi0 +2 E( i yi0 )
j=0 1 1
as the expectation of the other cross product terms are zero, using the results in (7) and (8), we have
( )
2 2 2 2
2 2(t 1) t 1 2 i 2 t 1 i
E yi;t 1 = 2
1 + 1 + 2
+
1 1 1 1
2
t 1 t 1 i
+2 1
1
2 n o 2 n o
2(t 1) 2(t 1) i t 1 2 2 t 1 t 1 t 1
= 2
1 + + 1 + +2 1
1 1
2 2 n o
i t 1 2(t 1) 2(t 1) t 1 2(t 1)
= 2
+ 1 2 + + +2 2
1 1
2 2
i
= 2
+ (9)
1 1
However,
yi1 = yi0 + i + i1
Therefore,
T
X T T
X1 t T
X1 T t
1 1 1
yi;t 1 = yi0 + i + it
t=1
1 t=1
1 t=1
1
Hence
T
( T
)
1X 1 1 T X1 1 t T
X1 1 T t
y i; 1 = yi;t 1 = yi0 + i + it
T t=1 T 1 t=1
1 t=1
1
( ! T
)
1 1 T 1 T 1 X1 1 T t
i
= yi0 + (T 1) + it
T 1 1 1 t=1
1
( 2
!2 T
!2
1 1 T 1 T 1 2 X1 1 T t
i
y 2i; 1 = 2
yi0 + (T 1) + it
T2 1 1 1 t=1
1
! )
T T 1
1 1 i yi0
+2 (T 1) + other cross product terms
1 1 1
6
Given the fact that other cross product terms have zero expectation and using the results in (7) and (8)
we obtain
( 2
! !2
T 2 2 T 1 2
1 1 i 1 i
E y 2i; 1 = + + (T 1)
T2 1 1 2 1 1 1
T 2
! )
X1 1 T t
1 T 1 T 1 2
2 i
+ +2 (T 1)
t=1
1 1 1 1
( 2 T
) ( 2
2
1 T
1 2 X1 1
2
1 T
T t 2 i
= + 2 1 + 2
T 2 (1 2) 1 (1 ) t=1
T 1 1
!2 !9
1 T 1
1 T 1 T 1 =
+ (T 1) +2 (T 1)
1 1 1 ;
( 2
!)
2 T 2 T 1 2 2(T 1)
1 1 2 1 1
= + 2 (T 1) +
T 2 (1 2) 1 (1 ) 1 1 2
8 !2
2< T 2 T 1
1 i 1 1
+ 2 + (T 1)
T 1 : 1 1
!)
T T 1
1 1
+2 (T 1) (11)
1 1
7
On the other hand, the second term in the right hand side of (11) simpli…es to
8 !2 !9
2< T 2 T 1 T T 1 =
1 i 1 1 1 1
+ (T 1) + 2 (T 1)
T2 1 : 1 1 1 1 ;
(
2 T 2 2 (T 1) 1 T 1 T 2
1 i 1 2
= + (T 1) +
T2 1 1 1 1
)
T T T
2 (T 1) 1 2 1
+ 2
1 (1 )
( ! )
2 T 2 T
1 i 1 2 2 (T 1) T
= 2
+ (T 1) + 1 + T + 2
T
2+2 T
T 1 1 1 (1 )
( 2
! )
2 T T
1 i 1 2 T
= 2
+ (T 1) + 2 (T 1) + 2 ( 1) + 1
T 1 1 (1 )
( ! )
2 T 2 T
1 i 1 2 T
= + T 1 2 (1 )+ 1
T2 1 1 (1 )
2 T T T T
1 i 1 1
= T2 1 +
T2 1 1 1 1 1
2 T T
1 i 1
= T2 1 +
T2 1 1 1
2 2
1 i i
= T2 1 +1 = (13)
T2 1 1
From the results in (12) and (13) the expression in (11) becomes
( )
2 T 2
1 2 T T 1+ i
E y 2i; 1 = 2)
+ 2 2
+ (14)
(1 T (1 ) T 1
The results in (10) and (14) imply that the plim in (5) is given by
n T T
1 XX 2 1X 2
plim yi;t 1 y i; 1 = E yi;t 1 E y 2i; 1
n!1 nT T t=1
i=1 t=1
( )
2 2 2 T
i 1 2 T T 1+
= 2
+ 2)
+ 2
1 1 (1 T (1 ) T2
2
i
1
( )
2 T
1 2 T T 1+
= 2
1 2 (15)
1 T (1 ) T2
8
T
T 1 T + (1 + )
=
2
T2 T (1 )2
(T T 1+ T) (1 )
T
(1 + ) T 1 T +
=
2
(1 ) T2 T (1 )2
(T T 1+ T)
The above expression imply that the semi-asymptotic bias can be rewritten as:
T
(1 + ) T 1 T +
plim (bLSDV )=
2
n!1 (1 ) T2 T (1 )2
(T T 1+ T)
Fact 1 If T also tends to in…nity, then the numerator converges to zero, and denominator converges to a
2 2
nonzero constant = 1 , hence the LSDV estimator of and i are consistent.
Fact 2 If T is …xed, then the denominator is a nonzero constant, and bLSDV and b i are inconsistent estimators
when n is large.
Theorem 6 (Dynamic panel bias) In a dynamic panel AR(1) model with individual e¤ ects, the semi-
asymptotic bias (with n) of the LSDV estimator on the autoregressive parameter is equal to:
T
(1 + ) T 1 T +
plim (bLSDV )=
2
n!1 (1 ) T2 T (1 )2
(T T 1+ T)
Theorem 7 For an AR(1) model, the dynamic panel bias can be rewritten as :
T T 1
(1 + ) 11 2 1
plim (bLSDV )= 1 1 1
n!1 T 1 T 1 (T 1) (1 ) T (1 )
Fact 3 The dynamic bias of bLSDV is caused by having to eliminate the individual e¤ects i from each
observation, which creates a correlation of order (1=T ) between the explanatory variables and the
residuals in the transformed model
0 1 0 1
B C B C
yit yi = @yit y i; 1 A+@ it i A
| {z } |{z}
dep ends on past value of it
dep ends on past value of it
yit yi = yit y i; 1 +( it i)
9
Therefore, !
n
1X 2
T (1 ) 1+ T
plim y i; 1 i = cov y i; 1; i = 2
n!1 n
i=1
T2 (1 )
Remark 8
T T 1
(1 + ) 1 1 2 1
plim (bLSDV )= 1 1 1
n!1 T 1 T (1 ) (T 1) (1 ) T (1 )
10
Estimation of an AR(1) Panel Data Model
To solve the inconsistency problem of the FE estimator of an AR(1) panel data model, we will use a
di¤erence transformation to eliminate the individual e¤ects i. This gives
If we estimate (1) by OLS we do not get a consistent estimator for because yi;t 1 and i;t 1 are,
by de…nition, correlated, even if T ! 1. However, this transformed model suggests an instrumental
variables approach. For example, yi;t 2 is correlated with yi;t 1 yi;t 2 but not with ( it i;t 1 ),
for either T , or n, or both going to in…nity. The estimator in (2) is one of the estimators proposed
by Anderson and Hsiao (1981). They also proposed an alternative, where yi;t 2 yi;t 3 is used as an
instrument. This gives
Pn PT
(2) (yi;t 2 yi;t 3 ) (yit yi;t 1 )
bIV = Pn i=1PT t=2 (4)
i=1 t=2 (yi;t 2 yi;t 3 ) (yi;t 1 yi;t 2 )
Consistency of both of these estimators is guaranteed by the assumption that it has no autocorrelation.
Note that the second instrumental variables estimator requires an additional lag to construct the
instrument, such that the e¤ective number of observations used in estimation is reduced (one sample
period is ‘lost’).
A method of moments approach can unify the estimators and eliminate the disadvantages of reduced
sample sizes.
Both IV estimators thus impose one moment condition in estimation. It is well known that imposing
more moment conditions increases the e¢ ciency of the estimators (provided the additional conditions
are valid, of course).
11
Arellano and Bond (1991) suggest that the list of instruments can be extended by exploiting additional
moment conditions and letting their number vary with t. To do this, they keep T …xed. For example,
when T = 4, we have
E f( i2 i1 ) yi0 g =0
E f( i3 i2 ) yi1 g =0
For period t = 4, we have three moment conditions and three valid instruments
E f( i4 i3 ) yi0 g = 0
E f( i4 i3 ) yi1 g = 0
E f( i4 i3 ) yi2 g = 0
All these moment conditions can be exploited in a GMM framework. To introduce the GMM estimator
de…ne for general sample size T , 0 1
i2 i1
B C
B .. C
i =B . C (6)
@ A
i;T i;T 1
as the matrix of instruments. Each row in the matrix Zi contains the instruments that are valid for a
given period. Consequently, the set of all moment conditions can be written concisely as
E fZ0i ig =0 (8)
1
Note that these are 1 + 2 + 3+ +T 1 = 2T (T 1) moment conditions. To derive the GMM
estimator, write this as
E fZ0i ( yi yi; 1 )g =0 (9)
12
Because the number of moment conditions will typically exceed the number of unknown coe¢ cients,
we estimate by minimizing a quadratic expression in terms of the corresponding sample moments,
that is " #0 " #
n n
1X 0 1X 0
min Z ( yi yi; 1) Wn Z ( yi yi; 1) (10)
n i=1 i n i=1 i
where Wn is a symmetric positive de…nite weighting matrix. Di¤erentiating this with respect to and
solving for gives
n
! n
!! 1 n
! n
!!
X X X X
bGM M = yi;0 1 Zi Wn Z0i yi; 1 yi;0 1 Zi Wn Z0i yi (11)
i=1 i=1 i=1 i=1
The properties of this estimator depend upon the choice for Wn , although it is consistent as long as
Wn is positive de…nite, for example, for Wn = I, the identity matrix.
The optimal weighting matrix is the one that gives the most e¢ cient estimator, i.e. that gives the
smallest asymptotic covariance matrix for bGM M .
From the general theory of GMM, we know that the optimal weighting matrix is (asymptotically)
proportional to the inverse of the covariance matrix of the sample moments. In this case, this means
that the optimal weighting matrix should satisfy
1 1
plim Wn = [var (Z0i i )] = [E (Z0i i
0
i Zi )] (12)
n!1
In the standard case where no restrictions are imposed upon the covariance matrix of i , this can
be estimated using a …rst-step consistent estimator of and replacing the expectation operator by a
sample average. This gives
n
! 1
c opt 1X 0
W n = Z bi b0i Zi (13)
n i=1 i
where bi is a residual vector from a …rst-step consistent estimator, for example using Wn = I.
The general GMM approach does not impose that it is i:i:d: over individuals and time, and the optimal
weighting matrix is thus estimated without imposing these restrictions.
Note, however, that the absence of autocorrelation was needed to guarantee the validity of the moment
conditions.
Instead of estimating unrestricted optimal weighting matrix, it is possible (and potentially advisable
in small samples) to impose the absence of autocorrelation in it , combined with a homoscedasticity
assumption. Note that under these restrictions
0 1
2 1 0
B C
B .. C
B 1 2 . 0 C
0 2 2B C
E( i i ) = G = B .. .. C (14)
B 0 . . 1 C
B C
@ . A
.. 0 1 2
13
the optimal weighting matrix can be determined as
n
! 1
1X 0
Wnopt = Z GZi (15)
n i=1 i
Note that this matrix does not involve unknown parameters, so that the optimal GMM estimator can
be computed in one step if the original errors it are assumed to be homoscedastic and exhibit no
autocorrelation.
Under weak regularity conditions, the GMM estimator for is asymptotically normal for n ! 1 and
…xed T , with its covariance matrix given by
0 ! ! !1 1
n n 1 n
1 X 1X 0 X
plim @ 0
yi; 1 Zi Z i
0
i Zi yi;0 1 Zi
A (16)
n!1 n i=1 n i=1 i i=1
Alvarez and Arellano (2003) showed that, in general, the GMM estimator is also consistent when both
n and T tend to in…nity despite the fact that the number of moment conditions tends to in…nity with
the sample size. For large T , however, the GMM estimator will be close to the …xed e¤ects estimator,
which is a more attractive alternative.
which can also be estimated by the generalized instrumental variables or GMM approach.
Depending upon the assumptions made about xit , di¤erent sets of additional instruments can be
constructed. If the xit are strictly exogenous in the sense that they are uncorrelated with any of the
is error terms, we also have that
so that xi1 ; :::; xiT can be added to the instruments list for the …rst-di¤erenced equation in each period.
This would make the number of rows in Z0i quite large. Instead, almost the same level of information
may be retained when the …rst-di¤erenced xit s are used as their own instruments. In this case, we
impose the moment conditions
E f xit it g =0 for each t (3)
14
and the instrument matrix can be written as
0 h i 1
0 0 0
B y i0 ; xi2 C
B h i C
B 0 yi0 ; yi1 x0i3 0 C
B C
Zi = B . .. .. .. C
B .. . C
B . . C
@ h i A
0 0 yi0 ; ; yi;T 2 x0i;T
If the xit variables are not strictly exogenous but predetermined, in which case current and lagged xit s
are uncorrelated with current error terms, we only have that Efxit is g = 0 for s t. In this case, only
xi;t 1 ; :::; xi1 are valid instruments for the …rst-di¤erenced equation in period t. Thus, the moment
conditions that can be imposed are
In practice, a combination of strictly exogenous and predetermined x-variables may occur rather than
one of these two extreme cases. The matrix Zi should then be adjusted accordingly.
15