0% found this document useful (0 votes)
61 views15 pages

Linear Method of Moments 1.1. The Model

This document describes the method of moments estimation technique. It discusses how to estimate parameters in linear models where the usual regression assumptions may not hold due to contemporaneous dependence between errors. Instrumental variables can be used to obtain moment conditions that allow consistent parameter estimation. The document discusses choosing weighting matrices to obtain efficient estimators, and how to test the validity of instruments. It also provides examples of applying the method of moments to static and dynamic panel data models.

Uploaded by

조성철
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views15 pages

Linear Method of Moments 1.1. The Model

This document describes the method of moments estimation technique. It discusses how to estimate parameters in linear models where the usual regression assumptions may not hold due to contemporaneous dependence between errors. Instrumental variables can be used to obtain moment conditions that allow consistent parameter estimation. The document discusses choosing weighting matrices to obtain efficient estimators, and how to test the validity of instruments. It also provides examples of applying the method of moments to static and dynamic panel data models.

Uploaded by

조성철
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

METHOD OF MOMENTS

Herman J. Bierens Pennsylvania State University September 19, 2005

1. 1.1.

Linear method of moments The model Consider a system of k linear equations, yi,t ' xi,t i % ui,t, t ' 1,...,n, i ' 1,...,k, i 0 i ,
T p

(1)

where the xi.,t vectors possibly contain some of the dependent variables yj,t , and the errors ui,t have zero expectation but are contemporaneously dependent. Consequently, the usual regression assumption E[ui,t*xi,t] ' 0 may not apply. However, suppose we have qi-vectors zi.,t of instrumental variables such that E[ui,tzi,t] ' 0 , hence E[zi,txi,t ] i ' E[zi,tyi,t ]
T

(2)

is a system of qi linear equations in pi unknown elements of i . If qi $ pi , and the rank of the matrix E[zi,txi,t ] is pi or larger, the parameter vector i is identified by the moment conditions (2). Note that this case is only one of the many cases for which the method of moment estimation approach is applicable. For example, least squares and two-stage least squares estimation are special cases of method of moment estimation techniques. Denoting p ' j pi , q ' j qi ,
k k i'1 i'1 T

(3)

we can write this model in vector form as y t ' X t 0 % u t ,


T

E[Ztut] ' 0 ,

(4)

where y1,t yt ' ! , Xt ' yk,t x1,t 0 ! 0 0 ... 0 (pk) , 0 ' 1 ! , ut ' k u1,t ! uk,t , (5)

x2,t ... 0 ! 0 " ! ... xk,t

and z1,t 0 ... 0 Zt ' 0 z2,t ... 0 ! 0 ! " ! 0 ... zk,t (qk) . (6)

The moment conditions (2) now take the form E[ZtXt ]0 ' E[Ztyt] .
T

(7)

The implicit assumption that the parameter vectors i in model (1) are different is not essential. If some or all of the components of the parameter vectors i are common, we may augment the xi.,t vectors with zeros corresponding to the parameters that are not part of the equation involved, and write model (1) as yi,t ' xi,t 0 % ui,t, t ' 1,...,n, i ' 1,...,k, 0 0 p . The only difference is then that the matrix Xt in (5) becomes Xt ' (x1,t , ..... , xk,t) (pk) . (9)
T

(8)

The moment conditions (7) suggest to estimate 0 by minimizing the quadratic form where Qn() ' Mn()TWnMn() , (10)

Mn() '

1 1 T j Z t y t & X t ' j Z ty t & n t'1 n t'1


n n

1 T j Z tX t . n t'1
n

(11)

and Wn is a positive definite q q matrix, to be determined later. Thus, the method of moment estimator involved is: ' argmin Qn() .

(12)

If q = p, the solution is the same as the solution of Mn() ' 0 , namely ' 1 T j Z tX t n t'1
n &1

1 j Zy , n t'1 t t
n

(13)

provided that the inverted matrix is nonsingular, which is known as the just identified case, but the overidentified case q > p is more interesting and challenging. The first-order condition for a minimum of Qn() is: MQn() MT ' 2 MMn()T MT WnMn() ' &2 1 1 1 T T j X tZ t W n j Z ty t & j Z tX t n t'1 n t'1 n t'1
n n n

' 0

(14)

hence the solution is: ' 1 1 T T j X tZ t W n j Z tX t n t'1 n t'1


n n &1

1 1 T j XtZt Wn j Z tyt n t'1 n t'1


n n

' 0 %

1 1 T T j X tZ t W n j Z tX t n t'1 n t'1
n n

&1

(15) . 1 1 T j XtZt Wn j Z tut n t'1 n t'1


n n

Now assume that 1 1 T T j Ztu t 6 Nk(0 , A) in distr. , where A ' plim j Ztu tut Z t . n64 n t'1 n t'1
n n

Assumption 1:

This condition is satisfied if for example Ztut is i.i.d. and the variance matrix A of Ztut is finite. Moreover, assume that 3

Assumption 2: B ' plimn64(1/n)'t'1XtZ t exists and is finite, Assumption 3: W ' plimn64Wn is finite and positive definite . Assumption 4: B WB T is nonsingular .

Then under Assumptions 1-4, n( & ) 6 Np[0 , (BWB T )&1(BWAWB T )(BWB T )&1] in distr. Because the variance matrix involved depends on W, the question now arises: (16)

1.2.

What is the best choice for Wn ? In order to answer this question, consider the linear regression model y ' A &1/2B T % e, e - N(0,I) . (17)

It follows from the Gauss-Markov theorem that the best linear unbiased estimator of is the least squares estimator: ' (BA &1B T )&1BA &1/2y ' % (BA &1B T )&1BA &1/2e - N[ , (BA &1B T )&1] Next, consider the alternative unbiased estimator ' [(BWB T )&1BWA 1/2]y ' % [(BWB T )&1BWA 1/2]e - N[ , (BWB T )&1(BWAWB T )(BWB T )&1] By the Gauss-Markov theorem, D ' (BWB T )&1(BWAWB T )(BWB T )&1 & (BA &1B T )&1 is a positive semi-definite matrix. The direct proof follows from the fact that we can write D ' (BWB T )&1BWA 1/2 I & A &1/2B T BA &1B T
&1

(18)

(19)

(20)

BA &1/2 A 1/2WB T(BWB T )&1

(21)

and that the matrix I & A &1/2B T BA &1B T

&1

BA &1/2 is idempotent, hence positive semi-definite.

Since D = O if W ' A &1 , the best choice for Wn is therefore such that plimWn ' A &1 .
n64

(22)

The efficient method of moment estimation procedure is now as follows. First, choose an initial matrix Wn, for example, let Wn = Iq. Then compute the first stage method of moment estimator , and denote 1 T T T A ' j Ztutut Z t , where ut ' yt & Xt . n t'1
n

(23)

Next, choose &1 Wn ' A , (24)

which under Assumptions 1-4 is a consistent estimator of A!1. Using this matrix Wn in the second stage now yields the efficient method of moment estimator: &1 EMM ' argmin Mn()TA Mn() ,

(25)

with limiting normal distribution: n EMM & 0 6 Np[0 , (BA &1B T )&1] . Moreover, denoting 1 (27) T B ' j X tZ t , n t'1 it follows from Assumptions 1,2, and 4 that the asymptotic variance matrix in (26) can be
n

(26)

&1 T estimated consistently by (BA B )&1 .

1.3.

Testing the adequacy of the instruments It follows from (15) with (24) and (27) that &1 T &1 BA &1 n(EMM & 0 ) ' BA B 5 1 j Z tu t , n t'1
n

(28)

hence it follows from (11) and Assumption 1 that &1/2 &1/2 1 &1/2 T nA Mn(EMM) ' A j Ztu t & A B n(EMM & 0) n t'1
n

&1/2 & A &1/2B T BA &1B T &1 BA &1 ' A

j Z tu t n t'1
n n

(29)

&1/2 T &1 T &1 BA &1/2 ' Iq & A B BA B

&1/2 1 A j Z tu t n t'1

6 N q[0 , M] in distribution , where M ' Iq & A &1/2B T(BA &1B T )&1BA &1/2 . Since M is idempotent (Exercise: Why?), it follows now that under Assumptions 1-4,
2 &1 nMn(EMM)TA Mn(EMM) 6 q&p in distribution .

(30)

(31)

(Exercise: Why?) On the other hand, if 1 plim j Z tut ' 0 , n64 n t'1
n

(32)

and q > p then under Assumptions 2-4, &1/2 plimA Mn(EMM) ' A &1/2 & A &1/2B T BA &1B T &1 BA &1 0
n64

(33)

hence &1 plim n.Mn(EMM)TA Mn(EMM) ' 4 .


n64

(34)

&1 Therefore, we can use nMn(EMM)TA Mn(EMM) as a test for the adequacy of the instruments.

1.4.

Application to static panel data models A static panel data model takes the form1 yi,t ' xi,t 0 % i % gi,t, t ' 1,...,T, i ' 1,...,N, 0 0 p ,
( (T

(35)

where i is a fixed or random effect which is constant over time t but varies with the crosssection index i, the xi,t are p1 vectors of exogenous variables, none of which are constant over time, and the gi,t are i.i.d. (0, 2 ) errors that are independent of the exogenous variables. It will be assumed that the cross-section dimension N is much larger than the time dimension T, so that T may be considered as fixed, whereas all the asymptotic properties follow from letting N 6 4 . In order to get rid of i , we take first differences: yi,t & yi,t&1 ' (xi,t & xi,t&1)T0 % gi,t & gi,t&1 , (36) t ' 2,...,T, i ' 1,...,N, 0 0 ,
p ( ( ( ( (

Since gi,t & gi,t&1 is independent of xi,t and xi,t&1 , we can choose either xi,t & xi,t&1 or xi,t and xi,t&1 as instruments. Choosing the latter, we can now write the model in vector form as y i ' X i 0 % u i , where yi,2 & yi,1 yi '
( ( ( T (

E[Z iui] ' 0 , i ' 1,....,N ,

(37)

gi,2 & gi,1 , Xi ' (xi,2 & xi,1 , ... , xi,T & xi,T&1) , ui '
( ( ( (

! yi,T & yi,T&1


(

! gi,T & gi,T&1


( (

(38)

and

Note that T now denotes the length of the time series, whereas the superscript T still denotes the "transpose". 7

xi,1 xi,2 0 Zt ' 0 ! 0 0 Note that


(

0 0
( (

... ...

0 0 0 0 !
(

xi,2 ... xi,3 ... ! 0 0 "

(qk) , q ' 2p , k ' T&1 .

(39)

... xi,T&1 ... xi,T


(

2 &1 0

0 0

0 0 0 0 ! ' 2 , say. (40)

&1 2 &1 0 0 0 &1 2 &1 0 Var(u i) ' 2 0 ! 0 0 Hence 0 &1 2 0 ! 0 0 ! 0 0 ! " !

0 2 &1 0 &1 2

1 1 T T T A ' plim j Ziuiui Z t ' 2plim j Z iZt . N64 N i'1 N64 N i'1
N N

(41)

Therefore, if we choose WN ' 1 T j ZiZ t N i'1


N &1

(42)

as the weight matrix, we obtain the efficient method of moment estimator EMM in one step. The only difference with the general case is that N.MN(EMM)TWN MN(EMM) needs to be divided by a consistent estimate 2 of the variance 2 of the errors gi,t s in order to be used as a chi-square 8

test of model correctness. Of course, we also need 2 to estimate the asymptotic variance matrix
T of N (EMM & 0). For example, given the residual vector ui ' yi & Xi EMM , the variance 2

can be estimated consistently by 2 ' (Exercise: Why?) 1 T j ui u i . 2N(T&1) i'1


N

(43)

1.5.

Application to dynamic panel data models A dynamic panel data model takes the form yi,t ' 0yi,t&1 % xi,t 0 % i % gi,t, t ' 2,...,T, i ' 1,...,N, 0 0 p , *0* < 1 ,
( ( (T

(44)

where again i is a fixed or random effect which is constant over time t but varies with the crosssection index i, the xi,t are p1 vectors of exogenous variables which are not constant over time, and the gi,t are i.i.d. (0, 2 ) errors that are independent of the exogenous variables. Also now it will be assumed that the cross-section dimension N is much larger than the time dimension T, so that T may be considered as fixed, whereas all the asymptotic properties follow from letting N 6 4. Taking first differences yields for t ' 3,...,T, i ' 1,...,N , yi,t & yi,t&1 ' 0(yi,t&1 & yi,t&2) % (xi,t & xi,t&1)T0 % gi,t & gi,t&1 .
( ( ( ( ( ( (

(45)

Due to the dynamic structure of the model, we now have a much richer choice of instruments, because 0(yi,t&1 & yi,t&2) % (xi,t & xi,t&1)T0 depends on yi,t&2&j for j ' 0,...,t&2 as well as on xi,t&j for j ' 0,...,t . Denoting Xi ' (yi,2 & yi,1 , xi,3 & xi,2 , , yi,T&1 & yi,T&2 , xi,T & xi,T&1)
( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (

(46)

0 '

0 0

yi,3 & yi,2 , yi '


(

gi,3 & gi,2 , ui ' ! gi,T & gi,T&1


( (

! yi,T & yi,T&1 9


(

(47)

xi,1 xi,1 ... xi,2 xi,2 ... xi,3 xi,3 ... yi,1 xi,4 ... 0 Zt ' 0 ! 0 0 ! 0 yi,1 ... yi,2 ... ! 0 0 ! 0 " ... ... "
( ( ( ( ( ( ( (

xi,1 xi,2 xi,3 xi,4 xi,5 xi,6 ! xi,T yi,1 !


( ( ( ( ( ( ( (

(qk) , q ' pT%T&1 , k ' T&2 ,

(48)

... yi,T&2

we can write the model again as (37). Therefore, the same results as in the previous section hold, except that we have to modify (43) to 2 ' (Exercise: Why?) 1 T j ui u i . 2N(T&2) i'1
N

(49)

2. 2.1

Nonlinear method of moments The model Consider now the case where a model for a random vector Xt 0 k is implicitly defined

by a set of moment conditions: 1(Xt , ) mt() ' ! q(Xt , ) where q $ p, is the parameter space, 0 is the parameter vector of interest. The random vectors 10 , 0 d p , 0 0 : E[mt(0)] ' 0 , (50)

Xt are observable for t = 1,..,n. For convenience of the exposition we will assume that

Assumption A: the Xt s are i.i.d.,

but under some mild additional conditions the results below will also hold if the Xt s are realizations of a stationary vector time series process, or are panel data observations. The following assumptions allow us to apply the central limit theorem and the uniform law of large numbers: Assumption B: The functions i(x,) are twice continuously differentiable in , and for each 0 Borel measurable in x 0 k . The parameter space is compact and convex, and 0 is an interior point of .

Assumption C: For i = 1,...,k, E sup i(Xt , )


0 2

< 4 , E sup 4 5 0 5 5

M i(Xt , ) MT

M2 i(Xt , ) 45 45 < 4 , E sup 5 55 0 5 MMT 5

45 < 4 . 55 5

In the latter case one should interpret the matrix norm 2@2 as the maximum of the absolute values of the elements of the matrix involved. Moreover, in order for the parameter vector 0 to be identified, we need to ensure that Assumption D: E[mt()] ' 0 if and only if ' 0 . 2.2. Strong consistency Denote Mn() ' Under Assumptions A-C we have 11 1 j m () , M() ' E[mt()] . n t'1 t
n

(51)

sup 2Mn() & M()2 6 0 a.s. ,


0

(52)

hence, denoting Qn() ' Mn()TWnMn() , Q() ' M()TW M() , where Wn 6 W a.s. , with W a positive definite symmetric matrix , it follows that sup *Qn() & Q()* 6 0 a.s. .
0

(53)

(54)

(55)

This result, together with Assumption D, imply that ' argmin Qn() 6 0 a.s.
0

(56)

(Exercise: Why?)

3.3.

Asymptotic normality Assumptions A-C are also sufficient conditions for the application of the central limit

theorem: 1
T j mt(0) 6 Nq[0 , A] in distr. , where A ' E[mt(0)mt(0) ] . n

n t'1

(57)

The limiting normal distribution of n( & 0) can be derived as follows. The first-order conditions for a minimum of Qn() are: MQn() Mi ' 2 MMn()T Mi WnMn() ' 2
n Mm ()T 1 t WnMn() ' 0 j n t'1 Mi

(58)

for i = 1,...,p. If is on the border of the parameter space , these first-order conditions may not

12

hold for . However, since by Assumption B 0 is an interior point of , and 6 0 a.s., we have that P (1/n)'t'1(Mmt()T /Mi) WnMn() /
n

'

' 0 6 1

(59)

Next observe that by the mean value theorem, and the convexity of the parameter space , there exists a mean value i 0 , with 2i & 02 # 2 & 02 , such that for i = 1,...,p, n (1/n)'t'1(Mmt()T /Mi) WnMn() /
n '

'

n (1/n)'t'1(Mmt()T /Mi) WnMn() /


n

'0

M n % (1/n)'t'1(Mmt()T /Mi) WnMn() / n & 0 . 00 ' M


i

(60)

It follows from (59) that the left-hand side of (60) converges in probability to zero. Moreover, since M n (1/n)'t'1(Mmt()T /Mi) WnMn() ' Mj
n

(1/n)'t'1
n

M2mt()T MiMj

WnMn() (61)

% (1/n)'t'1(Mmt()T /Mi) Wn (1/n)'t'1(Mmt() /Mj) 6 E M2mt()T MiMj WM() % E(Mmt()T /Mi) W E(Mmt() /Mj) a.s., uniformly on ,

and i 6 0 a.s., it follow that M n (1/n)'t'1(Mmt()T /M1) WnMn() / 00 ' M C ' ! M n (1/n)'t'1(Mmt()T /Mp) WnMn() / 00 ' M (Exercise: Why?) where

6 BWB T a.s. ,

(62)

13

B ' E

Mmt()

. / M 000 ' 0

(63)

Furthermore, it follows from the strong law of large numbers that (1/n)'t'1(Mmt()T /M) /
n

'0

6 B a.s.

(64)

hence it follows from (54) and (57) that n (1/n)'t'1(Mmt()T /M) WnMn() /
n

'0

6 N p(0 , BWAWB T) in distr.

(65)

(Exercise: Why?) Combining the results (59), (60), (62), and (65), now yield (Exercise: Why?) n( & ) 6 Np[0 , (BWB T )&1(BWAWB T )(BWB T )&1] in distr. provided that BWBT is nonsingular. Again, the variance matrix involved is the smallest for W ' A &1 , provided of course that Assumption E: The matrix B A &1 B T is nonsingular. (66)

&1 Thus, Wn ' A is an optimal choice. Finally, observe that under Assumptions A-D, 1 A ' j mt()mt()T n t'1
n n Mm () t 1 6 A a.s. , B ' j 6 B a.s.. / n t'1 M 000 '

'

(67)

(Exercise: Why?) Thus, under Assumptions A-E the efficient method of moment estimator &1 EMM ' argmin Mn()TA Mn() ,

(68)

is strongly consistent: EMM 6 0 a.s., and has limiting normal distribution: n EMM & 0 6 Np[0 , (BA &1B T )&1] . Moreover, 14 (69)

&1 T (BA B )&1 6 (BA &1B T )&1 a.s. Finally, observe that similarly to the linear case, under Assumptions A-E and the null hypothesis H0 : E[mt(0)] ' 0 for some 0 0 ,
2 &1 nMn(EMM)TA Mn(EMM) 6 q&p in distr. ,

(70)

(71)

whereas under the alternative hypothesis that the null hypothesis is incorrect, and the maintained Assumptions A, B,C,E, nMn(EMM)TA Mn(EMM) 6 4 a.s.
&1

(72)

Thus, the left-hand side of (71) is the test statistic of the Wald test that the moment conditions involved are correct.

15

You might also like