Advanced Econometrics (PDFDrive)
Advanced Econometrics (PDFDrive)
Winter 2013/2014
y = Xβ + u
u ∼ N 0, σ 2 I
Gauss-Markov theorem
yt = xt (β) + ut
ut ∼ IID(0, σ 2 )
with respect to β
Usually, the minimization must be done numerically
µ1 = g1 (θ1 , . . . , θr )
..
.
µr = gr (θ1 , . . . , θr )
θ1 = h1 (µ1 , . . . , µr )
..
.
θr = hr (µ1 , . . . , µr )
Is λ̂ unbiased?
Is λ̂ unbiased?
√
Alternative: Var (X ) = 1/λ2 , then λ̂ = 1/ S 2
∂ ln L/∂θ1 = 0
..
.
∂ ln L/∂θr = 0
Log-likelihood
n
X
ln L(λ; x1 , . . . , xn ) = n ln λ − λ xi
i=1
hence
n 1
λ̂ = Pn =
i=1 xi x̄
The ML estimator for λ is
1
λ̂ =
X̄
For all θ
Z Z
e ln L(θ) dx = L (θ; x1 , . . . , xn ) dx
= 1
∂ ln fX (Xi ; θ)
Gij (θ, Xi ) =
∂θj
Equivariance:
Let θ̂ be the ML estimator of θ
Let ψ = h(θ) be a one-to-one function of θ with inverse h−1 (ψ) = θ
Then the ML estimator of ψ satisfies
Consistency
The parameter θ is identified if for all θ0 6= θ and data x1 , . . . , xn
ln L θ0 |x1 , . . . , xn 6= ln L (θ|x1 , . . . , xn )
Asymptotic normality
By definition, the ML estimator satisfies
g (θ̂) = 0
See numnormal.R
Point estimates
µ̂ 3.64025
=
σ̂ 2 6.90869
See numnormal.R
Point estimates
µ̂ 3.64025
=
σ̂ 2 6.90869
Estimated covariance matrix derived from theory
2
0.13817 0
Cov µ̂, σ̂ =
d
0 1.90920
4.0e−13
0.0e+00
θ
4 5 6
Maximum is at θ̂ = maxi xi
The estimator is consistent but not asymptotically normal
Illustration in R
can be factorized as
T
Y
fX1 (x1 ) · fXt |X1 =x1 ,...,Xt−1 =xt−1 (xt )
t=2
Loglikelihood
T
X
ln L = ln fX1 (x1 ) + ln fXt |X1 =x1 ,...,Xt−1 =xt−1 (xt )
t=2
fXt |X1 =x1 ,...,Xt−1 =xt−1 (xt ) = fXt |Xt−1 =xt−1 (xt )
Example:
Let X1 , . . . , Xn be a random sample from X ∼ Exp(λ)
Test H0 : λ = 4 against H1 : λ 6= 4
Different notation:
H0 : r (λ) = 0
where r (λ) = λ − 4
See threetests.R
Wald test
Hypotheses
H0 : r (θ) = 0
H1 : r (θ) 6= 0
with
∂r (θ̂ML ) ∂r (θ̂ML )
Cov (r (θ̂ML )) = 0
· Cov (θ̂ML ) ·
∂θ ∂θ
Remember: If X ∼ N(µ, Σ), then (X − µ)0 Σ−1 (X − µ) ∼ χ2m
Wald test statistic
h i−1 asy
W = r (θ̂ML )0 Cov (r (θ̂ML )) r (θ̂ML ) ∼ χ2m
Remarks:
Reject H0 if W is larger than the (1 − α)-quantile of the
χ2m -distribution
Usually, Cov (r (θ̂ML )) must be replaced by Cov
d (r (θ̂ML ))
The Wald test is not invariant with respect to re-parametrizations
The Wald test only requires the unrestricted ML estimator
Ideal, if θ̂ML is much easier to calculate than θ̂R
asy
Asymptotic distribution: LR ∼ χ2m
Remarks:
Reject H0 if LR is larger than the (1 − α)-quantile of the
χ2m -distribution
To compute LR, one requires both the unrestricted estimator θ̂ML and
the restricted estimator θ̂R
Ideal, if both θ̂ML and θ̂R are easy to calculate
The LR test is often used to compare different models to each other
with !
∂ 2 ln L(θ̂R )
I (θ̂R ) = −E
∂θ∂θ0
Remarks:
Reject H0 if LM is larger than the (1 − α)-quantile of the
χ2m -distribution
The LM test only requires the restricted estimator
Ideal, if θ̂R is much easier to calculate than θ̂ML
The LM test is often used to test misspecifications
(heteroskedasticity, autocorrelation, omitted variables etc.)
Asymptotically, the three tests are equivalent
Multivariate case
Example: Production function
a1 a2
Yi = Xi1 · Xi2 + ui
E (ut |Ωt ) = 0
Errors in variables
Consider the model
yt = α + β1 x1t + ut
where ut = β2 x2t + εt
If x2t and x1t are correlated then so are ut and x1t
Endogeneity
Standard example: supply and demand curves determine both price
and quantity
qt = γd pt + Xtd βd + utd
qt = γs pt + Xts βs + uts
Since qt and pt depend on both utd and uts single equation OLS
estimation of
qt = γd pt + Xtd βd + utd
qt = γs pt + Xts βs + uts
is inconsistent
The right hand side variable pt is correlated with the error term
The condition E (ut |Ωt ) = 0 is violated if pt is in Ωt
●
●●
●
●●
● ●● ●
●●
●
●
● ● ●● ●
150
●
●●●
●
● ● ●
●
● ●●●
● ●
●●●
●
●
●● true regression line
● ● ●
100
●●●● ●
●● ●
y
●●
●
●●
●
●●
●●
●● ●●
50
●● ●
●
●●
● ●●
● ●●
●●
●● ●●
●●
● ●
●
●●●● ●
0
●
● ●●
10 20 30 40 50
E W 0 u = E W 0 (y − X β) = 0
Instruments must be
1 exogenous, i.e. plim n1 W 0 u = 0
2 valid, i.e. plim n1 W 0 X = SWX non-singular
Natural experiments (weather, earthquakes, . . . )
Angrist and Pischke (2009):
Good instruments come from a combination of institutional knowledge and
ideas about the processes determining the variable of interest.
Examples
Natural experiments
1 Brı̈¿ 21 ckner and Ciccone: Rain and the democratic window of
opportunity, Econometrica 79 (2011) 923-947
2 Angrist and Evans: Children and their parents’ labor supply: Evidence
from exogenous variation in family size, American Economic Review
88 (1998) 450-77.
Examples
Institutional arrangements
1 Angrist and Krueger: Does Compulsory School Attendance Affect
Schooling and Earnings?, Quarterly Journal of Economics 106 (1991)
979-1014.
2 Levitt: The Effect of Prison Population Size on Crime Rates: Evidence
from Prison Overcrowding Litigation, Quarterly Journal of Economics
111 (1996) 319-351.
yt = α + βxt∗ + ut
xt∗ = ρxt−1
∗
+ εt
xt = xt∗ + vt .
yt = α + β1 x1t + β2 xt2 + ut
x1t = ρ11 x1,t−1 + ρ12 x2,t−1 + ε1t
x2t = ρ21 x1,t−1 + ρ22 x2,t−1 + ε2t
yt = α + β1 xt + β2 yt−1 + ut
xt = γ + δ1 yt + δ2 xt−1 + vt
with PW = W (W 0 W )−1 W 0
Consistency and asymptotic normality still hold
Hence, WJ is similar to X β̂
The optimal instruments are obtained if we regress the
endogenous regressors on the instruments (1st stage), and
then use the fitted values as regressors (2nd stage)
with
−1 0
PW = W W 0W W
1 0
σ̂ 2 = y − X β̂IV y − X β̂IV
n
Asymptotic t-test
H0 : βi = βi0
H1 : βi 6= βi0
β̂i − βi0
t=r
Var
d β̂i
is asymptotically N(0, 1)
H0 : β2 = β20 , H1 : β2 6= β20
E (ut |Wt ) = 0
or E W 0 u = 0
nR 2 ∼ χ2m
Durbin-Wu-Hausman test
H0 : E X 0 u = 0
H1 : E W 0 u = 0
β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy
β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy
−1 0
−1
= X 0 PW X X 0 PW y − X 0 PW X X 0 X
Xy
β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy
−1 0
−1
= X 0 PW X X 0 PW y − X 0 PW X X 0 X
Xy
−1 −1 0
= X 0 PW X X 0 PW I − X X 0 X X y
β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy
−1 0
−1
X 0 PW X X 0 PW y − X 0 PW X X 0 X
= Xy
−1 −1 0
= X 0 PW X X 0 PW I − X X 0 X X y
−1
X 0 PW X X 0 PW M X y
=
y = X β + PW X̃ δ + u
Eθ (ft (θ, yt )) = 0
y = Xβ + u
u ∼ N(0, σ 2 I ), independent of X
Parameter vector θ =?
Observations yt =?
Elementary zero functions ft (θ, yt ) =?
X ∼ LN(µ, σ 2 )
Parameter vector θ =?
Observations yt =?
Elementary zero functions ft (θ, yt ) =?
Covariance matrix
E f (θ, y ) f (θ, y )0 = Ω
E (f (θ, y ) f (θ, y )0 ) = E u u 0
= σ2I
E u u0 = Ω
= ?
E Zt0 ft (θ, yt ) = 0
Example (contd)
. . . and the estimating equations are
f11
f12
1 0 1 1 0 1 0 ... 1 0
..
Z f (θ, y ) =
n n 0 1 0 1 ... 0 1
.
fn1
fn2
1 Pn 1 2
nP t=1 Xt − exp µ + 2 σ 0
= 1 n =
n t=1 Xt2 − exp 2µ + 2σ 2 0
Consistency
Assume that a law of large numbers applies to n1 Z 0 f (θ, y )
Define the limiting estimation functions
1
α (θ) = plim Z 0 f (θ, y )
n
and the limiting estimation equations α (θ) = 0
The GMM estimator θ̂ is consistent if the asymptotic identification
condition holds, α (θ) 6= α (θ0 ) for all θ 6= θ0 [P]
Asymptotic normality
Simplified notation: ft (θ) = ft (θ, yt ), f (θ) = f (θ, y )
Additional assumption: ft (θ) is continuously differentiable at θ0
First order Taylor series expansion of
1 0
Z f (θ) = 0
n
in θ̂ around θ0 [P]
√
The asymptotic distribution of n θ̂ − θ0 is normal with
mean 0 and covariance matrix
−1 −1
1 0 1 0 1 0
plim Z F (θ0 ) plim Z ΩZ plim F (θ0 ) Z
n n n
Z = F (θ0 )
1 0 0 1
J W f (θ) = F (θ)0 W Σ̂−1 W 0 f (θ) = 0
n n
Attention
Many textbooks use a different notation
(and so does the gmm package in R)
The two approaches are equivalent
The moment conditions are notated as
p
The GMM estimator based on ḡn is consistent, θ̂ → θ
Asymptotic normality: Define the L × K matrix
n
∂ ḡn (θ, yt ) 1 X ∂g (xt , θ)
G (θ) = =
∂θ0 n ∂θ0
t=1
√ d
Assume that nḡn (θ, y ) → N (0, V ), then [P]
√
d
−1 0 −1
n θ̂ − θ0 → N 0, G 0 AG G AVAG G 0 A0 G
G0 A g = 0
K ×L L×L L×1 K ×1
J0 W0 f = 0
K ×L L×n n×1 K ×1
E f (θ, y ) f (θ, y )0 = Ω
is often unknown
There may be heteroskedasticity and autocorrelation in Ω
Although Ω cannot be estimated consistently, the term n1 W 0 ΩW can
be estimated consistently
Write
1
Σ = plimn→∞ W 0 ΩW
n
Assume that a suitable law of large numbers holds,
n n
1 XX
E ft fs Wt0 Ws
Σ = lim
n→∞ n
t=1 s=1
where ft = ft (θ, yt )
Then
n−1
X n−1
X
Γ(j) + Γ0 (j)
Σ = lim Γ(j) = lim Γ(0) +
n→∞ n→∞
j=−n+1 j=1
Newey-West estimator of Σ
p
X j
Σ̂ = Γ̂(0) + 1− Γ̂(j) + Γ̂0 (j)
p+1
j=1
Economic model
yt = G (yt−1 , xt , ut ; β) , t = 1, . . . , T
yt = εt − βεt−1
θ = θ(F )
by the distribution of
How is F estimated?
parametric −→ parametric bootstrap
nonparametric −→ nonparametric bootstrap
smoothed −→ smooth bootstrap
model based
Applications
bias and standard errors
confidence intervals
hypothesis tests
v
u B
u 1 X ∗ 2
SE (X̄ ) = t X̄i − X̄ ∗
B −1
i=1
!
1 X ∗
λ̂b − λ̂
B
b
∗ ∗ ∗
original
edf * 1. resample: X1 , . . . , Xn → θ̂1
F̂ = Fn 2. resample: X1 , . . . , Xn → θ̂2∗
∗ ∗
sample −→ −→
or ..
X1 , . . . , Xn .
F̂ = Fθ̂ B. resample: X1∗ , . . . , Xn∗ → θ̂B∗
v
u B
u 1 X ∗ 2
−→ SE (θ̂) = t θ̂b − θ̂∗
B −1
b=1
θ̂ − θ
Then h i
θ̂ − c2 , θ̂ − c1
θ̂∗ − θ̂
H 0 : θ = θ0
H1 : θ 6= θ0
at significance level α
Assumption: Random sample (univariate or multivariate)
Test statistic
T = θ̂ − θ0
T ∗ = θ̂∗ − θ̂
T # = θ̂# − θ0
T # = Corr
d (X # , Y # )
# #
Reject H0 if T < T(0.025B) or T > T(0.975B)
Xi∗ = Z1 + hεi
yi = α + βxi + ui
OLS estimator of α is α̂ = ȳ − β̂ x̄
Fitted values
ŷi = α̂ + β̂xi
Residuals
ûi = yi − ŷi
Estimated error term variance
n
1 X 2
σ̂ 2 = ûi
n−2
i=1
1 Estimate the model (β̂) from the data and calculate û1 , . . . , ûn
2 Draw a resample u1∗ , . . . , un∗ with replacement from û1 , . . . , ûn
3 For i = 1, . . . , n generate
Algorithm bootregr2.R