0% found this document useful (0 votes)
47 views164 pages

Advanced Econometrics (PDFDrive)

economics book

Uploaded by

Mahfuzur Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views164 pages

Advanced Econometrics (PDFDrive)

economics book

Uploaded by

Mahfuzur Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 164

Advanced Econometrics

Dr. Andrea Beccarini

Center for Quantitative Economics

Winter 2013/2014

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 1 / 156


General information
Aims and prerequisites

Objective: learn to understand and use advanced econometric


estimation techniques
Applications in micro and macro econometrics and finance
Prerequisites: Statistical Foundations (random vectors, stochastic
convergence, estimators)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 2 / 156


General information
Literature

Russell Davidson and James MacKinnon, Econometric Theory and


Methods, Oxford University Press, 2004.
Various textbooks

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 3 / 156


General information
Schedule

Least squares estimation and method of moments


Maximum likelihood estimation
Instrument variables estimation
GMM
Indirect Inference

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 4 / 156


Least squares
Linear regression

Multiple linear regression model

y = Xβ + u
u ∼ N 0, σ 2 I


OLS estimator −1


β̂ = X 0 X X 0y
Covariance matrix   −1
Cov β̂ = σ 2 X 0 X

Gauss-Markov theorem

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 5 / 156


Least squares
Nonlinear regression

Notation of Davidson and MacKinnon (2004),

yt = xt (β) + ut
ut ∼ IID(0, σ 2 )

xt (β) is a nonlinear function of the parameter vector β


Example:
1
yt = β1 + β2 xt1 + xt2 + ut
β2

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 6 / 156


Least squares
Nonlinear regression

Minimize the sum of squared residuals


T
X
(yt − xt (β))2
t=1

with respect to β
Usually, the minimization must be done numerically

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 7 / 156


Method of moments
Definition of moments

Raw moment of order p


µp = E (X p )
Empirical raw moment of order p
n
1X p
µ̂p = Xi
n
i=1

for a simple random sample X1 , . . . , Xn

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 8 / 156


Method of moments
Basic idea: Step 1

Write r theoretical moments as functions of r unknown parameters

µ1 = g1 (θ1 , . . . , θr )
..
.
µr = gr (θ1 , . . . , θr )

Of course, central moments may be used as well

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 9 / 156


Method of moments
Basic idea: Step 2

Invert the system of equations:


Write the r unknown parameters
as functions of the r theoretical moments

θ1 = h1 (µ1 , . . . , µr )
..
.
θr = hr (µ1 , . . . , µr )

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 10 / 156


Method of moments
Basic idea: Step 3

Replace all theoretical moments by empirical moments

θ̂1 = h1 (µ̂1 , . . . , µ̂r )


..
.
θ̂r = hr (µ̂1 , . . . , µ̂r )

The estimators θ̂1 , . . . , θ̂r are moment estimators

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 11 / 156


Method of moments
Properties of moment estimators

Moment estimators are consistent since

plimθ̂1 = plim (h1 (µ̂1 , µ̂2 , . . .))


= h1 (plimµ̂1 , plimµ̂2 , . . .)
= h1 (µ1 , µ2 , . . .)
= θ1

In general, moment estimators are not unbiased and not efficient


Since the empirical moments are asymptotically normal (why?),
moment estimators are also asymptotically normal
−→ delta method [P]

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 12 / 156


Method of moments
Example

Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a


random sample

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156


Method of moments
Example

Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a


random sample
Step 1: We know that E (X ) = µ1 = 1/λ

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156


Method of moments
Example

Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a


random sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156


Method of moments
Example

Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a


random sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
1 1 1
λ̂ = = 1 P =
µ̂1 n i Xi X̄n

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156


Method of moments
Example

Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a


random sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
1 1 1
λ̂ = = 1 P =
µ̂1 n i Xi X̄n

Is λ̂ unbiased?

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156


Method of moments
Example

Let X ∼ Exp (λ) with unknown parameter λ and let X1 , . . . , Xn be a


random sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
1 1 1
λ̂ = = 1 P =
µ̂1 n i Xi X̄n

Is λ̂ unbiased?

Alternative: Var (X ) = 1/λ2 , then λ̂ = 1/ S 2

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156


Maximum likelihood
Basic idea

The basic idea is very natural:


Choose the parameters such that the probability (likelihood) of the
observations x1 , . . . , xn as a function of the unknown parameters
θ1 , . . . , θr is maximized
Likelihood function
(
P(X1 = x1 , . . . , Xn = xn ; θ)
L(θ; x1 , . . . , xn ) =
fX1 ,...,Xn (x1 , . . . , xn ; θ)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 14 / 156


Maximum likelihood
Basic idea

For simple random samples


n
Y
L(θ; x1 , . . . , xn ) = fX (xi ; θ)
i=1

Maximize the likelihood

L(θ̂; x1 , . . . , xn ) = max L(θ; x1 , . . . , xn )


θ∈Θ

ML estimate θ̂ = arg max L(θ; x1 , . . . , xn )


ML estimator θ̂ = arg max L(θ; X1 , . . . , Xn )

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 15 / 156


Maximum likelihood
Basic idea

Because sums are easier to deal with than products,


and because sums are subject to limit laws, it is
common to maximize the log-likelihood
n
X
ln L(θ) = ln fX (Xi ; θ)
i=1

The ML estimator is the same as before, since

θ̂ = arg max ln L(θ; X1 , . . . , Xn )


= arg max L(θ; X1 , . . . , Xn )

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 16 / 156


Maximum likelihood
Basic idea

Usually, we find θ̂ by solving the system of equations

∂ ln L/∂θ1 = 0
..
.
∂ ln L/∂θr = 0

The gradient vector g (θ) = ∂ ln L(θ)/∂θ is called


score vector or score
If the log-likelihood is not differentiable other maximization methods
must be used

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 17 / 156


Maximum likelihood
Example

Let X ∼ Exp(λ) with density f (x; λ) = λe −λx for x ≥ 0


and f (x; λ) = 0 else
Likelihood of i.i.d. random sample
n
Y
L(λ; x1 , . . . , xn ) = λe −λxi
i=1

Log-likelihood
n
X
ln L(λ; x1 , . . . , xn ) = n ln λ − λ xi
i=1

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 18 / 156


Maximum likelihood
Example

Set the derivative to zero


n
∂ ln L(λ) n X !
= − xi = 0,
∂λ λ̂ i=1

hence
n 1
λ̂ = Pn =
i=1 xi x̄
The ML estimator for λ is
1
λ̂ =

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 19 / 156


Maximum likelihood
Properties of ML estimators: Preliminaries

The log-likelihood and the score vector are


n
X
ln L (θ) = ln fX (Xi ; θ)
i=1
n
∂ ln L (θ) X ∂ ln fX (Xi ; θ)
=
∂θ ∂θ
i=1

The contributions ln fX (Xi ; θ) are random variables


The contributions ∂ ln fX (Xi ; θ)/∂θ are random vectors
Hence, limit laws can be applied to the (normalized) sums

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 20 / 156


Maximum likelihood
Properties of ML estimators: Preliminaries

For all θ
Z Z
e ln L(θ) dx = L (θ; x1 , . . . , xn ) dx
= 1

since L (θ) is a joint density function of X1 , . . . , Xn

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 21 / 156


Maximum likelihood
Properties of ML estimators: Preliminaries

Define the matrix G (θ, X1 , . . . , Xn ) of gradient contributions

∂ ln fX (Xi ; θ)
Gij (θ, Xi ) =
∂θj

The column sums are the gradient vector with elements


n
X
gj (θ) = Gij (θ, Xi )
i=1

The expected gradient vector is Eθ (g (θ)) = 0 [P]

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 22 / 156


Maximum likelihood
Properties of ML estimators: Preliminaries

The covariance matrix of gradient vector

Cov (g (θ)) = E g (θ) g (θ)0




is called information matrix (and often denoted I (θ))


Information matrix equality [P]

Cov (g (θ)) = −E (H (θ))


   2 
∂ ln L (θ) ∂ ln L (θ)
Cov = −E
∂θ ∂θ∂θ0

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 23 / 156


Maximum likelihood
Properties of ML estimators

1 Equivariance: If θ̂ is the ML estimator for θ, then h(θ̂) is the ML


estimator for h(θ)
2 Consistency:
plimθ̂n =θ
3 Asymptotic normality:
√  
d
n θ̂n − θ → U ∼ N (0, V (θ))

4 Asymptotic efficiency: V (θ) is the Cramı̈¿ 12 r-Rao bound


5 Computability (analytical or numerical); the covariance matrix of the
estimator is a by-product of the numerical method

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 24 / 156


Maximum likelihood
Properties of ML estimators

Equivariance:
Let θ̂ be the ML estimator of θ
Let ψ = h(θ) be a one-to-one function of θ with inverse h−1 (ψ) = θ
Then the ML estimator of ψ satisfies

d ln L(h−1 (ψ)) d ln L(θ) dh−1 (ψ)


= =0
dψ dθ dψ

which holds at ψ̂ = h(θ̂)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 25 / 156


Maximum likelihood
Properties of ML estimators

Consistency
The parameter θ is identified if for all θ0 6= θ and data x1 , . . . , xn

ln L θ0 |x1 , . . . , xn 6= ln L (θ|x1 , . . . , xn )


The parameter θ is asymptotically identified if for all θ0 6= θ0


1 1
plim ln L θ0 6= plim ln L (θ0 )

n n
where θ0 is the true value of the parameter [P]

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 26 / 156


Maximum likelihood
Properties of ML estimators

Asymptotic normality
By definition, the ML estimator satisfies

g (θ̂) = 0

A first order Taylor series expansion of g around the true parameter


vector θ0 gives [P]

g (θ̂) = g (θ0 ) + H (θ0 ) (θ̂ − θ0 ) + rest

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 27 / 156


Maximum likelihood
Covariance matrix estimation

The (approximate) covariance matrix of θ̂ is


  2 −1
∂ ln L(θ0 )
Cov (θ̂) = − [E (H (θ0 ))]−1 = − E
∂θ0 ∂θ00

A consistent estimator of Cov (θ̂) is


!−1
h
d (θ̂) = − H(θ̂)
i−1 ∂ 2 ln L(θ̂)
Cov =−
∂ θ̂∂ θ̂0

Often, H(θ̂) is a by-product of numerical optimization

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 28 / 156


Maximum likelihood
Covariance matrix estimation

An alternative consistent covariance matrix estimator is


h i−1
0
Cov (θ̂) = G (θ̂; X1 , . . . , Xn ) G (θ̂; X1 , . . . , Xn )
d

This estimator is called outer-product-of-the-gradient (OPG)


estimator
Advantage: Only the first derivatives are required
Disadvantage: Less reliable in small samples

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 29 / 156


Maximum likelihood
Example

Numerical estimation of the parameters of N(µ, σ 2 )


Let X1 , . . . , X50 be a random sample from X ∼ N(µ, σ 2 )
with µ = 5 and σ 2 = 9
Density function
!
1 1 (x − µ)2
fX (x) = √ exp − ·
2π 2 σ2

Log-likelihood function ln L µ, σ 2 = ni=1 ln fX (xi )


 P

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 30 / 156


Maximum likelihood
Example

See numnormal.R
Point estimates    
µ̂ 3.64025
=
σ̂ 2 6.90869

Estimated covariance matrix derived numerically from H(θ̂)


 
2
 0.13817 −0.00016
Cov µ̂, σ̂ =
d
−0.00016 1.90918

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 31 / 156


Maximum likelihood
Example

See numnormal.R
Point estimates    
µ̂ 3.64025
=
σ̂ 2 6.90869
Estimated covariance matrix derived from theory
 
2
 0.13817 0
Cov µ̂, σ̂ =
d
0 1.90920

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 32 / 156


Maximum likelihood
Example of violated regularity conditions

Let X be uniformly distributed on the interval [0, θ]


The density function is

1/θ for 0 ≤ x ≤ θ
fX (x) =
0 else

The likelihood function is


1 n
( 
θ for θ ≥ maxi xi
L(θ|x1 , . . . , xn ) =
0 else

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 33 / 156


Maximum likelihood
Example of violated regularity conditions
1.2e−12
8.0e−13
likelihood

4.0e−13
0.0e+00

L(θ) is not differentiable at maxi xi


0 1 2 3

θ
4 5 6

Maximum is at θ̂ = maxi xi
The estimator is consistent but not asymptotically normal
Illustration in R

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 34 / 156


Maximum likelihood
Dependent observations

Maximum likelihood estimation is still possible if the observations are


dependent
The joint density of the observations

fX1 ,...,XT (x1 , . . . , xT )

can be factorized as
T
Y
fX1 (x1 ) · fXt |X1 =x1 ,...,Xt−1 =xt−1 (xt )
t=2

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 35 / 156


Maximum likelihood
Dependent observations

Loglikelihood
T
X
ln L = ln fX1 (x1 ) + ln fXt |X1 =x1 ,...,Xt−1 =xt−1 (xt )
t=2

If T is large, one may ignore ln fX1 (x1 )


Computing the loglikelihood is straightforward if

fXt |X1 =x1 ,...,Xt−1 =xt−1 (xt ) = fXt |Xt−1 =xt−1 (xt )

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 36 / 156


Maximum likelihood
The three classical tests

Wald test, Lagrange multiplier test and likelihood ratio test


(W, LM, LR)
Hypotheses
H0 : r (θ) = 0 vs H1 : r (θ) 6= 0
Often, r is a scalar-valued function and θ is a scalar
The function r may be non-linear!

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 37 / 156


Maximum likelihood
The three classical tests

Basic test ideas:


Wald test: If r (θ) = 0 is true, then r (θ̂ML ) will be close to 0
Likelihood ratio test: If r (θ) = 0 is true, then ln L(θ̂R ) will not be far
below ln L(θ̂ML )
Lagrange multiplier test: If r (θ) = 0 is true, the score function
g (θ̂R ) = ∂ ln L(θ̂R )/∂θ will be close to 0

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 38 / 156


Maximum likelihood
The three classical tests

Example:
Let X1 , . . . , Xn be a random sample from X ∼ Exp(λ)
Test H0 : λ = 4 against H1 : λ 6= 4
Different notation:
H0 : r (λ) = 0
where r (λ) = λ − 4
See threetests.R

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 39 / 156


Maximum likelihood
Wald test

Wald test
Hypotheses

H0 : r (θ) = 0
H1 : r (θ) 6= 0

with functions r = (r1 , . . . , rm )


m is the number of restrictions
Wald test: If r (θ) = 0 is true, then r (θ̂ML ) will be close to 0

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 40 / 156


Maximum likelihood
Wald test

Asymptotically, under H0 (by delta method!)


 
r (θ̂ML ) ∼ N 0, Cov (r (θ̂ML ))

with
∂r (θ̂ML ) ∂r (θ̂ML )
Cov (r (θ̂ML )) = 0
· Cov (θ̂ML ) ·
∂θ ∂θ
Remember: If X ∼ N(µ, Σ), then (X − µ)0 Σ−1 (X − µ) ∼ χ2m
Wald test statistic
h i−1 asy
W = r (θ̂ML )0 Cov (r (θ̂ML )) r (θ̂ML ) ∼ χ2m

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 41 / 156


Maximum likelihood
Wald test

Remarks:
Reject H0 if W is larger than the (1 − α)-quantile of the
χ2m -distribution
Usually, Cov (r (θ̂ML )) must be replaced by Cov
d (r (θ̂ML ))
The Wald test is not invariant with respect to re-parametrizations
The Wald test only requires the unrestricted ML estimator
Ideal, if θ̂ML is much easier to calculate than θ̂R

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 42 / 156


Maximum likelihood
Likelihood ratio test

Likelihood ratio test


Is ln L(θ̂ML ) significantly larger than ln L(θ̂R ) ?
LR test statistic
!
L(θ̂R )
LR = −2 ln
L(θ̂ML )
 
= −2 ln L(θ̂R ) − ln L(θ̂ML )

asy
Asymptotic distribution: LR ∼ χ2m

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 43 / 156


Maximum likelihood
Likelihood ratio test

Remarks:
Reject H0 if LR is larger than the (1 − α)-quantile of the
χ2m -distribution
To compute LR, one requires both the unrestricted estimator θ̂ML and
the restricted estimator θ̂R
Ideal, if both θ̂ML and θ̂R are easy to calculate
The LR test is often used to compare different models to each other

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 44 / 156


Maximum likelihood
Lagrange multiplier test

Lagrange multiplier test


Is g (θ̂R ) significantly different from 0?
The test is based on the restricted estimator θ̂R
Lagrange approach: maxθ ln L(θ) s.t. r (θ) = 0
LM test statistic
h i−1 asy
LM = g (θ̂R )0 · I (θ̂R ) · g (θ̂R ) ∼ χ2m

with !
∂ 2 ln L(θ̂R )
I (θ̂R ) = −E
∂θ∂θ0

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 45 / 156


Maximum likelihood
Lagrange multiplier test

Remarks:
Reject H0 if LM is larger than the (1 − α)-quantile of the
χ2m -distribution
The LM test only requires the restricted estimator
Ideal, if θ̂R is much easier to calculate than θ̂ML
The LM test is often used to test misspecifications
(heteroskedasticity, autocorrelation, omitted variables etc.)
Asymptotically, the three tests are equivalent

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 46 / 156


Maximum likelihood
The three classical tests

Multivariate case
Example: Production function
a1 a2
Yi = Xi1 · Xi2 + ui

where ui ∼ N(0, 0.052 )


Log-likelihood function ln L(a1 , a2 )
ML estimators â1 and â2
Hypothesis test of a1 + a2 = 1 or a1 + a2 − 1 = 0
See classtest.R

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 47 / 156


Instrumental variables
Preliminaries

OLS is not consistent if E (ut |Xt ) 6= 0


Define an information set Ωt (a σ-algebra), such that

E (ut |Ωt ) = 0

This moment condition can be used for estimation


Variables in Ωt are called instrumental variables (or instruments)
We denote the instrument vector by Wt

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 48 / 156


Instrumental variables
Correlation between errors and disturbances (I)

Errors in variables
Consider the model

yt = α + βxt∗ + εt , εt ∼ iid(0, σε2 )

The exogenous variable xt∗ is unobservable


We can only observe
xt = xt∗ + vt
where vt ∼ iid(0, σv2 ) are independent of everything else
Estimators of yt = α + βxt + ut are inconsistent [P]

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 49 / 156


Instrumental variables
Correlation between errors and disturbances (II)

Omitted variables bias


Let
yt = α + β1 x1t + β2 x2t + εt
If x2 is unobservable, one estimates

yt = α + β1 x1t + ut

where ut = β2 x2t + εt
If x2t and x1t are correlated then so are ut and x1t

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 50 / 156


Instrumental variables
Correlation between errors and disturbances (III)

Endogeneity
Standard example: supply and demand curves determine both price
and quantity

qt = γd pt + Xtd βd + utd
qt = γs pt + Xts βs + uts

Solve for qt and pt


−1 
Xtd βd utd
     
qt 1 −γd
= +
pt 1 −γs Xts βs uts

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 51 / 156


Instrumental variables
Correlation between errors and disturbances (III)

Since qt and pt depend on both utd and uts single equation OLS
estimation of

qt = γd pt + Xtd βd + utd
qt = γs pt + Xts βs + uts

is inconsistent
The right hand side variable pt is correlated with the error term
The condition E (ut |Ωt ) = 0 is violated if pt is in Ωt

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 52 / 156


Instrumental variables
Correlation between errors and disturbances

Warning! Inconsistency is not always a problem


If we simply want to forecast, we can use inconsistent estimators
Trivial example:
Positive correlation between u and X
200


●●

●●
● ●● ●
●●


● ● ●● ●
150


●●●

● ● ●

● ●●●
● ●
●●●


●● true regression line
● ● ●
100

●●●● ●
●● ●
y

●●

●●

●●
●●
●● ●●
50

●● ●

●●
● ●●
● ●●
●●
●● ●●
●●
● ●

●●●● ●
0


● ●●

10 20 30 40 50

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 53 / 156


Instrumental variables
The simple IV estimator

Let W denote the T × K matrix of instruments


All columns of X with Xt ∈ Ωt should be included in W
Then E (ut |Wt ) = 0 implies the moment condition

E W 0 u = E W 0 (y − X β) = 0
 

The IV estimator is a method of moment estimator


The solution is −1
β̂IV = W 0 X W 0y

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 54 / 156


Instrumental variables
Properties

The simple IV estimator is consistent if


1
plim W 0 X = SWX
n
is deterministic and nonsingular [P]
The simple IV estimator is asymptotically normal,
√    −1 
n β̂IV − β → U ∼ N 0, σ 2 (SWX )−1 SWW SWX 0

where SWW = plim n1 W 0 W [P]

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 55 / 156


Instrumental variables
How to find instruments

Instruments must be
1 exogenous, i.e. plim n1 W 0 u = 0
2 valid, i.e. plim n1 W 0 X = SWX non-singular
Natural experiments (weather, earthquakes, . . . )
Angrist and Pischke (2009):
Good instruments come from a combination of institutional knowledge and
ideas about the processes determining the variable of interest.

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 56 / 156


Instrumental variables
How to find instruments

Examples
Natural experiments
1 Brı̈¿ 21 ckner and Ciccone: Rain and the democratic window of
opportunity, Econometrica 79 (2011) 923-947
2 Angrist and Evans: Children and their parents’ labor supply: Evidence
from exogenous variation in family size, American Economic Review
88 (1998) 450-77.

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 57 / 156


Instrumental variables
How to find instruments

Examples
Institutional arrangements
1 Angrist and Krueger: Does Compulsory School Attendance Affect
Schooling and Earnings?, Quarterly Journal of Economics 106 (1991)
979-1014.
2 Levitt: The Effect of Prison Population Size on Crime Rates: Evidence
from Prison Overcrowding Litigation, Quarterly Journal of Economics
111 (1996) 319-351.

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 58 / 156


Instrumental variables
How to find instruments

In a time series context, one can sometimes use lagged endogenous


regressors as instrumental variables
Example:
yt = α + βxt + ut
with E (ut |xt ) 6= 0
If Cov (xt , xt−1 ) 6= 0 but Cov (ut , xt−1 ) = 0, then xt−1 can be used as
instrumental variable
Attention: Cov (ut , xt−1 ) = 0 is not always obvious

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 59 / 156


Instrumental variables
How to find instruments

Example (Measurement error in time series)


Consider the model

yt = α + βxt∗ + ut
xt∗ = ρxt−1

+ εt
xt = xt∗ + vt .

Then xt−1 is a valid instrument for a regression of yt on xt , and α and β


will be estimated consistently.

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 60 / 156


Instrumental variables
How to find instruments

Example (Omitted variable bias in time series)


Consider the model

yt = α + β1 x1t + β2 xt2 + ut
x1t = ρ11 x1,t−1 + ρ12 x2,t−1 + ε1t
x2t = ρ21 x1,t−1 + ρ22 x2,t−1 + ε2t

Then x1,t−1 is not a valid instrument for a regression of yt on x1t , and α


and β1 will not be estimated consistently.

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 61 / 156


Instrumental variables
How to find instruments

Example (Endogeneity in time series)


Consider the model

yt = α + β1 xt + β2 yt−1 + ut
xt = γ + δ1 yt + δ2 xt−1 + vt

Then x1,t−1 is a valid instrument for a regression of yt on xt and yt−1 , and


α, β1 and β2 will be estimated consistently.

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 62 / 156


Instrumental variables
Generalized IV estimation

If the number of instruments L is larger than the number of


parameters K , the model is overidentified
Right-multiply the T × L matrix W by an L × K matrix J to obtain
an T × K instrument matrix WJ
Linear combinations of the instruments in W
One can show that the asymptotically optimal matrix is
J = (W 0 W )−1 W 0 X

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 63 / 156


Instrumental variables
Generalized IV estimation

The generalized IV estimator is


−1
β̂IV = (WJ)0 X (WJ)0 y
 −1 0 −1 0 −1 0
= X 0W W 0W W X X W W 0W W y
−1 0
= X 0 PW X

X PW y

with PW = W (W 0 W )−1 W 0
Consistency and asymptotic normality still hold

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 64 / 156


Instrumental variables
Generalized IV estimation

The two-stage-least-squares (2SLS) interpretation


The matrix J is similar to β̂ in the standard OLS model,
−1 0
J = W 0W W X

Hence, WJ is similar to X β̂
The optimal instruments are obtained if we regress the
endogenous regressors on the instruments (1st stage), and
then use the fitted values as regressors (2nd stage)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 65 / 156


Instrumental variables
Finite sample properties

The finite sample properties of IV estimators are complex


In the overidentified case, the first L − K moments exist,
but higher moments do not
If the expectation exists, IV estimators are in general biased
The simple IV estimator has very heavy tails,
even the first moment does not exist!
The estimator can be extremely far off the true value
ivfinite.R

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 66 / 156


Instrumental variables
Hypothesis testing

Exact hypothesis tests are usually not feasible


Asymptotic tests are based on the asymptotic normality
An estimator of the covariance matrix of β̂IV is
 
d β̂IV = σ̂ 2 X 0 PW X −1

Cov

with
−1 0
PW = W W 0W W
1   0 
σ̂ 2 = y − X β̂IV y − X β̂IV
n

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 67 / 156


Instrumental variables
Hypothesis testing

Asymptotic t-test

H0 : βi = βi0
H1 : βi 6= βi0

Under the null hypothesis, the test statistic

β̂i − βi0
t=r  
Var
d β̂i

is asymptotically N(0, 1)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 68 / 156


Instrumental variables
Hypothesis testing

Asymptotic Wald test (similiar to an F -test)

H0 : β2 = β20 , H1 : β2 6= β20

where β2 is a length L subvector of β


Under the null hypothesis, the test statistic
 0 h  i−1  
W = β̂2 − β20 Cov
d β̂2 β̂2 − β20

is asymptotically χ2 with L degrees of freedom

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 69 / 156


Instrumental variables
Hypothesis testing

Testing overidentifying restrictions


The identifying restrictions are

E (ut |Wt ) = 0
or E W 0 u = 0


If the model is just identified the validity of the restriction cannot be


tested
If the model is overidentified, one can test if the overidentifying
restrictions hold, i.e. if the instruments are valid and exogenous

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 70 / 156


Instrumental variables
Hypothesis testing

Basic test idea: Check if the IV residuals can be explained


by the full set of instruments
Compute the IV residuals û
Regress the residuals on all instruments W
Under the null hypothesis, the test statistic

nR 2 ∼ χ2m

where m is the degree of overidentification

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 71 / 156


Instrumental variables
Hypothesis testing

Davidson and MacKinnon (2004, p. 338):


Even if we do not know quite how to interpret a significant value of the
overidentification test statistic, it is always a good idea to compute it. If it
is significantly larger than it should be by chance under the null
hypothesis, one should be extremely cautious in interpreting the estimates,
because it is quite likely either that the model is specified incorrectly or
that some of the instruments are invalid.

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 72 / 156


Instrumental variables
Hypothesis testing

Durbin-Wu-Hausman test

H0 : E X 0 u = 0


H1 : E W 0 u = 0


Test if IV estimation is really necessary or if OLS would do


Under H1 , OLS is inconsistent, but IV is still consistent
Basic test idea: Compare β̂OLS and β̂IV . If they are
‘too different’, reject H0

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 73 / 156


Instrumental variables
Hypothesis testing

The difference between the estimators is

β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156


Instrumental variables
Hypothesis testing

The difference between the estimators is

β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy
 −1 0 
−1
= X 0 PW X X 0 PW y − X 0 PW X X 0 X
 
Xy

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156


Instrumental variables
Hypothesis testing

The difference between the estimators is

β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy
 −1 0 
−1
= X 0 PW X X 0 PW y − X 0 PW X X 0 X
 
Xy
−1   −1 0  
= X 0 PW X X 0 PW I − X X 0 X X y

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156


Instrumental variables
Hypothesis testing

The difference between the estimators is

β̂IV − β̂OLS
−1 0 −1 0
= X 0 PW X X PW y − X 0 X Xy
 −1 0 
−1
X 0 PW X X 0 PW y − X 0 PW X X 0 X
 
= Xy
−1   −1 0  
= X 0 PW X X 0 PW I − X X 0 X X y
−1
X 0 PW X X 0 PW M X y
 
=

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156


Instrumental variables
Hypothesis testing

We need to test if X 0 PW MX y is significantly different from 0


This term is identically equal to zero for all variables in X that are
instruments (i.e. that are also in W )
Denote by X̃ all possibly endogenous regressors
To test if X̃ 0 PW MX y is significantly different from zero, perform a
Wald test of δ = 0 in the regression

y = X β + PW X̃ δ + u

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 75 / 156


GMM
Model description

Hansen, L. (1982), Large Sample Properties of Generalized Method of


Moments Estimators, Econometrica 50, 1029-1054:
In this paper we study the large sample properties of a class of generalized
method of moments (GMM) estimators which subsumes many standard
econometric estimators. To motivate this class, consider an econometric
model whose parameter vector we wish to estimate. The model implies a
family of orthogonality conditions that embed any economic
theoretical restrictions that we wish to impose or test.

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 76 / 156


GMM
Model description

John Cochrane (2005), Asset Pricing, p. 196:


Most of the effort involved with GMM is simply mapping a given problem
into the very general notation.

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 77 / 156


GMM
Model description

Describe the model by elementary zero functions

Eθ (ft (θ, yt )) = 0

where everything can be vector-valued


Parameter vector θ of length K
Observation vectors yt
Identification condition

Eθ0 (ft (θ, yt )) 6= 0 for all θ 6= θ0

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 78 / 156


GMM
Model description

Example (Linear regression model)


Consider the standard model

y = Xβ + u
u ∼ N(0, σ 2 I ), independent of X

Parameter vector θ =?
Observations yt =?
Elementary zero functions ft (θ, yt ) =?

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 79 / 156


GMM
Model description

Example (Lognormal distribution)


Suppose there is a random sample X1 , . . . , Xn from

X ∼ LN(µ, σ 2 )

Parameter vector θ =?
Observations yt =?
Elementary zero functions ft (θ, yt ) =?

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 80 / 156


GMM
Model description

Example (Asset pricing)


The basic asset pricing formula is

pt = E (mt+1 xt+1 |Ωt )

with asset price p, stochastic discount factor m, payoff x, and information


set Ωt .
Parameter vector θ =?
Observations yt =?
Elementary zero functions ft (θ, yt ) =?

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 81 / 156


GMM
Model description

Stack all elementary zero functions


 
f1 (θ, y1 )
f (θ, y ) = 
 .. 
. 
fn (θ, yn )

Covariance matrix

E f (θ, y ) f (θ, y )0 = Ω


Dimension of Ω depends on dimension of ft (θ, yt )

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 82 / 156


GMM
Model description

Example (Linear regression model)


The covariance matrix Ω is

E (f (θ, y ) f (θ, y )0 ) = E u u 0


= σ2I

If there are autocorrelation and heteroskedasticity

E u u0 = Ω


Andrea Beccarini (CQE) Econometrics Winter 2013/2014 83 / 156


GMM
Model description

Example (Lognormal distribution)


The covariance matrix Ω is
2
 
f11 f11 f12 . . . f11 fn1 f11 fn2
 f12 f11 2
f12 . . . f12 fn1 f12 fn2 
 
0
E (f (θ, y ) f (θ, y ) ) = E 
 .. .. .. .. .. 
 . . . . . 

 fn1 f11 fn1 f12 . . . fn1 2 fn1 fn2 
fn2 f11 fn2 f12 . . . fn2 fn1 2
fn2
= ?

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 84 / 156


GMM
Model description

Example (Asset pricing)


The covariance matrix Ω is
2
 
f11 . . . f11 fn1
E (f (θ, y ) f (θ, y )0 ) = E 
 .. .. .. 
. . . 
fn1 f11 . . . fn1 2

= ?

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 85 / 156


GMM
Estimating equations

To estimate θ, we need K estimating equations


In general, they are weighted averages of the ft
In most cases, the estimating equations are based on L ≥ K
instrumental variables W
If L > K , we need to form linear combinations
Let W be the n × L matrix of instruments
and J be an L × K matrix of full rank
Define the n × K matrix Z = WJ

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 86 / 156


GMM
Estimating equations

Theoretical moment conditions (orthogonality conditions)

E Zt0 ft (θ, yt ) = 0


The estimating equations are the empirical counterpart


1 0
Z f (θ, y ) = 0
n

Solving this system yields the GMM estimator θ̂

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 87 / 156


GMM
Estimating equations

Example (Linear regression model)


The K moment conditions for the linear regression model are

E Zt0 ft (θ, yt ) = E Xt0 yt − Xt0 β = 0


 

and the estimating equations are


1 0
X (y − X β) = 0.
n

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 88 / 156


GMM
Estimating equations

Example (Lognormal distribution)


The two moment conditions for the lognormal distribution are
  
0
 1 0 ft1 (θ, yt )
E Zt ft (θ, yt ) = E
0 1 ft2 (θ, yt )
Xt − exp µ + 12 σ 2 
  
= E
Xt2 − exp 2µ + 2σ 2
 
0
=
0

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 89 / 156


GMM
Estimating equations

Example (contd)
. . . and the estimating equations are
 
f11
  f12 
1 0 1 1 0 1 0 ... 1 0

..

Z f (θ, y ) =
 
n n 0 1 0 1 ... 0 1

 . 

 fn1 
fn2
1 Pn 1 2
    
nP t=1 Xt − exp µ + 2 σ  0
= 1 n =
n t=1 Xt2 − exp 2µ + 2σ 2 0

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 90 / 156


GMM
Properties of GMM estimators

Consistency
Assume that a law of large numbers applies to n1 Z 0 f (θ, y )
Define the limiting estimation functions
1
α (θ) = plim Z 0 f (θ, y )
n
and the limiting estimation equations α (θ) = 0
The GMM estimator θ̂ is consistent if the asymptotic identification
condition holds, α (θ) 6= α (θ0 ) for all θ 6= θ0 [P]

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 91 / 156


GMM
Properties of GMM estimators

Asymptotic normality
Simplified notation: ft (θ) = ft (θ, yt ), f (θ) = f (θ, y )
Additional assumption: ft (θ) is continuously differentiable at θ0
First order Taylor series expansion of
1 0
Z f (θ) = 0
n

in θ̂ around θ0 [P]

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 92 / 156


GMM
Asymptotic efficiency

√  
The asymptotic distribution of n θ̂ − θ0 is normal with
mean 0 and covariance matrix
 −1   −1
1 0 1 0 1 0
plim Z F (θ0 ) plim Z ΩZ plim F (θ0 ) Z
n n n

What is the optimal choice of Z in the estimating equations?


The optimal choice depends on assumptions about the matrices F (θ)
and Ω

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 93 / 156


GMM
Asymptotic efficiency

If Ω = σ 2 I and E (Ft (θ0 )ft (θ0 )) = 0 the optimal choice is

Z = F (θ0 )

Problem: Z depends on the unknown θ0


Solution: Solve the estimating equations
1 0
F (θ)f (θ) = 0
n

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 94 / 156


GMM
Asymptotic efficiency

If Ω = σ 2 I and E (Ft (θ0 )ft (θ0 )) 6= 0 but Wt ∈ Ωt , the optimal choice


is
Z = PW F (θ0 )
Problem: Z depends on the unknown θ0
Solution: Solve the estimating equations
1 0
F (θ)PW f (θ) = 0
n

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 95 / 156


GMM
Asymptotic efficiency

Suppose, the covariance matrix Ω is unknown



Since Z = WJ, the covariance matrix of n(θ̂ − θ0 ) is
 −1   −1
1 1 1
plim J 0 W 0 F0 plim J 0 W 0 ΩWJ plim F0 0 WJ
n n n

For the optimal J = (W 0 ΩW )−1 W 0 F0 this becomes


 −1
1 −1 0
plim F00 W W 0 ΩW W F0
n

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 96 / 156


GMM
Asymptotic efficiency

Although Ω cannot be estimated consistently, the term n1 W 0 ΩW can


be estimated consistently (we will do that later)
If Σ̂ is an estimator of n1 W 0 ΩW , the optimal estimating equations are

1 0 0 1
J W f (θ) = F (θ)0 W Σ̂−1 W 0 f (θ) = 0
n n

and the estimated covariance matrix of θ̂ is


 −1
d (θ̂) = n F̂ 0 W Σ̂−1 W 0 F̂
Cov

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 97 / 156


GMM
Alternative notation

Attention
Many textbooks use a different notation
(and so does the gmm package in R)
The two approaches are equivalent
The moment conditions are notated as

E (g (θ, yt )) = E Wt0 ft (θ, y ) = 0




The number of moment conditions L can be larger than the number


of parameters K

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 98 / 156


GMM
Alternative notation

The L estimating equations cannot be solved exactly


n
1X
ḡn (θ, y ) = g (θ, yt ) = 0
n
t=1

The GMM estimator is defined by

θ̂ = arg min ḡn (θ, y )0 An ḡn (θ, y )

where An is a sequence of L × L weighting matrices


(which can be chosen by the user) with limit A

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 99 / 156


GMM
Alternative notation

p
The GMM estimator based on ḡn is consistent, θ̂ → θ
Asymptotic normality: Define the L × K matrix
n
∂ ḡn (θ, yt ) 1 X ∂g (xt , θ)
G (θ) = =
∂θ0 n ∂θ0
t=1

√ d
Assume that nḡn (θ, y ) → N (0, V ), then [P]
√  
d
 −1 0 −1 
n θ̂ − θ0 → N 0, G 0 AG G AVAG G 0 A0 G

Asymptotically optimal weighting matrix A [P]

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 100 / 156


GMM
Equivalence

The two GMM approaches (based on ft and g ) are equivalent


The first order condition of ḡ (θ)0 Aḡ (θ) is

G0 A g = 0
K ×L L×L L×1 K ×1

which is the same as

J0 W0 f = 0
K ×L L×n n×1 K ×1

List of equivalences [P]

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 101 / 156


GMM
Covariance matrix estimation

The covariance matrix of the elementary zero functions

E f (θ, y ) f (θ, y )0 = Ω


is often unknown
There may be heteroskedasticity and autocorrelation in Ω
Although Ω cannot be estimated consistently, the term n1 W 0 ΩW can
be estimated consistently

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 102 / 156


GMM
Covariance matrix estimation

Write
1
Σ = plimn→∞ W 0 ΩW
n
Assume that a suitable law of large numbers holds,
n n
1 XX
E ft fs Wt0 Ws

Σ = lim
n→∞ n
t=1 s=1

where ft = ft (θ, yt )

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 103 / 156


GMM
Covariance matrix estimation

Define the autocovariance matrices


( 1 Pn 0
n t=j+1 E (f
t ft−j Wt Wt−j )  for j ≥ 0
Γ(j) = 1 Pn 0
n t=−j+1 E ft+j ft Wt+j Wt for j < 0

Then
 
n−1
X n−1
X
Γ(j) + Γ0 (j) 

Σ = lim Γ(j) = lim Γ(0) +
n→∞ n→∞
j=−n+1 j=1

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 104 / 156


GMM
Covariance matrix estimation

The autocovariance matrix Γ(j), j ≥ 0, can be estimated by


n
1 X ˆˆ
Γ̂(j) = ft ft−j Wt0 Wt−j
n
t=j+1

Newey-West estimator of Σ
p  
X j 
Σ̂ = Γ̂(0) + 1− Γ̂(j) + Γ̂0 (j)
p+1
j=1

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 105 / 156


GMM
Test of overidentifying restrictions

The GMM estimators minimize the criterion function


1 0
f (θ)W Σ̂−1 W 0 f (θ)
n
Asymptotically, the minimized value (Hansen’s J statistics,
Hansen’s overidentification statistic, Hansen-Sargan statistic)
is distributed as χ2L−K if the overidentifying restrictions hold
If the null hypothesis is rejected, then something went wrong,
e.g. the model is misspecified

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 106 / 156


Indirect inference
Basic idea

Anthony Smith, Jr. (New Palgrave Dictionary of Economics):


Indirect inference is a simulation-based method for estimating the
parameters of economic models . Its hallmark is the use of an auxiliary
model to capture aspects of the data upon which to base the estimation.
The parameters of the auxiliary model can be estimated using either the
observed data or data simulated from the economic model. Indirect
inference chooses the parameters of the economic model so that these two
estimates of the parameters of the auxiliary model are as close as possible .

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 107 / 156


Indirect inference
The true model

Economic model

yt = G (yt−1 , xt , ut ; β) , t = 1, . . . , T

Exogenous variables xt and endogenous variables yt


Random errors ut , i.i.d. with cdf F
Parameter vector β of dimension K
Let standard estimation methods for β be intractable
It must be possible (and easy) to simulate y1 , . . . , yT
given y0 (assumed to be known), x1 , . . . , xT and β

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 108 / 156


Indirect inference
The auxiliary model

The true model is too complicated for estimation of β


Instead estimate an auxiliary model with parameter vector θ
The dimension L of θ must be at least as large as the
dimension K of β
The auxiliary model must be
“suitable” (but is allowed to be misspecified)
easy and fast to estimate
Often, the auxiliary model is a standard time series model

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 109 / 156


Indirect inference
Estimating the auxiliary model

For given β (and y0 , x1 , . . . , xT ), the auxiliary model’s parameters θ


are estimated
1 from the observed data x1 , . . . , xT , y1 , . . . , yT ,
resulting in estimator θ̂
(h) (h)
2 from H simulated datasets x1 , . . . , xT , ỹ1 , . . . , ỹT for h = 1, . . . , H,
resulting in estimators θ̃(h) (β)
Define
H
1 X (h)
θ̃(β) = θ̃ (β)
H
h=1

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 110 / 156


Indirect inference
Optimization

Compute the difference between the vectors θ̂ and θ̃(β)


 0  
Q(β) = θ̂ − θ̃(β) W θ̂ − θ̃(β)

where W is a positive definite weighting matrix


The indirect inference estimator of β is

β̂ = arg min Q(β)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 111 / 156


Indirect inference
Remarks

The simulations have to be done with the same set of


random errors
Indirect inference is similar to GMM: the auxiliary parameters
are the “moments”
The asymptotic distribution of β̂ can be derived
(see Gourieroux et al., 1993)
The weighting matrix W can be chosen optimally

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 112 / 156


Indirect inference
A simple example (Gourieroux et al., 1993)

Consider the MA(1) process

yt = εt − βεt−1

with εt ∼ N(0, 1) and β = 0.5 for t = 1, . . . , 250


The maximum likelihood estimator β̂ML is not trivial
Indirect inference estimator β̂II of β ?
Auxiliary model: AR(3) with parameters θ
No weighting, the matrix W is the identity matrix

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 113 / 156


Indirect inference
A simple example (Gourieroux et al., 1993)

Compare the distribution of β̂ML and β̂II


Step 1: Simulate a time series y1 , . . . , y250
Step 2: Compute β̂ML
Step 3: Estimate θ̂ from y1 , . . . , y250
(h) (h)
Step 4: For given β, simulate 10 paths ỹ1 , . . . , ỹ250
Step 5: Estimate θ̃(β) from the simulated paths
Step 6: Repeat steps 4 and 5 for different β until the difference
between θ̂ and θ̃(β) is minimized
Step 7: Save β̂II and start again at step 1

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 114 / 156


Bootstrap
Basic idea

Point of departure: unknown distribution function F


(univariate or multivariate)
Unknown parameter vector

θ = θ(F )

Simple random sample X1 , . . . , Xn from F


Estimator
θ̂ = θ̂(X1 , . . . , Xn )
Why is the distribution of θ̂ of interest?

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 115 / 156


Bootstrap
Basic idea

Basic bootstrap idea: Approximate the unknown distribution of

θ̂(X1 , . . . , Xn ) for X1 , . . . , Xn i.i.d. from F

by the distribution of

θ̂(X1∗ , . . . , Xn∗ ) for X1∗ , . . . , Xn∗ i.i.d. from F̂

The distribution of θ̂ under F̂ is usually found by Monte-Carlo


simulations based on resamples (pseudo sample)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 116 / 156


Bootstrap
Basic idea

How is F estimated?
parametric −→ parametric bootstrap
nonparametric −→ nonparametric bootstrap
smoothed −→ smooth bootstrap
model based
Applications
bias and standard errors
confidence intervals
hypothesis tests

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 117 / 156


Bootstrap
Example 1

Nonparametric bootstrap of the standard error of


n
1X
θ̂ = X̄ = Xi
n
i=1

Simple random sample X1 , . . . , X20


Estimation of the unknown cdf F by the empirical distribution
function
n
1X
Fn (x) = 1 (Xi ≤ x)
n
i=1

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 118 / 156


Bootstrap
Example 1 (contd)

How is X̄ distributed under F ?


How is X̄ distributed under F̂ = Fn ?
Estimation of the distributio of X̄ under Fn
by Monte-Carlo simulation
Calculation of the standard deviation of X̄ under Fn
The distribution of X̄ under Fn is an approximation of the distribution
of X̄ under F

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 119 / 156


Bootstrap
Example 1 (still contd): The algorithm

1 Draw a random sample X1∗ , . . . , X20


∗ from F (resampling)
n
2 Compute
20
1 X ∗
X̄ ∗ = Xi
20
i=1
3 Repeat steps 1 and 2 a large number B of times,
save the results as X̄1∗ , . . . , X̄B∗
4 Compute the standard error bootex1.R

v
u B
u 1 X ∗ 2
SE (X̄ ) = t X̄i − X̄ ∗
B −1
i=1

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 120 / 156


Bootstrap
Example 2

Parametric bootstrap of the bias of


1
θ̂ = λ̂ =

for the exponential distribution X ∼ Exp(λ)
Simple random sample X1 , . . . , X8
Estimation of the unknown distribution function F by
 
Fλ̂ (x) = 1 − exp −λ̂x

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 121 / 156


Bootstrap
Example 2 (contd)

How is λ̂ distributed under F ?


How is λ̂ distributed under F̂ = Fλ̂ ?
Estimation of the distribution of λ̂ under Fλ̂
by Monte-Carlo simulation
Find the expectation of λ̂ under Fλ̂
The distribution of λ̂ under Fλ̂ approximates the distribution of λ̂
under F

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 122 / 156


Bootstrap
Example 2 (still contd): The algorithm

1 Compute λ̂ = 1/X̄ from X1 , . . . , X8


2 Draw a simple random sample X1∗ , . . . , X8∗ from Fλ̂
3 Compute λ̂∗ = 1/X̄ ∗
4 Repeat steps 1 and 2 a large number B of times,
save the results as λ̂∗1 , . . . , λ̂∗B
5 Estimate the bias by bootex2.R

!
1 X ∗
λ̂b − λ̂
B
b

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 123 / 156


Bootstrap
General approach for bootstrap standard errors

∗ ∗ ∗

original
edf * 1. resample: X1 , . . . , Xn → θ̂1
F̂ = Fn 2. resample: X1 , . . . , Xn → θ̂2∗
∗ ∗ 
sample −→ −→

or .. 
X1 , . . . , Xn . 
F̂ = Fθ̂ B. resample: X1∗ , . . . , Xn∗ → θ̂B∗

v
u B
u 1 X ∗ 2
−→ SE (θ̂) = t θ̂b − θ̂∗
B −1
b=1

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 124 / 156


Bootstrap
Bootstrapping confidence intervals

General definition: An interval


h i
θ̂low (X1 , . . . , Xn ) ; θ̂high (X1 , . . . , Xn )

is called (1 − α)-confidence interval if


 
P θ̂low ≤ θ ≤ θ̂high = 1 − α

If the equality holds only asymptotically, the interval is called


asymptotic (1 − α)-confidence interval
Note: The interval limits are random variables

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 125 / 156


Bootstrap
Naive bootstrap confidence intervals

The naive confidence intervals are sometimes called the


“other” percentile method
Generate a large number (B) of resamples and compute θ̂1∗ , . . . , θ̂B∗
∗ ≤ θ̂ ∗ ≤ . . . ≤ θ̂ ∗
Let θ̂(1) (2) (B) be the order statistic
The naive (1 − α)-confidence interval is
h i
∗ ∗
θ̂((α/2)B) ; θ̂((1−α/2)B)

Why is this approach often problematic? bootnaiv.R

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 126 / 156


Bootstrap
Percentile bootstrap confidence intervals

To determine confidence intervals we look at the distribution of

θ̂ − θ

Let c1 and c2 be the α/2- and (1 − α/2)-quantiles, i.e.


 
P c1 ≤ θ̂ − θ ≤ c2 = 1 − α

Then h i
θ̂ − c2 , θ̂ − c1

is the (1 − α)-confidence interval

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 127 / 156


Bootstrap
Percentile bootstrap confidence intervals

Approximate the distribution of θ̂ − θ by bootstrapping

θ̂∗ − θ̂

Let c1∗ and c2∗ be the α/2- and (1 − α/2)-quantiles, i.e.


 
P c1∗ ≤ θ̂∗ − θ̂ ≤ c2∗ = 1 − α

We obtain c1∗ = θ̂(α/2B)


∗ − θ̂ and c2∗ = θ̂((1−α/2)B)
∗ − θ̂ and
h i h i
θ̂ − c2∗ , θ̂ − c1∗ = 2θ̂ − θ̂((1−α/2)B)
∗ ∗
; 2θ̂ − θ̂((α/2)B)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 128 / 156


Bootstrap
Percentile bootstrap confidence intervals

Algorithm of the percentile method:


Compute θ̂ from the original sample X1 , . . . , Xn
Generate a large number B of resamples and compute θ̂1∗ , . . . , θ̂B∗
∗ ≤ θ̂ ∗ ≤ . . . ≤ θ̂ ∗
Let θ̂(1) (2) (B) be the order statistics
The bootstrap (1 − α)-confidence interval is
h i
∗ ∗
2θ̂ − θ̂((1−α/2)B) ; 2θ̂ − θ̂((α/2)B)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 129 / 156


Bootstrap
Example 3

Parametric bootstrap 0.95-confidence interval for λ of an exponential


distribution
Simple random sample X1 , . . . , X8
Estimate λ by λ̂ = 1/X̄
Estimate the unknown distribution function F by
 
Fλ̂ (x) = 1 − exp −λ̂x

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 130 / 156


Bootstrap
Example 3 (contd)

The algorithm bootex3.R

1 Compute λ̂ = 1/X̄ from X1 , . . . , X8


2 Draw a simple random sample X1∗ , . . . , X8∗ from Fλ̂
3 Compute λ̂∗ = 1/X̄ ∗
4 Repeat steps 1 and 2 a large number B of times,
save the results as λ̂∗1 , . . . , λ̂∗B
5 The bootstrap 0.95-confidence interval is
h i
2λ̂ − λ̂∗((1−α/2)B) ; 2λ̂ − λ̂∗((α/2)B)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 131 / 156


Bootstrap
Hypothesis testing

Test the hypotheses

H 0 : θ = θ0
H1 : θ 6= θ0

at significance level α
Assumption: Random sample (univariate or multivariate)
Test statistic
T = θ̂ − θ0

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 132 / 156


Bootstrap
Hypothesis testing

Reject H0 if the value of the test statistic is less than the


α/2-quantile of T or greater than the (1 − α/2)-quantile of T
The p-value of the test is P(|T | > |t|)
How can we estimate the distribution of T under H0 ?
Wald approach: bootstrap distribution

T ∗ = θ̂∗ − θ̂

θ̂∗ = θ̂(X1∗ , . . . , Xn∗ ) is calculated from resamples drawn under the


alternative hypothesis

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 133 / 156


Bootstrap
Hypothesis testing

Lagrange multiplier approach: bootstrap distribution

T # = θ̂# − θ0

Attention: θ̂# = θ̂(X1# , . . . , Xn# ) is calculated from resamples drawn


under the null hypothesis!
This approach is particularly suitable for the parametric bootstrap
(but can also be used for other bootstraps)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 134 / 156


Bootstrap
Hypothesis testing: General algorithm

1 Compute test statistic T from X1 , . . . , Xn


2 Draw a resample under the null hypothesis, X1# , . . . , Xn# , or draw a
resample under the alternative hypothesis, X1∗ , . . . , Xn∗
3 Compute the test statistic T ∗ or T # for the resample
4 Repeat steps 2 and 3 a large number B of times;
save the results as T1# , . . . , TB# or T1∗ , . . . , TB∗
5 Calculate the α/2-quantile c1# (or c1∗ ) and the
(1 − α/2)-quantile c2# (or c2∗ )
6 Reject H0 if the test statistic T is less than c1# (or c1∗ ) or greater
than c2# (or c2∗ )

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 135 / 156


Bootstrap
Example 4

Parametric bootstrap for the parameter λ of an exponential


distribution X ∼ Exp(λ)
Random sample X1 , . . . , X8
Hypotheses H0 : λ = λ0 = 2 against H1 : λ 6= λ0
(at level α = 0.05)
Test statistic
T = λ̂ − 2
Bootstrap of the distribution of T under the alternative hypothesis
(Wald approach) bootex4a.R

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 136 / 156


Bootstrap
Example 4 (contd)

Bootstrap of the distribution of T under the null hypothesis


(LM approach) bootex4b.R

Under the null hypothesis, X# ∼ Exp(λ0 ) with λ0 = 2


Hence, the distribution of T # is found by an ordinary Monte-Carlo
simulation!
# #
If T < T(α/2B) or T > T((1−α/2)B) , reject H0

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 137 / 156


Bootstrap
Example 5

Nonparametric test for equality of two expectations


Two independent variables X and Y with expectations µX , µY
and unknown variances σX2 , σY2
Hypotheses H0 : µX = µY against H1 : µX 6= µY
Samples X1 , . . . , Xm and Y1 , . . . , Yn
Test statistic
µ̂X − µ̂Y
T =q
σ̂X2 + σ̂Y2

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 138 / 156


Bootstrap
Example 5 (contd)

Case I: resampling under the alternative hypothesis bootex5a.R

Draw X1∗ , . . . , Xm∗ with replacement from X1 , . . . , Xm


and Y1∗ , . . . , Yn∗ from Y1 , . . . , Yn
Compute the test statistic T ∗
Repeat this B times; calculate the quantile of T ∗

Reject H0 at level α = 0.05 if T < T(0.025B) ∗
or T > T(0.975B)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 139 / 156


Bootstrap
Example 5 (still contd)

Case II: resampling under the null hypothesis bootex5b.R

Estimate the joint expectation by


mµ̂X + nµ̂Y
µ̂ =
n+m
Translate X1 , . . . , Xm such that their mean is µ̂
Translate Y1 , . . . , Yn such that their mean is µ̂
Resample from the translated data (i.e. under the null hypothesis);
then continue as before

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 140 / 156


Bootstrap
Example 6

Nonparametric bootstrap for independence


Bivariate distribution (X , Y )
Hypothesis H0 : X and Y are stochastically independent
Sample (X1 , Y1 ) , . . . , (Xn , Yn )
Test statistic: Empirical coefficient of correlation
P  
Xi − X̄ Y i − Ȳ
T = Corr
d (X , Y ) = q
P 2 P 2
Xi − X̄ Yi − Ȳ

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 141 / 156


Bootstrap
Example 6 (contd)

Resampling under the null hypothesis bootex6.R

Draw X1# , . . . , Xn# with replacement from X1 , . . . , Xn


Independently, draw Y1# , . . . , Yn# with replacement from Y1 , . . . , Yn
Bootstrap distribution of

T # = Corr
d (X # , Y # )

# #
Reject H0 if T < T(0.025B) or T > T(0.975B)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 142 / 156


Bootstrap
Resampling methods: Parametric bootstrap

Parametric bootstrap under the alternative hypothesis


1 Estimate θ̂ from the original data X1 , . . . , Xn
2 The estimated distribution function is F̂ = Fθ̂
3 Draw X1∗ , . . . , Xn∗ from Fθ̂ and compute θ̂∗
4 Repeat step 3 a large number of times to determine the required
distribution

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 143 / 156


Bootstrap
Resampling methods: Parametric bootstrap

Parametric bootstrap under the null hypothesis


1 The estimated distribution function is F̂ = Fθ0 If the distribution
function is not completely specified by θ0 , choose F̂ “as close as
possible” to θ̂
2 Draw X1# , . . . , Xn# from Fθ0 and compute θ̂#
3 Repeat step 2 a large number of times to determine the required
distribution

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 144 / 156


Bootstrap
Resampling methods: Nonparametric bootstrap

Nonparametric bootstrap under the alternative hypothesis


1 The estimated distribution function is F̂ = Fn
(empirical distribution function)
2 Draw X1∗ , . . . , Xn∗ with replacement from X1 , . . . , Xn
and compute θ̂∗
3 Repeat step 2 a large number of times to determine the required
distribution

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 145 / 156


Bootstrap
Resampling methods: Nonparametric bootstrap

Nonparametric bootstrap under the null hypothesis


1 The estimated distribution function F̂ is a weighted empirical
distribution function
2 Draw X1# , . . . , Xn# with replacement (but with different probabilities)
from X1 , . . . , Xn
The probabilities are chosen such that F̂ satisfies H0 . If not unique,
choose an optimality criterion, e.g. maximal entropy
3 Repeat step 2 a large number of times to determine the required
distribution

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 146 / 156


Bootstrap
Resampling methods: Smooth bootstrap

Smooth bootstrap under the alternative hypothesis


Kernel density estimation (e.g. with Gaussian kernel φ)
n  
1 X x − Xi
fˆX (x) = φ
nh h
i=1
Rx
Estimated distribution function F̂ (x) = ˆ
−∞ fX (z)dz
Draw X1∗ , . . . , Xn∗ from F̂ (x)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 147 / 156


Bootstrap
Resampling methods: Smooth bootstrap

Drawing from F̂ (x) is equivalent to the following method:


1 Draw Z1 , . . . , Zn with replacement from X1 , . . . , Xn
2 Draw ε1 , . . . , εn from a standard normal distribution
3 For i = 1, . . . , n, compute

Xi∗ = Z1 + hεi

Smooth bootstrap: nonparametric bootstrap with additional noise

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 148 / 156


Bootstrap
Warning

The bootstrap approximates the distribution of θ̂ (or some


transformations of θ̂) if the model is correctly specified
Bias due to misspecification cannot be found by bootstrapping!
Example: Errors-in-variables, omitted variables
The validity of the bootstrap approximation can usually be shown
only asymptotically, i.e. for B → ∞ and n → ∞
Experience shows that the bootstrap often yields good approximations
of the small-sample distribution of θ̂

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 149 / 156


Bootstrap
Regression

Simple linear regression model

yi = α + βxi + ui

for i = 1, . . . , n with i.i.d. error terms ui


Let E (ui |xi ) = 0 for all i = 1, . . . , n
OLS estimator of β is
Pn
i=1 (xi − x̄) (yi − ȳ )
β̂ = Pn 2
i=1 (xi − x̄)

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 150 / 156


Bootstrap
Regression

OLS estimator of α is α̂ = ȳ − β̂ x̄
Fitted values
ŷi = α̂ + β̂xi
Residuals
ûi = yi − ŷi
Estimated error term variance
n
1 X 2
σ̂ 2 = ûi
n−2
i=1

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 151 / 156


Bootstrap
Regression

How can we construct a (1 − α)-confidence interval for β?


Usual approach: Normal approximation
h i
β̂ − 1.96 · SE (β̂); β̂ + 1.96 · SE (β̂)
q
σ̂ 2 / (xi − x̄)2
P
with standard errors SE (β̂) =
Alternative method (1): bootstrap the residuals
Alternative method (2): bootstrap the observations (xi , yi )

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 152 / 156


Bootstrap
Regression

Bootstrap the residuals


The unknown distribution function F is the distribution function of
the error terms
The estimated distribution function F̂ is the (parametrically or
nonparametrically) estimated distribution function of the residuals
û1 , . . . , ûn
The x-values are kept constant
Only the error terms are resampled

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 153 / 156


Bootstrap
Regression

Algorithm (nonparametric) bootregr1.R

1 Estimate the model (β̂) from the data and calculate û1 , . . . , ûn
2 Draw a resample u1∗ , . . . , un∗ with replacement from û1 , . . . , ûn
3 For i = 1, . . . , n generate

yi∗ = α̂ + β̂xi + ui∗

4 Compute β̂ ∗ from (x1 , y1∗ ), . . . , (xn , yn∗ )


5 Proceed as usual

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 154 / 156


Bootstrap
Regression

Bootstrap of the observations


The unknown distribution function F is the joint distribution function
of (xi , yi )
The estimated distribution function F̂ is the (usually
nonparametrically) estimated multivariate distribution function of the
observations (x1 , y1 ), . . . , (xn , yn )
The x-values are different in each resample

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 155 / 156


Bootstrap
Regression

Algorithm bootregr2.R

1 Estimate β̂ from the data


2 Draw a resample (x1∗ , y1∗ ), . . . , (xn∗ , yn∗ ) with replacement from
(x1 , y1 ), . . . , (xn , yn )
3 Compute β̂ ∗ from (x1∗ , y1∗ ), . . . , (xn∗ , yn∗ )
4 Proceed as usual

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 156 / 156

You might also like