0% found this document useful (0 votes)

59 views18 pages

Ch. 1 - Endogeneity

This document introduces the concept of endogeneity in linear regression models and discusses instrumental variable techniques to address it. It notes that when explanatory variables are correlated with the error term, ordinary least squares estimates will be biased and inconsistent. Instrumental variables are introduced as a solution, which must be correlated with endogenous regressors but uncorrelated with the error term. The instrumental variable estimator and two-stage least squares estimator are then derived and shown to produce unbiased estimates when their assumptions are met.

Uploaded by

Volkan Veli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views18 pages

Ch. 1 - Endogeneity

Uploaded by

Volkan Veli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Chapter 1

Endogeneity in Single-Equation
Linear Models

1.1 Introduction

So far in our models we have assumed that the set of explanatory variables, X, are
exogenous variables. By exogeneity, conceptually, we mean that all variables in X are
generated independently from the dependent variable y. For example, consider that we
are trying to estimate the amount of money donated to charity by a household. Such
an amount depends on the marginal tax rate that an individual is paying, since if the
individual itemizes her deductions she has an incentive to give to pay lower taxes. But
it is also clear that the marginal tax rate depends on the amount donated, because a
large enough gift will bring the household to a lower tax bracket. Money donations and
marginal tax rate are thus generated in the same process, and the marginal tax rate is
an endogenous variable if used as an independent variable in a model to explain money
donations.

The consequence of all explanatory variables being exogenous is that the the errors
are independent of, unrelated with, the regressors, i.e. E[εi |xi ] = 0. In the presence of
endogenous regressors, these endogenous regressors now provide information about the
expected value of the residuals, i.e. E[εi |xi ] = ηi . This implies that the residuals and the
regressors are correlated, i.e. E [xi εi ] = γ, where γ 6= 0. In the presence of endogenous

1
2 CHAPTER 1. ENDOGENEITY

regressors, the OLS estimator is biased and inconsistent, so the OLS estimate βbj no
longer measures the marginal effect on y of the j th regressor variable, xj . The expected
value of the OLS estimator is
h i
E β|X = β + X0 X−1 X0 η.
b (1.1)

To understand this last point better, consider once again that we are trying to esti-
mate how much money households give to charity. This amount is related to the level
of social awareness that the individuals in the household have. Unfortunately such a
variable is not observable, so its information is collected in the error, εi . The problem is
that the level of social awareness is normally correlated with the age, level of education,
and other demographic variables that you would include as exogenous regressors in the
model, but that they would actually turn out to be endogenous. So these regressors
would be correlated with the residuals, and the estimate of the coefficients on those re-
gressors would not only measure the marginal effect of the regressors, but also some of
the effect caused by the unobservable level of social awareness, making the OLS estimates
biased.

1.2 Instrumental Variables

The most efficient way to solve the problem that endogeneity presents is through the use
of instrumental variables (IV) estimators. Instrumental variables are a set of variables,
Z, that satisfy the following two properties:

1. Exogeneity: They are uncorrelated with the residuals, i.e. E [zi εi ] = 0.

2. Relevance: They are correlated with the independent variables, i.e. E [zi xi ] 6= 0.

The difficulty is in finding actual valid instruments, which are sometimes very hard
to come by. In the first example of endogenous variable we considered, the marginal tax
rate, a common approach is to use the “first dollar rate” which is the marginal tax rate
that an individual would face if she didn’t give any money to charity, and thus would
not itemize them. You can see how the second example presents a harder case, since it
is difficult to find a variable that is correlated with social awareness that is not already
included in the model, and thus could be used as an instrument for social awareness.
1.3. THE IV ESTIMATOR 3

Before we consider the IV estimators we can use, let us setup the problem we are
going to be considering. The model we want to estimate is

y = Xβ + ε. (1.2)

Equation (1.2) is called the structural equation. Let X = (X1 X2 ) where, X1 is the set
of endogenous variables, and X2 is the set of exogenous variables. We also assume that
there is a matrix of exogenous variables Z = (Z1 X2 ) that are relevant.

1.2.1 Identification

Depending on the dimension of Z relative to X, i.e. how many instruments are there
compared to the regressors, we have three cases of identification.

If dim(Z) < dim(X), i.e. there are fewer instrumental variables than endogenous
regressors, we have the not-identified, or underidentified, case. In this case there is no
consistent IV estimator. This case often happens in practice, and there is very little you
can do.

When dim(Z) = dim(X), i.e. the number of instruments equal the number of endoge-
nous regressors, we have the just-identified case. In this case we use the IV estimator
(next section).

Finally, when dim(Z) > dim(X), i.e. there are more instruments than endogenous
regressors, we have the overidentified case. In this case we consider the two-stage least
squares (2SLS) estimator.1

1.3 The IV Estimator

The IV estimator is used for the just-identified case, which is were dim(Z) = dim(X).
Let’s consider the exogeneity assumption E [zi εi ] = 0. From the structural equation
(1.2), εi = yi − x0i β, so the exogeneity condition becomes E [z0i (yi − x0i β)] = 0. The
1
Cameron and Trivedi (2009, ch. 6) also covers the generalized method of moments (GMM) estimator.
It does not provide the estimate for the estimate of the asymptotic variance with homoskedastic errors,
however.
4 CHAPTER 1. ENDOGENEITY

expected value of a variable

Pn in0 a sample is its mean (average), so for a sample of size
0
n,
Pnthis0 is equivalent to i=1 zi (yi − xi β) /n = 0, which implies that it must be that
0
i=1 zi (yi − xi β) = 0. In matrix form, since Z and X have the same dimension, this
converts to Z0 (y − Xβ) = 0. Solving for β yields the IV estimator

b = (Z0 X)−1 Z0 y.
β (1.3)
IV

If we use the fact that y = Xβ + ε, then

b IV = β + (Z0 X)−1 Z0 ε.
β

Taking expectations, conditional on Z, on both sides:

h i
E β b IV | Z = β + (Z0 X)−1 Z0 E [ε | Z] = β. (1.4)

Equation (1.4) shows that the IV estimator is an unbiased estimator of the popula-
tion’s vector of coefficients. We now consider the variance of the estimator.
h i 0
V β b IV | Z = E β b IV − β β b −β |Z
IV
h i
−1 −1
= E (Z0 X) Z0 εε0 Z (X0 Z) | Z
−1 −1
= (Z0 X) Z0 E [εε0 | Z] Z (X0 Z)
−1 −1
= σ 2 (Z0 X) Z0 Z (X0 Z) (1.5)

To estimate equation (1.5) we then need a sample estimate of σ 2 . Let b ε = y−

0
2

Xβ IV , be the estimate of the errors. It can be shown that σ
b b = b εb ε /n is a consistent
estimator of σ 2 .2 So the estimate of the asymptotic variance of the IV estimator, with
homoskedastic errors, is:3
h i
−1 −1
V β IV = σ
b b b2 (Z0 X) Z0 Z (X0 Z) . (1.6)

2
See Greene (2012, pp. 226–227).
3
We will cover heteroskedasticity in the next chapter. Cameron and Trivedi (2009, p. 176) presents
the robust estimates of the asymptotic variance for the different estimators.
1.4. 2SLS 5

1.4 Two-Stage Least Squares

Consider now the case when we have more instruments than endogenous variables so
dim(Z) > dim(X). Let Z have L variables, and X have K variables, where L > K. It
b is no longer valid because Z0 X is now L × K, and since L > K, this
is clear that β IV
matrix is not invertible. What can we do in this situation? One solution is to drop L−K
variables from Z and use the IV estimator we just covered in the previous section.

Even though there may be ways of making the elimination of those “extra” variables
non-arbitrary, the 2SLS estimator is much more efficient. The estimator is
h i−1
b 2SLS = X0 Z (Z0 Z)−1 Z0 X
β
−1
X0 Z (Z0 Z) Z0 y. (1.7)

Once again, considering that y = Xβ + ε implies that

h i−1
0 0 −1 0 −1
β
b
2SLS = β + X Z (Z Z) Z X X0 Z (Z0 Z) Z0 ε.

Taking conditional expectations on Z, on both sides:

h i h i−1
0 0 −1 0 −1
E βb
2SLS | Z = β + X Z (Z Z) Z X X0 Z (Z0 Z) Z0 E [ε | Z] = β. (1.8)

As was the case with the IV estimator, the 2SLS is also an unbiased estimator of the
population’s vector of coefficients. To consider the variance of the 2SLS estimator, let
QZ = Z0 Z, QZX = Z0 X, and QXZ = X0 Z. The variance of the estimator can then be
expressed as

h i 0
V β 2SLS | Z = E β 2SLS − β β 2SLS − β |Z
b b b
h −1 −1 i
= E QXZ QZ −1 QZX QXZ QZ −1 Z0 εε0 ZQZ −1 QZX QXZ QZ −1 QZX |Z
−1 −1
= QXZ QZ −1 QZX QXZ QZ −1 Z0 E [εε0 | Z] ZQZ −1 QZX QXZ QZ −1 QZX
−1 −1
= σ 2 QXZ QZ −1 QZX QXZ QZ −1 QZ QZ −1 QZX QXZ QZ −1 QZX
−1
= σ 2 QXZ QZ −1 QZX .
6 CHAPTER 1. ENDOGENEITY

Substituting back the values for the different Q matrices, we have that the variance of
the 2SLS estimator is
h i h i−1
V β b 2SLS | Z = σ 2 X0 Z (Z0 Z)−1 Z0 X . (1.9)

ε0 b

The estimate for the errors is now b
ε = y−Xβ b2 = b
b 2SLS . Letting, once more, σ ε /n,
the estimate of equation (1.9) is
h i h i−1
2 0 0 −1 0
V β 2SLS = σ
b b b X Z (Z Z) Z X (1.10)

It can easily be shown that equations (1.7) and (1.9) simplify to equations (1.3) and
(1.5), respectively, when L = K, i.e. when Z and X have the same number of variables:
the just-identified case. You should check this on your own. This means that the 2SLS
estimator equals the IV estimator in the just-identified case.

1.4.1 Two-Step Estimator

The reason for the 2SLS estimator to be called the way it is, is because it can be
implemented by a 2-step estimation of an OLS model in each step. The process is the
following:

1. Run OLS regressions of each endogenous variable on X on Z.

2. Run an OLS regression of y on a set of variables, X,
b where you have replaced the
endogenous variables with the predictions from the first step.

Let us check how the formula for the end estimator is the same as that on equation (1.7).
As Greene (2012, p. 231) says “if any column of X also appears in Z, then that column of
X is reproduced exactly in X.”
b This means that you can think of X b as the prediction of
a system of OLS estimations of X on Z. Let β b be the OLS estimate of such a system.
1S
0 −1 0 b = Z (Z0 Z)−1 Z0 X.
Since it is an OLS regression β 1S = (Z Z) Z X, and X
b b = Zβ 1S
0 −1 0
This means that the OLS estimate of the second stage is β 2S = X
b bXb X
b y. Note
b 0 = X0 Z (Z0 Z)−1 Z0 , and X
that X b 0X
b = X0 Z (Z0 Z)−1 Z0 X after some simplification. There-
1.5. SPECIFICATION TESTS 7
h i−1
fore, the second stage OLS estimate is βb = X0 Z (Z0 Z)−1 Z0 X −1
X0 Z (Z0 Z) Z0 y,
2S
which is the same as in equation (1.7).

1.5 Specification Tests

So far we have considered how OLS estimation with endogenous regressors is biased,
and the estimators that can be used depending on how the model is identified. In this
section we consider two tests of specification. The first one is to test for endogneity in
the regressors, and the second one is to test for the validity of the instruments in the
overidentification case.

1.5.1 Hausman-Wu Test for Endogeneity

To understand this test, notice that in the presence of endogenous regressors the OLS
coefficients’ estimator βb is inconsistent, and the IV or 2SLS estimators are consistent.
However, if there are no endogenous regressors, all estimators are consistent, but the OLS
estimator is much more efficient, i.e. has lower standard errors. In this presentation I
will refer to the IV estimator βb IV . The exposition and implementation are also valid for
the 2SLS estimator, so you can think of β b IV as representing both the IV and the 2SLS
estimators.

Under the null hypothesis of no endogeneity both estimators, β b and β

b IV , are consis-
tent. This means that plimd = 0, where d = β b IV − β.
b This means that we can setup a
Wald test statistic to test this
−1 0 h i−1
H = d0 V b [d] d= β b −β
IV
b V
b b −β
β IV
b b −β
β IV
b . (1.11)

The test statistic can in equation (1.11) can be simplified. First, notice that
h i h i h i h i
V
b β b IV − β
b =V b β b IV + V
b β b − 2C b β,b β
b IV , (1.12)

where Cb [·] represents the estimate of the asymptotic covariance between the elements
in the brackets.
8 CHAPTER 1. ENDOGENEITY

Second, and paraphrasing Greene (2012, p.235), “the covariance between an efficient
estimator, β b , of a parameter vector, β, and its difference from an inefficient estimator,
E
b bE − β
β I , of the same parameter vector, β b I , is zero.” This is saying that
h i h i h i
C β E , β E − β I = V β E − C β E , β I = 0,
b b b b b b b b b

which means that

h i h i
C
b β b ,β
E
b =V
I
b β
b .
E (1.13)

For our case under the null hypothesis, the efficient estimator is the OLS estimator,
β E = β,
b b and the inefficient estimator is the IV estimator, β b = β
I
b . So applying
IV
equation (1.13) in equation (1.12), we have that
h i h i h i
Vb β b −β b =V b β
b − V
b β
b . (1.14)
IV IV

Finally, using equation (1.14) in equation (1.11), we have the test statistic
0 h i h i−1
H= βb −βb V
b β
b − V
b β
b b −β
β b . (1.15)
IV IV IV

This test statistic is χ2 (J) distributed, where J is the number of endogenous variables
you are testing for. To understand why, notice that in OLS we are assuming that there
is no correlation between the explanatory variables and the coefficients. If there are J
variables that are correlated with the residuals, i.e. endogenous, OLS would in fact be
placing restrictions on J correlation coefficients to be zero.

Durbin-Watson’s implementation of the test

The test for endogeneity can be implemented as a two-step Durbin-Watson test of sig-
nificance. The first step is the same one as in the 2SLS two-step process, i.e. do an
OLS regression of each endogenous variable on the exogenous variables, Z, and estimate
the errors. Group the estimates of the errors in a matrix E,
b and perform another OLS
regression of the following augmented model:

y = Xβ + Eρ
b + ν. (1.16)
1.5. SPECIFICATION TESTS 9

Having done an OLS estimation of the model, we test for the joint significance of ρ,
i.e. if the restrictions that ρ = 0 hold. Notice that the number of columns of ρ is the
number of endogenous variables that we are testing are endogenous. Rejecting the null
hypothesis means that at least one of the variables is endogenous, but it will not tell
you which one. You can, of course, perform individual Wald tests of significance for the
different ρj in ρ to see if one variable is endogenous or not.

1.5.2 Test for Overidentification

This purpose of the test is to see if the instruments are valid. In the just-identified
case it is not possible to test for the validity of the instrument, only whether there
is endogeneity. In the overidentification case we can test if the additional (L − K)
instruments are valid. The idea behind the test is a comparison between the 2SLS
estimator and the IV estimator when considering the just-identified case of the model.
If all the instruments are valid, both estimates should only differ because of sampling
error.

The null hypothesis of the test is that the orthogonality condition E [zi εi ] = 0 is true.
If the null cannot be rejected then the additional instruments are valid. If we can reject
the null hypothesis we can accept the alternative that at least one of the instruments is
not valid. Notice that the test does not tell us which is invalid, only that at least one is
invalid.

Let m̄ = n−1 Z0b

ε2SLS . The test statistic is then
χ2 [L − K] = m̄0 (V [m̄])−1 m̄. (1.17)

What we need, then is an estimate for the variance. Under the assumption of the
model
σ2
V [m̄] = 2 Z0 Z. (1.18)
n
This can easily be estimated using a sample estimator of σ 2 .

Greene (2012, p. 238) mentions that a more favored estimate of the variance is
b [m̄] = 1
X
V εb 2 zi z0 . (1.19)
n2 i=1 2SLS,i i
10 CHAPTER 1. ENDOGENEITY

Implementation of the test

The test for overidentification, under the assumption of homoskedastic errors, has a very
easy implementation. First run the 2SLS model and keep the residuals, v b. Then regress
2 2
v on Z, and the test statistic is n × R , and it follows a χ distribution with L − K
b
degrees of freedom.

1.6 Weak Instruments

So far we have concentrated on testing the identification condition, i.e. whether the
instruments are exogenous. However, for there to be valid instruments they must also
be relevant, i.e. correlated with the endogenous variables. The relevance assumption
implies that

1
plim Z0 X = QZX , a finite, nonzero, L × K matrix with rank K. (1.20)
n

Conceptually, given that the instruments are exogenous, if the condition in (1.20) is
satisfied then the IV and 2SLS estimators are consistent. While this is true, the case
of weak instruments, where (1/n) Z0 X is close to zero (null matrix). In principle the
problem of weak instruments it that it produces a large asymptotic variance and thus
lose precision. However, Nelson and Startz (1990b,a), and Hahn and Hausman (2003)
list two implications:

1. the 2SLS estimator is badly biased toward the OLS estimator, which is known to
be inconsistent; and

2. the standard first-order asymptotics will not give an accurate framework for sta-
tistical inference.

The problem of weak instruments therefore go beyond a lack of precision. Bound et al.
(1995) present some evidence that this problem extends beyond “small sample problems.”
1.6. WEAK INSTRUMENTS 11

1.6.1 Diagnostics for Weak Instruments

In order to see if we may be dealing with weak instruments, there are several things we
can do. The simplest thing we can do is to observe the correlations of the endogenous
regressors with the instruments. This will allow us to see which instruments have low
correlations and may be trouble.

When there is just one endogenous variable identification method is based on the F
statistic of the Wald test of joint significance of the instruments used in the first stage
regression of the 2SLS regression. The rule of thumb is that an F statistic of less than
10 will indicate weak instruments.

When we have several endogenous variables, checking each one of them separately
using this method is not sufficient, because collinearity among the variables is important
to check, but doing the individual checks would not account for it. Shea (1997) developed
a partial Rk2 statistic, where k represents the endogenous regressor, that Godfrey (1999)
simplifies to

(X0 X)−1 [k, k]

Rk2 = −1 . (1.21)
XX
b 0 b [k, k]

The F statistic can be calculated as F = [Rk2 / (L − 1)] / [(1 − Rk2 ) / (n − L)], assuming
that Z has a constant.4

1.6.2 Formal Tests for Weak Instruments

Stock and Yogo (2005) present two tests using the same statistic. The test statistic
depends on whether we have just one endogenous variable or more. If there is only
one endogenous variable, the test statistic is the F statistic from the Wald test of joint
significance of the instruments in the first stage regression we mentioned before. With
more than one endogenous variable, since there is more than one first-stage regression,
the test statistic is the minimum eigenvalue of a matrix analog of the F statistic that
is defined in Stock and Yogo (2005, p. 84) (the Cragg-Donald statistic from Cragg and
Donald (1993)).
4
See Greene (2012, p. 250)
12 CHAPTER 1. ENDOGENEITY

The first test they propose is whether the instruments are weak because the estimate
will be highly biased. The critical values for this test are provided in Stock and Yogo
(2005, Table 5.1 p. 100) and they depend on the number of endogenous variables, n, the
number of exclusion restrictions (I understand this as the number of instruments used),
K2 , and the bias toleration you want with respect to the OLS estimator. For example,
for a model with one endogenous variable (n = 1), three instruments (K2 = 3), and a
5% bias toleration over OLS, the critical value would be 13.91. We would then reject
the null hypothesis that the instruments are weak because of the bias when the statistic
discussed before (since n = 1 in this case it would be the F statistic we discussed before)
would be larger than 13.91. Unfortunately the tables only provide critical values when
the model has two or more over identifying restrictions, i.e. L − K ≥ 2.

The second test they propose is related to how weak instruments could lead to size
distortion of Wald tests of significance on the parameters in finite samples. The critical
values for this test are provided in Stock and Yogo (2005, Table 5.2 p. 101). This table
gives critical values for K2 ≥ 1 so you can perform the test for both just-identified and
overidentified cases. The critical values are based on a level of significance of 5% for the
Wald tests. We need to select how much distortion we want to tolerate on the Wald tests
to set r. If we want a distortion of 5% since the Wald test is assumed to be performed at
the 5% level of significance, r = 5% + 5%. If we are willing to accept a distortion of 10%,
then r = 5% + 10% = 15. The other two numbers we need are K2 and n, which represent
the same as in the first test. So for example if we are willing to tolerate a distortion of
5%, and we have 1 endogenous variable, and 2 instruments, the critical value would be
19.93. We would reject the null of weak instruments because they can distort the size of
the Wald tests of significance if the test statistic is greater than 19.93.

1.7 Assignment

The dataset we are going to use for this assignment is the one used in Cameron and
Trivedi (2009, Ch. 6), an extract from the Medical Expenditure Panel Survey (MEPS)
of individuals 65 years and older who qualify for health care under the US Medicare
program. Medicare does not cover all medical expenses. For example, co-payments for
medical services and expenses of prescribed pharmaceutical drugs were not covered for
the time period studied here. About half of the eligible individuals therefore purchases
supplementary insurance in the private market that provides insurance coverage against
various out-of-pocket expenses.
1.7. ASSIGNMENT 13

The files containing the data for this assignment are Endogeneity.dta (Stata format)
and Endogeneity.txt (comma-separated values text file with the names of the variable
in the first row). Table 1.1 presents the description of the variables in the dataset.

Table 1.1: Variable Description

Name Description
ssiratio Social Security Income / Income ratio
age Age
educyr Years of education
drugexp Presc-drugs expense
private Private insurance binary indicator (1 = Has private insurance)
female Female binary indicator (1 = Female)
hisp Hispanic binary indicator (1 = Hispanic)
marry Married binary indicator (1 = Married)
vegood V-good health binary indicator (1 = very good health)
good Good health binary indicator (1 = good health)
fair Fair health binary indicator (1 = fair health)
poor Poor health binary indicator (1 = poor health)
poverty Poor binary indicator (1 = poor)
lowincome Low income binary indicator (1 = low income)
midincome Middle income binary (1 = middle income)
msa Met-Stat area binary indicator (1 = lives in metropolitan area)
firmsz Firm’s employed labor force size
multlc Multiple locations binary indicator (1 = multiple locations)
income Income in thousands of dollars
priolist Priority list condition binary indicator (1 = priority list)
totchr Total number of chronic conditions
black Black binary indicator (1 = black)
hi empunion Insured through emp/union binary indicator (1 = yes)
ldrugexp ln(drugexp)
age2 Age-squared
linc ln(income)
vgh vg or good health binary indicator
fph fair or poor health binary indicator
blhisp Black or Hispanic
14 CHAPTER 1. ENDOGENEITY

For this assignment we want to estimate a model that explains the expenditure on
prescription drugs of the individuals. Since the variable is highly skewed to the right
even for non-zero values (you should check this) we are going to used the natural log
of the variable for the model (ldrugexp). The explanatory variables are going to be
hi empunion, totchr, age, female, blhisp, and linc. The following are the tasks you
are responsible for:

1. So you can compare to later estimates, run the OLS estimation of the model.
2. The variable hi empunion is suspected to be endogenous. Calculate the sample
correlation between hi empunion and ssiratio. Does ssiratio seem to be a
relevant instrument? Estimate an IV model using ssiratio as the instrument.
Compare the values of the coefficients, standard errors, and 95% confidence inter-
vals of the coefficients between this estimation and the OLS estimation you did in
part 1. Test for endogeneity using the Durbin-Watson implementation of the test.
Does the test confirm that hi empunion is endogenous? Perform the Stock and
Yogo (2005) test for weak instruments (remember we have only one endogenous
variable, so the test-statistic here is Wald’s F statistic of the joint significance of
the instruments in the first stage of the regression). Should we be concerned in
using ssiratio as an instrument with respect to size (remember that we can’t test
the bias in the just-identified case)?
3. The variables lowincome, multlc, and firmsz can also be used as instruments.
Estimate the sample correlation between hi empunion and each of these variables.
Do these variables seem to be relevant instruments? Estimate a 2SLS model using
all available instruments (including ssiratio and the ones mentioned here). Com-
pare the values of the coefficients, standard errors, and 95% confidence intervals
for the coefficients between this estimation and the OLS and IV estimation you
did in parts 1 and 2, respectively. Test for endogeneity using the Durbin-Watson
implementation. Does hi empunion seem to be endogenous? Test for the validity
of the overidentifying instruments using the implementation of the test discussed
here. Are the additional instruments valid? Finally, perform the Stock and Yogo
(2005) test for weak instruments (remember we have only one endogenous vari-
able, so the test-statistic here is Wald’s F statistic of the joint significance of the
instruments in the first stage of the regression). Should we be concerned in using
ssiratio as an instrument with respect to bias and/or size?

As a general rule for this and all assignments, before doing any estimation requested
1.8. STATA COMMANDS 15

in the list above, you should describe the variables you are going to use in this analysis,
and what type of relationship you observe a-priori between the dependent variable and
the independent variables. The writeup should have the following format:

a. Introduction Introduce the analysis you are going to perform. What’s the point of
the model you want the estimates for? What potential (yes, they are potential at
this stage) issues may the data have, and what are you going to do about them?

b. Data description Summary of the different variables to be used in the data. Anal-
ysis of the relationship between the independent variable and each of the dependent
variables (in this case of the structural model).

c. Estimations Present the estimations you have performed, and explain the results.
What coefficients are significant? What tests have you performed? Why have you
performed them? What do they show?

d. Conclusions Present the conclusions of your estimations here. Remember that even
though the assignment may be helping you learn things about a certain topic, the
purpose of a model estimation is always the same, how to describe the dependent
variable and inferencing about the population coefficients!!!

1.8 Stata Commands

The Stata command for performing linear IV estimations is ivregress. This command
allows also for estimation through a generalized method of moments (GMM) estimator,
and a limited-information maximum likelihood estimator. For the purposes of the course,
we will use the 2SLS estimator. Notice that for a just-identified case, the 2SLS estimator
is identical to the IV estimator, so in Stata we also use 2SLS for the just-identified case.

The tests of endogeneity and overidentification are easily performed after estimation
of the model with the estat endogenous and estat overid commands, respectively.
Cameron and Trivedi (2009, Section 6.3 p. 177) present a worked out example of how to
perform the estimations and tests in Stata. The example there corrects the estimate of
the errors for heteroskedasticity, but since that is the topic of the next chapter, for now
you can mimic what they do but without adjusting for heteroskedasticity.
16 CHAPTER 1. ENDOGENEITY

The post estimation command estate firststage provides the F statistic for the
joint significance of the instruments used in the first-step regression, Shea (1997)’s Rk2
statistics, as week as the minimum eigenvalue test statistic together with the critical
values for both of the tests proposed by Stock and Yogo (2005).
Bibliography

Bound, John, David A. Jaeger, and Regina M. Baker, “Problems with Instru-
mental Variables Estimation When the Correlation Between the Instruments and the
Endogenous Explanatory variable is Weak,” Journal of the American Statistical As-
sociation, June 1995, 90 (30), 443–450.

Cameron, A. Colin and Pravin K. Trivedi, Microeconometrics Using Stata, 1 ed.,

College Station, TX USA: Stata Press, 2009.

Cragg, John G. and Stephen G. Donald, “Testing Identifiability and Specification

in Instrumental Variable Models,” Econometric Theory, June 1993, 9 (2), 222–240.

Godfrey, Leslie G., “Instrument Relevance in Multivariate Linear Models,” The Re-
view of Economics and Statistics, August 1999, 81 (3), 550–552.

Greene, William H., Econometric Analysis, 7 ed., Upper Saddle River, NJ USA:
Prentice Hall, 2012.

Hahn, Jinyong and Jerry Hausman, “Weak Instruments: Diagnosis and Cures in
Empirical Econometrics,” The American Economic Review, May 2003, 93 (2), 118–
125.

Nelson, Charles R. and Richard Startz, “The Distribution of the Instrumental

Variables Estimator and Its t-Ratio When the Instrument is a Poor One,” The Journal
of Business, January 1990, 63 (1), S125–S140.

and , “Some Further Results on the Exact Small Sample Properties of the Instru-
mental Variable Estimator,” Econometrica, July 1990, 58 (4), 967–976.

Shea, John, “Instrument Relevance in Multivariate Linear Models: A Simple Measure,”

The Review of Economics and Statistics, May 1997, 79 (2), 348–352.

17
18 BIBLIOGRAPHY

Stock, James H. and Motohiro Yogo, “Testing for Weak Instruments in Linear IV
Regression,” in Donald W. K. Andrews and James H. Stock, eds., Identification and In-
ference for Econometric Models: Essays in Honor of Thomas Rothenberg, Cambridge:
Cambridge University Press, 2005, pp. 80–108.

Chapter 15
No ratings yet
Chapter 15
38 pages
Lecture Set 7
No ratings yet
Lecture Set 7
88 pages
Chapter 1 - Instrumental Variable Method
No ratings yet
Chapter 1 - Instrumental Variable Method
32 pages
s10 IV Handout
No ratings yet
s10 IV Handout
48 pages
Cathy Econ0019 - w2
No ratings yet
Cathy Econ0019 - w2
62 pages
Applied Longitudinal Analysis. ISBN 0470380276, 978-0470380277
100% (26)
Applied Longitudinal Analysis. ISBN 0470380276, 978-0470380277
23 pages
Block 4
No ratings yet
Block 4
51 pages
Lecture3.2 IV 2020
No ratings yet
Lecture3.2 IV 2020
63 pages
Chapter 2 SEM
No ratings yet
Chapter 2 SEM
33 pages
15 Instrumental Variables
No ratings yet
15 Instrumental Variables
27 pages
Cathy Econ0019 - w3
No ratings yet
Cathy Econ0019 - w3
44 pages
Metrics WT 2023-24 Unit12 Iv+2sls
No ratings yet
Metrics WT 2023-24 Unit12 Iv+2sls
32 pages
Instrumental Variables: Ani Katchova
No ratings yet
Instrumental Variables: Ani Katchova
27 pages
Instrumental PDF
No ratings yet
Instrumental PDF
69 pages
Lecture Note 5: I V (IV) E: Outline
No ratings yet
Lecture Note 5: I V (IV) E: Outline
16 pages
Slides 5 Iu
No ratings yet
Slides 5 Iu
38 pages
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
No ratings yet
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
30 pages
A Robust Instrumental-Variables Estimator: 1 Theory
No ratings yet
A Robust Instrumental-Variables Estimator: 1 Theory
13 pages
Solution Assignment
No ratings yet
Solution Assignment
34 pages
Lectute 1 - Instrumental Variable Method
No ratings yet
Lectute 1 - Instrumental Variable Method
32 pages
CH 15
No ratings yet
CH 15
21 pages
Endogeneity
No ratings yet
Endogeneity
73 pages
Week 10
No ratings yet
Week 10
42 pages
Endogeneity 6
No ratings yet
Endogeneity 6
16 pages
Ec0 8203 Econometrics Ppt6b
No ratings yet
Ec0 8203 Econometrics Ppt6b
25 pages
Variáveis Instrumentais
No ratings yet
Variáveis Instrumentais
21 pages
Instrumental Variables Slides 2021
No ratings yet
Instrumental Variables Slides 2021
26 pages
Econometrics Eviews 4
No ratings yet
Econometrics Eviews 4
14 pages
cn4 IV
No ratings yet
cn4 IV
18 pages
Economics 717 Fall 2019 Lecture - IV PDF
No ratings yet
Economics 717 Fall 2019 Lecture - IV PDF
30 pages
Lecture: Simultaneous Equation Model (Wooldridge's Book Chapter 16)
No ratings yet
Lecture: Simultaneous Equation Model (Wooldridge's Book Chapter 16)
28 pages
Lecture 2 - Instrumental Variable
No ratings yet
Lecture 2 - Instrumental Variable
18 pages
Notes 11
No ratings yet
Notes 11
9 pages
5 Ivmf
No ratings yet
5 Ivmf
13 pages
Endogeneity
No ratings yet
Endogeneity
9 pages
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
No ratings yet
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
21 pages
PH1630 Chapter8 Notes
No ratings yet
PH1630 Chapter8 Notes
8 pages
Lecture 12 Instrumental Variables
No ratings yet
Lecture 12 Instrumental Variables
5 pages
Endogeneity and Instruments
No ratings yet
Endogeneity and Instruments
5 pages
Panel Data
No ratings yet
Panel Data
14 pages
1 IV/2SLS Estimator is Biased but Consistent: i i 0 i i 1i ξ
No ratings yet
1 IV/2SLS Estimator is Biased but Consistent: i i 0 i i 1i ξ
4 pages
Inoue TWOSAMPLEINSTRUMENTALVARIABLES 2010
No ratings yet
Inoue TWOSAMPLEINSTRUMENTALVARIABLES 2010
6 pages
2 - Instrumental Variable
No ratings yet
2 - Instrumental Variable
77 pages
Lectures On IV Estimation: 1 General Set-UP
No ratings yet
Lectures On IV Estimation: 1 General Set-UP
7 pages
Vb V ε X = σ Vb = σ Vb = X'X Σx X'X: I X'X X'
No ratings yet
Vb V ε X = σ Vb = σ Vb = X'X Σx X'X: I X'X X'
9 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Endogeneity and Instrumental Variables
No ratings yet
Endogeneity and Instrumental Variables
22 pages
Econometrics II. Lecture Notes 1
No ratings yet
Econometrics II. Lecture Notes 1
17 pages
MIT Microeconomics 14.32 Final Review
No ratings yet
MIT Microeconomics 14.32 Final Review
5 pages
Development Economics I Dr. Elisabetta Gentile: Orientation Tutorial
No ratings yet
Development Economics I Dr. Elisabetta Gentile: Orientation Tutorial
11 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
2 pages
Section 12 PDF
No ratings yet
Section 12 PDF
7 pages
Method of Moment
No ratings yet
Method of Moment
53 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
ps8 +fall2013
No ratings yet
ps8 +fall2013
6 pages
An End-To-End Project On Time Series Analysis and Forecasting With Python
No ratings yet
An End-To-End Project On Time Series Analysis and Forecasting With Python
23 pages
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
No ratings yet
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
21 pages
07 - Lent - Topic 2 - Generalized Method of Moments, Part II - The Linear Model - mw217
No ratings yet
07 - Lent - Topic 2 - Generalized Method of Moments, Part II - The Linear Model - mw217
16 pages
Vol4 No1
No ratings yet
Vol4 No1
374 pages
Sample Quiz 2 Statistics Essentials of Business Development
No ratings yet
Sample Quiz 2 Statistics Essentials of Business Development
15 pages
2022 CFA L1 Review Workshop Questions Book Online
No ratings yet
2022 CFA L1 Review Workshop Questions Book Online
200 pages
A Combination of Mathematics, Statistics and Machine Learning To Detect Fraud
No ratings yet
A Combination of Mathematics, Statistics and Machine Learning To Detect Fraud
4 pages
Master's Written Examination and Solution
No ratings yet
Master's Written Examination and Solution
14 pages
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
No ratings yet
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
2 pages
Latent Variable: JUNE 7, 2018
No ratings yet
Latent Variable: JUNE 7, 2018
5 pages
Least Square Method Definition
No ratings yet
Least Square Method Definition
7 pages
STT-101 Probability and Statistics New
No ratings yet
STT-101 Probability and Statistics New
4 pages
MODULE 7 2 Hypothesis Testing CANVAS
No ratings yet
MODULE 7 2 Hypothesis Testing CANVAS
63 pages
Sample Size Guidelines For Log
No ratings yet
Sample Size Guidelines For Log
10 pages
Unit 2 1
No ratings yet
Unit 2 1
54 pages
Hunter, M. A., & Takane, Y. (2002) - Constrained Principal Component Analysis.
No ratings yet
Hunter, M. A., & Takane, Y. (2002) - Constrained Principal Component Analysis.
41 pages
Data Modification and Predictive Analytics - MCQ - 1 - 2
No ratings yet
Data Modification and Predictive Analytics - MCQ - 1 - 2
24 pages
Bloem PSRM 2023 Semester Test 3 Information and Additional Exercises - 024406
No ratings yet
Bloem PSRM 2023 Semester Test 3 Information and Additional Exercises - 024406
6 pages
Business Analytics IV
No ratings yet
Business Analytics IV
8 pages
Linear Models in Stata and Anova
No ratings yet
Linear Models in Stata and Anova
20 pages
Statistical Science: Volume 33, Number 2 May 2018
No ratings yet
Statistical Science: Volume 33, Number 2 May 2018
35 pages
Multinomial Logit Models
No ratings yet
Multinomial Logit Models
6 pages
Output SPSS (1) 1
No ratings yet
Output SPSS (1) 1
47 pages
(Solution) - Promana hw1
No ratings yet
(Solution) - Promana hw1
8 pages
Case Processing Summary
No ratings yet
Case Processing Summary
5 pages
Kuesioner Pengaruh Etika Komunikasi Antar Mahasiswa
No ratings yet
Kuesioner Pengaruh Etika Komunikasi Antar Mahasiswa
20 pages
Suggested Solution To Assignment 2 (2025, Allow With or Without Continuity Correction)
No ratings yet
Suggested Solution To Assignment 2 (2025, Allow With or Without Continuity Correction)
6 pages
Model Question Paper-2
No ratings yet
Model Question Paper-2
3 pages
Stat 1 Midterm Exam
No ratings yet
Stat 1 Midterm Exam
2 pages
Hypothesis Testing Flowchart v0.2 2017 02 03
No ratings yet
Hypothesis Testing Flowchart v0.2 2017 02 03
1 page
University of Michigan STATS 500 hw3 F2020
No ratings yet
University of Michigan STATS 500 hw3 F2020
2 pages
Introduction to Bessel Functions
From Everand
Introduction to Bessel Functions
Frank Bowman
2.5/5 (1)
Lectures on Measure and Integration
From Everand
Lectures on Measure and Integration
Harold Widom
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

Ch. 1 - Endogeneity

Uploaded by

Ch. 1 - Endogeneity

Uploaded by

Chapter 1

1.2 Instrumental Variables

1. Exogeneity: They are uncorrelated with the residuals, i.e. E [zi εi ] = 0.

1.3 The IV Estimator

expected value of a variable

If we use the fact that y = Xβ + ε, then

Taking expectations, conditional on Z, on both sides:

To estimate equation (1.5) we then need a sample estimate of σ 2 . Let b ε = y−

1.4 Two-Stage Least Squares

Once again, considering that y = Xβ + ε implies that

Taking conditional expectations on Z, on both sides:

1.4.1 Two-Step Estimator

1. Run OLS regressions of each endogenous variable on X on Z.

1.5 Specification Tests

1.5.1 Hausman-Wu Test for Endogeneity

Under the null hypothesis of no endogeneity both estimators, β b and β

which means that

Durbin-Watson’s implementation of the test

1.5.2 Test for Overidentification

Let m̄ = n−1 Z0b

Implementation of the test

1.6 Weak Instruments

1.6.1 Diagnostics for Weak Instruments

(X0 X)−1 [k, k]

1.6.2 Formal Tests for Weak Instruments

Table 1.1: Variable Description

1.8 Stata Commands

Cameron, A. Colin and Pravin K. Trivedi, Microeconometrics Using Stata, 1 ed.,

Cragg, John G. and Stephen G. Donald, “Testing Identifiability and Specification

Nelson, Charles R. and Richard Startz, “The Distribution of the Instrumental

Shea, John, “Instrument Relevance in Multivariate Linear Models: A Simple Measure,”

You might also like