0% found this document useful (0 votes)

28 views53 pages

Topic 3 - Endogeneity

Econ 7IE

Uploaded by

saien1moodley5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views53 pages

Topic 3 - Endogeneity

Econ 7IE

Uploaded by

saien1moodley5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

ECON7IE

Topic 3
Endogeneity
WHY IS THIS TOPIC IMPORTANT?
• We commonly need to estimate models where:
– One or more important factors cannot be measured
– Some of the data may be inaccurate
– There are multiple causal relationships, not just X → Y

• These are all examples of the presence of endogeneity

– Its effect on the reliability of regression model results is a key issue in empirical
research

• In this topic, we’ll learn what endogeneity is, how it affects the reliability of OLS results,
and what methods can be used to overcome it

2
Part 1

The Problem of Endogeneity

We consider the case of an endogenous

explanatory variable, which arises when
one of the Classical Linear Regression
Model assumptions is violated.
1.1 DEFINITION OF ENDOGENEITY
• Consider the regression model
𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝑢
• If any 𝑋𝑗 is correlated with 𝑢 for any reason, then:
– 𝑋𝑗 is an endogenous explanatory variable
• Three key statistical / economic reasons why 𝑋𝑗 and 𝑢 may be correlated:
a) Omitted variables that are correlated with 𝑋𝑗
b) Measurement error in 𝑋𝑗
c) Simultaneity (or bi-directional causality) between 𝑋𝑗 and 𝑌

• We will:
– Try to identify sources of endogeneity in models
– Derive expressions for the consequences of endogeneity
– See how we can estimate models to overcome this issue
4
a) Omitted variable
• An important explanatory variable is omitted from the regression
– And it is correlated with any of the included X variables
• Why might a variable be omitted from a model?
• E.g.
𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 = 𝛽1 + 𝛽2 𝑦𝑟𝑠𝑐ℎ𝑜𝑜𝑙 + 𝛽3 𝑎𝑏𝑖𝑙𝑖𝑡𝑦 + ⋯ + 𝑢

b) Measurement error
• An explanatory variable is measured with error i.e. is inaccurate:
– Some variables are inherently difficult to measure e.g. income
– May need to use a proxy when true variable is unavailable
• E.g.
𝑙𝑒𝑖𝑠𝑢𝑟𝑒 𝑡𝑖𝑚𝑒 = 𝛽1 + 𝛽2 ℎℎ𝑖𝑛𝑐𝑜𝑚𝑒 + 𝛽3 𝑚𝑎𝑙𝑒 + ⋯ + 𝑢
5
c) Simultaneity
• One (or more) explanatory variables are jointly determined with Y
– i.e. X affects Y, and Y affects X
• Common in macro models
• Also occurs with many other complex economic processes
• E.g. effect of inflation on trade openness:
𝑜𝑝𝑒𝑛𝑛𝑒𝑠𝑠 = 𝛽1 + 𝛽2 𝑖𝑛𝑓𝑙𝑎𝑡𝑖𝑜𝑛 + 𝛽3 𝑙𝑛𝑝𝑐𝑖𝑛𝑐 + 𝛽4 𝑙𝑛𝑙𝑎𝑛𝑑 + 𝑢
𝑖𝑛𝑓𝑙𝑎𝑡𝑖𝑜𝑛 = 𝛼1 + 𝛼2 𝑜𝑝𝑒𝑛𝑛𝑒𝑠𝑠 + 𝛼3 𝑙𝑛𝑝𝑐𝑖𝑛𝑐 + 𝑢
• Possible to suspect/identify simultaneity even when only given one equation:
– If we suspect feedback from Y to X
• All demand and supply models suffer from simultaneity:
– Equilibrium price and quantity are determined simultaneously
– Through the interaction of demand and supply
6
Class Exercise Question 1
• Identify the source of endogeneity related to the first X variable in each
of the following models:
a) Omitted variables that are correlated with 𝑋𝑗
b) Measurement error in 𝑋𝑗
c) Simultaneity (or bi-directional causality) between 𝑋𝑗 and 𝑌
• In some cases, more than one source may apply!

1. 𝑚𝑢𝑟𝑑𝑒𝑟 𝑟𝑎𝑡𝑒 = 𝛽1 + 𝛽2 𝑝𝑜𝑙𝑖𝑐𝑒 + 𝛽3 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝑢

2. 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑑 = 𝛽1 + 𝛽2 𝑖𝑚𝑚𝑖𝑔_𝑠𝑡𝑎𝑡𝑢𝑠 + 𝛽3 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 + 𝑢
3. ℎ𝑒𝑎𝑙𝑡ℎ_𝑠𝑡𝑎𝑡𝑢𝑠 = 𝛽1 + 𝛽2 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝛽3 𝑎𝑔𝑒 + ⋯ + 𝑢
4. 𝑔𝑟𝑜𝑤𝑡ℎ = 𝛽1 + 𝛽2 𝑖𝑛𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑎𝑙_𝑞𝑢𝑎𝑙𝑖𝑡𝑦 + 𝛽3 𝑐𝑎𝑝𝑖𝑡𝑎𝑙 + 𝛽4 𝑙𝑎𝑏𝑜𝑢𝑟 + 𝑢
5. 𝑙𝑛ℎ𝑤𝑎𝑔𝑒 = 𝛽1 + 𝛽2 𝑦𝑟𝑠𝑐ℎ𝑜𝑜𝑙 + 𝛽3 𝑒𝑥𝑝 + ⋯ + 𝑢
6. 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦𝑇𝑉𝑠 = 𝛽1 + 𝛽2 𝑝𝑟𝑖𝑐𝑒𝑇𝑉𝑠 + 𝛽3 𝑖𝑛𝑐𝑜𝑚𝑒 + ⋯ + 𝑢
7
1.2 SUMMARY THUS FAR
• Endogeneity is present in a lot of models

• We need to be able to:

– Explain its source ✓
– Understand its effect on our ability to estimate reliable parameters
– Correct any resulting econometric problems

8
1.3 STANDARD ASSUMPTIONS FOR THE
CLASSICAL LINEAR REGRESSION MODEL (CLRM)
• These assumptions are required:
– For OLS estimators to be unbiased estimators of population parameters

• Assumptions relate to statistical properties of estimators:

– Somewhat abstract!
– Describe properties of estimators when random sampling is done repeatedly
– Have nothing to do with a particular sample
– i.e. not meaningful to discuss properties of estimates obtained from a single sample

9
• Assumption CLRM1:
– The model is linear in the parameters
• Assumption CLRM2:
– The dataset is a random sample drawn from the population
• Assumption CLRM3:
– There is no perfect multicollinearity
• Assumption CLRM4:
– The error terms must be uncorrelated with all the X variables
– i.e. there is no endogeneity

• When CLRM4 holds: we have exogenous explanatory variables

• But if any 𝑋𝑗 is correlated with 𝑢 for any reason, then 𝑋𝑗 is an endogenous explanatory
variable

10
Assumption CLRM4: Zero conditional mean
𝐸 𝑢|𝑋2 , 𝑋3 , … , 𝑋𝑘 = 0
or
𝑐𝑜𝑣 𝑢, 𝑋𝑗 = 0 , 𝑗 = 2, … , 𝑘
• CLRM4 is more likely to hold when fewer factors are in the error term
– i.e. When the model is better specified
• BUT CLRM4 can fail due to three sources discussed previously
• We cannot know for sure whether the average value of the unobserved factors is
unrelated to the explanatory variables.
• But this is the most important assumption:

Exogeneity is the key assumption to enable a causal interpretation

of the regression results
WHY?
11
1.4 RESULT: CONSISTENCY OF OLS
Under assumptions CLRM1-CLRM4:
OLS estimator 𝑏𝑗 is consistent for 𝛽𝑗 for all 𝑗 = 2, … , 𝑘

• What is consistency?
– It is an asymptotic or large sample property
– Let 𝑏𝑗 be the OLS estimator of 𝛽𝑗 for some j.
– For each N, 𝑏𝑗 has a probability distribution (representing its possible values in
different random samples of size N).
– If this estimator is consistent, then the distribution of 𝑏𝑗 becomes more and more
tightly distributed around 𝛽𝑗 as the sample size grows.
– As N tends to infinity, the distribution of 𝑏𝑗 collapses to the single point 𝛽𝑗 :

Say: 𝛽𝑗 is the
𝑝𝑙𝑖𝑚 𝑏𝑗 = 𝛽𝑗
probability limit of 𝑏𝑗
12
Fig C3. Sampling distributions of 𝑏𝑗 for increasing
sample sizes

𝑓(𝑏𝑗 )
N = 40

N = 16

N=4

𝛽𝑗 𝑏𝑗
Source: Wooldridge (2013) 13
Why does consistency matter?
• Virtually all economists agree:
– consistency is a minimal requirement for an estimator

• Nobel Prize-winning econometrician Clive W. J. Granger:

– “If you can’t get it right as N goes to infinity, you shouldn’t be in this business.”

14
Showing the consistency of OLS
• In general, we need matrix algebra to show this.
• But, we can illustrate it for a simple model with a single X
• The formula (estimator) for the slope coefficient is given by:
σ𝑁 ത
𝑖=1(𝑋𝑖2 − 𝑋2 ) 𝑌𝑖
𝑏2 = 𝑁
σ𝑖=1(𝑋𝑖2 − 𝑋ത1 )2
• Substituting 𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖2 + 𝑢𝑖 and rearranging gives:
1 𝑁
σ𝑖=1(𝑋𝑖2 − 𝑋ത2 ) 𝑢𝑖
𝑏2 = 𝛽2 + 𝑁
1 𝑁
σ𝑖=1(𝑋𝑖2 − 𝑋ത2 )2
𝑁
• Law of large numbers, the numerator and denominator converge in probability to
𝑐𝑜𝑣(𝑋2 , 𝑢) and 𝑣𝑎𝑟 𝑋2 , i.e.
𝑐𝑜𝑣(𝑋2 , 𝑢) CLRM4
𝑝𝑙𝑖𝑚 𝑏2 = 𝛽2 +
𝑣𝑎𝑟 𝑋2
= 𝛽2 because 𝑐𝑜𝑣 𝑋2 , 𝑢 = 0 15
1.5 CONSEQUENCE OF VIOLATING ASSUMPTION CLRM4
• Given that:
𝑐𝑜𝑣 𝑋2 , 𝑢
𝑝𝑙𝑖𝑚 𝑏2 = 𝛽2 + 1.1
𝑣𝑎𝑟 𝑋2
• Then the inconsistency (or asymptotic bias) is:
𝑐𝑜𝑣(𝑋2 , 𝑢)
𝑝𝑙𝑖𝑚 𝑏2 − 𝛽2 =
𝑣𝑎𝑟 𝑋2
If 𝑐𝑜𝑣 𝑋2 , 𝑢 = 0 OLS is consistent and unbiased
If 𝑐𝑜𝑣 𝑋2 , 𝑢 < 0 OLS is inconsistent and biased downwards
If 𝑐𝑜𝑣 𝑋2 , 𝑢 > 0 OLS is inconsistent and biased upwards

• If the covariance is small, the inconsistency might be negligible

– But we cannot estimate 𝑐𝑜𝑣 𝑋2 , 𝑢 since 𝑢 is unobserved
• We need to use our knowledge of the relationship being estimated 16
• We will examine each of the three potential causes of endogeneity
• i.e. of violating assumption CLRM4
1. Omitted variables
2. Measurement error
3. Bidirectional causality (simultaneity)

• We will look at:

– Why is 𝑢 correlated with 𝑋𝑗 in each case?
– What is the nature of the resulting asymptotic bias in each case?
– What is the general econometric method of solving the endogeneity issue?
• Instrumental variables

17
2. OMITTED VARIABLES
• Suppose the true model is:
𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝑢
• but instead we estimate:
𝑌 = 𝑏1 + 𝑏2 𝑋2 + 𝑣
– E.g. 𝑌 is earnings, 𝑋2 is years of education, and 𝑋3 is ability
– Does 𝑏2 measure the true return to education, 𝛽2 ?
• From eq.(1.1):
𝑐𝑜𝑣(𝑋2 , 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙)
𝑝𝑙𝑖𝑚 𝑏2 = 𝛽2 +
𝑣𝑎𝑟 𝑋2
𝑐𝑜𝑣(𝑋2 , 𝛽3 𝑋3 + 𝑢 )
= 𝛽2 +
𝑣𝑎𝑟 𝑋2
𝑐𝑜𝑣 𝑋2 , 𝛽3 𝑋3 + 𝑐𝑜𝑣 𝑋2 , 𝑢
= 𝛽2 +
𝑣𝑎𝑟 𝑋2
𝑐𝑜𝑣 𝑋2 , 𝑋3
= 𝛽2 + 𝛽3
𝑣𝑎𝑟 𝑋2
18
𝑐𝑜𝑣 𝑋2 , 𝑋3
𝑝𝑙𝑖𝑚 𝑏2 = 𝛽2 + 𝛽3
𝑣𝑎𝑟 𝑋2
• Therefore 𝑏2 is asymptotically unbiased only if either:
➢ 𝛽3 = 0 (i.e. there is no omitted variable), or
➢ 𝑋2 and 𝑋3 are uncorrelated.
• If neither of these two occurs, then b2 is biased and inconsistent,
– direction of asymptotic bias depends on sign of 𝛽3 𝑐𝑜𝑣 𝑋2 , 𝑋3 .
• In the example:
𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑠 = 𝛽1 + 𝛽2 𝑦𝑟𝑠𝑐ℎ𝑜𝑜𝑙 + 𝛽3 𝑎𝑏𝑖𝑙𝑖𝑡𝑦 + 𝑢
– what is the direction of the bias of the return to education, when ability is
unobserved?
• Determining direction of bias is more complex with multiple Xs:
– Depends on their relationships with each other and with the omitted factor

Now try Exercise 3, Question 2! 19

3. MEASUREMENT ERROR
• Suppose that the true model is given by
𝑌 = 𝛽1 + 𝛽2 𝑋2∗ + 𝑢
• But 𝑋2∗ cannot be measured accurately: we only have an imperfect measure 𝑋2
– E.g. 𝑋2∗ is actual income, but 𝑋2 is reported income
• What is the effect on our ability to estimate 𝛽2 ?
• The measurement error in the population is simply
𝑒2 = 𝑋2 − 𝑋2∗
• We make the classical errors-in-variables (CEV) assumption: the measurement error is
uncorrelated with the true (unobserved) 𝑋2∗
• Simplify eq.(1.1) for 𝑝𝑙𝑖𝑚 𝑏2 , using various properties of variance and covariance in this
context, to become:
𝑐𝑜𝑣(𝑋2 , 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙) 𝑣𝑎𝑟 𝑋2∗
𝑝𝑙𝑖𝑚 𝑏2 = 𝛽2 + = 𝛽2
𝑣𝑎𝑟 𝑋2 𝑣𝑎𝑟 𝑋2∗ + 𝑣𝑎𝑟 𝑒2
20
• In the presence of measurement error: ‘Signal’ i.e true information
contained in 𝑋2∗

𝑣𝑎𝑟 𝑋2∗
𝑝𝑙𝑖𝑚 𝑏2 = 𝛽2
𝑣𝑎𝑟 𝑋2∗ + 𝑣𝑎𝑟 𝑒2

‘Noise’ i.e. measurement error

• Therefore, the OLS estimate 𝑏2 is biased towards zero (this is called attenuation bias).
– The larger the degree of measurement error, the greater is the attenuation bias.

• Issue is more complex in models with multiple Xs:

– Generally, measurement error in a single variable causes inconsistency in all
estimators
21
4. SIMULTANEITY
• Simultaneity arises when some of the Xs are jointly determined with
the dependent variable in the same economic model.
– There is bidirectional causality between X and Y
• We should view the equation we are interested in estimating as part of a system of
relationships:
– multiple causal relationships.

• Some examples:
– Models of demand and supply i.e. market equilibrium
• For commodities
• For an input into production e.g. labour
– Models of the macroeconomy
22
Example 1: Demand and supply
• Consider a system of supply and demand for a commodity:
Demand: Q = 1 + 2P + 3Y + u1 (4.1)
Supply: Q = 1 + 2P + u2 (4.2)
• In equilibrium, equate demand and supply:
1 +  2P +  3Y + u1 = 1 +  2P + u2
 2P −  2P = 1 + u2 − (1 +  3Y + u1 )
P ( 2 −  2 ) = 1 − 1 −  3Y + u2 − u1 slope

 − 3 u −u
P= 1 1 − Y+ 2 1 (4.3)
(3)
 2 − 2  2 − 2  2 − 2
intercept error term
• Thus P is a function of u1: i.e. X variable correlated with error term
• P is an endogenous explanatory variable
– It is simultaneously determined with Q
– Cannot meaningfully estimate (4.1) using OLS: 2 will inconsistent. 23
4.1 SIMULTANEITY BIAS

• Demand and supply equations (4.1) and (4.2) are known as structural equations:
– They describe the structure of the economy:
• Derivable from economic theory
• Have a causal interpretation
• In the structural equations:
– Price and quantity are determined simultaneously:
• price affects quantity and quantity affects price
– P and Q are endogenous variables, while Y is exogenous
– Estimation by OLS will lead to biased and inconsistent coefficient estimates
• Explanatory variables are correlated with error term
• Determining the direction of the bias is generally complicated in models with multiple X
variables.
24
Avoiding simultaneity bias
• Equations such as (4.3) are known as reduced form equations:
– Endogenous variables are expressed as a function only of all exogenous variables
(and a constant)
– Can derive a similar equation for Q
• Write the reduced form equations as:
𝑃 = 𝜋11 + 𝜋21 𝑌 + 𝑣1 4.3𝑎
𝑄 = 𝜋12 + 𝜋22 𝑌 + 𝑣2 4.4
• These reduced form equations can be estimated by OLS:
– All the RHS variables are exogenous
• But:
– We don’t care about the values of the 𝜋 parameters
– The parameters of interest are 𝛼1 , 𝛼2 and 𝛼3 , and 𝛽1 and 𝛽2 (from the structural
equations)
25
4.2 IDENTIFICATION OF STRUCTURAL EQUATIONS
• In OLS, we can identify the value of the parameters if:
– each explanatory variable is uncorrelated with the error term
• This condition does not hold when there is endogeneity

• We can sometimes still identify (or consistently estimate) the parameters in a structural
equation
– Similarly for cases of omitted variables or measurement error.

• Do we have enough information to retrieve the original coefficients (𝛼s and 𝛽s) from
the 𝜋s?
– Answer depends on having additional exogenous variables
– i.e. exogenous variables that are not in the equation of interest
26
Three possible situations
1. An equation is unidentified
– We cannot get the structural coefficients from the reduced form estimates
– E.g. the demand equation Q = 1 + 2P + 3Y + u1
– There are no additional exogenous variables in the model

2. An equation is exactly identified

– We can get unique structural form coefficient estimates
– E.g. the supply equation Q = 1 + 2P + u2

3. An equation is over-identified
– More than one set of structural coefficients could be obtained from the reduced form
– Example given later
27
Condition for a structural equation to be identified
• A structural equation satisfies the order condition if:
– number of exogenous variables excluded from the equation is
– at least as large as the number of right-hand side endogenous variables
• This is a necessary (but not sufficient) condition for identification
• The rank condition is a sufficient condition
– but requires matrix algebra: beyond scope of this module
• Express the order condition as:
K – k0  m0
• where K = no. of exogenous variables in the equation system (i.e. overall model)
in total
k0 = no. of exogenous variables in the structural equation
m0 = no. of endogenous variables on RHS of structural equation
28
Demand: Q = 1 + 2P + 3Y + u1 (4.1)
Supply: Q = 1 + 2P + u2 (4.2)

Are each of these structural equations identified?

For the model as a whole: K=
Demand equation: k0 = ; m0 =
K – k0 =

Supply equation: k0 = ; m0 =
K – k0 =

• Therefore we can get unbiased estimates of the parameters in the supply equation
– but not in the demand equation.

29
Example 2: Keynesian macro model
• For a closed economy:
𝐶 = 𝛽1 + 𝛽2 𝑌 + 𝛽3 𝑟 + 𝑢1 4.5
𝐼 = 𝛾1 + 𝛾2 𝑟 + 𝑢2 4.6
𝑌 ≡𝐶+𝐼+𝐺 4.7
• Three equations in the system:
– therefore three endogenous (dependent) variables
• Assume all other variables are exogenous

• Is equation (4.5) identified?

– For the model as a whole: K=
– For equation (4.5): k0 = ; m0 =
– Therefore:

30
• Various issues with such a simple macro model:
1. Difficult to argue that interest rates and government spending are exogenous
2. Model would be estimated with time series data, but is static:
• We expect adjustment lags

• Can adapt the model to deal with issue 2, e.g.

𝐶𝑡 = 𝛽1 + 𝛽2 𝑌𝑡 + 𝛽3 𝑟𝑡 + 𝛽4 𝐶𝑡−1 + 𝑢1
𝐼𝑡 = 𝛾1 + 𝛾2 𝑟𝑡 + 𝛾3 𝑌𝑡−1 + 𝑢2
• Then the lagged values can be treated as exogenous:
– They are referred to as predetermined variables
– Including lags helps with identification (as well as better modelling dynamic
behaviour)

Now try Exercise 3, Question 3.1 and 3.2! 34

Part 2

Estimation in the Presence of Endogeneity:

The use of instrumental variables

We focus on how to address endogeneity,

and various associated statistical tests
5. ESTIMATION: INSTRUMENTAL VARIABLE TECHNIQUE
• Recall:
– We cannot use OLS directly on the structural equations
– Because the endogenous explanatory variable/s are correlated with the errors

• One solution:
– Don’t use the endogenous Xs
– Rather, use some other variables instead
• We want these other variables to be:
– (highly) correlated with the endogenous Xs, but
– NOT correlated with the errors

• They are called INSTRUMENTS (IVs)

33
• Here, we express the use of instruments more formally:
• Consider the equation:
Y1 = 1 + 2X + 3Y2 + u
where X is exogenous and Y2 is endogenous (correlated with u).
• The method of instrumental variables requires that we find a variable Z which is an
instrument for Y2
• Z must be:
1) strongly correlated with Y2
Instrument relevance: corr (Z, Y2 )  0
but
2) not correlated with u
Instrument exogeneity: corr (Z, u) = 0
• If the instrument is good (i.e. satisfies the two conditions above):
– we can use it to consistently estimate the parameters in the equation of interest.
34
5.1 WHERE DO THE INSTRUMENTS COME FROM?
• Depends on the source of endogeneity
• Simultaneity:
– Provided we have a model with multiple equations:
– Instruments are the excluded exogenous variables from other equations
• Including any predetermined variables
• Omitted variable and measurement error:
– More challenging:
• There aren’t additional equations with extra variables
– Need to make an argument for choice of instrument/s, and justify
– Similarly for cases of simultaneity with only one equation
• Panel data often provides instruments from previous time periods
– See Topics 5 and 6 for more information
35
Some examples of instruments: 1

• We want to estimate the causal effect of skipping class on academic performance:

𝑚𝑎𝑟𝑘 = 𝛽1 + 𝛽2 𝑎𝑏𝑠𝑒𝑛𝑡 + 𝛽3 𝑝𝑟𝑒𝑣𝑚𝑎𝑟𝑘𝑠 + 𝛽4 𝑚𝑜𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 + 𝑢
– But motivation is an omitted variable
– We suspect it is correlated with absenteeism

• Proposed IV:
– Use distance between living location and campus as instrument for absent
• Motivation:
– Relevance: longer commute → probability of being absent (e.g. due to transport
problems)
– Exogeneity: distance not expected to be correlated with motivation
36
Some examples of instruments: 2
• We want to estimate the causal effect of education on earnings:
log(𝑤𝑎𝑔𝑒) = 𝛽1 + 𝛽2 𝑦𝑟𝑠𝑐ℎ𝑜𝑜𝑙 + 𝛽3 𝑎𝑏𝑖𝑙𝑖𝑡𝑦 + 𝑢
• Proposed IV 1: Parents’ education
– Relevance: parents’ education is correlated with child’s education in many samples
(true for SA?)
– Exogeneity: but likely to be correlated with child’s ability
• Proposed IV 2: Number of siblings
– Relevance: having more siblings is typically associated with lower education per child
(true for SA?)
– Exogeneity: likely to be uncorrelated with child’s ability
• Need to make similar arguments for measurement error cases
The statistical reliability of the results depends on having good IVs
37
5.2 TWO-STAGE LEAST SQUARES (2SLS)
• Two-stage least squares (2SLS) provides a method for using multiple
instrumental variables.
• 2SLS proceeds as follows:
– Stage 1:
• Regress each endogenous variable that appears on the RHS of the structural
equation on all of its instruments
– In simultaneous equations, this is the reduced form equation
• Predict the value of each endogenous variable, 𝑍መ
– Stage 2:
• Use the predicted value of each endogenous variable in place of the variable
itself
• Standard errors have to be corrected in Stage 2
• Interpret the resulting coefficients and perform hypothesis tests as usual.
38
Stata example
Consider a demand and supply model for a food product:
Demand: Q = 1 + 2P + 3PS + 4INC + u1
Supply: Q = 1 + 2P + 3PF + u2
• Q is quantity; P is price; PS is price of a substitute; INC is per capita income; PF is price of
factor of production
• Endogenous: Q and P; exogenous: PS, INC and PF.
• The demand equation, estimated by OLS:
. regress q p ps inc
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 3, 26) = 8.52
Model | 305.92719 3 101.97573 Prob > F = 0.0004
Residual | 311.209627 26 11.969601 R-squared = 0.4957
-------------+------------------------------ Adj R-squared = 0.4375
Total | 617.136817 29 21.2805799 Root MSE = 3.4597 If price and quantity
------------------------------------------------------------------------------ are simultaneously
q | Coef. Std. Err. t P>|t| [95% Conf. Interval] determined, then this
-------------+----------------------------------------------------------------
p | .0232954 .0768423 0.30 0.764 -.1346562 .181247 coefficient is likely to
ps | .7100395 .2143246 3.31 0.003 .269489 1.15059
inc | .0764442 1.190855 0.06 0.949 -2.371393 2.524282 be biased.
_cons | 1.091045 3.71158 0.29 0.771 -6.538218 8.720308
------------------------------------------------------------------------------
39
. ivregress 2sls q (p = ps inc pf) ps inc, first
First-stage regressions
-----------------------
Number of obs = 30
F( 3, 26) = 69.19
This stage creates an instrument for the Prob > F = 0.0000
R-squared = 0.8887
potentially-endogenous variable, price Adj R-squared = 0.8758
Root MSE = 6.5975
------------------------------------------------------------------------------
p | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ps | 1.708147 .3508806 4.87 0.000 .9869017 2.429393
inc | 7.602491 1.724336 4.41 0.000 4.058068 11.14691
pf | 1.353906 .2985062 4.54 0.000 .7403175 1.967494
_cons | -32.51242 7.984235 -4.07 0.000 -48.92425 -16.10059
------------------------------------------------------------------------------

Instrumental variables (2SLS) regression Number of obs = 30

Wald chi2(3) = 20.43
Prob > chi2 = 0.0001
Stage 2 uses the instrument in R-squared = .
place of price in the regression Root MSE = 4.5895
After dealing with the
------------------------------------------------------------------------------
q | Coef. Std. Err. z P>|z| [95% Conf. Interval] endogeneity, price has
-------------+---------------------------------------------------------------- a significant negative
p | -.3744591 .1533755 -2.44 0.015 -.6750695 -.0738486
ps | 1.296033 .3306669 3.92 0.000 .6479381 1.944128 effect on quantity
inc | 5.013977 2.125875 2.36 0.018 .847339 9.180615
_cons | -4.279471 5.161076 -0.83 0.407 -14.39499 5.836052 demanded
------------------------------------------------------------------------------
Instrumented: p
Instruments: ps inc pf 40
5.3 TESTING FOR INSTRUMENT VALIDITY
• Estimates produced using IV are consistent only when the IV used is valid
• Illustrate properties of IV estimation if Z is a poor IV:
Instrument exogeneity:
should be close to zero
𝑐𝑜𝑟𝑟(𝑍, 𝑢) 𝜎𝑢
𝑝𝑙𝑖𝑚 𝑏2,𝐼𝑉 = 𝛽2 + .
𝑐𝑜𝑟𝑟(𝑍, 𝑋2 ) 𝜎𝑋2
Instrument relevance:
should be large
• If Z is not exogenous: estimates are inconsistent
• If relevance of Z is weak:
– Can have large asymptotic bias (and high std errors)
– Even if 𝑐𝑜𝑟𝑟(𝑍, 𝑢) is small
41
1) Instrument relevance:
• Straightforward to assess:
– Examine the first stage of 2SLS
• Focus on significance of the IV’s, rather than all exogenous variables.
– IVs should be significantly related to the endogenous X:
• Use t-test for one IV, or F-test for multiple IVs
– Rule of thumb: for a single endogenous explanatory variable, the F-statistic in the
first stage should be greater than 10.
. ivregress 2sls q (p = ps inc pf) ps inc, first
First-stage regressions
-----------------------
Number of obs = 30
F( 3, 26) = 69.19
Prob > F = 0.0000
R-squared = 0.8887
Adj R-squared = 0.8758
Root MSE = 6.5975
------------------------------------------------------------------------------
p | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ps | 1.708147 .3508806 4.87 0.000 .9869017 2.429393
inc | 7.602491 1.724336 4.41 0.000 4.058068 11.14691
pf | 1.353906 .2985062 4.54 0.000 .7403175 1.967494
_cons | -32.51242 7.984235 -4.07 0.000 -48.92425 -16.10059
------------------------------------------------------------------------------ 42
2) Instrument exogeneity:
• If the coefficients are exactly identified:
– There is no statistical test for this assumption.
– Researcher must use knowledge and judgement of the research question at hand.

• If equation is over-identified (i.e. extra IVs), can conduct a test

43
Test for over-identifying restrictions
• Suppose that we have q more instruments than we need:
– i.e. q = (K – k0) – (m0) > 0
– Recall that IVs must be excluded exogenous variables
– E.g. one endogenous X (m0 = 1), and three proposed IVs (K – k0 = 3)
• q = 3 – 1 = 2 over-identifying restrictions.
• Then we can test whether the 2SLS residuals are correlated with q linear functions of
the instruments

• Procedure for testing over-identifying restrictions:

1) Estimate structural equation by 2SLS; obtain residuals, 𝑢ො 1 .
2) Regress 𝑢ො1 on all exogenous variables. Obtain 𝑅12 .
3) Test statistic = 𝑛𝑅12 ~𝜒 2 with df = q
2
4) If 𝑛𝑅12 > 𝜒𝑐𝑟𝑖𝑡 , reject 𝐻0 : 𝐼𝑉𝑠 𝑢𝑛𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑒𝑑 𝑤𝑖𝑡ℎ 𝑢ො1
5) Conclude that at least some of the IVs are not exogenous. 44
• Recall that our model is:
Demand: Q = 1 + 2P + 3PS + 4INC + u1
Supply: Q = 1 + 2P + 3PF + u2
• q = (K – k0) – (m0) = (no. of proposed IVs) – (no. of endogenous Xs)
– Demand equation: q = (3-2) – (1) = 0
– Supply equation: q = (3-1) – (1) = 1
. ivregress 2sls q (p = ps inc pf) pf

Instrumental variables (2SLS) regression Number of obs = 30

Wald chi2(2) = 211.69
Prob > chi2 = 0.0000
R-squared = 0.9019
Root MSE = 1.4207

------------------------------------------------------------------------------
q | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p | .3379816 .0236408 14.30 0.000 .2916465 .3843166
pf | -1.000909 .0782929 -12.78 0.000 -1.154361 -.8474581
_cons | 20.0328 1.160349 17.26 0.000 17.75856 22.30704
------------------------------------------------------------------------------
Instrumented: p
Instruments: pf ps inc

. predict u, resid
45
. reg u pf ps inc

Source | SS df MS Number of obs = 30

-------------+---------------------------------- F(3, 26) = 0.47
Model | 3.0948454 3 1.03161513 Prob > F = 0.7080
Residual | 57.4597199 26 2.20998923 R-squared = 0.0511
-------------+---------------------------------- Adj R-squared = -0.0584
Total | 60.5545653 29 2.08808846 Root MSE = 1.4866

------------------------------------------------------------------------------
u | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pf | .0363318 .067262 0.54 0.594 -.1019273 .1745909
ps | .0790798 .0790635 1.00 0.326 -.0834376 .2415971
inc | -.4023461 .3885424 -1.04 0.310 -1.201007 .3963143
_cons | -1.149104 1.799078 -0.64 0.529 -4.847162 2.548953
------------------------------------------------------------------------------

• Then 𝑛𝑅2 = 30 ∗ 0. 0511 = 1.533

2
• 𝜒𝑐𝑟𝑖𝑡 𝛼 = 0.05; 𝑑𝑓 = 𝑞 = 1 = 3.841
2
• 𝑛𝑅2 < 𝜒𝑐𝑟𝑖𝑡 , therefore cannot reject 𝐻0
• Therefore the instruments used are exogenous.

Now try Exercise 3, Question 3.3! 46

5.4 TESTING FOR ENDOGENEITY
• It is ‘costly’ to use IV if there is no endogeneity:
– IV is less efficient (has larger standard errors) than OLS.

• Statistical Properties of OLS and IV:

Endogeneity No endogeneity
OLS Inconsistent Consistent and efficient
IV Consistent Consistent but inefficient

• In the presence of endogeneity:

– Only IV is consistent
– BUT may have bias in small samples
• Recall: consistency is an asymptotic property
47
A. Regression-based Test
• Consider the equation:
Y1 = 1 + 2X + 3Y2 + u
where X is exogenous and Y2 may be endogenous.
• Estimate the reduced form equation for Y2
– i.e. regress Y2 on all the truly exogenous variables
– and obtain the residuals, e.

• Now include these residuals in the model of interest:

Y1 = 1 + 2X + 3Y2 + θe + u
• Hypotheses: H0: θ = 0, i.e. Y2 is exogenous
H1: θ  0, i.e. Y2 is endogenous
• Thus a standard t-test on the coefficient on e in the above regression:
– constitutes a test of the null hypothesis of Y2 being exogenous.
48
B. Hausman Test
• Estimate the model by both OLS and IV:
– Compare (statistically) the coefficient values and their variances.

• H0: no endogeneity bias (both OLS and IV estimators will be consistent, but
OLS is more efficient)
• H1: endogeneity (only IV will be consistent – the difference between the OLS and IV
coefficients will not converge to zero as n → )

• If there is a systematic difference in the OLS and IV estimates:

– the explanatory variable/s is/are endogenous.
• The test statistic is based on the differences between all of the coefficients:
– follows a chi-squared distribution (with df = number of instrumented variables).

49
Stata example
A. Regression-based test:
To test whether price is endogenous in the demand equation, estimate the
reduced form equation for price, then include its residuals in the demand equation:
reduced form equation: regress the potentially
. reg p ps inc pf
endog var, p, on all exog vars in the model
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 3, 26) = 69.19
Model | 9034.77551 3 3011.59184 Prob > F = 0.0000
Residual | 1131.69721 26 43.5268157 R-squared = 0.8887
-------------+------------------------------ Adj R-squared = 0.8758
Total | 10166.4727 29 350.568025 Root MSE = 6.5975

------------------------------------------------------------------------------
p | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ps | 1.708147 .3508806 4.87 0.000 .9869017 2.429393
inc | 7.602491 1.724336 4.41 0.000 4.058068 11.14691
pf | 1.353906 .2985062 4.54 0.000 .7403175 1.967494
_cons | -32.51242 7.984235 -4.07 0.000 -48.92425 -16.10059
------------------------------------------------------------------------------
predict the residuals from
. predict e, resid
the reduced form equation 50
include the residuals as an extra
. regress q p ps inc e
variable in the demand equation
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 4, 25) = 60.88
Model | 559.677099 4 139.919275 Prob > F = 0.0000
Residual | 57.4597181 25 2.29838873 R-squared = 0.9069
-------------+------------------------------ Adj R-squared = 0.8920
Total | 617.136817 29 21.2805799 Root MSE = 1.516

------------------------------------------------------------------------------
q | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p | -.3744591 .0506639 -7.39 0.000 -.4788032 -.2701149
ps | 1.296033 .1092277 11.87 0.000 1.071074 1.520992
inc | 5.013977 .702231 7.14 0.000 3.567705 6.460249
e | .7124655 .0678067 10.51 0.000 .5728149 .852116
_cons | -4.279471 1.704836 -2.51 0.019 -7.790645 -.7682958
------------------------------------------------------------------------------

p-value on residuals = 0
Reject H0 at all levels of
significance
• Therefore reject H0: θ = 0 (p is exogenous)
• Therefore price is endogenous in the demand equation.
51
B. Hausman test:
Command for the Hausman test,
. hausman IV OLS, cons sigmamore
comparing the two sets of estimates
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| IV OLS Difference S.E.
-------------+----------------------------------------------------------------
p | -.3744591 .0232954 -.3977545 .0863877
ps | 1.296033 .7100395 .5859938 .1272711
inc | 5.013977 .0764442 4.937533 1.072376
_cons | -4.279471 1.091045 -5.370516 1.166414
------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from ivregress
B = inconsistent under Ha, efficient under Ho; obtained from regress
Test: Ho: difference in coefficients not systematic

chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 21.20
Prob>chi2 = 0.0000
Reject H0 at all levels
of significance

• H0: no endogeneity bias

• Therefore endogeneity does exist in the demand equation:
– We must estimate the equation using IV, not OLS.
52
6. CONCLUSION

• Endogeneity is one of the key issues in empirical econometrics:

– It violates an assumption that is required to have unbiased, consistent estimators
– It means that relationships can no longer be interpreted as causal

• The way in which endogeneity is discussed and dealt with is a crucial determinant of:
– Reliability of empirical estimates
– Whether an empirical paper is published
– Success of empirical dissertations for advanced degrees

• In this topic, we’ve gone through some key tools for dealing with this issue:
– It remains a complex conceptual and empirical issue which is difficult grapple with.

Tabel. Durbin Watson
No ratings yet
Tabel. Durbin Watson
112 pages
Econometrics
No ratings yet
Econometrics
320 pages
CH 07 Specification and Data Issues TQT
No ratings yet
CH 07 Specification and Data Issues TQT
45 pages
CLRM Assumptions
No ratings yet
CLRM Assumptions
24 pages
CRD Subsample
100% (3)
CRD Subsample
34 pages
Econometrics 2
No ratings yet
Econometrics 2
135 pages
Chapter 2 - Simple Linear Regression Function
100% (1)
Chapter 2 - Simple Linear Regression Function
49 pages
CHAPTER 3 Multiple Linear Regression
No ratings yet
CHAPTER 3 Multiple Linear Regression
35 pages
Chapter 17 - Logistic Regression
No ratings yet
Chapter 17 - Logistic Regression
32 pages
chp2 Econometric
No ratings yet
chp2 Econometric
54 pages
Lecture 2 Simple Regression Model
100% (1)
Lecture 2 Simple Regression Model
47 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Generalised Linear Models and Bayesian Statistics
No ratings yet
Generalised Linear Models and Bayesian Statistics
35 pages
Violation of (Weak) Exogeneity Assumption
No ratings yet
Violation of (Weak) Exogeneity Assumption
34 pages
Applied Data Analytics II (Informatics) IK1024.4
100% (1)
Applied Data Analytics II (Informatics) IK1024.4
7 pages
TCH442E Quantitative Methods For Finance: Last Lecture: Next
No ratings yet
TCH442E Quantitative Methods For Finance: Last Lecture: Next
13 pages
Lec2 Ase Iev
No ratings yet
Lec2 Ase Iev
32 pages
Week 2 - The Simple Linear Regression Model PDF
No ratings yet
Week 2 - The Simple Linear Regression Model PDF
47 pages
2 Regression With Multiple Regressors 1
No ratings yet
2 Regression With Multiple Regressors 1
22 pages
Metrics WT 2023-24 Unit11 Endogeneity
No ratings yet
Metrics WT 2023-24 Unit11 Endogeneity
36 pages
Chapter 2 Simultaneous Equation Models New
No ratings yet
Chapter 2 Simultaneous Equation Models New
15 pages
Tema I (Mínimos Cuadrados Ordinarios)
No ratings yet
Tema I (Mínimos Cuadrados Ordinarios)
49 pages
2 - Model Linear Jamak Dan OLS
No ratings yet
2 - Model Linear Jamak Dan OLS
11 pages
Set Domande Econometria 2
No ratings yet
Set Domande Econometria 2
19 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
Theme 2 Ordinary Least Squares Regression
No ratings yet
Theme 2 Ordinary Least Squares Regression
10 pages
Econ20222 MJAbackgr
No ratings yet
Econ20222 MJAbackgr
164 pages
Econometrics Chapter 3
No ratings yet
Econometrics Chapter 3
24 pages
Instrumental PDF
No ratings yet
Instrumental PDF
69 pages
Assignment3SolNew Fall2024
No ratings yet
Assignment3SolNew Fall2024
9 pages
Session-Classical Assumption
No ratings yet
Session-Classical Assumption
26 pages
Violations of Gauss Markov Assumptions: Omitted Variable Bias
No ratings yet
Violations of Gauss Markov Assumptions: Omitted Variable Bias
10 pages
Ordinary Least Squares: Rómulo A. Chumacero
No ratings yet
Ordinary Least Squares: Rómulo A. Chumacero
50 pages
Week 10
No ratings yet
Week 10
42 pages
Cap2 Slides (1-14)
No ratings yet
Cap2 Slides (1-14)
14 pages
Econometria 2
No ratings yet
Econometria 2
16 pages
(2021) EC6041 Lecture 2 CLRM
No ratings yet
(2021) EC6041 Lecture 2 CLRM
30 pages
Metrics Topic6 Part1 Multipleregression
No ratings yet
Metrics Topic6 Part1 Multipleregression
33 pages
PEV Onesided
No ratings yet
PEV Onesided
322 pages
Topic 4 Multiple Regression - Estimation
No ratings yet
Topic 4 Multiple Regression - Estimation
23 pages
Econometrics: Specification Errors: Burcu Eke
No ratings yet
Econometrics: Specification Errors: Burcu Eke
35 pages
Econometrics
No ratings yet
Econometrics
13 pages
Wooldridge Notes
No ratings yet
Wooldridge Notes
15 pages
Financial Econometrics Lecture 4
No ratings yet
Financial Econometrics Lecture 4
41 pages
Ordinary Least Squares-2
No ratings yet
Ordinary Least Squares-2
31 pages
Econometrics II: Revision Class: Introduction To Econometrics
No ratings yet
Econometrics II: Revision Class: Introduction To Econometrics
55 pages
Lecture 11 - Stochastic Regressors Measurement Errors
No ratings yet
Lecture 11 - Stochastic Regressors Measurement Errors
6 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
Eh426 At4 2024 Iv
No ratings yet
Eh426 At4 2024 Iv
28 pages
Multicollinearity and Endogeneity PDF
No ratings yet
Multicollinearity and Endogeneity PDF
37 pages
Introduction To Econometrics (ET2013) : Teresa Randazzo
No ratings yet
Introduction To Econometrics (ET2013) : Teresa Randazzo
30 pages
FIRSA AULIA RAHMAN/B200154011/R:) ) ) E (X - (X E ( ) ) ) E (X - (X) ) E ( - E ( ( ) X, Cov (
No ratings yet
FIRSA AULIA RAHMAN/B200154011/R:) ) ) E (X - (X E ( ) ) ) E (X - (X) ) E ( - E ( ( ) X, Cov (
6 pages
Econ 4
No ratings yet
Econ 4
92 pages
Econometrics: Damodar Gujarati
No ratings yet
Econometrics: Damodar Gujarati
36 pages
CH-15 - IInd Sem 23-24
No ratings yet
CH-15 - IInd Sem 23-24
99 pages
Chapter 2
No ratings yet
Chapter 2
17 pages
Omitted Variable Bias C-T 4.7
No ratings yet
Omitted Variable Bias C-T 4.7
6 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Multiple Linear Regression Model - Final
No ratings yet
Multiple Linear Regression Model - Final
16 pages
Height and Basketball True Shooting Percentage
No ratings yet
Height and Basketball True Shooting Percentage
15 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Abadie2010 - Synthetic Control Methods
No ratings yet
Abadie2010 - Synthetic Control Methods
14 pages
Midterm Exam 1 - Specimen Paper - v4 - With Solution
No ratings yet
Midterm Exam 1 - Specimen Paper - v4 - With Solution
7 pages
DOE Mentos and Soda Final Paper
No ratings yet
DOE Mentos and Soda Final Paper
9 pages
3-Econometrics-Linear Regression
No ratings yet
3-Econometrics-Linear Regression
13 pages
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
No ratings yet
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
8 pages
Topic 6 - FE, RE and Tests
No ratings yet
Topic 6 - FE, RE and Tests
46 pages
Point Estimation: Institute of Technology of Cambodia
No ratings yet
Point Estimation: Institute of Technology of Cambodia
22 pages
A Second Course in Statistics: 8th Regression Analysis, Edition, William
No ratings yet
A Second Course in Statistics: 8th Regression Analysis, Edition, William
407 pages
Those Who Do Not Remember The Past Are Condemned To Repeat It George Santayana Spanish Philosopher, Poet and Novelist (1863-1952)
No ratings yet
Those Who Do Not Remember The Past Are Condemned To Repeat It George Santayana Spanish Philosopher, Poet and Novelist (1863-1952)
32 pages
A Beginner's Guide To Variational Inference
No ratings yet
A Beginner's Guide To Variational Inference
48 pages
Analysis of Factors Affecting The Company's Debt Policy With Pecking Order Theory in Wholesale and Retail Companies in Indonesia
No ratings yet
Analysis of Factors Affecting The Company's Debt Policy With Pecking Order Theory in Wholesale and Retail Companies in Indonesia
5 pages
Module 4 - Chapter 2
No ratings yet
Module 4 - Chapter 2
14 pages
Topic 1 Class Exercises
No ratings yet
Topic 1 Class Exercises
5 pages
Intro To Hypothesis Testing
No ratings yet
Intro To Hypothesis Testing
69 pages
Hypothesis Tests Regarding A Parameter The Language of Hypothesis Testing
No ratings yet
Hypothesis Tests Regarding A Parameter The Language of Hypothesis Testing
93 pages
Stock Watson 3U ExerciseSolutions Chapter10 Instructors
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter10 Instructors
12 pages
Comparing Groups For Statistical Differences - How To Choose The Right Statistical Test - Biochemia Medica
No ratings yet
Comparing Groups For Statistical Differences - How To Choose The Right Statistical Test - Biochemia Medica
8 pages
EDUC 203 Curriculum Development and Design
No ratings yet
EDUC 203 Curriculum Development and Design
9 pages
Fds Unit 4 FINSH
No ratings yet
Fds Unit 4 FINSH
37 pages
WEEK 7 - Two Sample Mean Test
No ratings yet
WEEK 7 - Two Sample Mean Test
26 pages
Logistic Regression Playbook
No ratings yet
Logistic Regression Playbook
19 pages
Kappa Size
No ratings yet
Kappa Size
23 pages
Correlation Coefficients Appropriate Use And.50
No ratings yet
Correlation Coefficients Appropriate Use And.50
6 pages
Assignment04 Math 215
No ratings yet
Assignment04 Math 215
14 pages
Chi-Square: Reno Phillip Bactong
No ratings yet
Chi-Square: Reno Phillip Bactong
3 pages
National University of Singapore ST5215: Advanced Statistical Theory (I)
No ratings yet
National University of Singapore ST5215: Advanced Statistical Theory (I)
3 pages
ECMT1010 Final Formula
No ratings yet
ECMT1010 Final Formula
2 pages
Topic 2 Class Exercises
No ratings yet
Topic 2 Class Exercises
2 pages
Formula Sheet
No ratings yet
Formula Sheet
2 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Precalculus: A Self-Teaching Guide
From Everand
Precalculus: A Self-Teaching Guide
Steve Slavin
4.5/5 (5)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Topic 3 - Endogeneity

Uploaded by

Topic 3 - Endogeneity

Uploaded by

ECON7IE

• These are all examples of the presence of endogeneity

The Problem of Endogeneity

We consider the case of an endogenous

1. 𝑚𝑢𝑟𝑑𝑒𝑟 𝑟𝑎𝑡𝑒 = 𝛽1 + 𝛽2 𝑝𝑜𝑙𝑖𝑐𝑒 + 𝛽3 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝑢

• We need to be able to:

• Assumptions relate to statistical properties of estimators:

• When CLRM4 holds: we have exogenous explanatory variables

Exogeneity is the key assumption to enable a causal interpretation

• Nobel Prize-winning econometrician Clive W. J. Granger:

• If the covariance is small, the inconsistency might be negligible

• We will look at:

Now try Exercise 3, Question 2! 19

‘Noise’ i.e. measurement error

• Issue is more complex in models with multiple Xs:

2. An equation is exactly identified

Are each of these structural equations identified?

• Is equation (4.5) identified?

• Can adapt the model to deal with issue 2, e.g.

Now try Exercise 3, Question 3.1 and 3.2! 34

Estimation in the Presence of Endogeneity:

We focus on how to address endogeneity,

• They are called INSTRUMENTS (IVs)

• We want to estimate the causal effect of skipping class on academic performance:

Instrumental variables (2SLS) regression Number of obs = 30

• If equation is over-identified (i.e. extra IVs), can conduct a test

• Procedure for testing over-identifying restrictions:

Instrumental variables (2SLS) regression Number of obs = 30

Source | SS df MS Number of obs = 30

• Then 𝑛𝑅2 = 30 ∗ 0. 0511 = 1.533

Now try Exercise 3, Question 3.3! 46

• Statistical Properties of OLS and IV:

• In the presence of endogeneity:

• Now include these residuals in the model of interest:

• If there is a systematic difference in the OLS and IV estimates:

• H0: no endogeneity bias

• Endogeneity is one of the key issues in empirical econometrics:

You might also like