0% found this document useful (0 votes)

150 views22 pages

Endogeneity and Instrumental Variables

This document discusses instrumental variables and their use in regression analysis. It defines instrumental variables as variables that are correlated with endogenous regressors but uncorrelated with the error term. This allows instrumental variables to isolate the portion of the endogenous regressor that is uncorrelated with the error. The document also discusses issues like weak instruments, overidentification, and tests for instrumental variable validity like the J-test, Sargan test, and Durbin-Wu-Hausman test. Examples from studies by Angrist and Krueger and Bound, Jaeger, and Baker are provided to illustrate instrumental variables estimation.

Uploaded by

justin bal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

150 views22 pages

Endogeneity and Instrumental Variables

Uploaded by

justin bal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Endogeneity and

Instrumental
Variables
Presented by Justin Balthrop
September 28, 2015

What Exactly Are Instrumental

Variables?

Take the simple case of ordinary least squares regression with a single
explanatory variable:

A fundamental assumption of estimating _1 is that the correlation between

X and u is zero.

If this is not the case (X is endogenous), then using instrumental variables Z

can essentially detect the part of X which is *not* correlated with the error
term.

An Instrument must satisfy

Relevance and Exclusion

Loosely speaking, the relevance restriction imposes that the instrument Z is

non-trivially related to the endogenous regressor X.

Exclusion, or exogeneity requires that Z not be systematically related to the

error term, u.

Why do we need IV?

Before we worry about external validity and the big picture implications of
our results, we need to satisfy internal validity.

Three main sources of internal validity issues are:

Omitted variables bias
Simultaneity bias
Errors-in-variables bias

Appropriately instrumenting for endogenous regressors can eliminate these

biases

Univariate IV Estimation: Two Stage

Least Squares

Stage 1: Identify the portion of X that is uncorrelated with the error, u

This gives estimates of _0 and _1, which are used to get predicted X values:

Stage 2: Replace X values with estimated X

Underlying Assumptions of 2SLS

Instrument validity

_0 and _1 are well-estimated in the first stage (large samples)

Why it works:

Careful about Standard Errors

Second-stage OLS standard errors are not correct

They need to be adjusted for the fact that the explanatory variables are estimated
See Woolridge for the math, STATA for the code- ivreg, robust

Other considerations:

Heteroskedasticity
Appropriate clustering
Instrument relevance- more relevance lower estimator variance and higher Rsquared in the first stage

IV Regression in a Multivariate Model

Aside from messy algebra, estimation generalizes rather easily

Key identification criterion: at least as many Z as endogenous X

Dont forget to instrument for interactions between endogenous X

Underidentified = too few to estimate _vec (correctly, anyway)
Exactly identified = equal number of Z and endogenous X
Overidentified = too many instruments

Testing for Instrument Relevance

Assume one endogenous X

First stage regress is therefore:

Relevance comes from at least one _i different from zero

If not, the instrument is weak

Why are weak instruments so bad?

Back to the simple model:

With estimator:

Weak instruments leads to a near-zero denominator, and the resulting

sampling distribution cannot be accurately approximated by its asymptotic
distribution

Measuring the Strength of Instruments

The first-stage F-test

Tests the hypothesis that instruments Z_i do not enter the first-stage regression
Small F-stat (less than 10) are the result of weak instruments

If the set of instruments is weak, get better instruments

If that is impossible, consider dropping the weakest to improve the first-stage F

This is somewhat ad-hoc

Too many Instruments = Tests for

Overidentifying Restrictions

Assume we have multiple valid instruments with a single endogenous

regressor.

Intuition: If we perform 2SLS using both instruments separately and arrive

at completely different results, it shouldnt be that both instruments are
valid.

Statistics: J-test

J-Test of Overidentifying Restrictions

Step 1: estimate the conditional expectation function using TSLS and both instruments

Step 2: Compute predicted Y values using the actual Xs

Step 3: Compute residuals

Step 4: Regress residuals against all instruments Z and exogenous regressors X

Step 5: Test the hypothesis that all coefficients on Z_i are zero, with J-statistic J= mF

Here, F is the F-stat from testing coefficients on Z_i

If some instruments are exogenous and others endogenous, J-stat will be large, rejecting the
null that all instruments are exogenous

Sargan Test for Overidentification

1. Estimate the 2SLS IV regression - Extract residuals
2. Regress these residuals on all exogenous variables and extract R2
3. Calculate nR2 which is 2 distributed
4. Compare the value with the critical value in the chi-square table with
degrees of freedom equal to # instruments less #

If the statistic (nR2) exceeds the critical 2 value, conclude the instruments
are invalid.
They are not uncorrelated with the error term and hence has some explanatory
power in the main equation.
Be very careful: The test assumes that one instrument is valid.

If all instruments do not fulfill the criteria Cov(zi,ui) = 0, then the test might
suggest that the instruments are valid, even when they are not

Durbin-Wu-Hausman Test

Balances the consistency of IV against the efficiency of LS

H0: IV and LS both consistent, but LS is efficient
H1: Only IV is consistent

DWH test for a single endogenous regressor:

DWH = (bIV bLS) / (s2bIV s2bLS) ~ N(0,1)

If |DWH| > 1.96, then X is endogenous and IV is the preferred estimator despite its
inefficiency

A roughly equivalent procedure for DWH:

1. Estimate the first-stage model

2. Include the first-stage residual in the structural model along with the endogenous X
3. Test for significance of the coefficient on residual

Note: Coefficient on endogenous X in this model is b IV (standard error is

smaller, though)
First-stage residual is a generated regressor

The

following example is
taken from the University of
Albany Center for Social and
Demographic Analysis
presentation on IV
Estimation

Angrist and Krueger (1991), J.L.E.

Returns to education (Y = wages)

Problem of omitted ability bias

Years of schooling vary by quarter of birth

Compulsory schooling laws, age-at-entry rules

Someone born in Q1 is a little older and will be able to drop out sooner than someone born in Q4

Q.O.B. can be treated as a useful source of exogeneity in schooling

Angrist and Krueger (1991), J.L.E.

People born in Q1 do obtain less

schooling

But pay close attention to the scale of

the y-axis
Mean difference between Q1 and Q4
is only 0.124, or 1.5 months

So...need large N since R2X,Z will

be very small
A&K had over 300k for the 1930-39
cohort

Source: Angrist and Krueger (1991), Figure I

Angrist and Krueger (1991),

J.L.E.

Final 2SLS model interacted QOB with year of birth (30), state of birth (150)
OLS: b = .0628 (s.e. = .0003)
2SLS: b = .0811 (s.e. = .0109)

Least squares estimate does not appear to be badly biased by omitted variables
But...replication effort identified some pitfalls in this analysis that are instructive

Bound, Jaeger, and Baker (1995), J.A.S.A.

Potential problems with QOB as an IV

Correlation between QOB and schooling is weak

Small Cov(X,Z) introduces finite-sample bias, which will be exacerbated with the inclusion of many IVs

QOB may not be completely exogenous

Even small Cov(Z,e) will cause inconsistency, and this will be exacerbated when Cov(X,Z) is small

QOB qualifies as a weak instrument that may be correlated with unobserved determinants of
wages (e.g., family income)

Bound, Jaeger, and Baker (1995), J.A.S.A.

Even if the instrument is good, matters can be made far worse with IV as opposed to LS
Weak correlation between IV and endogenous regressor can pose severe finite-sample bias

Andreally large samples wont help, especially if there is even weak endogeneity between IV and error

First-stage diagnostics provide a sense of how good an IV is in a given setting

F-test and partial-R2 on IVs

Lewbel (2012) Method of Identification

Mostly applicable to models with an unobserved common factor

Identification is achieved by having regressors that are uncorrelated with the product of
heteroskedastic errors

ConsiderY1,Y2 as observed endogenous variables, X a vector of observed exogenous

regressors, and =(1,2) as unobserved error processes.

Consider a structural model of the form:

Y1 = X1+Y21+1 (1)
Y2 = X2+Y12+2 (2)

Higher-moment considerations (restricting correlations of with X);

In the presence of heteroskedasticity related to at least some elements of X,
identification can be achieved.

CH - 4 - Application To Time Series and Panel Data in Stata
No ratings yet
CH - 4 - Application To Time Series and Panel Data in Stata
40 pages
Chapter 15
No ratings yet
Chapter 15
38 pages
Lab Introduction To STATA
100% (1)
Lab Introduction To STATA
27 pages
CH 15
No ratings yet
CH 15
21 pages
Professional Practice
No ratings yet
Professional Practice
19 pages
STAT0013 Introductory Slides
No ratings yet
STAT0013 Introductory Slides
126 pages
Panel Analysis - April 2019 PDF
100% (1)
Panel Analysis - April 2019 PDF
303 pages
CH - 2 - Application To Univariate and Bivariate Analysis in Stata
No ratings yet
CH - 2 - Application To Univariate and Bivariate Analysis in Stata
32 pages
Chapter 2 - Strategic Training
100% (1)
Chapter 2 - Strategic Training
53 pages
David Michie - Why Mindfulness Is Better Than Chocolate (Extract)
100% (4)
David Michie - Why Mindfulness Is Better Than Chocolate (Extract)
21 pages
FN3026 Introduction
No ratings yet
FN3026 Introduction
10 pages
How To Do Xtabond2: An Introduction To "Difference" and "System" GMM in Stata by David Roodman
No ratings yet
How To Do Xtabond2: An Introduction To "Difference" and "System" GMM in Stata by David Roodman
45 pages
DDD Analysis
No ratings yet
DDD Analysis
21 pages
Materi GMM Panel Data
No ratings yet
Materi GMM Panel Data
11 pages
Unit1 (Integrationcurriculum) - Millie Tapia
50% (4)
Unit1 (Integrationcurriculum) - Millie Tapia
16 pages
Hope 3 Q2 - Module 1
No ratings yet
Hope 3 Q2 - Module 1
28 pages
Chapter 8 Managing Interest Rate Risk - Economic Value of Equity
No ratings yet
Chapter 8 Managing Interest Rate Risk - Economic Value of Equity
60 pages
Captain's Skills
No ratings yet
Captain's Skills
174 pages
Stata Guide To Accompany Introductory Econometrics For Finance PDF
No ratings yet
Stata Guide To Accompany Introductory Econometrics For Finance PDF
175 pages
Panel Data Analysis Using Stata: Sebastian T. Braun University of ST Andrews
No ratings yet
Panel Data Analysis Using Stata: Sebastian T. Braun University of ST Andrews
90 pages
KMV Merton Model
0% (1)
KMV Merton Model
36 pages
Es G Risk Rating
No ratings yet
Es G Risk Rating
215 pages
Bjornland Lecture (PHD Course) SVAR
100% (1)
Bjornland Lecture (PHD Course) SVAR
17 pages
An Introduction To Particle Filters: David Salmond and Neil Gordon Sept 2005
No ratings yet
An Introduction To Particle Filters: David Salmond and Neil Gordon Sept 2005
27 pages
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
No ratings yet
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
349 pages
Quant Econ
No ratings yet
Quant Econ
462 pages
Panel Data Methods For Microeconometrics Using Stata: A. Colin Cameron Univ. of California - Davis
100% (1)
Panel Data Methods For Microeconometrics Using Stata: A. Colin Cameron Univ. of California - Davis
55 pages
Microeconometrics Lecture Notes
No ratings yet
Microeconometrics Lecture Notes
407 pages
Applied Economics IV Lecture Notes
No ratings yet
Applied Economics IV Lecture Notes
64 pages
Cathy Econ0019 - w3
No ratings yet
Cathy Econ0019 - w3
44 pages
DLL - Nail Care 8 - 1st Week, Nov. 5-9 2018
100% (1)
DLL - Nail Care 8 - 1st Week, Nov. 5-9 2018
3 pages
Autoencoder Asset Pricing Models
No ratings yet
Autoencoder Asset Pricing Models
22 pages
Otoritas Ijtihad......
No ratings yet
Otoritas Ijtihad......
30 pages
Slides 5 Iu
No ratings yet
Slides 5 Iu
38 pages
Competencies Proficiency Scale
100% (1)
Competencies Proficiency Scale
2 pages
Good Stata Programming Lecture
No ratings yet
Good Stata Programming Lecture
207 pages
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
No ratings yet
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
27 pages
Stata Presentacion
No ratings yet
Stata Presentacion
109 pages
An Exploration of A Theraplay Informed Group As An Intervention For Adoptive Families
No ratings yet
An Exploration of A Theraplay Informed Group As An Intervention For Adoptive Families
20 pages
STATA Commands For Unobserved Effects Pa
No ratings yet
STATA Commands For Unobserved Effects Pa
23 pages
MCW Newest and Final PDF
No ratings yet
MCW Newest and Final PDF
20 pages
Matching and The Propensity Score Handout
No ratings yet
Matching and The Propensity Score Handout
23 pages
TBAC Presentation Q2 2020
No ratings yet
TBAC Presentation Q2 2020
31 pages
Saad Akhtar
No ratings yet
Saad Akhtar
48 pages
VSP Midrange Installation HQT 4180 Exam
No ratings yet
VSP Midrange Installation HQT 4180 Exam
2 pages
Essential Communication Skills For Conflict Resolution
No ratings yet
Essential Communication Skills For Conflict Resolution
15 pages
STATA Commands
No ratings yet
STATA Commands
42 pages
Changes in Direct Patient Care From Physiotherapy
No ratings yet
Changes in Direct Patient Care From Physiotherapy
9 pages
Philippine Christian University: College of Business and Technology
100% (3)
Philippine Christian University: College of Business and Technology
10 pages
Drukker XTDPD
No ratings yet
Drukker XTDPD
34 pages
1A.P5.S1,2 Describing A Picture
No ratings yet
1A.P5.S1,2 Describing A Picture
8 pages
Gold Exp B1 TB Flip
No ratings yet
Gold Exp B1 TB Flip
2 pages
Social Group Work Process-Phases
No ratings yet
Social Group Work Process-Phases
11 pages
Chapt
No ratings yet
Chapt
11 pages
Shareholder Value in Banks
No ratings yet
Shareholder Value in Banks
107 pages
Instrumental Variables
No ratings yet
Instrumental Variables
28 pages
Panel Stochastic Frontier Models With Endogeneity in Stata: Mustafa U. Karakaplan
No ratings yet
Panel Stochastic Frontier Models With Endogeneity in Stata: Mustafa U. Karakaplan
13 pages
GMM Stata
No ratings yet
GMM Stata
27 pages
Anthropology Natural Selection Lab Report Final
No ratings yet
Anthropology Natural Selection Lab Report Final
11 pages
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
No ratings yet
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
21 pages
Stata Graphs - Examples
No ratings yet
Stata Graphs - Examples
42 pages
Ii-Day 39
No ratings yet
Ii-Day 39
5 pages
Section 12 PDF
No ratings yet
Section 12 PDF
7 pages
Translation and Culture Analysis Tarian Lengger Maut 2021
No ratings yet
Translation and Culture Analysis Tarian Lengger Maut 2021
4 pages
Lectures On IV Estimation: 1 General Set-UP
No ratings yet
Lectures On IV Estimation: 1 General Set-UP
7 pages
Lesson Plan For Position and Movement Mathematics 8 Lesson 1
No ratings yet
Lesson Plan For Position and Movement Mathematics 8 Lesson 1
5 pages
Daily Lesson Log - Writing WH Questions
0% (1)
Daily Lesson Log - Writing WH Questions
3 pages
Unit 6 Lesson 1 Anglais
No ratings yet
Unit 6 Lesson 1 Anglais
6 pages
IVregression ECO311 Erdinc 14.03
No ratings yet
IVregression ECO311 Erdinc 14.03
11 pages
Computational Statistics With Matlab
No ratings yet
Computational Statistics With Matlab
71 pages
September 19 - 23, 2022 DLL EIM 12
100% (6)
September 19 - 23, 2022 DLL EIM 12
3 pages
DID101
No ratings yet
DID101
6 pages
Group Activity in Modaltext
No ratings yet
Group Activity in Modaltext
4 pages
Pvar Stata Modul
No ratings yet
Pvar Stata Modul
29 pages
Mathematical Finance End Sem
No ratings yet
Mathematical Finance End Sem
4 pages
Stata Excel Spreadsheet
No ratings yet
Stata Excel Spreadsheet
43 pages
Comments On The Savitzky Golay Convolution Method For Least Squares Fit Smoothing and Differentiation of Digital Data
No ratings yet
Comments On The Savitzky Golay Convolution Method For Least Squares Fit Smoothing and Differentiation of Digital Data
4 pages
Holy Angel University Bachelor of Science in Accountancy (Bsa)
No ratings yet
Holy Angel University Bachelor of Science in Accountancy (Bsa)
2 pages
LCC BSA Curriculum
No ratings yet
LCC BSA Curriculum
2 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Variations in Psychological Attributes
100% (2)
Variations in Psychological Attributes
43 pages
Analytical Pricing of Basket Default Swaps in A Dynamic Hull & White Framework
No ratings yet
Analytical Pricing of Basket Default Swaps in A Dynamic Hull & White Framework
18 pages
Abnormal Beh Lara
No ratings yet
Abnormal Beh Lara
5 pages
Regression Splines
No ratings yet
Regression Splines
4 pages
16 Refutation and Concession
No ratings yet
16 Refutation and Concession
2 pages
1Panel-Data Unit-Root Tests - Stata
No ratings yet
1Panel-Data Unit-Root Tests - Stata
3 pages
Markov Interest Rate Models - Hagan and Woodward
No ratings yet
Markov Interest Rate Models - Hagan and Woodward
28 pages
Heath Jarrow Morton A Interest Rate Model For CVA Calculations
No ratings yet
Heath Jarrow Morton A Interest Rate Model For CVA Calculations
9 pages
Libor Market Model Joshi
No ratings yet
Libor Market Model Joshi
2 pages
Legal Education and RM Project
No ratings yet
Legal Education and RM Project
7 pages

Endogeneity and Instrumental Variables

Uploaded by

Endogeneity and Instrumental Variables

Uploaded by

Endogeneity and

What Exactly Are Instrumental

A fundamental assumption of estimating _1 is that the correlation between

If this is not the case (X is endogenous), then using instrumental variables Z

An Instrument must satisfy

Loosely speaking, the relevance restriction imposes that the instrument Z is

Exclusion, or exogeneity requires that Z not be systematically related to the

Why do we need IV?

Three main sources of internal validity issues are:

Appropriately instrumenting for endogenous regressors can eliminate these

Univariate IV Estimation: Two Stage

Stage 1: Identify the portion of X that is uncorrelated with the error, u

Stage 2: Replace X values with estimated X

Underlying Assumptions of 2SLS

_0 and _1 are well-estimated in the first stage (large samples)

Careful about Standard Errors

Second-stage OLS standard errors are not correct

IV Regression in a Multivariate Model

Aside from messy algebra, estimation generalizes rather easily

Key identification criterion: at least as many Z as endogenous X

Dont forget to instrument for interactions between endogenous X

Testing for Instrument Relevance

Assume one endogenous X

First stage regress is therefore:

Relevance comes from at least one _i different from zero

If not, the instrument is weak

Why are weak instruments so bad?

Back to the simple model:

Weak instruments leads to a near-zero denominator, and the resulting

Measuring the Strength of Instruments

The first-stage F-test

If the set of instruments is weak, get better instruments

If that is impossible, consider dropping the weakest to improve the first-stage F

Too many Instruments = Tests for

Assume we have multiple valid instruments with a single endogenous

Intuition: If we perform 2SLS using both instruments separately and arrive

J-Test of Overidentifying Restrictions

Step 2: Compute predicted Y values using the actual Xs

Step 3: Compute residuals

Step 4: Regress residuals against all instruments Z and exogenous regressors X

Here, F is the F-stat from testing coefficients on Z_i

Sargan Test for Overidentification

Balances the consistency of IV against the efficiency of LS

DWH test for a single endogenous regressor:

A roughly equivalent procedure for DWH:

1. Estimate the first-stage model

Note: Coefficient on endogenous X in this model is b IV (standard error is

Angrist and Krueger (1991), J.L.E.

Returns to education (Y = wages)

Years of schooling vary by quarter of birth

Compulsory schooling laws, age-at-entry rules

Q.O.B. can be treated as a useful source of exogeneity in schooling

Angrist and Krueger (1991), J.L.E.

People born in Q1 do obtain less

But pay close attention to the scale of

So...need large N since R2X,Z will

Source: Angrist and Krueger (1991), Figure I

Angrist and Krueger (1991),

Bound, Jaeger, and Baker (1995), J.A.S.A.

Potential problems with QOB as an IV

Correlation between QOB and schooling is weak

QOB may not be completely exogenous

Bound, Jaeger, and Baker (1995), J.A.S.A.

First-stage diagnostics provide a sense of how good an IV is in a given setting

Lewbel (2012) Method of Identification

Mostly applicable to models with an unobserved common factor

ConsiderY1,Y2 as observed endogenous variables, X a vector of observed exogenous

Consider a structural model of the form:

Higher-moment considerations (restricting correlations of with X);

You might also like