0% found this document useful (0 votes)
70 views32 pages

Chapter 1 - Instrumental Variable Method

The document discusses the endogeneity problem in econometrics, highlighting how it leads to biased and unstable coefficient estimates in regression models. It introduces instrumental variables (IV) as a solution to this problem, detailing the conditions for a valid instrument and the process of IV estimation. The document also provides examples and explanations of different sources of endogeneity, such as omitted variables and measurement errors.

Uploaded by

Linh Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views32 pages

Chapter 1 - Instrumental Variable Method

The document discusses the endogeneity problem in econometrics, highlighting how it leads to biased and unstable coefficient estimates in regression models. It introduces instrumental variables (IV) as a solution to this problem, detailing the conditions for a valid instrument and the process of IV estimation. The document also provides examples and explanations of different sources of endogeneity, such as omitted variables and measurement errors.

Uploaded by

Linh Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

National Economic University

Chapter 1
Instrumental Variable
Method

Dr. Phung Minh Duc


Contents
1. Endogeneity Problem
2. Instrumental Variables
3. IV Estimation
4. 2SLS Estimation
5. Testing for Endogeneity
6. Testing for instrumental variables
7. Commands on Stata
8. Practice
Endogeneity Problem

Endogeneity Problem
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢
❖ Endogeneity problem is when the independent variable is correlated with the
error term:
𝑐𝑜𝑣(𝑋, 𝑢) ≠ 0
❖ If the model contains endogenous variables => The coefficients estimated by
the OLS method are biased and unstable
❖ Endogeneity is a frequent problem in economic and econometrics.
Endogeneity Problem

Endogeneity Problem
❖ In the regression model: 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢, we have
𝑐𝑜𝑣 𝑌, 𝑋 = 𝑐𝑜𝑣 𝛽0 + 𝛽1 𝑋 + 𝑢, 𝑋 = 𝛽1 𝑐𝑜𝑣 𝑋, 𝑋 + 𝑐𝑜𝑣 𝑢, 𝑋
𝑐𝑜𝑣 𝑌,𝑋 𝑐𝑜𝑣 𝑢,𝑋
⇒ 𝛽1 = −
𝑣𝑎𝑟(𝑋) 𝑣𝑎𝑟(𝑋)

෢1 = 𝑐𝑜𝑣(𝑌,𝑋)
In OLS regression, 𝛽 ෣ 𝑣𝑎𝑟(𝑋)
𝑐𝑜𝑣 𝑌,𝑋
▪ If 𝑐𝑜𝑣 𝑢, 𝑋 = 0 then 𝛽1 = ⇒ 𝐸 𝛽መ1 = 𝛽1
𝑣𝑎𝑟(𝑋)
𝑐𝑜𝑣 𝑌,𝑋 𝑐𝑜𝑣 𝑢,𝑋
▪ If 𝑐𝑜𝑣 𝑢, 𝑋 ≠ 0 then 𝛽1 = − ⇒ 𝐸 𝛽መ1 ≠ 𝛽1
𝑣𝑎𝑟(𝑋) 𝑣𝑎𝑟(𝑋)

Biased
Endogeneity Problem

Sources of endogeneity
❖ Omitted variable: Independent variables are not observed and
located in error term, so the error term is correlated with the
independent variables used in the model
▪ In the model: 𝑄 = 𝛽0 + 𝛽1 𝑃 + 𝑢, where 𝑄 is the yield of rice and 𝑃 is the
amount of fertilizer used.
▪ P is correlated with the variable 𝑍 which is the "natural quality of the
soil", while there is often no data for 𝑍 so 𝑐𝑜𝑣 𝑃, 𝑢 ≠ 0 => 𝑃 is a
endogenous variable.
Endogeneity Problem

Sources of endogeneity
❖ Omitted variable: Independent variables are not observed and end up
in the error term, so the error term is correlated with the independent
variables used in the model
▪ In the model: 𝑙𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝑢, where 𝑙𝑤𝑎𝑔𝑒 is the loga of
wage, and 𝑒𝑑𝑢𝑐 is the the level of education of the woker.
▪ Educ is correlated with the variable 𝑍 which is the "intelligence", while
there is often no data for 𝑍 so 𝑐𝑜𝑣(𝑒𝑑𝑢𝑐, 𝑢) ≠ 0 and 𝑒𝑑𝑢𝑐 is a
endogenous variable.
Endogeneity Problem

Sources of endogeneity
❖ Measurement: Measurement error can cause correlation between
the mismeasured variable and the error term
▪ In the model: 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢 (1), suppose that 𝑋 is wrongly
measured as 𝑋 ∗ , that is, 𝑋 ∗ = 𝑋 + 𝑣, model (1) becomes
𝑌 = 𝛽0 + 𝛽1 𝑋 ∗ + 𝑢ത , in which 𝑢ത = 𝑢 − 𝛽1 𝑣 (2)
▪ If the error 𝑣 is larger then 𝑋 ∗ is a endogenous variable, because
𝑐𝑜𝑣(𝑋 ∗ , 𝑢ത ) ≠ 0.
Endogeneity Problem

Sources of endogeneity
❖ Measurement: Measurement error can cause correlation between
the mismeasured variable and the error term
▪ In the model: 𝑐𝑜𝑛𝑠 = 𝛽0 + 𝛽1 𝑖𝑛𝑐 + 𝑢 (1), where 𝑐𝑜𝑛𝑠 is the
consumption and 𝑖𝑛𝑐 is the income of a household.
▪ Usually, households don't remember the exact income, so inc is often
wrongly measured as 𝑖𝑛𝑐 ∗ , that is, 𝑖𝑛𝑐 ∗ = 𝑖𝑛𝑐 + 𝑣, so 𝑖𝑛𝑐 ∗ is a
endogenous variable.
Endogeneity Problem

Sources of endogeneity
❖ Simultaneity:
▪ Assume that
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢 (1)

𝑋 = 𝛼0 + 𝛼1 𝑌 + 𝑣 (2)
We have: 𝑋 = 𝛼0 + 𝛼1 𝛽0 + 𝛽1 𝑋 + 𝑢 + 𝑣
𝛼0 +𝛼1 𝛽0 𝛼1 𝑢+𝑣
If 1 − 𝛼1 𝛽1 ≠ 0 then 𝑋 = + ⇒ 𝑐𝑜𝑣 (𝑋, 𝑢) ≠ 0
1−𝛼1 𝛽1 1−𝛼1 𝛽1
Endogeneity Problem

Solution for endogeneity


❖ Find and include the unobserved variable in the model
❖ Find and include a proxy variable in the model
❖ Use fixed effects estimator with panel data, by eliminating individual specific
effects
❖ Use instrumental variable (IV) method which replaces the endogenous
variable with a predicted value that has only exogenous information
Instrumental Variable

Instrumental Variable - Definition


❖ An instrumental variable (or instrument or IV) is a variable that is used in a
regression model to correct for the endogeneity problem
❖ The instrument 𝑍 is said to be consistent with the endogenous variable 𝑋 if
two conditions are satisfied:
1. Instrument relevance: 𝑐𝑜𝑣(𝑍, 𝑋) ≠ 0 (𝑍 is correlated to the
endogenous variable 𝑋, but 𝑍 does not belong in the model)
2. Instrument exogeneity: 𝑐𝑜𝑣 𝑍, 𝑢 = 0 (𝑍 is not correlated with the
error term 𝑢)
Instrumental Variable

Model for log wages (𝑙𝑤𝑎𝑔𝑒) explained by education 𝑒𝑑𝑢𝑐, which is endogenous
𝑙𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝑢
❖ 𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐 (father’s education) is a good instrument for 𝑒𝑑𝑢𝑐 because it has three
properties:
▪ The instrument 𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐 doe not appear in the original model
▪ The instrument 𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐 is correlated with the endogenous variable 𝑒𝑑𝑢𝑐, so
𝑐𝑜𝑣(𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐, 𝑒𝑑𝑢𝑐) ≠ 0
▪ The instrument 𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐 is uncorrelated with the error term 𝑢, so
𝑐𝑜𝑣 𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐, 𝑢 = 0
❖ Other potential instruments: mother education, the number of siblings, the month
of bird,…
Instrumental Variable

Example
❖ The model: 𝑆𝑐𝑜𝑟𝑒 = 𝛽0 + 𝛽1 𝑠𝑘𝑖𝑝𝑝𝑒𝑑 + 𝑢, in which 𝑠𝑐𝑜𝑟𝑒 is the final exam score;
𝑠𝑘𝑖𝑝𝑝𝑒𝑑 is the total number of lectures missed during the semester
❖ Why is 𝑠𝑘𝑖𝑝𝑝𝑒𝑑 correlated with other factors in 𝑢?
(So, 𝑠𝑘𝑖𝑝𝑝𝑒𝑑 is an endogeneity variable)
❖ Can distance (the distance between living quarters and campus) act as an instrumental
variable for 𝑠𝑘𝑖𝑝𝑝𝑒𝑑?
▪ 𝑐𝑜𝑣(𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒, 𝑠𝑘𝑖𝑝𝑝𝑒𝑟) ≠ 0?
▪ 𝑐𝑜𝑣(𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒, 𝑢) ≠ 0?
Instrumental Variable

Instrumental Variable - Definition


❖ The instrument 𝑍 is said to be a strong instrument if it is highly correlated
with the endogenous variable 𝑋 and called a weak instrument otherwise.
❖ An instrumental variable is called a valid instrument if it is both a strong
instrumental variable and an exogenous variable.
Note: The condition of the exogenous of the instrumental variable 𝑍 is related to
the covariance between 𝑍 and the error term, so it is generally impossible to
test. In most cases, the researcher must consider the economic nature of the
problem to account for the exogenous of the chosen instrumental variable.
IV Estimation

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢 (1)
The instrument 𝑍 such that: 𝑐𝑜𝑣(𝑋, 𝑍) ≠ 0 and 𝑐𝑜𝑣 𝑍, 𝑢 = 0

𝐸 𝑢 =0 𝐸 𝑢 =0 𝐸 𝑌 − 𝛽0 − 𝛽1 𝑋 = 0
❖ ቊ ⇒ቊ ⇒൝
𝑐𝑜𝑣 𝑍, 𝑢 = 0 𝑐𝑜𝑣 𝑍. 𝑢 = 0 𝑐𝑜𝑣 𝑍 𝑌 − 𝛽0 − 𝛽1 𝑋 = 0
1
σ 𝑌𝑖 − 𝛽መ0 − 𝛽መ1 𝑋𝑖 = 0
𝑛
❖ In the sample: ቐ1 we have IV estimator:
σ(𝑍𝑖 (𝑌𝑖 − 𝛽መ0 − 𝛽መ1 𝑋𝑖 ) = 0
𝑛
ത 𝑖 − 𝑍)ҧ
σ(𝑌𝑖 − 𝑌)(𝑍 𝑐𝑜𝑣෣𝑌, 𝑍
𝛽መ1 𝐼𝑉 = =
ത 𝑖 − 𝑍)ҧ
σ(𝑋𝑖 − 𝑋)(𝑍 ෣ 𝑍)
𝑐𝑜𝑣(𝑋,
IV Estimation

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢 (1)
The instrument 𝑍 such that: 𝑐𝑜𝑣(𝑋, 𝑍) ≠ 0 and 𝑐𝑜𝑣 𝑍, 𝑢 = 0

ത 𝑖 − 𝑍)ҧ
σ(𝑌𝑖 − 𝑌)(𝑍 𝑐𝑜𝑣෣𝑌, 𝑍
𝛽መ1 𝐼𝑉 = =
ത 𝑖 − 𝑍)ҧ
σ(𝑋𝑖 − 𝑋)(𝑍 ෣ 𝑍)
𝑐𝑜𝑣(𝑋,
We have:
𝑐𝑜𝑣 𝑌, 𝑍 = 𝑐𝑜𝑣 𝛽0 + 𝛽1 𝑋 + 𝑢, 𝑍 = 𝛽1 𝑐𝑜𝑣 𝑋, 𝑍 + 𝑐𝑜𝑣(𝑢, 𝑍)
Because 𝑐𝑜𝑣 𝑢, 𝑍 = 0 we have:
𝑐𝑜𝑣 𝑌,𝑍 ෡ 𝟏 ) = 𝜷𝟏
𝛽1 = 𝑐𝑜𝑣(𝑋,𝑍) then 𝑬(𝜷 𝑰𝑽

Then, the coefficient estimated using the IV formula will be unbiased and consistent
2SLS Estimation

Step 1: Estimating the endogenous variable 𝑋 according to the instrumental


variable 𝑍:
𝑋 = 𝛿0 + 𝛿1 𝑍 + 𝑣
obtain the estimated result 𝑋෠ = 𝛿መ0 + 𝛿መ1 𝑍, which contains only exogenous
information from the instrument 𝑍.
Step 2: Regression the dependent variable 𝑌 on the predicted value 𝑋: ෠
𝑌 = 𝛽0 + 𝛽1 𝑋෠ + 𝑢
The coefficient 𝛽1 estimated with 2SLS will be unbiased because 𝑋෠ is exogenous
and uncorrelated with the error term 𝑢.
2SLS Estimation

2SLS Standard Error:


❖ The standard errors from the second stage regression need to be corrected
𝜎2 𝜎2 2
❖ In OLS, 𝑣𝑎𝑟 𝛽1 = , but in 2SLS, 𝑣𝑎𝑟 𝛽1 = 2 in which 𝜎 is the
𝑆𝑆𝑇𝑋 𝑆𝑆𝑇𝑋 .𝑅𝑋,𝑍
variance of the error term 𝑢, and 𝑆𝑆𝑇𝑋 is the total variation in 𝑋;
2
❖ 𝑅𝑋,𝑍 is 𝑅2 from the regression of 𝑋 on 𝑍, so the variance of coefficients using
the 2SLS estimation will be higher than the variance of coefficients using the
OLS, because the R-square is less than 1.
❖ A weaker the relationship between 𝑿 and 𝒁 will results in lower 𝑹𝟐𝑿,𝒁 , and
the higher variance of the 2SLS coefficient, leading to less significance.
Testing for Endogeneity

❖ Multiple regression model


𝐻 : 𝑐𝑜𝑣 𝑋, 𝑢 = 0
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 with ቊ 0
𝐻1 : 𝑐𝑜𝑣(𝑋, 𝑢) ≠ 0
▪ If 𝐻0 is true, then 𝛽መ𝑂𝐿𝑆 and 𝛽መ𝐼𝑉 are both consistent, so OLS is the best (BLUE)
▪ If 𝐻0 is false, then OLS is unconsistent and 2SLS is consistent
❖ Testing for Endogeneity of a single Explanatory Variable
▪ Regression of the independent variable 𝑋 on the instrumental variable 𝑍 and the
exogenous independent variable 𝑊, obtain the residuals 𝑣ො
▪ Add 𝑣ො to the structural equation (which includes 𝑋) and test for significance of 𝑣ො
using an OLS regression. If the coefficient on 𝑣ො is statistically different from zero,
we conclude that 𝑋 is indeed endogenous.
Testing for Endogeneity

Hausman Test
❖ The Hausman test compares the difference between 𝛽መ𝑂𝐿𝑆 and 𝛽መ𝐼𝑉 :
𝐻0 : 𝑑 = 𝛽መ𝑂𝐿𝑆 − 𝛽መ𝐼𝑉 = 0

𝐻1 : 𝑑 = 𝛽መ𝑂𝐿𝑆 − 𝛽መ𝐼𝑉 ≠ 0
❖ Test statistics:

𝐻 = 𝛽መ𝐼𝑉 − 𝛽መ𝑂𝐿𝑆 . 𝑣𝑎𝑟 𝛽መ𝐼𝑉 − 𝑣𝑎𝑟 𝛽መ𝑂𝐿𝑆 𝛽መ𝐼𝑉 − 𝛽መ𝑂𝐿𝑆 ~2
▪ If 𝐻0 is true, then OLS is the best (BLUE)
▪ If 𝐻0 is false, then IV is consistent
Testing for instrumental variables

Testing for the exogenous of the instrument


Consider the model 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢, in which 𝑋 is the endogenous variable
❖ The exogenousity test for instrumental variables is performed only if the
number of instrumental variables is greater than or equal to the number of
endogenous independent variables.
▪ If greater: Over-identification
▪ If equal: Exact-identification
Testing for instrumental variables

Testing for the exogenous of the instrument


Consider the model 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢, in which cov(𝑋, 𝑢) ≠ 0 and 𝑍1 , 𝑍2 are
two instrument variable.
❖ Way 1: (The Sargan test)
▪ Estimate the structural equation by 2SLS using the instrument 𝑍1 , 𝑍2 and obtain
the 2SLS residuals 𝑢ො
▪ Regress 𝑢ො 1 on all exogenous variables
𝑢ො = 𝛼0 + 𝛼1 𝑍1 + 𝛼2 𝑍2 + 𝑣
▪ Test the hypothesis 𝐻0 : 𝛼1 = 𝛼2 = 0, if P-value << then at least some of the IV
𝑍1 , 𝑍2 are not exogenous
Testing for instrumental variables

Testing for the exogenous of the instrument


Consider the model 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢, in which cov(𝑋, 𝑢) ≠ 0 and 𝑍1 , 𝑍2 are
two instrument variable.
❖ Way 2: (the Hausman test)
▪ Estimate the structural equation by 2SLS using the instrument 𝑍1 and obtain the
cofficient 𝛽መ1
▪ Estimate the structural equation by 2SLS using the instrument 𝑍2 and obtain the
cofficient 𝛽መ1 ′
▪ Test the hypothesis 𝐻0 : 𝑑 = 𝛽መ1 − 𝛽መ1 ′ = 0, if P-value << then at least some of the
IV 𝑍1 , 𝑍2 are not exogenous.
Testing for instrumental variables

Testing for the weakly instrumental


❖ The instrumental variable 𝑍 is said to be strongly instrumental if it is highly
correlated with the endogenous independent variable and weakly
instrumental otherwise.
▪ Estimating the endogenous variable 𝑋 according to the instrumental variable
𝑍1 , 𝑍2 :
𝑋 = 𝛿0 + 𝛿1 𝑍1 + 𝑣
▪ Use the F-statistic of variable 𝑍 to conclude whether 𝑍1 , 𝑍2 is the weak instrument
or not: If the value of F is less than 10, it can be concluded that at least one of the
two instrumental variables 𝑍1 and 𝑍2 is weak instrument.
The general model with
endogenous variables

𝑌 = 𝛽0 + 𝛽1 𝑋1 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝛽𝑘+1 𝑊1 + ⋯ + 𝛽𝑘+𝑚 𝑊𝑚 + 𝑢 (*)


In which, 𝑋1 , … , 𝑋𝑘 are the endogenous variable; 𝑊1 , … , 𝑊𝑚 are the exogeneous variable;
and 𝑍1 , … , 𝑍𝑝 𝑝 ≥ 𝑘 are the instrument variables.
Step 1: Regression 𝑋1 on all instrumental variables 𝑍1 , … , 𝑍𝑚 and al the exogenous
independent variables 𝑊1 , … , 𝑊𝑚 by OLS method, store the estimated value as 𝑋෠1 . Repeat
the same for the remaining endogenous independent variables, saving the estimated
values 𝑋෠1 , 𝑋෠2 , … , 𝑋෠𝑘 .
Step 2: Regression of equation (*) by OLS method with exogenous independent variables
𝑊1 , … , 𝑊𝑚 , endogenous independent variables 𝑋1 , … , 𝑋𝑘 are replaced by estimated
values 𝑋෠1 , 𝑋෠2 , … , 𝑋෠𝑘 .
Command on Stata

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, and 𝑍 is the instrument variable
❖ Estimate 2SLS
Ivregress 2sls Y W (X=Z)
Ivregress 2sls Y W (X=Z), small (if the sample is small)
Ivregress 2sls Y W (X=Z), vce(robust) small (if variance is not homogenous)
Command on Stata

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, and 𝑍 is the instrument variable
❖ Test the endogeneity of the variable X:
▪ Way 1: reg X Z W
predict Vhat, residuals
reg Y W Vhat => if P-value of Vhat << then X is endogeneous
Command on Stata

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, and 𝑍 is the instrument variable
❖ Test the endogeneity of the variable X:
▪ Way 2: reg Y X W
est store ls
ivregress 2sls Y W (X=Z)
est store iv
Hausman iv ls, constant sigmamore
If P-value << then X is endogeneous
Command on Stata

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, 𝑍1 , 𝑍2 are the instrument variables
❖ Test the exogeneous of the instrument variable 𝐙𝟏 , 𝐙𝟐 :
▪ Way 1:
ivregres 2sls Y (X=Z1 Z2) W, small
estat overid
If P-value << then at least some of the IV 𝑍1 , 𝑍2 are not exogenous
Command on Stata

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, and 𝑍 is the instrument variable
❖ Test the exogeneous of the instrument variable 𝐙𝟏 , 𝐙𝟐 :
▪ Way 2: ivregres 2sls Y (X=Z1) W, small
est storer z1
ivregres 2sls Y (X=Z2) W, small
est storer z2
Hausman z1 z2, constant sigmamore
If P-value << then at least some of the IV 𝑍1 , 𝑍2 are not exogenous
Command on Stata

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, and 𝑍 is the instrument variable
❖ Test the weak instrument of variable 𝒁𝟏 , 𝒁𝟐 :
ivregres 2sls Y (X=Z1 Z2) W, small
estat firststage
If F-statistic value < 10 then at least some of the IV 𝑍1 , 𝑍2 is weakly instrument
If F-statistic value > 10 then both 𝑍1 , 𝑍2 is strong instrument
Practice

You might also like