Chapter 1 - Instrumental Variable Method
Chapter 1 - Instrumental Variable Method
Chapter 1
Instrumental Variable
Method
Endogeneity Problem
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢
❖ Endogeneity problem is when the independent variable is correlated with the
error term:
𝑐𝑜𝑣(𝑋, 𝑢) ≠ 0
❖ If the model contains endogenous variables => The coefficients estimated by
the OLS method are biased and unstable
❖ Endogeneity is a frequent problem in economic and econometrics.
Endogeneity Problem
Endogeneity Problem
❖ In the regression model: 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢, we have
𝑐𝑜𝑣 𝑌, 𝑋 = 𝑐𝑜𝑣 𝛽0 + 𝛽1 𝑋 + 𝑢, 𝑋 = 𝛽1 𝑐𝑜𝑣 𝑋, 𝑋 + 𝑐𝑜𝑣 𝑢, 𝑋
𝑐𝑜𝑣 𝑌,𝑋 𝑐𝑜𝑣 𝑢,𝑋
⇒ 𝛽1 = −
𝑣𝑎𝑟(𝑋) 𝑣𝑎𝑟(𝑋)
1 = 𝑐𝑜𝑣(𝑌,𝑋)
In OLS regression, 𝛽 𝑣𝑎𝑟(𝑋)
𝑐𝑜𝑣 𝑌,𝑋
▪ If 𝑐𝑜𝑣 𝑢, 𝑋 = 0 then 𝛽1 = ⇒ 𝐸 𝛽መ1 = 𝛽1
𝑣𝑎𝑟(𝑋)
𝑐𝑜𝑣 𝑌,𝑋 𝑐𝑜𝑣 𝑢,𝑋
▪ If 𝑐𝑜𝑣 𝑢, 𝑋 ≠ 0 then 𝛽1 = − ⇒ 𝐸 𝛽መ1 ≠ 𝛽1
𝑣𝑎𝑟(𝑋) 𝑣𝑎𝑟(𝑋)
Biased
Endogeneity Problem
Sources of endogeneity
❖ Omitted variable: Independent variables are not observed and
located in error term, so the error term is correlated with the
independent variables used in the model
▪ In the model: 𝑄 = 𝛽0 + 𝛽1 𝑃 + 𝑢, where 𝑄 is the yield of rice and 𝑃 is the
amount of fertilizer used.
▪ P is correlated with the variable 𝑍 which is the "natural quality of the
soil", while there is often no data for 𝑍 so 𝑐𝑜𝑣 𝑃, 𝑢 ≠ 0 => 𝑃 is a
endogenous variable.
Endogeneity Problem
Sources of endogeneity
❖ Omitted variable: Independent variables are not observed and end up
in the error term, so the error term is correlated with the independent
variables used in the model
▪ In the model: 𝑙𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝑢, where 𝑙𝑤𝑎𝑔𝑒 is the loga of
wage, and 𝑒𝑑𝑢𝑐 is the the level of education of the woker.
▪ Educ is correlated with the variable 𝑍 which is the "intelligence", while
there is often no data for 𝑍 so 𝑐𝑜𝑣(𝑒𝑑𝑢𝑐, 𝑢) ≠ 0 and 𝑒𝑑𝑢𝑐 is a
endogenous variable.
Endogeneity Problem
Sources of endogeneity
❖ Measurement: Measurement error can cause correlation between
the mismeasured variable and the error term
▪ In the model: 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢 (1), suppose that 𝑋 is wrongly
measured as 𝑋 ∗ , that is, 𝑋 ∗ = 𝑋 + 𝑣, model (1) becomes
𝑌 = 𝛽0 + 𝛽1 𝑋 ∗ + 𝑢ത , in which 𝑢ത = 𝑢 − 𝛽1 𝑣 (2)
▪ If the error 𝑣 is larger then 𝑋 ∗ is a endogenous variable, because
𝑐𝑜𝑣(𝑋 ∗ , 𝑢ത ) ≠ 0.
Endogeneity Problem
Sources of endogeneity
❖ Measurement: Measurement error can cause correlation between
the mismeasured variable and the error term
▪ In the model: 𝑐𝑜𝑛𝑠 = 𝛽0 + 𝛽1 𝑖𝑛𝑐 + 𝑢 (1), where 𝑐𝑜𝑛𝑠 is the
consumption and 𝑖𝑛𝑐 is the income of a household.
▪ Usually, households don't remember the exact income, so inc is often
wrongly measured as 𝑖𝑛𝑐 ∗ , that is, 𝑖𝑛𝑐 ∗ = 𝑖𝑛𝑐 + 𝑣, so 𝑖𝑛𝑐 ∗ is a
endogenous variable.
Endogeneity Problem
Sources of endogeneity
❖ Simultaneity:
▪ Assume that
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢 (1)
ቊ
𝑋 = 𝛼0 + 𝛼1 𝑌 + 𝑣 (2)
We have: 𝑋 = 𝛼0 + 𝛼1 𝛽0 + 𝛽1 𝑋 + 𝑢 + 𝑣
𝛼0 +𝛼1 𝛽0 𝛼1 𝑢+𝑣
If 1 − 𝛼1 𝛽1 ≠ 0 then 𝑋 = + ⇒ 𝑐𝑜𝑣 (𝑋, 𝑢) ≠ 0
1−𝛼1 𝛽1 1−𝛼1 𝛽1
Endogeneity Problem
Model for log wages (𝑙𝑤𝑎𝑔𝑒) explained by education 𝑒𝑑𝑢𝑐, which is endogenous
𝑙𝑤𝑎𝑔𝑒 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝑢
❖ 𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐 (father’s education) is a good instrument for 𝑒𝑑𝑢𝑐 because it has three
properties:
▪ The instrument 𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐 doe not appear in the original model
▪ The instrument 𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐 is correlated with the endogenous variable 𝑒𝑑𝑢𝑐, so
𝑐𝑜𝑣(𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐, 𝑒𝑑𝑢𝑐) ≠ 0
▪ The instrument 𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐 is uncorrelated with the error term 𝑢, so
𝑐𝑜𝑣 𝑓𝑎𝑡ℎ𝑒𝑑𝑢𝑐, 𝑢 = 0
❖ Other potential instruments: mother education, the number of siblings, the month
of bird,…
Instrumental Variable
Example
❖ The model: 𝑆𝑐𝑜𝑟𝑒 = 𝛽0 + 𝛽1 𝑠𝑘𝑖𝑝𝑝𝑒𝑑 + 𝑢, in which 𝑠𝑐𝑜𝑟𝑒 is the final exam score;
𝑠𝑘𝑖𝑝𝑝𝑒𝑑 is the total number of lectures missed during the semester
❖ Why is 𝑠𝑘𝑖𝑝𝑝𝑒𝑑 correlated with other factors in 𝑢?
(So, 𝑠𝑘𝑖𝑝𝑝𝑒𝑑 is an endogeneity variable)
❖ Can distance (the distance between living quarters and campus) act as an instrumental
variable for 𝑠𝑘𝑖𝑝𝑝𝑒𝑑?
▪ 𝑐𝑜𝑣(𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒, 𝑠𝑘𝑖𝑝𝑝𝑒𝑟) ≠ 0?
▪ 𝑐𝑜𝑣(𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒, 𝑢) ≠ 0?
Instrumental Variable
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢 (1)
The instrument 𝑍 such that: 𝑐𝑜𝑣(𝑋, 𝑍) ≠ 0 and 𝑐𝑜𝑣 𝑍, 𝑢 = 0
𝐸 𝑢 =0 𝐸 𝑢 =0 𝐸 𝑌 − 𝛽0 − 𝛽1 𝑋 = 0
❖ ቊ ⇒ቊ ⇒൝
𝑐𝑜𝑣 𝑍, 𝑢 = 0 𝑐𝑜𝑣 𝑍. 𝑢 = 0 𝑐𝑜𝑣 𝑍 𝑌 − 𝛽0 − 𝛽1 𝑋 = 0
1
σ 𝑌𝑖 − 𝛽መ0 − 𝛽መ1 𝑋𝑖 = 0
𝑛
❖ In the sample: ቐ1 we have IV estimator:
σ(𝑍𝑖 (𝑌𝑖 − 𝛽መ0 − 𝛽መ1 𝑋𝑖 ) = 0
𝑛
ത 𝑖 − 𝑍)ҧ
σ(𝑌𝑖 − 𝑌)(𝑍 𝑐𝑜𝑣𝑌, 𝑍
𝛽መ1 𝐼𝑉 = =
ത 𝑖 − 𝑍)ҧ
σ(𝑋𝑖 − 𝑋)(𝑍 𝑍)
𝑐𝑜𝑣(𝑋,
IV Estimation
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑢 (1)
The instrument 𝑍 such that: 𝑐𝑜𝑣(𝑋, 𝑍) ≠ 0 and 𝑐𝑜𝑣 𝑍, 𝑢 = 0
ത 𝑖 − 𝑍)ҧ
σ(𝑌𝑖 − 𝑌)(𝑍 𝑐𝑜𝑣𝑌, 𝑍
𝛽መ1 𝐼𝑉 = =
ത 𝑖 − 𝑍)ҧ
σ(𝑋𝑖 − 𝑋)(𝑍 𝑍)
𝑐𝑜𝑣(𝑋,
We have:
𝑐𝑜𝑣 𝑌, 𝑍 = 𝑐𝑜𝑣 𝛽0 + 𝛽1 𝑋 + 𝑢, 𝑍 = 𝛽1 𝑐𝑜𝑣 𝑋, 𝑍 + 𝑐𝑜𝑣(𝑢, 𝑍)
Because 𝑐𝑜𝑣 𝑢, 𝑍 = 0 we have:
𝑐𝑜𝑣 𝑌,𝑍 𝟏 ) = 𝜷𝟏
𝛽1 = 𝑐𝑜𝑣(𝑋,𝑍) then 𝑬(𝜷 𝑰𝑽
Then, the coefficient estimated using the IV formula will be unbiased and consistent
2SLS Estimation
Hausman Test
❖ The Hausman test compares the difference between 𝛽መ𝑂𝐿𝑆 and 𝛽መ𝐼𝑉 :
𝐻0 : 𝑑 = 𝛽መ𝑂𝐿𝑆 − 𝛽መ𝐼𝑉 = 0
൝
𝐻1 : 𝑑 = 𝛽መ𝑂𝐿𝑆 − 𝛽መ𝐼𝑉 ≠ 0
❖ Test statistics:
′
𝐻 = 𝛽መ𝐼𝑉 − 𝛽መ𝑂𝐿𝑆 . 𝑣𝑎𝑟 𝛽መ𝐼𝑉 − 𝑣𝑎𝑟 𝛽መ𝑂𝐿𝑆 𝛽መ𝐼𝑉 − 𝛽መ𝑂𝐿𝑆 ~2
▪ If 𝐻0 is true, then OLS is the best (BLUE)
▪ If 𝐻0 is false, then IV is consistent
Testing for instrumental variables
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, and 𝑍 is the instrument variable
❖ Estimate 2SLS
Ivregress 2sls Y W (X=Z)
Ivregress 2sls Y W (X=Z), small (if the sample is small)
Ivregress 2sls Y W (X=Z), vce(robust) small (if variance is not homogenous)
Command on Stata
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, and 𝑍 is the instrument variable
❖ Test the endogeneity of the variable X:
▪ Way 1: reg X Z W
predict Vhat, residuals
reg Y W Vhat => if P-value of Vhat << then X is endogeneous
Command on Stata
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, and 𝑍 is the instrument variable
❖ Test the endogeneity of the variable X:
▪ Way 2: reg Y X W
est store ls
ivregress 2sls Y W (X=Z)
est store iv
Hausman iv ls, constant sigmamore
If P-value << then X is endogeneous
Command on Stata
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, 𝑍1 , 𝑍2 are the instrument variables
❖ Test the exogeneous of the instrument variable 𝐙𝟏 , 𝐙𝟐 :
▪ Way 1:
ivregres 2sls Y (X=Z1 Z2) W, small
estat overid
If P-value << then at least some of the IV 𝑍1 , 𝑍2 are not exogenous
Command on Stata
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, and 𝑍 is the instrument variable
❖ Test the exogeneous of the instrument variable 𝐙𝟏 , 𝐙𝟐 :
▪ Way 2: ivregres 2sls Y (X=Z1) W, small
est storer z1
ivregres 2sls Y (X=Z2) W, small
est storer z2
Hausman z1 z2, constant sigmamore
If P-value << then at least some of the IV 𝑍1 , 𝑍2 are not exogenous
Command on Stata
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢 (*)
in which 𝑋 is the endogeneous, 𝑊 is the exogeneous, and 𝑍 is the instrument variable
❖ Test the weak instrument of variable 𝒁𝟏 , 𝒁𝟐 :
ivregres 2sls Y (X=Z1 Z2) W, small
estat firststage
If F-statistic value < 10 then at least some of the IV 𝑍1 , 𝑍2 is weakly instrument
If F-statistic value > 10 then both 𝑍1 , 𝑍2 is strong instrument
Practice