江艇+IRID metrics 2016 slides
江艇+IRID metrics 2016 slides
應用微觀計量經濟學
Applied Microeconometrics
江 艇
中國人民大學國家發展與戰略研究院、經濟學院
2016 年 8 月 8 日至 10 日,上海
相关关系就是因果关系!
[1]Stuart Firestein: “Sometimes science is like looking for a black cat in a dark
room. It’s difficult — especially when there is no cat.”
• ⼀个例⼦:我们想研究 D 和 Y 之间的关系。
D : some binary treatment(是否上⼤学)
Y : some outcome(40 岁时的⼯资⽔平)
• 假定数据 (Di ; Yi ) 独⽴同分布,于是我们写下⼀个线性模型:
Yi =
0 +
1Di + "i
• 那么问题来了:"i 是什么玩意⼉?显然,我们只有知道 "i 的含义,才
能知道
0;
1 的含义。
Observed outcome
Yi = (1 Di ) Yi0 + Di Yi1
• 多个个体的存在本⾝并⽆助于解决因果推断的基本难题:treatment
levels 和 potential outcomes 数量爆炸式增长。
所以我们才需要——
• Stable unit treatment value assumption (SUTVA). 每个个体的 potential
outcome 不会因为其他个体接受的 treatment 的不同⽽不同;每个个
体所接受到的每个 treatment 都只会产⽣⼀种 potenital outcome.
E(Y jD = 1) E(Y jD = 0)
=˛1 ˛0 + E(XjD = 1)0ˇ1 E(XjD = 0)0ˇ0
1
0
+ E U jD = 1 E U jD = 0
0
=˛
„1 ˛0 + E(X)
ƒ‚ (ˇ1 ˇ0…)
average treatment effect
0 0
+ E(XjD = 1) E(X) ˇ1 E(XjD = 0) E(X) ˇ0
„ ƒ‚ …
selection bias due to observables
1
0
+E U jD = 1 E U jD = 0
„ ƒ‚ …
selection bias due to unobservables
263 + 80
P (X = 1) = = 49%
700
87 + 270
P (X = 0) = = 51%
700
263
P (X = 1jD = 1) = = 75%
350
80
P (X = 1jD = 0) = = 23%
350
E(Y jD = 1)
=E(Y jX = 1; D = 1)P (X = 1jD = 1)
+ E(Y jX = 0; D = 1)P (X = 0jD = 1)
=73% 75% + 93% 25% = 78%
E(Y jD = 0)
=E(Y jX = 1; D = 0)P (X = 1jD = 0)
+ E(Y jX = 0; D = 0)P (X = 0jD = 0)
=69% 23% + 87% 77% = 83%
H)E(Y jD = 1) E(Y jD = 0) = 5%
=E E(Y jX; D = 1) E(Y jX; D = 0)
= (X = 1)P (X = 1) + (X = 0)P (X = 0)
=4% 49% + 6% 51% = 5%
1 =E E(Y jX; D = 1) E(Y jX; D = 0)jD = 1
=(X = 1)P (X = 1jD = 1) + (X = 0)P (X = 0jD = 1)
=4% 75% + 6% 25% = 4:5%
0 =E E(Y jX; D = 1) E(Y jX; D = 0)jD = 0
=(X = 1)P (X = 1jD = 0) + (X = 0)P (X = 0jD = 0)
=4% 23% + 6% 77% = 5:5%
= 1 P (D = 1) + 0 P (D = 0)
© Ting JIANG, 2016 Summer, Renmin Univ of China.
从线性模型的⾓度来思考这⼀问题。
E(Y jX; D = d ) = ˛d + ˇd X
可以分组回归或使⽤交互项模型
E(Y jD; X ) = ˛0 + (˛1 ˛0)D + ˇ0X + (ˇ1 ˇ0)D X
8
ˆ
ˆ E(Y jX = 1; D = 1) = ˛1 + ˇ1 = 73%
ˆ
ˆ
ˆ
<E(Y jX = 0; D = 1) = ˛1 = 93%
ˆ
ˆ E(Y jX = 1; D = 0) = ˛0 + ˇ0 = 69%
ˆ
ˆ
:̂E(Y jX = 0; D = 0) = ˛0 = 87%
n
X
xi yi x0i ˇ˜ = 0
i =1
n
! 1 n
X X
b= xi x0i xi yi
i =1 i =1
0
min S ˇ˜ = y Xˇ˜ y Xˇ˜
ˇ˜
1
b = (X 0 X ) X0 y
y = Xb + e
X0 e = 0
E (yjX) = Xˇ
• Unbiasedness of OLS estimator.
E (bjX) = ˇ
• 在包含控制变量的回归中我们实际隐含的往往是条件均值独⽴性假
定——如果我们并不关⼼控制变量的因果效应。(控制变量控制的到
底是什么?)
1
(b ˇ)jX N 0; 2 (X0X)
• t -test of individual regression coefficients.
H0 : ˇk = ˇ¯k
bk ˇ¯k bk ˇ¯k
tk ≜ =r t(n K)
\
SE (bk ) s 2 (X 0 X ) 1
kk
r0 b q r0 b q
tstat ≜ =q t(n K)
\
SE (r 0 b ) 1
s 2 r 0 (X 0 X ) r
• F -test of general linear hypotheses.
H0 : Rˇ = q
1
0 2 0 1 0
Fstat = (Rb q ) s R (X X ) R (Rb q) / dim(q)
F (dim(q); n K)
where dim(q) denotes the number of linear restrictions.
MISCELLANEA
ON HETEROS*EDASTICITY
BY J. HUSTON MCCULLOCH'
• Stata still reports results of t -test and F -test, which are asymptotically
valid and may work better for moderate sample sizes. (But there is no
guarantee.)
• Comments.
– In both FE and FD, parameters are identified off the variation within
groups over time, not between groups. This variation may require
justification.
– FE and FD are equivalent when T = 2.
D Y
• 必须不能控制
X1
D Y X D Y
X2
D Y
• 例⼦:⼀个两难选择
D Y
X1 X2
X1: 不可观测的能⼒
D : 参加⼩学奥数竞赛
X2: 受教育程度
Y : 收⼊⽔平
© Ting JIANG, 2016 Summer, Renmin Univ of China.
3.2.2 评估 overlap
• 计算 normalized difference
X̄T X̄C
∆=q
2 2
sC + sT /2
1 X 2 1 X 2
sC2 = Xc X̄C ; sT2 = Xt X̄T
NC 1 NT 1
c2C t2T
• 这个统计量和检验两个样本均值是否相等的 t 统计量长得很像,但
不建议使⽤后者。(为什么?)
X̄T X̄C
t stat = q
sC2 /NC + sT2 /NT
• 如果 normalized difference 差异很⼤,则要考虑删截样本。
其中
1 X
Yis1 =Di Yi + (1 Di ) Yt
M
t2Ti
s0 1 X
Yi =(1 Di )Yi + Di Yc
M
c2Ci
KM (i ) 是观测值 i 被⽤于匹配的次数;lj (i) 是同⼀(处理或控制)
组内和观测值 i 最接近的第 j 个观测值。
• 实现:teffects nnmatch
(X ) =E (X)j(X)
=E E(DjX)j(X)
=E Dj(X)
=P D = 1j(X)
P (D = 1; X t j(X ))
P X t jD = 1; (X) = = P X tj(X)
P D = 1j(X)
• ⽐较⼀下:
– 基于 X 的匹配:使处理组和控制组之间的 X 相同(或⾄少接近)。
– 随机试验:使 X 和 " 的分布相同。
– 基于 (X) 的匹配:使 X 的分布相同。
˚ ˚ ˚ 1 0
E Y j(X); D = 1 E Y j(X); D = 0 = E Y Y j(X)
n ˚ o
) E E Y1 Y 0j(X) = EfY 1 Y 0g
• 估计量
1 X 1 X
ˆ1PSM = Yt Yc
NT M
t 2T c2C t
• Abadie and Imbens (2016, ECMA) 给出了 PSM 估计量的渐进分布。
• 实现:teffects psmatch
• Weighting.
0 1
DY D (1 D)Y + DY DY 1
E =E =E
(X ) (X)
(X)
E DY 1jX E(DjX)E Y 1jX
=E =E
n (X)o (X)
=E E Y 1jX = E(Y 1)
(1 D)Y
E = E(Y 0)
1 (X)
X
W 1 Di 1 Di
ˆ = Yi
N ˆ i ) 1 (X
(X ˆ i)
i
1 X 0ˆ 1 X 0ˆ 1 X 0 ˆ
¯ = Xi ˇ1 Xi ˇ0 = Xi ˇ1 ˇˆ0
N N N
i i i
• Multivalued and continuous treatments.
– 对于 multivalued treatment, 类似定义
d (X) = P (D = d jX)
– 对于 continuous treatment, 定义 fDjX (d jX) 为⼴义倾向得分。
– 实现:teffects 的其它选项。参见 SJ14-3, SJ14-1, SJ13-3,
SJ8-3.
X̂ = Z(Z0Z) 1Z0X
• Key reference: Baum et al. (2007, SJ7-4). “Enhanced Routines for In-
strumental Variables/Generalized Method of Moments Estimation and
Testing.”
• Report the first stage result and think about whether it (magnitude and
sign) makes sense.
• Report the reduced-form regression of the dependent variable on in-
struments.
• Pick your best single instrument and report just-identified estimates.
• Check over-identified 2SLS and GMM (CUE) estimates. Worry if they
are very different.
• Carry out specification tests.
1 0
E(Y jZ = 1) E(Y jZ = 0)
E Y Y jcompliers =
P (compliers)
E(Y jZ = 1) E(Y jZ = 0)
=
E(DjZ = 1) E(DjZ = 0)
因为
E(DjZ = 1) E(DjZ = 0)
=P (D = 1jZ = 1) P (D = 0jZ = 1)
=P (always-takers or compliers) P (always-takers)
=P (compliers)
D=1 D=0
Z = 1 Complier (ntc ) & Always-taker (nua) Never-taker (non)
Z=0 Always-taker (nao ) Complier (ncc ) & Never-taker (nun
n11n00 n10n01
nc = 1 na nn =
(n11 + n10)(n01 + n00)
nc
E(Y jZ = 0; D = 0) = E(Y 0jcomplier)
nc + nn
nn
+ E(Y 0jnever-taker)
nc + nn
3
Outcome variable (Y)
B′
2 τ
A″
0
c″ c c′
Assignment variable (X)
E[W|X]
E[D|X]
E[D|X]
0 00
0 x 00xx
ı D Y ı D Y
"
如果
1. E(DjX) 在 X = c 处存在断点:limX #c E(DjX ) ¤ limX"c E(DjX)
2. m(X) 在 X = c 处连续:limX #c m(X) = limX "c m(X)
则
RD limX #c E(Y jX) limX "c E(Y jX )
= (5.1)
limX #c E(DjX) limX "c E(DjX)
• 如何从 potential outcomes framework 的⾓度 justify 这⼀结构模型?
以 sharp RD 为例。
lim E(DjX) lim E(DjX) = 1
X #c X "c
© Ting JIANG, 2016 Summer, Renmin Univ of China.
1
0
= lim E Y jX lim E Y jX
X #c X "c
1
0
= lim E Y jX lim E Y jX
X #c X #c
0
(如果E Y jX 在X = c处连续)
1 0
= lim E Y Y jX
X #c
„ ƒ‚ …
treatment effect on the just treated
Y =(1 D)Y 0 + DY 1
1
= Y Y D +Y0
0
0
= E Y Y jX + Y 1
1
Y0 E Y0 Y 1jX D + Y 0
„ ƒ‚ … „ ƒ‚ …
0
0
E(Y jX ) = D + E D + Y jX = D + E Y jX
(因为 E(DjX ) = DE(jX) = 0.)
所以 m(X ) 在 X = c 处连续就是指 E Y 0jX 在 X = c 处连续。
5.2.1 OLS/2SLS
• OLS for sharp RD
Y = D + m(X) + U
m(X) =
0+
1+ı(X c)+
1 (1 ı)(X c)+
2+ı(X c)2+
2 (1 ı)(X c)2
• 2SLS for fuzzy RD
Y =D + m(X) + U
D =˛ı + m(X) + "
• 线性模型视⾓下的识别。考察下⾯的线性模型(简便起见,省略 X)
Yd00 =ˇ0 + ˇ1D + U0
Yd01 =ˇ0 + ˇ1D + ˇ2 + U1
Yd11 =ˇ0 + ˇ1D + ˇ2 + ˇ3 + U1
ˇ3 为 treatment effect
Ydt =(1 T )Yd00
+ T (1 D)Y010 + DY111
=ˇ0 + ˇ1D + ˇ2T + ˇ3D T + (1 T )U0 + T U1
© Ting JIANG, 2016 Summer, Renmin Univ of China.
之前的识别条件可写作
E(U1jD = 1; T = 1) E(U0jD = 1; T = 0)
=E(U1jD = 0; T = 1) E(U0jD = 0; T = 0)
此时
(X ) =E(Y jX; D = 1; T = 1) E(Y jX; D = 1; T = 0)
E(Y jX; D = 0; T = 1) E(Y jX; D = 0; T = 0)
=ˇ3
• DID 就是 before-after comparison + matching.
• 所有交互项模型都可以从 DID 的⾓度来理解。
• 对 DID 的威胁
– D T 反映的可能是其它 treatment effect(不可解决)
– 不好的控制组,检验平⾏趋势很重要!
– D 不稳定,即处理组和控制组存在 compositional changes. 换句
话说,是否进⼊处理组可操纵,例如 program application, policy
threshold, migration 等等。(我们通常表述成政策内⽣。)
© Ting JIANG, 2016 Summer, Renmin Univ of China.
示例 9. 茶叶的价格与消失的⼥性 (Qian, 2008, QJE).
• Bertrand et al. (2004, QJE) 对标准误估计的建议:聚类标准误。
• DID 估计值有可能对因变量的定义不稳健。(什么样的趋势才稳健?)
• ⾯板数据的 DID 回归⽅程是
Yidt = ˇ0 + Di T t + ui + t + "idt
• 重复横截⾯数据的 DID 回归⽅程是
Yidt = ˇ0 + Di T t +
Di + t + "idt
• 如果超过两期,更灵活的⽅程形式是(以⾯板数据为例)
T
X
Yidt = ˇ0 + l (Di T tl ) + ui + t + "idt
l=2
其中 8
<1 t =l
D tl =
:0 otherwise
• 先通过匹配⽅法构造控制组,然后进⾏ DID.
• 先差分,然后对 differenced outcome 进⾏匹配估计。
• 假定某省实施了⼀项针对 70 岁以上⽼年⼈的公共政策,我们关⼼
的结果变量是健康⽔平。⼀种做法是⽐较该省 70 岁以上⽼年⼈和
60-70 岁之间⽼年⼈的健康⽔平在政策实施前后的变化,这种做法的
问题在于,和该政策⽆关的其它因素可能也会对 70 岁以上⽼年⼈和
60-70 岁之间⽼年⼈的健康⽔平差异产⽣影响,例如同时推⾏的某项
中央政策;另⼀种做法是⽐较该省和其他省 70 岁以上⽼年⼈的健康
⽔平在政策实施前后的变化,这种做法的问题在于,该省 70 岁以上
⽼年⼈健康⽔平的时序变化可能与其他省 70 岁以上⽼年⼈健康⽔平
的时序变化具有系统性的差异,这种差异可能来⾃不同省份之间经
济增长的差异,⽽不是该政策驱动的差异。⼀种更稳健的做法是⽐较
该省的 DID 估计和其他省的 DID 估计。