Control Function Approach Slides
Control Function Approach Slides
1 2
Reduced form for y 2 : where v 2 is an explanatory variable in the equation. The new error, e 1 ,
3 4
) The OLS estimates from (6) are control function estimates. ) What does CF approach entail? Now assume
) The OLS estimates of - 1 and ) 1 from (6) are identical to the 2SLS Eu 1 |z, y 2 Eu 1 |v 2 > 1 v 2 , (9)
estimates starting from (1).
where independence of u 1 , v 2 and z is sufficient for the first equality;
) Now extend the model: linearity is a substantive restriction. Then,
y1 z1-1 )1y2 + 1 y 22 u1 (7)
Ey 1 |z, y 2 z 1 - 1 ) 1 y 2 + 1 y 22 > 1 v 2 , (10)
Eu 1 |z 0. (8)
and a CF approach is immediate: replace v 2 with v 2 and use OLS on
Let z 2 be a scalar not also in z 1 . Under the (8) – which is stronger than
(10). Not equivalent to a 2SLS estimate. CF likely more efficient but
(2), and is essential for nonlinear models – we can use, say, z 22 as an
less robust.
instrument for y 22 . So the IVs would be z 1 , z 2 , z 22 for z 1 , y 2 , y 22 .
5 6
) CF approaches can impose extra assumptions when we base it on ) Consistency of the CF estimators hinges on the model for Dy 2 |z
Ey 1 |z, y 2 . The estimating equation is being correctly specified, along with linearity in Eu 1 |v 2 . If we just
Ey 1 |z, y 2 z 1 - 1 ) 1 y 2 Eu 1 |z, y 2 . (11) apply 2SLS directly to (1), it makes no distinction among discrete,
continuous, or some mixture for y 2 .
If y 2 1¡z- 2 e 2 u 0¢, u 1 , e 2 is independent of z, Eu 1 |e 2 > 1 e 2 ,
and e 2 ~Normal0, 1 , then
) How might we robustly use the binary nature of y 2 in IV estimation?
Obtain the fitted probabilities, oz i - 2 , from the first stage probit, and
Eu 1 |z, y 2 > 1 ¡y 2 5z- 2 " 1 " y 2 5"z- 2 ¢, (12)
then use these as IVs (not regressors!) for y i2 . Fully robust to
where 5 is the inverse Mills ratio (IMR). Heckman two-step misspecification of the probit model, usual standard errors from IV
approach (for endogeneity, not sample selection): (i) Probit to get - 2 asymptotically valid. Efficient IV estimator if Py 2 1|z oz- 2
and compute gr i2 q y i2 5z i - 2 " 1 " y i2 5"z i - 2 (generalized and Varu 1 |z @ 21 .
residual). (ii) Regress y i1 on z i1 , y i2 , gr i2 , i 1, . . . , N. ) Similar suggestions work for y 2 a count variable or a corner solution.
7 8
2. Correlated Random Coefficient Models ) The potential problem with applying instrumental variables is that the
) Modify the original equation as error term v 1 y 2 u 1 is not necessarily uncorrelated with the
where a 1 , the “random coefficient” on y 2 . Heckman and Vytlacil Eu 1 |z Ev 1 |z 0. (16)
(1998) call (13) a correlated random coefficient (CRC) model. We want to allow y 2 and v 1 to be correlated, Covv 1 , y 2 q A 1 p 0. A
) Write a 1 ) 1 v 1 where ) 1 Ea 1 is the object of interest. We suffcient condition that allows for any unconditional correlation is
can rewrite the equation as Covv 1 , y 2 |z Covv 1 , y 2 , (17)
y1 11 z1-1 )1y2 v1y2 u1 (14) and this is sufficient for IV to consistently estimate ) 1 , - 1 .
q 11 z1-1 )1y2 e1. (15)
9 10
) The usual IV estimator that ignores the randomness in a 1 is more ) In the case of binary y 2 , we have what is often called the “switching
robust than Garen’s (1984) CF estimator, which adds v 2 and v 2 y 2 to the regression” model. If y 2 1¡z- 2 v 2 u 0¢ and v 2 |z is Normal0, 1 ,
original model, or the Heckman/Vytlacil (1998) “plug-in” estimator, then
which replaces y 2 with 2 z= 2 . Ey 1 |z, y 2 1 1 z 1 - 1 ) 1 y 2
) The condition Covv 1 , y 2 |z Covv 1 , y 2 cannot really hold for > 1 h 2 y 2 , z- 2 8 1 h 2 y 2 , z- 2 y 2 ,
discrete y 2 . Further, Card (2001) shows how it can be violated even if where
y 2 is continuous. Wooldridge (2005) shows how to allow parametric
h 2 y 2 , z- 2 y 2 5z- 2 " 1 " y 2 5"z- 2
heteroskedasticity.
is the generalized residual function.
) Can interact exogenous variables with h 2 y i2 , z i - 2 . Or, allow
Ev 1 |v 2 to be more flexible [Heckman and MaCurdy (1986)].
11 12
3. Nonlinear Models and Limitations of the CF Approach Binary and Fractional Responses
) Typically three approaches to nonlinear models with EEVs. Probit model:
(1) Plug in fitted values from a first step regression (in an attempt to y 1 1¡z 1 - 1 ) 1 y 2 u 1 u 0¢, (18)
mimic 2SLS in linear model). More generally, try to find Ey 1 |z or
where u 1 |z ~Normal0, 1 . Analysis goes through if we replace z 1 , y 2
Dy 1 |z and then impose identifying restrictions.
with any known function x 1 q g 1 z 1 , y 2 .
(2) CF approach: plug in residuals in an attempt to obtain Ey 1 |y 2 , z or
) The Rivers-Vuong (1988) approach is to make a
Dy 1 |y 2 , z .
homoskedastic-normal assumption on the reduced form for y 2 ,
(3) Maximum Likelihood (often limited information): Use models for
y 2 z= 2 v 2 , v 2 |z ~Normal0, A 22 . (19)
Dy 1 |y 2 , z and Dy 2 |z jointly.
) All strategies are more difficult with nonlinear models when y 2 is
discrete. Some poor practices have lingered.
13 14
Py 1 1|z, y 2 oz 1 - >1 ) >1 y 2 2 >1 v 2 (22) ASFz 1 , y 2 N "1 ! ox 1 * >1 2 >1 v i2 , (23)
i1
15 16
) The two-step CF approach easily extends to fractional responses: ) The control function approach has some decided advantages over
Ey 1 |z, y 2 , q 1 ox 1 * 1 q 1 , (24) another two-step approach – one that appears to mimic the 2SLS
estimation of the linear model. Rather than conditioning on v 2 along
where x 1 is a function of z 1 , y 2 and q 1 contains unobservables. Can
with z (and therefore y 2 ) to obtain Py 1 1|z, v 2 Py 1 1|z, y 2 , v 2 ,
use the the same two-step because the Bernoulli log likelihood is in the
we can obtain Py 1 1|z . To find the latter probability, we plug in the
linear exponential family. Still estimate scaled coefficients. APEs must
reduced form for y 2 to get y 1 1¡z 1 - 1 ) 1 z- 2 ) 1 v 2 u 1 0¢.
be obtained from (23). In inference, we should only assume the mean is
Because ) 1 v 2 u 1 is independent of z and normally distributed,
correctly specified.method can be used in the binary and fractional
Py 1 1|z o£¡z 1 - 1 ) 1 z- 2 ¢/F 1 ¤. So first do OLS on the reduced
cases. To account for first-stage estimation, the bootstrap is convenient.
form, and get fitted values, i2 z i - 2 . Then, probit of y i1 on z i1 , i2 .
) Wooldridge (2005) describes some simple ways to make the analysis
Harder to estimate APEs and test for endogeneity.
starting from (24) more flexible, including allowing Varq 1 |v 2 to be
heteroskedastic.
17 18
) Danger with plugging in fitted values for y 2 is that one might be ) Can understand the limits of CF approach by returning to
tempted to plug 2 into nonlinear functions, say y 22 or y 2 z 1 . This does Ey 1 |z, y 2 , q 1 oz 1 - 1 ) 1 y 2 q 1 , where y 2 is discrete.
not result in consistent estimation of the scaled parameters or the Rivers-Vuong approach does not generally work.
partial effects. If we believe y 2 has a linear RF with additive normal ) Some poor strategies still linger. Suppose y 1 and y 2 are both binary
error independent of z, the addition of v 2 solves the endogeneity and
problem regardless of how y 2 appears. Plugging in fitted values for y 2 y 2 1¡z- 2 v 2 u 0¢ (25)
only works in the case where the model is linear in y 2 . Plus, the CF
and we maintain joint normality of u 1 , v 2 .We should not try to mimic
approach makes it much easier to test the null that for endogeneity of y 2
2SLS as follows: (i) Do probit of y 2 on z and get the fitted probabilities,
as well as compute APEs. 2 oz- 2 . (ii) Do probit of y 1 on z 1 , o
2 , that is, just replace y 2 with
o
2.
o
19 20
) Currently, the only strategy we have is maximum likelihood Multinomial Responses
estimation based on fy 1 |y 2 , z fy 2 |z . [Perhaps this is why some, such ) Recent push by Petrin and Train (2006), among others, to use control
as Angrist (2001), promote the notion of just using linear probability function methods where the second step estimation is something simple
models estimated by 2SLS.] – such as multinomial logit, or nested logit – rather than being derived
) “Bivariate probit” software be used to estimate the probit model with from a structural model. So, if we have reduced forms
a binary endogenous variable. y 2 z$ 2 v 2 , (26)
) Parallel discussions hold for ordered probit, Tobit. then we jump directly to convenient models for Py 1 j|z 1 , y 2 , v 2 . The
average structural functions are obtained by averaging the response
probabilities across v i2 . No convincing way to handle discrete y 2 ,
though.
21 22
23 24
4. Semiparametric and Nonparametric Approaches ) Two-step estimation: Estimate the function g 2 and then obtain
) Blundell and Powell (2004) show how to relax distributional residuals v i2 y i2 " 2 z i . BP (2004) show how to estimate H and * 1
assumptions on u 1 , v 2 in the model y 1 1¡x 1 * 1 u 1 0¢, where x 1 (up to scaled) and G , the distribution of u 1 . The ASF is obtained
can be any function of z 1 , y 2 . Their key assumption is that y 2 can be from Gx 1 * 1 or
written as y 2 g 2 z v 2 , where u 1 , v 2 is independent of z, which N
25 26
) BP (2003) consider a very general setup: y 1 g 1 z 1 , y 2 , u 1 , with 5. Methods for Panel Data
ASF 1 z 1 , y 2 ; g 1 z 1 , y 2 , u 1 dF 1 u 1 , (33)
) Combine methods for correlated random effects models with CF
methods for nonlinear panel data models with unobserved
where F 1 is the distribution of u 1 . Key restrictions are that y 2 can be heterogeneity and EEVs.
written as ) Illustrate a parametric approach used by Papke and Wooldridge
y 2 g 2 z v 2 , (34) (2008), which applies to binary and fractional responses.
where u 1 , v 2 is independent of z. ) Nothing appears to be known about applying “fixed effects” probit to
) Key: ASF can be obtained from Ey 1 |z 1 , y 2 , v 2 h 1 z 1 , y 2 , v 2 by estimate the fixed effects while also dealing with endogeneity. Likely
averaging out v 2 , and fully nonparametric two-step estimates are to be poor for small T.
available. Can also justify flexible parametric approaches and just skip
modeling g 1 .
27 28
) Model with time-constant unobserved heterogeneity, c i1 , and ) Write z it z it1 , z it2 , so that the time-varying IVs z it2 are excluded
time-varying unobservables, v it1 , as from the “structural.”
29 30
) Rules out discrete y it2 because ) Simple two-step procedure: (i) Estimate the reduced form for y it2
r it1 1 1 v it2 e it1 , (38) (pooled across t, or maybe for each t separately; at a minimum,
different time period intercepts should be allowed). Obtain the
e it1 |z i , v it2 ~ Normal0, @ 2e 1 , t 1, . . . , T. (39)
residuals, v it2 for all i, t pairs. The estimate of - 2 is the fixed effects
Then estimate. (ii) Use the pooled probit (quasi)-MLE of y it1 on
Ey it1 |z i , y it2 , v it2 o) e1 y it2 z it1 - e1 y it2 , z it1 , z i , v it2 to estimate ) e1 , - e1 , E e1 , 8 e1 and 1 e1 .
E e1 z i 8 e1 1 e1 v it2 (40) ) Delta method or bootstrapping (resampling cross section units) for
where the “e” subscript denotes division by 1 @ 2e 1 1/2 . This equation standard errors. Can ignore first-stage estimation to test 1 e1 0 (but
is the basis for CF estimation. test should be fully robust to variance misspecification and serial
independence).
31 32
) Estimates of average partial effects are based on the average Model: Linear Fractional Probit
structural function, Estimation Method: Instrumental Variables Pooled QMLE
E c i1 ,v it1 ¡o) 1 y t2 z t1 - 1 c i1 v it1 ¢, (41) Coefficient Coefficient APE
log(arexppp) .555 1.731 .583
which is consistently estimated as
(.221) (.759) (.255)
N
lunch ". 062 ". 298 ". 100
N "1 ! o) e1 y t2 z t1 - e1 E
e1 z i 8 e1 1 e1 v it2 . (42)
i1
. 074 (.202) (.068)
log(enroll) .046 .286 .096
These APEs, typically with further averaging out across t and perhaps
(.070) (.209) (.070)
over y t2 and z t1 , can be compared directly with fixed effects IV
v 2 ". 424 "1. 378 —
estimates. (.232) (.811) —
Scale Factor — .337
33 34