0% found this document useful (0 votes)
22 views9 pages

Control Function Approach Slides

The document discusses various econometric methods for estimating linear-in-parameters models, focusing on the differences between instrumental variable (IV) approaches and control function (CF) methods. It highlights the conditions under which these methods are applicable, particularly in the presence of endogenous explanatory variables and correlated random coefficients. Additionally, it addresses the limitations of CF approaches in nonlinear models and offers insights into robust estimation techniques for binary and fractional response variables.

Uploaded by

sjmin711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views9 pages

Control Function Approach Slides

The document discusses various econometric methods for estimating linear-in-parameters models, focusing on the differences between instrumental variable (IV) approaches and control function (CF) methods. It highlights the conditions under which these methods are applicable, particularly in the presence of endogenous explanatory variables and correlated random coefficients. Additionally, it addresses the limitations of CF approaches in nonlinear models and offers insights into robust estimation techniques for binary and fractional response variables.

Uploaded by

sjmin711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

A Course in Applied Econometrics 1.

Linear-in-Parameters Models: IV versus Control Functions


Lecture 14: Control Functions and Related Methods
) Most models that are linear in parameters are estimated using
standard IV methods – two stage least squares (2SLS) or generalized
Jeff Wooldridge
IRP Lectures, UW Madison, August 2008 method of moments (GMM).
) An alternative, the control function (CF) approach, relies on the same
kinds of identification conditions.
1. Linear-in-Parameters Models: IV versus Control Functions
2. Correlated Random Coefficient Models
) Let y 1 be the response variable, y 2 the endogenous explanatory
variable (EEV), and z the 1 • L vector of exogenous variables (with
3. Common Nonlinear Models and CF Limitations
z1  1 :
4. Semiparametric and Nonparametric Approaches
5. Methods for Panel Data y1  z1-1  )1y2  u1, (1)

where z 1 is a 1 • L 1 strict subvector of z.

1 2

) First consider the exogeneity assumption Plug (4) into (1):

EŸz U u 1  0. (2) y1  z1-1  )1y2  >1v2  e1, (5)

Reduced form for y 2 : where v 2 is an explanatory variable in the equation. The new error, e 1 ,

y 2  z= 2  v 2 , EŸz U v 2  0 (3) is uncorrelated with y 2 as well as with v 2 and z.

where = 2 is L • 1. Write the linear projection of u 1 on v 2 , in error form,


) Two-step procedure: (i) Regress y 2 on z and obtain the reduced form
residuals, v 2 ; (ii) Regress
as
y 1 on z 1 , y 2 , and v 2 . (6)
u1  >1v2  e1, (4)
The implicit error in (6) is e i1  > 1 z i Ÿ= 2 " = 2 , which depends on the
where > 1  EŸv 2 u 1 /EŸv 22 is the population regression coefficient. By
sampling error in = 2 unless > 1  0 (exogeneity test). OLS estimators
construction, EŸv 2 e 1  0 and EŸz U e 1  0.
from (6) will be consistent for - 1 , ) 1 , and > 1 .

3 4
) The OLS estimates from (6) are control function estimates. ) What does CF approach entail? Now assume
) The OLS estimates of - 1 and ) 1 from (6) are identical to the 2SLS EŸu 1 |z, y 2  EŸu 1 |v 2  > 1 v 2 , (9)
estimates starting from (1).
where independence of Ÿu 1 , v 2 and z is sufficient for the first equality;
) Now extend the model: linearity is a substantive restriction. Then,
y1  z1-1  )1y2  + 1 y 22  u1 (7)
EŸy 1 |z, y 2  z 1 - 1  ) 1 y 2  + 1 y 22  > 1 v 2 , (10)
EŸu 1 |z  0. (8)
and a CF approach is immediate: replace v 2 with v 2 and use OLS on
Let z 2 be a scalar not also in z 1 . Under the (8) – which is stronger than
(10). Not equivalent to a 2SLS estimate. CF likely more efficient but
(2), and is essential for nonlinear models – we can use, say, z 22 as an
less robust.
instrument for y 22 . So the IVs would be Ÿz 1 , z 2 , z 22 for Ÿz 1 , y 2 , y 22 .

5 6

) CF approaches can impose extra assumptions when we base it on ) Consistency of the CF estimators hinges on the model for DŸy 2 |z
EŸy 1 |z, y 2 . The estimating equation is being correctly specified, along with linearity in EŸu 1 |v 2 . If we just

EŸy 1 |z, y 2  z 1 - 1  ) 1 y 2  EŸu 1 |z, y 2 . (11) apply 2SLS directly to (1), it makes no distinction among discrete,
continuous, or some mixture for y 2 .
If y 2  1¡z- 2  e 2 u 0¢, Ÿu 1 , e 2 is independent of z, EŸu 1 |e 2  > 1 e 2 ,
and e 2 ~NormalŸ0, 1 , then
) How might we robustly use the binary nature of y 2 in IV estimation?
Obtain the fitted probabilities, oŸz i - 2 , from the first stage probit, and
EŸu 1 |z, y 2  > 1 ¡y 2 5Ÿz- 2 " Ÿ1 " y 2 5Ÿ"z- 2 ¢, (12)
then use these as IVs (not regressors!) for y i2 . Fully robust to
where 5Ÿ is the inverse Mills ratio (IMR). Heckman two-step misspecification of the probit model, usual standard errors from IV
approach (for endogeneity, not sample selection): (i) Probit to get - 2 asymptotically valid. Efficient IV estimator if PŸy 2  1|z  oŸz- 2
and compute gr i2 q y i2 5Ÿz i - 2 " Ÿ1 " y i2 5Ÿ"z i - 2 (generalized and VarŸu 1 |z  @ 21 .
residual). (ii) Regress y i1 on z i1 , y i2 , gr i2 , i  1, . . . , N. ) Similar suggestions work for y 2 a count variable or a corner solution.

7 8
2. Correlated Random Coefficient Models ) The potential problem with applying instrumental variables is that the
) Modify the original equation as error term v 1 y 2  u 1 is not necessarily uncorrelated with the

y1  11  z1-1  a1y2  u1, (13) instruments z, even under

where a 1 , the “random coefficient” on y 2 . Heckman and Vytlacil EŸu 1 |z EŸv 1 |z  0. (16)

(1998) call (13) a correlated random coefficient (CRC) model. We want to allow y 2 and v 1 to be correlated, CovŸv 1 , y 2 q A 1 p 0. A
) Write a 1  ) 1  v 1 where ) 1  EŸa 1 is the object of interest. We suffcient condition that allows for any unconditional correlation is
can rewrite the equation as CovŸv 1 , y 2 |z  CovŸv 1 , y 2 , (17)
y1  11  z1-1  )1y2  v1y2  u1 (14) and this is sufficient for IV to consistently estimate Ÿ) 1 , - 1 .
q 11  z1-1  )1y2  e1. (15)

9 10

) The usual IV estimator that ignores the randomness in a 1 is more ) In the case of binary y 2 , we have what is often called the “switching
robust than Garen’s (1984) CF estimator, which adds v 2 and v 2 y 2 to the regression” model. If y 2  1¡z- 2  v 2 u 0¢ and v 2 |z is NormalŸ0, 1 ,
original model, or the Heckman/Vytlacil (1998) “plug-in” estimator, then
which replaces y 2 with  2  z= 2 . EŸy 1 |z, y 2  1 1  z 1 - 1  ) 1 y 2
) The condition CovŸv 1 , y 2 |z  CovŸv 1 , y 2 cannot really hold for  > 1 h 2 Ÿy 2 , z- 2  8 1 h 2 Ÿy 2 , z- 2 y 2 ,
discrete y 2 . Further, Card (2001) shows how it can be violated even if where
y 2 is continuous. Wooldridge (2005) shows how to allow parametric
h 2 Ÿy 2 , z- 2  y 2 5Ÿz- 2 " Ÿ1 " y 2 5Ÿ"z- 2
heteroskedasticity.
is the generalized residual function.
) Can interact exogenous variables with h 2 Ÿy i2 , z i - 2 . Or, allow
EŸv 1 |v 2 to be more flexible [Heckman and MaCurdy (1986)].

11 12
3. Nonlinear Models and Limitations of the CF Approach Binary and Fractional Responses
) Typically three approaches to nonlinear models with EEVs. Probit model:
(1) Plug in fitted values from a first step regression (in an attempt to y 1  1¡z 1 - 1  ) 1 y 2  u 1 u 0¢, (18)
mimic 2SLS in linear model). More generally, try to find EŸy 1 |z or
where u 1 |z ~NormalŸ0, 1 . Analysis goes through if we replace Ÿz 1 , y 2
DŸy 1 |z and then impose identifying restrictions.
with any known function x 1 q g 1 Ÿz 1 , y 2 .
(2) CF approach: plug in residuals in an attempt to obtain EŸy 1 |y 2 , z or
) The Rivers-Vuong (1988) approach is to make a
DŸy 1 |y 2 , z .
homoskedastic-normal assumption on the reduced form for y 2 ,
(3) Maximum Likelihood (often limited information): Use models for
y 2  z= 2  v 2 , v 2 |z ~NormalŸ0, A 22 . (19)
DŸy 1 |y 2 , z and DŸy 2 |z jointly.
) All strategies are more difficult with nonlinear models when y 2 is
discrete. Some poor practices have lingered.

13 14

) RV approach comes close to requiring The RV two-step approach is

Ÿu 1 , v 2 independent of z. (20) (i) OLS of y 2 on z, to obtain the residuals, v 2 .


(ii) Probit of y 1 on z 1 , y 2 , v 2 to estimate the scaled coefficients. A
If we also assume
simple t test on v 2 is valid to test H 0 : > 1  0.
Ÿu 1 , v 2 ~Bivariate Normal (21)
) Can recover the original coefficients, which appear in the partial
with > 1  CorrŸu 1 , v 2 , then we can proceed with MLE based on effects. Or,
fŸy 1 , y 2 |z . A CF approach is available, too, based on N

PŸy 1  1|z, y 2  oŸz 1 - >1  ) >1 y 2  2 >1 v 2 (22) ASFŸz 1 , y 2  N "1 ! oŸx 1 * >1  2 >1 v i2 , (23)
i1

where each coefficient is multiplied by Ÿ1 " > 21 "1/2 .


that is, we average out the reduced form residuals, v i2 . This formulation
is useful for more complicated models.

15 16
) The two-step CF approach easily extends to fractional responses: ) The control function approach has some decided advantages over
EŸy 1 |z, y 2 , q 1  oŸx 1 * 1  q 1 , (24) another two-step approach – one that appears to mimic the 2SLS
estimation of the linear model. Rather than conditioning on v 2 along
where x 1 is a function of Ÿz 1 , y 2 and q 1 contains unobservables. Can
with z (and therefore y 2 ) to obtain PŸy 1  1|z, v 2  PŸy 1  1|z, y 2 , v 2 ,
use the the same two-step because the Bernoulli log likelihood is in the
we can obtain PŸy 1  1|z . To find the latter probability, we plug in the
linear exponential family. Still estimate scaled coefficients. APEs must
reduced form for y 2 to get y 1  1¡z 1 - 1  ) 1 Ÿz- 2  ) 1 v 2  u 1  0¢.
be obtained from (23). In inference, we should only assume the mean is
Because ) 1 v 2  u 1 is independent of z and normally distributed,
correctly specified.method can be used in the binary and fractional
PŸy 1  1|z  o£¡z 1 - 1  ) 1 Ÿz- 2 ¢/F 1 ¤. So first do OLS on the reduced
cases. To account for first-stage estimation, the bootstrap is convenient.
form, and get fitted values,  i2  z i - 2 . Then, probit of y i1 on z i1 ,  i2 .
) Wooldridge (2005) describes some simple ways to make the analysis
Harder to estimate APEs and test for endogeneity.
starting from (24) more flexible, including allowing VarŸq 1 |v 2 to be
heteroskedastic.

17 18

) Danger with plugging in fitted values for y 2 is that one might be ) Can understand the limits of CF approach by returning to
tempted to plug  2 into nonlinear functions, say y 22 or y 2 z 1 . This does EŸy 1 |z, y 2 , q 1  oŸz 1 - 1  ) 1 y 2  q 1 , where y 2 is discrete.
not result in consistent estimation of the scaled parameters or the Rivers-Vuong approach does not generally work.
partial effects. If we believe y 2 has a linear RF with additive normal ) Some poor strategies still linger. Suppose y 1 and y 2 are both binary
error independent of z, the addition of v 2 solves the endogeneity and
problem regardless of how y 2 appears. Plugging in fitted values for y 2 y 2  1¡z- 2  v 2 u 0¢ (25)
only works in the case where the model is linear in y 2 . Plus, the CF
and we maintain joint normality of Ÿu 1 , v 2 .We should not try to mimic
approach makes it much easier to test the null that for endogeneity of y 2
2SLS as follows: (i) Do probit of y 2 on z and get the fitted probabilities,
as well as compute APEs.  2  oŸz- 2 . (ii) Do probit of y 1 on z 1 , o
 2 , that is, just replace y 2 with
o
 2.
o

19 20
) Currently, the only strategy we have is maximum likelihood Multinomial Responses
estimation based on fŸy 1 |y 2 , z fŸy 2 |z . [Perhaps this is why some, such ) Recent push by Petrin and Train (2006), among others, to use control
as Angrist (2001), promote the notion of just using linear probability function methods where the second step estimation is something simple
models estimated by 2SLS.] – such as multinomial logit, or nested logit – rather than being derived
) “Bivariate probit” software be used to estimate the probit model with from a structural model. So, if we have reduced forms
a binary endogenous variable. y 2  z$ 2  v 2 , (26)
) Parallel discussions hold for ordered probit, Tobit. then we jump directly to convenient models for PŸy 1  j|z 1 , y 2 , v 2 . The
average structural functions are obtained by averaging the response
probabilities across v i2 . No convincing way to handle discrete y 2 ,
though.

21 22

Exponential Models hŸy 2 , z= 2 , 2 1  expŸ2 21 /2 £y 2 oŸ2 1  z= 2 /oŸz= 2


) IV and CF approaches available for exponential models. Write  Ÿ1 " y 2 ¡1 " oŸ2 1  z= 2 ¢/¡1 " oŸz= 2 ¢¤.
EŸy 1 |z, y 2 , r 1  expŸz 1 - 1  ) 1 y 2  r 1 , (27) ) IV methods that work for any y 2 are available [Mullahy (1997)]. If
where r 1 is the omitted variable. As usual, CF methods based on EŸy 1 |z, y 2 , r 1  expŸx 1 * 1  r 1 (28)
EŸy 1 |z, y 2  expŸz 1 - 1  ) 1 y 2 E¡expŸr 1 |z, y 2 ¢. and r 1 is independent of z then
For continuous y 2 , can find EŸy 1 |z, y 2 when DŸy 2 |z is homoskedastic E¡expŸ"x 1 * 1 y 1 |z¢  E¡expŸr 1 |z¢  1, (29)
normal (Wooldridge, 1997) and when DŸy 2 |z follows a probit (Terza,
where E¡expŸr 1 ¢  1 is a normalization. The moment conditions are
1998). In the probit case,
E¡expŸ"x 1 * 1 y 1 " 1|z¢  0. (30)
EŸy 1 |z, y 2  expŸz 1 - 1  ) 1 y 2 hŸy 2 , z= 2 , 2 1

23 24
4. Semiparametric and Nonparametric Approaches ) Two-step estimation: Estimate the function g 2 Ÿ and then obtain
) Blundell and Powell (2004) show how to relax distributional residuals v i2  y i2 "  2 Ÿz i . BP (2004) show how to estimate H and * 1
assumptions on Ÿu 1 , v 2 in the model y 1  1¡x 1 * 1  u 1  0¢, where x 1 (up to scaled) and GŸ , the distribution of u 1 . The ASF is obtained
can be any function of Ÿz 1 , y 2 . Their key assumption is that y 2 can be from GŸx 1 * 1 or
written as y 2  g 2 Ÿz  v 2 , where Ÿu 1 , v 2 is independent of z, which N

rules out discreteness in y 2 . Then ASFŸz 1 , y 2  N "1 ! Ÿx 1 * 1 , v i2 ; (32)


i1

PŸy 1  1|z, v 2  EŸy 1 |z, v 2  HŸx 1 * 1 , v 2 (31)


) Blundell and Powell (2003) allow PŸy 1  1|z, y 2 to have general
for some (generally unknown) function HŸ,  . The average structural form HŸz 1 , y 2 , v 2 , and the second-step estimation is entirely
function is just ASFŸz 1 , y 2  E v i2 ¡HŸx 1 * 1 , v i2 ¢. nonparametric. Further,  2 Ÿ can be fully nonparametric. Parametric
approximations might produce good estimates of the APEs.

25 26

) BP (2003) consider a very general setup: y 1  g 1 Ÿz 1 , y 2 , u 1 , with 5. Methods for Panel Data

ASF 1 Ÿz 1 , y 2  ; g 1 Ÿz 1 , y 2 , u 1 dF 1 Ÿu 1 , (33)
) Combine methods for correlated random effects models with CF
methods for nonlinear panel data models with unobserved
where F 1 is the distribution of u 1 . Key restrictions are that y 2 can be heterogeneity and EEVs.
written as ) Illustrate a parametric approach used by Papke and Wooldridge
y 2  g 2 Ÿz  v 2 , (34) (2008), which applies to binary and fractional responses.

where Ÿu 1 , v 2 is independent of z. ) Nothing appears to be known about applying “fixed effects” probit to
) Key: ASF can be obtained from EŸy 1 |z 1 , y 2 , v 2  h 1 Ÿz 1 , y 2 , v 2 by estimate the fixed effects while also dealing with endogeneity. Likely

averaging out v 2 , and fully nonparametric two-step estimates are to be poor for small T.

available. Can also justify flexible parametric approaches and just skip
modeling g 1 Ÿ .

27 28
) Model with time-constant unobserved heterogeneity, c i1 , and ) Write z it  Ÿz it1 , z it2 , so that the time-varying IVs z it2 are excluded
time-varying unobservables, v it1 , as from the “structural.”

EŸy it1 |y it2 , z i , c i1 , v it1  oŸ) 1 y it2  z it1 - 1 ) Chamberlain approach:


 c i1  v it1 . (35) c i1  E 1  z i 8 1  a i1 , a i1 |z i ~ NormalŸ0, @ 2a 1 . (36)
Allow the heterogeneity, c i1 , to be correlated with y it2 and z i , where Next step:
z i  Ÿz i1 , . . . , z iT is the vector of strictly exogenous variables
EŸy it1 |y it2 , z i , r it1  oŸ) 1 y it2  z it1 - 1  E 1  z i 8 1  r it1
(conditional on c i1 ). The time-varying omitted variable, v it1 , is
where r it1  a i1  v it1 . Next, assume a linear reduced form for y it2 :
uncorrelated with z i – strict exogeneity – but may be correlated with
y it2 . As an example, y it1 is a female labor force participation indicator y it2  E 2  z it - 2  z i 8 2  v it2 , t  1, . . . , T. (37)

and y it2 is other sources of income.

29 30

) Rules out discrete y it2 because ) Simple two-step procedure: (i) Estimate the reduced form for y it2
r it1  1 1 v it2  e it1 , (38) (pooled across t, or maybe for each t separately; at a minimum,
different time period intercepts should be allowed). Obtain the
e it1 |Ÿz i , v it2 ~ NormalŸ0, @ 2e 1 , t  1, . . . , T. (39)
residuals, v it2 for all Ÿi, t pairs. The estimate of - 2 is the fixed effects
Then estimate. (ii) Use the pooled probit (quasi)-MLE of y it1 on
EŸy it1 |z i , y it2 , v it2  oŸ) e1 y it2  z it1 - e1 y it2 , z it1 , z i , v it2 to estimate ) e1 , - e1 , E e1 , 8 e1 and 1 e1 .
 E e1  z i 8 e1  1 e1 v it2 (40) ) Delta method or bootstrapping (resampling cross section units) for
where the “e” subscript denotes division by Ÿ1  @ 2e 1 1/2 . This equation standard errors. Can ignore first-stage estimation to test 1 e1  0 (but
is the basis for CF estimation. test should be fully robust to variance misspecification and serial
independence).

31 32
) Estimates of average partial effects are based on the average Model: Linear Fractional Probit
structural function, Estimation Method: Instrumental Variables Pooled QMLE
E Ÿc i1 ,v it1 ¡oŸ) 1 y t2  z t1 - 1  c i1  v it1 ¢, (41) Coefficient Coefficient APE
log(arexppp) .555 1.731 .583
which is consistently estimated as
(.221) (.759) (.255)
N
lunch ". 062 ". 298 ". 100
N "1 ! oŸ) e1 y t2  z t1 - e1  E
 e1  z i 8 e1  1 e1 v it2 . (42)
i1
Ÿ. 074 (.202) (.068)
log(enroll) .046 .286 .096
These APEs, typically with further averaging out across t and perhaps
(.070) (.209) (.070)
over y t2 and z t1 , can be compared directly with fixed effects IV
v 2 ". 424 "1. 378 —
estimates. (.232) (.811) —
Scale Factor — .337

33 34

You might also like