Estimation_of_dynamic_panel_data_models_2
Estimation_of_dynamic_panel_data_models_2
Jeffrey M. Wooldridge
Department of Economics
Michigan State University
East Lansing, MI 48824-1038
[email protected]
March 2, 2011
1
Summary
We propose a new method for estimating dynamic panel data models with selection.
The method uses backward substitution for the lagged dependent variable, which leads to
an estimating equation that requires correcting for contemporaneous selection only. The
estimator is valid under relatively weak assumptions about errors and permits avoiding
the weak instruments problem associated with differencing. We also propose a simple test
for selection bias that is based on the addition of a selection term to the first-difference
equation and subsequent testing for significance of this term. The methods are applied
to estimating dynamic earnings equations for women.
Key words: Sample selection, Panel data, Dynamic models, Two-step estimation.
2
1 Introduction
Recently developed methods for estimating dynamic unobserved effects panel data mod-
els have become widely used in applied economics research. In the present paper, we
contribute to the literature by developing a new estimation method for the models, where
the panel is not balanced due to nonrandom selection.
In the absence of selection, the traditional approach to estimating dynamic panel data
models is to remove the unobserved effect by first-differencing and then use instrumental
variables methods for estimating the differenced equation. This approach was initially
proposed by Anderson and Hsiao (1981) and was later considered within a more efficient
generalized method of moments (GMM) framework by Holtz-Eakin, Newey and Rosen
(1988), Arellano and Bond (1991), Ahn and Schmidt (1995), and others.
Blundell and Bond (1998) raised the problem of weak instruments in the context of
the first-differenced GMM estimation. This problem arises when the series are highly
persistent, which happens in a simple AR(1) model with the autoregressive coefficient
close to unity.1 Blundell and Bond show that imposing restrictions on the initial condition
results in additional linear moments that can help to improve the performance of the GMM
estimator. As an alternative solution, they model the relationship between the unobserved
effect and initial condition through a linear function and suggest using the generalized
least squares estimator on the extended model, where the initial value is included in the
conditioning set.
Several previous studies considered estimation of dynamic panel data models with
selectivity; most of them use differencing to remove the unobserved effect.2 Ziliak and
Kniesner (1998) and Wooldridge (2002) propose a solution to the selection problem that
1
Binder, Hsiao and Pesaran (2005) show that the same problem arises in panel vector autoregressive
models.
2
Dynamic panel data models with censoring are considered, for example, by Honore and Hu (2004),
Hu (2002) and Labeaga (1999). See also Bover and Arellano (1997).
3
arises because of nonrandom attrition. Given the nature of attrition as an absorbing state
– if the unit is observed in the current period, it is observed in the previous period, also –
Ziliak and Kniesner, and Wooldridge show that accounting for the current period selection
in the differenced equation results in consistent estimation. Under the assumption that
errors in the selection equation are normally distributed, the selection correction term is
the inverse Mills ratio.
Arellano, Bover and Labeaga (1999) consider autoregressive panel data models with
sample selection. They model the conditional expectation of the unobserved effect as a
linear function of the past values of the dependent variable and consider the distribution
of the dependent variable conditional on its past. For each t, the resulting reduced-form
equation is estimated on a sub-sample of data, which includes cross-section units without
missing past values. Arellano, Bover and Labeaga assume normality of the error terms
in both primary and selection equations and use the inverse Mills ratio to account for
the fact that only the sub-samples with observed past values are used. The structural
autoregressive coefficient is then recovered from the reduced-form coefficients using the
restrictions imposed on parameters.
Another solution to the incidental truncation problem in dynamic panel data models
was proposed by Kyriazidou (2001), who suggested taking differences between any two
periods in which the selection index for the given unit is the same or “similar.” Under the
assumption that the vector of errors is independent and identically distributed over time
conditional on the exogenous variables, differencing eliminates both the unobserved effect
and selection effect. For consistency, it is crucial that the assumptions of strict stationarity
and conditional serial independence of the errors hold. Moreover, the estimator converges
at a rate that is slower than the usual square root of the cross-section sample size.
Another semiparametric estimator was proposed by Gayle and Viauroux (2007), who
consider a three-step sieve estimator. In the first step the selection probabilities in each
4
period are estimated nonparametrically by a kernel estimator. In the second step the
inverse probability function is linearized, the unobserved effect is removed by differencing
and the parameters in the linearized specification of the inverse probability function are
estimated using a sieve minimum distance estimator (a GMM estimator with series used
to approximate unknown functions). In the third step the GMM estimator is used to
estimate the differenced primary equation augmented by the correction term, where the
differenced correction term is again approximated by series estimators.
As mentioned above, most earlier studies use differencing. A benefit of differencing for
unbalanced panels is that it removes additive heterogeneity, and therefore any selection is
allowed to be arbitrarily correlated with the heterogeneity in the levels equation. Unfor-
tunately, if selection depends on the idiosyncratic shocks, consistent estimation requires
either imposing relatively strong assumptions on the properties of error distributions or
necessitates derivation of a complicated selection correction term that accounts for selec-
tion in several consecutive periods. As noted by Blundell and Bond (1998), differencing
may also lead to a weak instruments problem. Furthermore, in the case of incidental trun-
cation – such as labor force participation – units may drop out and appear again in any
period; therefore, the use of first-differencing or otherwise conditioning on observability
of the dependent variable in multiple consecutive periods in dynamic panel data models
with arbitrary selection patterns implies that much of the data is lost.
In this paper we consider an alternative method for estimating dynamic panel data
models with selection, which does not rely on differencing. One of the key assumptions
is that the initial condition is observed for all cross-section units. To account for unob-
served heterogeneity, rather than using differencing we follow Blundell and Bond (1998)
and Chamberlain (1980, 1982, 1984), and model the conditional expectation of the unob-
served effect as a linear function of the exogenous variables and initial condition. Then,
backward substitution for the lagged dependent variable is used to obtain the equation
5
that contains the lags of the exogenous explanatory variables (which are assumed to be
always observed) and the initial condition, but no lags of the dependent variable. As a
result, selection correction reduces to a contemporaneous selection problem of the type
studied in Wooldridge (1995) with strictly exogenous variables. The ability to focus on
selection period-by-period greatly simplifies the derivation of the correction term while
allowing general serial correlation in the error of the selection equation. The simplest ap-
proach relies on the assumption that the error terms in the selection equation are normally
distributed, but we also briefly discuss the possibility of semiparametric estimation. Once
the correction term is obtained, the augmented equation can be consistently estimated
by nonlinear least squares (NLS) or GMM.
The new estimation methods have several important advantages. Modeling the un-
observed effects allows us to estimate the equation of interest in levels, thereby avoiding
the weak instruments problem often associated with the estimators that use differencing.
In the discussed context the error terms in both primary and selection equations may
be heterogeneously distributed over time, and the error in the selection equation may be
arbitrarily serially dependent. We also discuss how estimation can be modified, so that
the observability of the initial condition is not required, and serial dependence in the error
terms is permitted in both equations. Additionally, the approach proposed here makes
use of all cross-section units observed at least once after the initial period, which helps to
avoid losing data.
2 The Model
6
where xit is a 1×K vector of time-varying variables, β is a K×1 vector of parameters, ρ
is a scalar parameter, ci1 is a time-constant unobserved effect, and uit1 is an idiosyncratic
error. Variables in xit are assumed to be strictly exogenous conditional on the unobserved
effect, but may be correlated with ci1 .
Selection occurs because of the partial observability of the dependent variable, yit .
This is modeled by specifying a selection rule
where sit is a selection indicator that equals one if yit is observed and is zero otherwise, ci2
is a time-constant unobserved effect, uit2 is an idiosyncratic error, zit is a 1×L (L > K)
vector of variables that are strictly exogenous conditional on the unobserved effect, and
δ2t is an L×1 vector of parameters. In what follows, it is assumed that zit contains all of
the regressors from the primary equation, but must also contain at least one additional
time-varying variable. Additional variables may be the factors that affect selection but
not the dependent variable in the primary equation. Alternatively, if selection is partly
determined by the lagged values of yit (as in some labor supply models, for example),
vector zit would include lagged values of xit .
Given the selection problem, estimation of equation (1) by differencing is complicated
for several reasons. First, we need to observe the dependent variable and explanatory
variables in the current and previous periods. Because of the lagged dependent variable,
we would only be able to use observations where yit is observed in three consecutive peri-
ods. Moreover, any selection correction term would involve conditioning on observability
in three different periods, making its derivation and estimation difficult.
We can avoid these problems by substituting back for yi,t−1 and expressing yit through
7
the current and lagged values of the explanatory variables and the initial condition, yi0 :
t−1
! t−1 t−1
X X X
yit = ρt yi0 + ρj xi,t−j β + ci1 ρj + ρj ui,t−j,1 , t = 1, ..., T. (3)
j=0 j=0 j=0
Denote zi ≡ (zi1 , zi2 , . . . , ziT ). Given (3), the estimating equation can be derived
under the following assumption:
ASSUMPTION 2.1
(i) yi0 and zi are always observed, while yit , t = 1, . . . , T , are observed only for sit = 1.
(ii) E(uit1 |xit , yi,t−1 , xi,t−1 , ..., yi0 , ci1 ) = 0, so that Cov(uit1 , uis1 ) = 0, for all s 6= t.
PT
(iv) ci1 = η1 + s=1 ξs zis + γ1 yi0 + ai1 , E(ai1 |zi , yi0 ) = 0.
PT
(v) ci2 = η2 + s=1 ψs zis + γ2 yi0 + ai2 .
2
(vi) For vit2 = ai2 + uit2 , vit2 |zi , yi0 ∼ N ormal(0, σ2t ), t = 1, . . . , T .
Pt−1
(vii) For vit1 = j=0 ρj (ui,t−j,1 + ai1 ), E(vit1 |zi , yi0 , vit2 ) = ϕ2t vit2 , t = 1, . . . , T .
According to part (ii) of Assumption 2.1, the conditional mean in equation (1) is
assumed to be dynamically complete, which is a rather standard assumption in the lit-
erature. This part of the assumption ensures that yi0 is exogenous with respect to the
final error in (3). At the end of this section we discuss an alternative set of assumptions
and the corresponding estimating equation, where the dynamic completeness assumption
is dropped, so that {uit1 } may be serially correlated.
Part (iv) of Assumption 2.1 uses Chamberlain’s (1980, 1982, 1984) device to model the
conditional mean of the unobserved effect, ci1 , as a linear function of exogenous variables
(see also Blundell and Bond, 1998). This approach was used by Wooldridge (2005) in
8
the context of nonlinear dynamic panel data models with balanced panels. In general,
zit may contain time-constant variables; of course, the leads and lags of such variables
would not be included in the conditional mean of ci1 . A non-zero correlation between
the time-constant variables and ci1 implies that the effect of these variables cannot be
distinguished from that of the unobserved heterogeneity. However, it may still be useful
to include the time-invariant characteristics in zit because controlling for more variables
can help to improve on the precision of the estimator.
Under Assumption 2.1, parts (i)-(iv), the primary equation can be written as
t−1
! T
!
1 − ρt
X X
t j
yit = ρ yi0 + ρ xi,t−j β+ η1 + ξs zis + γ1 yi0 + vit1 ,
j=0
1−ρ s=1
E(vit1 |zi , yi0 ) = 0, t = 1, . . . , T, (4)
Pt−1
where vit1 = j=0 ρj (ui,t−j,1 + ai1 ), t = 1, . . . , T , are the new error terms, which will be
serially correlated even though the initial idiosyncratic errors were not.
Equation (4) can be used to estimate the parameters when the panel is balanced.3
Estimating equation (4) by NLS or GMM can serve as an alternative to traditional estima-
tors that combine first differencing with instrumental variables methods. As mentioned in
the introduction, a GMM estimator that uses first-differenced data suffers from the weak
instruments problem when the series are highly persistent. Specifically, for a sequentially
exogenous variable ωit , such as a lagged dependent variable, we can write the data gener-
ating process as ωit = ρωi,t−1 + ǫit , where Cov(ǫis , ǫit ) = 0 for s 6= t. In the extreme case,
where ρ = 1, ∆ωit = ǫit , so that past values (ωi,t−1 , . . . , ωi1 ) are not correlated with ∆ωit
and hence, cannot be used as instruments. When ρ is close to one, the lagged values are
correlated with ∆ωit , but the correlation is weak, which results in the weak instruments
3
We thank the anonymous referee for bringing this fact to our attention. The referee also noted that an
interesting question is whether our approach is less efficient than the Blundell and Bond (1998) approach.
This is difficult to say, as the two approaches make different assumptions about the initial condition.
9
problem. It is important to note, however, that this problem arises only when the esti-
mation method is GMM. Binder, Hsiao and Pesaran (2005) proposed a quasi maximum
likelihood estimator that uses differencing to remove unobserved heterogeneity, but does
not suffer from the weak instruments problem. Similarly, Hsiao, Pesaran and Tahmis-
cioglu (2002) propose a transformed likelihood approach and show that their maximum
likelihood estimator that uses differenced data performs better than the GMM estimator.
In equation (4), the weak instruments problem does not arise. Because all variables in
(4) are in levels, all of them are exogenous under Assumption 2.1 parts (ii)-(iv) and hence,
are used as their own instruments. Although the estimator relies on time variation in the
variables, the source of this variation does not matter. Even if ρ = 1, the parameters in
(4) can be consistently estimated by NLS or GMM, as long as Var(ǫit ) 6= 0. As is true
for all panel data models with large N and fixed T , the autoregressive coefficient can be
identified from the cross-sectional variation in the data.
In the context of an unbalanced panel, under Assumption 2.1, parts (v) and (vi), the
selection equation can we written as
T
X
sit = 1[η2 + zit δ2t + ψs zis + γ2 yi0 + vit2 > 0], t = 1, ..., T, (5)
s=1
2
vit2 |zi , yi0 ∼ N ormal(0, σ2t ), t = 1, . . . , T, (6)
where the Chamberlain’s modeling device is used to model the distribution of the time-
constant unobserved effect, ci2 . Note that due to the presence of the unobserved effect,
the composite errors, vit2 = uit2 + ai2 , t = 1, . . . , T , are necessarily serially correlated.
Also, error variances are allowed to vary over time. The normality assumption is not
crucial for estimating the selection equation. As long as vit2 is independent of (zi , yi0 )
and the appropriate regularity conditions hold, parameters in (5) can be consistently
estimated using a semiparametric estimator (see, for example, Ichimura 1993, Klein and
10
Spady 1991). However, as discussed below, the derivation of the selection correction term
is substantially simplified if Assumption 2.1(vi) holds.
To correct for the selection bias, we consider a two-step estimator and use the as-
sumptions similar to the standard selection literature in a cross-sectional context; see, for
example, Wooldridge (2002, Chapter 17). Specifically, from Assumption 2.1(vii) it follows
that
E[vit1 |zi , yi0 , sit = 1] = E[E(vit1 |vit2 )|zi , yi0 , sit = 1] = E[ϕ2t vit2 |zi , yi0 , sit = 1]
PT
where hit ≡ ht (η2 + zit δ2t + s=1 ψs zis + γ2 yi0 ), and ht (·) is an unknown function.
From (7), it follows that for sit = 1, equation (4) can be written as
t−1
! T
!
1 − ρt
X X
yit = ρt yi0 + ρj xi,t−j β+ η1 + ξs zis + γ1 yi0 + hit + eit1 ,
j=0
1−ρ s=1
E(eit1 |zi , yi0 , sit = 1) = 0, t = 1, ..., T. (8)
φ(·)
ht (·) = ϕ2t ≡ ϕ2t λ(·), (9)
Φ(·)
where φ(·) and Φ(·) are standard normal pdf and cdf , respectively, and λ(·) is the inverse
11
Mills ratio. Thus, with some abuse of notation we can write the primary equation for the
selected sample as
t−1
! T
!
1 − ρt
X X
t j
yit = ρ yi0 + ρ xi,t−j β+ η1 + ξs zis + γ1 yi0 + ϕ2t λit2 + eit1 ,
j=0
1−ρ s=1
E(eit1 |zi , yi0 , sit = 1) = 0, t = 1, ..., T, (10)
PT
where λit2 ≡ λ(η2 +zit δ2t + s=1 ψs zis +γ2 yi0 ). Under Assumption 2.1, equation (10) is
the final estimating equation that can be consistently estimated by NLS or GMM.
As an alternative approach, one could treat the initial condition as an unobserved
effect and model its conditional expectation as a linear function of exogenous variables,
as suggested by Chamberlain (1984).4 In this case, the dynamic completeness of the
conditional mean in equation (2) is not needed (and most likely will not hold), so that
the idiosyncratic errors in (2) may be serially correlated. Formally, the set of assumptions
can be summarized as follows:
ASSUMPTION 2.2
(i) yi0 is not observed, zi is always observed, and yit , t = 1, . . . , T , are observed only
for sit = 1.
PT
(iii) yi0 = s=1 κs zis + bi , E(bi |zi ) = 0.
PT PT
(vi) ci1 = η1 + s=1 ξs zis + γ1 yi0 + ai1 = η1 + s=1 (ξs + γ1 κs )zis + ai1 + γ1 bi , E(ai1 |zi ) = 0.
PT PT
(v) ci2 = η2 + s=1 ψs zis + γ2 yi0 + ai2 = η2 + s=1 (ψs + γ2 κs )zis + ai2 + γ2 bi .
2
(vi) For vit2 = ai2 + γ2 bi + uit2 , vit2 |zi ∼ N ormal(0, σ2t ), t = 1, . . . , T .
4
We thank the anonymous referee for suggesting that we consider this approach.
12
Pt−1
(vii) For vit1 = ρt bi + j=0 ρj (ui,t−j,1 + ai1 + γ1 bi ), E(vit1 |zi , vit2 ) = ϕ2t vit2 , t = 1, . . . , T .
Under Assumption 2.2, for sit = 1, the primary equation can be written as
T t−1
! " T
#
1 − ρt
ξ˜s zis + ϕ2t λit2 + eit1 ,
X X X
yit = ρt κs zis + ρj xi,t−j β+ η1 +
s=1 j=0
1−ρ s=1
E(eit1 |zi , sit = 1) = 0, t = 1, ..., T. (11)
3 NLS Estimation
13
Assumption 2.1(v) and (vi), equation (5) can be consistently estimated by probit after
the error variance is normalized to equal unity. Since error variances may differ across
time periods, it is most appropriate to estimate the selection equation separately for each
time period. Denote the first-step estimators π̂t = (ηt2 , ψ̂1t , . . . , δ2t\
+ ψtt , . . . , ψ̂T t , γ̂t2 )′ ,
π̂ = (π̂1′ , . . . , π̂T′ )′ , and the first-step vector of regressors qit = (1, zi1 , . . . , ziT , yi0 ). These
can be used to obtain λ̂it2 ≡ λ(qit π̂t ), and then λ̂it2 can be used instead of λit2 in equation
(10).
Denote the 1×[K + LT + T + 3] vector of the parameters θ ≡ (ρ, β, η1 , ξ1 , . . ., ξT , γ1 ,
ϕ21 , . . ., ϕ2T ). Parameters in θ can be consistently estimated by pooled nonlinear least
squares (NLS) on the selected sample.
Define the conditional expectation of yit :
mit (θ) ≡ m(zi , yi0 , sit = 1; θ) = E(yit |zi , yi0 , sit = 1), (12)
where
t−1
!
X
m(zi , yi0 , sit = 1; θ) = ρt yi0 + ρj xi,t−j β
j=0
T
!
t
1−ρ X
+ η1 + ξs zis + γ1 yi0 + ϕ2t λit2 . (13)
1−ρ s=1
The correction term, λit2 , is not available, but it can be replaced by a consistent estimator
mentioned above. In general, let mit (θ, π̂) be a conditional expectation obtained using the
estimators of the parameters in the selection equation. Then, the pooled NLS estimator
of θ is the solution to the minimization problem
N T
1 XX
min sit [yit − mit (θ, π̂)]2 , (14)
θ 2
i=1 t=1
14
where one half is used as a multiplier for convenience. The first-order condition for this
problem is
N X
X T
−sit ∇θ mit (θ̂, π̂)′ [yit − mit (θ̂, π̂)] = 0, (15)
i=1 t=1
which can be solved for θ̂ using the iterative procedures. As is standard in panel data
models, for identification it is necessary that T ≥ 2.
In summary, if Assumption 2.1 holds, a consistent estimator of θ can be obtained
from the following two-step procedure:
PROCEDURE 3.1
2. For sit = 1, estimate equation (10) with λit2 replaced by λ̂it2 by pooled NLS. Esti-
mate the asymptotic variance as described in Appendix A.
From Procedure 3.1 it is apparent that one needs at least one additional exogenous
variable in the selection equation (L > K). Although the inverse Mills ratio, λ̂it2 , is a
nonlinear function of its argument, it is approximately linear on the most of its range,
which may lead to multicollinearity. Thus, it is necessary to have at least one exclusion
restriction in order to make the estimation convincing.
Even though the resulting estimator is consistent, it is not efficient. From equations
(3) and (4) it is seen that the error terms in (10) are serially correlated. Besides, the
errors are going to be heteroskedastic because of selection. A nonlinear analog of the
seemingly unrelated regressions estimator (see Wooldridge 2002, Problem 12.7) cannot be
15
used in this context because selection is not strictly exogenous in the selection equation.
However, one can improve efficiency by using a GMM estimator, as discussed in the next
section.
4 GMM Estimation
The efficiency of the two-step estimator can be improved by using GMM at the second
step. Equation (10) is linear in regressors, but nonlinear in parameters, which results in
overidentification and permits obtaining a more efficient estimator than pooled NLS.
To specify a GMM estimator, define a 1×(LT +3) vector of instruments ω̂it ≡ ωit (π̂t ) ≡
(1, yi0 , zi1 , . . . , ziT , λ̂it2 ), t = 1, . . . , T , and a T ×T (LT + 3) matrix of instruments Ŵi ,
ω̂i1 0 0 . . . 0 0
0 ω̂i2 0 . . . 0 0
Ŵi ≡ Wi (π̂) ≡ (16)
...
0 0 0 . . . 0 ω̂iT
ĝit ≡ git (θ, π̂t ) ≡ sit [yit − mit (θ, π̂)], t = 1, . . . , T. (17)
From equation (10) it follows that the following moment conditions are available:
Since the conditional expectation of yit is different in each time period, equation (18)
implies T (LT + 3) moment conditions. Moreover, because mit (θ, π̂) is nonlinear in θ,
16
these conditions are not redundant and can be used to enhance efficiency.
The GMM estimator of θ is the solution to the minimization problem
N
!′ N
!
X X
min Wi (π̂)′ gi (θ∗ , π̂) Ω̂−1 Wi (π̂)′ gi (θ∗ , π̂) , (19)
θ
∗
i=1 i=1
" N
#′ " N
#
X X
Wi (π̂)′ ∇θ gi (θ̂, π̂) Ω̂−1 Wi (π̂)′ gi (θ̂, π̂) = 0. (20)
i=1 i=1
Then, θ can be consistently estimated using a procedure similar to Procedure 3.1, where
the GMM estimator is used instead of the pooled NLS estimator.
Notice that the pooled NLS estimator is identical to a GMM estimator, which exploits
the moment conditions
T
X
E[∇θ git (θ, π)′ git (θ, π)] = 0 (21)
t=1
( T
)−1
X
E[∇θ git (θ, π)′ ∇θ git (θ, π)] . (22)
t=1
Thus, in the NLS estimation, the instruments are “stacked” on top of each other, and
each time period receives an equal weight. In contrast, a general GMM estimator that
uses a block-diagonal matrix of instruments, as in equation (16), assigns different weights
to each time period, which can be used to improve efficiency. In the discussion below, it
is the solution to the minimization problem (19), which we call the GMM estimator.
The proposed GMM estimator will be consistent for any positive definite matrix Ω;
however, a particular form is preferred. Specifically, we formulate an additional assump-
17
tion:
ASSUMPTION 4.1
(ii) Ω = Λ.
p
(iii) Ω̂ −→ Ω.
which can be estimated as (ĜΩ̂−1 Ĝ)−1 /N , using the formulae provided in Appendix A.
We can now summarize a two-step estimation procedure. Let Assumptions 2.1 and
4.1 hold. Then, an estimator of θ that is asymptotically more efficient than the estimator
discussed in Section 3 can be obtained using the procedure:
PROCEDURE 4.1
18
2. In equation (10), replace λit2 with λ̂it2 . For sit = 1, estimate the equation by
GMM that uses moment conditions (18) and the weighting matrix that satisfies
Assumption 4.1. Estimate the asymptotic variance as described in Appendix A.
It is important to note that there are more moment conditions available in addition
to those specified in equation (18). Equation (10) implies that eit1 is uncorrelated with
any function of zi and yi0 . Therefore, any nonlinear functions of the exogenous variables
and the initial condition should be valid instruments and can be used to obtain additional
moment conditions.
The proposed two-step estimator can also be formulated as a joint GMM estimator
of (θ, π). As suggested by Newey and McFadden (1994, Section 6.1), such an estimator
can be obtained by “stacking” the moment conditions from the two steps. The moment
conditions from the second step are given in (18), while the first-order conditions from
the first-step estimation generate the additional moment conditions:
E {Φ(qit πt )[1 − Φ(qit πt )]}−1 φ(qit πt )qit′ [yit2 − Φ(qit πt )] = 0, t = 1, . . . , T. (24)
The conditions in (18) and (24) can be used to form a vector of moment conditions
for the joint GMM estimation. In that way the additional conditions can be used for
estimating θ, which can help to improve efficiency. However, since the first-step equa-
tions are exactly identified, the efficiency gain may be modest or even not present at all.
Moreover, the two-step GMM estimator appears to be computationally more tractable
than the joint GMM estimator in applications where the number of the first-step moment
conditions is large, for example, due to T being relatively large.
To study the properties of the proposed estimators in finite samples we performed
Monte Carlo experiments.5 In the experiments, among the three estimators that account
5
Detailed description of the experiments and all results are summarized in the supplement to the
paper, which is available from the authors upon request.
19
for the selection bias (two-step NLS, two-step GMM and joint GMM that uses the moment
conditions for both equations) the two-step NLS estimator has the smallest standard
deviations and root mean square errors (RMSEs) in small samples (N = 200), which is
likely due to the fact that the GMM estimators use estimated weighting matrices, Ω̂, that
cannot be precisely estimated in small samples. However, in large samples (N = 4000)
both GMM estimators are more efficient than the two-step NLS estimator. The joint
GMM estimator tends to have slightly smaller standard deviations and RMSEs than the
two-step GMM estimator, but the differences are minor and virtually disappear when N
is large (N = 4000).
The two-step NLS, two-step GMM and joint GMM estimators also perform reasonably
well when testing simple hypothesis about parameters. Although for all three estimators
the true null is rejected too often in small samples (with the over-rejection being most
severe for the two-step GMM estimator), the computed size gets closer to the nominal
size as N grows. Both the two-step GMM and joint GMM estimators outperform the
two-step NLS estimator in terms of the power of the tests.
It is possible to test for selection bias by testing the hypothesis H0 : ϕ2t = 0 in equation
(10). A variety of tests for GMM estimators described in Newey and McFadden (1994,
Section 9) can be used for this purpose. However, such tests require estimation of either
restricted or unrestricted model, or both, prior to testing. Since estimation of equation
(10) may be computationally costly due to nonlinearity in the parameters, it is useful to
have a simple alternative.
A simple test can be developed based on the initial linear model (1). To construct
a test, introduce a new selection indicator which identifies observability of yit in three
20
consecutive periods, and nominally assume that this new indicator follows an index model
with unobserved heterogeneity:
= 1[zit δ30t + zi,t−1 δ31t + zi,t−2 δ32t + ci3 + uit3 > 0], t = 3, . . . , T, (25)
where ci3 is the unobserved effect and uit3 is the idiosyncratic error. Moreover, (nominally)
assume that uit3 is normally distributed and independent of the explanatory variables and
unobserved effect,
uit3 |zi , ci3 ∼ N ormal(0, 1). (26)
Using Chamberlain’s approach and assuming normality, write the unobserved effect as
T
X
ci3 = η3 + zis ζs + ai3 ,
s=1
ai3 |zi ∼ N ormal(0, σ3t ), t = 3, . . . , T. (27)
T
X
dit = 1[η3 + zit δ30t + zi,t−1 δ31t + zi,t−2 δ32t + zis ζs + vit3 > 0],
s=1
vit3 |zi ∼ N ormal(0, 1 + σ3t ), t = 3, . . . , T, (28)
where vit3 ≡ ai3 + uit3 is a new composite error term. With regard to the error terms in
the primary equation, assume
21
which, when combined with the normality assumption, gives
After applying first differencing to equation (1), with some abuse of notation we can
write the differenced primary equation for dit = 1 as
Thus, the unobserved effect is removed by first differencing and ϕ3t λit3 captures the se-
lection effect. Naturally, time-constant variables drop out from the equation. The test is
then performed using the following procedure:
PROCEDURE 5.1
2. For dit = 1, augment the first-differenced primary equation by λ̂it3 and its interac-
tions with time dummies and estimate the augmented equation by pooled two stage
least squares or GMM using yi,t−2 and leads and lags of zit as instruments for ∆yi,t−1
(∆xit , λ̂it3 and the interaction terms should be used as their own instruments). Use
the Wald test to test the hypothesis ϕ31 = . . . = ϕ3T = 0.
22
As an extension to the proposed procedure, it is possible to impose a restriction of
equal variances in the selection equation and estimate equation (28) by pooled probit.
Similarly, one may assume that the effect of selection is the same in all time periods and
omit the interaction terms in the second-step estimation. A test for selection bias in that
case is a usual t-test of the significance of the coefficient on λ̂it3 . Note that for testing
a usual variance-covariance matrix should be used; there is no need to adjust for the
first-step estimation.
If in some period, t − j (for j = 3, . . . , t − 1), yi,t−j is observed for all cross-section
units, then yi,t−j can be used as an additional instrument in the second-step estimation.
Otherwise, if there are missing values for at least some i, then the observable variable is
(si,t−j · yi,t−j ), and this is not a valid instrument, since we did not account for selection in
period t − j when constructing λ̂it3 .
Importantly, the proposed test is valid regardless of whether or not the model in (25)
is correct and whether or not the normality assumption holds. All we need for testing is
a reasonable proxy for the selection effect, and the correct specification of the selection
term is not essential. If selection problem is present, hopefully this will still be captured
by a non-zero coefficient on the inverse Mills ratio in the differenced equation. Similar to
the estimators discussed above, having additional variables in zit that are not also in xit
helps to make the test more reliable.
When the hypothesis of no selection bias is not rejected, the pooled two stage least
squares or GMM estimation of the first-differenced equation with ∆xit , yi,t−2 , and leads
and lags of strictly exogenous variables used as instruments will produce consistent es-
timators. More distant lags can be used as additional instruments if observed for all
cross-section units. However, if the null is rejected, Procedure 5.1 will be a valid correc-
tion procedure only if all the assumptions specified in this section are correct. Given that
model for dit in equation (25) is quite restrictive, Procedure 5.1 is unlikely to perform
23
well as a correction method. Therefore, the methodology described in the previous two
sections should be used instead.
6 Empirical Application
This section illustrates the proposed methodology with an empirical example by applying
the new methods to the estimation of dynamic earnings equations for females. This
example is appropriate because earnings are largely determined by different historical
factors and tend to be correlated over time.
The data come from the Panel Study of Income Dynamics (PSID), years 1980 to
1992. The sample consists of white females, who were followed over the considered pe-
riod.6 Because when estimating equation (10) it is necessary that the initial condition
is observed, we keep only those females for whom 1980 earnings are available. The final
sample consists of 579 women, or 6,948 observations over the 12-year period (1981-1992).
For this period, the earnings sample is comprised of 5,891 observations. Thus, about 15%
of earnings data are missing due to non-participation.
Because we define the population as women working in 1980, this exercise should
be viewed as an evaluation of the effects of movement in and out of the labor force
on estimated earnings equations. Such a question is of considerable interest in labor
economics.
The dependent variable in the primary equation is the natural logarithm of the aver-
age annual hourly earnings, while the independent variables include age, age squared and
time dummies. We assume that age is strictly exogenous and is not correlated with the
6
We consider working-age women (ages 18-65) who were either household heads or “wives,” have com-
pleted their education and are neither self-employed nor agricultural workers. The woman was excluded
from the analysis if her self-reported age exceeded the age constructed using information on the year of
birth by more than two years or self-reported age was smaller than the constructed age by more than
one year, or if the woman reported positive work hours and zero earnings.
24
unobserved effect. This assumption implies that the mean ability of women born in differ-
ent years is about the same. Our sample is restricted to women who have completed their
education (i.e. years of schooling do not vary over time); hence, the effect of education
is not separable from unobserved heterogeneity. Therefore, we only include education as
part of the unobserved effect. Additionally, to control for unobserved heterogeneity, we
include the number of children in all time periods (i.e. the number of children is assumed
to belong to zit , but not xit ).
The selection rule is for labor force participation. A woman is considered to be a
participant if she reports positive work hours in a given year. When estimating selection
equations, in the probit regressions in each time period we include education, age, age
squared, and the number of children in all time periods, where the number of children
may have a direct effect on the labor force participation. Log of hourly earnings in 1980
is included depending on whether the methodology of Sections 2-4 or the methodology of
Section 5 is used for the analysis.
Before applying the more advanced methods developed in Sections 2 through 4, we first
estimate equation (1) using the simple approach of Section 5. From the total 1980-1992
sample we keep observations for which earnings data are available in three consecutive
periods and use first differencing to remove the unobserved effect. As a result, the sample
size reduces to 5,033 observations; age and education drop out from the equation. Then,
we estimate the first-differenced equation by pooled instrumental variables using the log
of hourly earnings in t − 2 as an instrument for ∆yi,t−1 . We call this estimator the first
difference instrumental variables (FD-IV) estimator.
The estimates for the log earnings equations are reported in Table 1. The first column
of the Table display contains the estimates from FD-IV regressions without inverse Mills
ratios. The second column contains the test of selection bias in the first-differenced
equation using the results in Section 5. The estimate of ρ is rather similar in the two
25
columns; it is about 0.15-0.16 and is statistically significant at the 1% level. However, the
test suggests that selection bias may be present. The null of no selection is rejected at
the 8% significance level. Thus, one might conclude from the test using the FD equation
that selection into the work force may be systematically related to idiosyncratic shocks
to earnings.
The estimates obtained using the methods discussed in Sections 2-4 are reported in
the remaining three columns of Table 1. Columns (3) and (4) show estimation results
from regressions where the NLS estimator is used at the second step. Column (5) contains
the estimates obtained using Procedure 4.1, which employs GMM at the second step. The
estimates for the augmented log earnings equation are reported in columns (4) and (5).
Based on the Wald tests of the joint significance of the selection terms, the hypothesis
of no selection bias is rejected at the 5% level in both cases. Thus, we again find the
evidence of the selection bias.
The NLS and GMM estimates of ρ are very similar in all three regressions. The
estimate is about 0.6 and is significant at the 1% level, which provides evidence of state
dependence in earnings offers. This estimate is rather different from the one obtained using
first-differencing. Interestingly, similar results were obtained in Monte Carlo simulations,
where the FD-IV estimator had substantially larger biases than the NLS estimator that
did not account for selection. For all coefficient estimates, standard errors are smaller
when the GMM estimator is used at the second step.
Columns (3)-(5) show an estimated effect of another year of schooling of about 3%,
which is statistically significant at the 1% level. We emphasize, however, that this effect
is not distinguishable from unobserved heterogeneity. Moreover, the coefficient on years
of schooling in these regressions is not a true return to education because education has
an additional effect on earnings through the autoregressive earnings term.
The coefficients on the age and age squared reveal a usual U-shape profile, although
26
the corresponding estimates are less precise, particularly in the NLS regressions.
As a robustness check, we re-estimated the earnings equation using the data from
years 1981-1992. The sample was restricted to only include women who reported earnings
in 1981 (583 women).7 The resulting coefficient estimates and standard errors were very
similar to the ones reported in Table 1. The only noticeable change was observed for the
two-step GMM estimates of the coefficients on age and age squared, which became some-
what smaller and statistically insignificant. Based on the results of the joint Wald tests,
the null of no selection bias could not be rejected; however, several selection correction
terms were individually significant. Specifically, in the FD-IV regression the inverse Mills
ratios for years 1984, 1985 and 1991 were significant at the 5% significance level. The
correction term for year 1991 was also significant at the 5% level in the two-step NLS and
two-step GMM regressions. The table with detailed estimation results is available from
the authors upon request.
Returning to the discussion of the estimating equation in Section 2, we note that one
could also estimate the parameters using equation (11). Is such a case, identification would
rely on time variation in strictly exogenous variables, age and age squared. Moreover, the
autoregressive coefficient, ρ, would only capture the observed dynamics. In applications
where there are no time-varying strictly exogenous variables in the model (i.e. xit is
empty), the data would not provide a distinction between the observed and unobserved
dynamics.8
7 Conclusions
In this paper, the new methods for estimating dynamic panel data models with selectivity
were proposed. A distinctive feature of the new estimators is that they do not rely on
7
The cross-section sample size increased because more women were working in 1981 than in 1980.
8
We thank the anonymous referee for suggesting that we include the discussion of this issue.
27
differencing when treating the unobserved heterogeneity. This feature allows to avoid the
weak instruments problem, which arises in the context of differencing if series are highly
persistent or close to unit root. The proposed correction is relatively simple because the
method requires correcting for selection in current period only. The errors in both selection
and primary equations may be heterogeneously distributed. The errors in the selection
equation may also be serially dependent, and the general form of heteroskedasticity is
allowed in the primary equation. Additionally, this paper develops a simple test for
sample selection bias.
The proposed methods are applied to the estimation of dynamic earnings equations
for females using the Panel Study of Income Dynamics data. The evidence of selection
bias is found in both the first-differenced equation and the equation obtained after back-
substitution. The NLS and GMM estimation based on the new methodology produces the
estimate of the stability parameter that equals 0.6 and is rather different from the estimate
obtained from the instrumental variables estimation of the first-differenced equation.
The proposed correction procedure is parametric and assumes normality of the er-
rors in the selection equation. An important topic for future research is developing a
semiparametric estimator, which would not require parametric assumptions regarding the
error distributions. Such an estimator can be implemented within the framework of this
paper using the methods similar to those considered in Semykina and Wooldridge (2010).
Appendix A
This section starts with a derivation of the variance of the GMM estimator. The derivation
of the variance of the pooled NLS estimator follows by analogy. Using the notation from
Section 3, let π̂t = (ηt2 , ψ̂1t , . . . , δ2t\
+ ψtt , . . . , ψ̂T t , γ̂t2 )′ , π̂ = (π̂1′ , . . . , π̂T′ )′ , be the first-step
estimators, and let qit = (1, zi1 , . . . , ziT , yi0 ) be the first-step vector of regressors. Also,
28
denote the vector of the parameters θ ≡ (ρ, β, η1 , ξ1 , . . ., ξT , γ1 , ϕ21 , . . ., ϕ2T ).
Under the standard regularity conditions given, for example, in Wooldridge (2002,
Theorem 14.1), the GMM estimator, θ̂, is consistent when π is known. If π̂ is a consistent
estimator of π, the first stage estimation will not affect consistency of θ̂.
By definition, if Ω̂ is a consistent estimator of a positive definite matrix Ω, then
p
Ω̂ −→ Ω. Also, by consistency of θ̂ and π̂, and the weak law of large numbers,
N
p
X
−1
N Wi (π̂)′ ∇θ gi (θ̂, π̂) −→ G,
i=1
N
X N
X
′ −1 −1/2 ′ ′ −1 −1/2
GΩ N Wi (π̂) gi (θ, π̂) + G Ω N Wi (π̂)′ ∇θ gi (θ, π̂)(θ̂ − θ) + op (1) = 0,
i=1 i=1
N
√ X
N (θ̂ − θ) = −C −1 G′ Ω−1 N −1/2 Wi (π̂)′ gi (θ, π̂) + op (1), (32)
i=1
where C ≡ G′ Ω−1 G.
Next, we need to account for the first-stage estimation of π. In equation (32), both the
matrix of instruments and function gi depend on π̂. However, as is known, the use of gen-
erated instruments does not affect the asymptotic variance of the GMM estimator. This
result follows from the conditional moment restrictions in equation (10), which imply that
E[gi (θ, π)|xi , yi0 , sit = 1] = 0, so that gi (θ, π) is uncorrelated with any function of (xi , yi0 )
conditional on sit = 1. Therefore, the mean-value expansion of N −1/2 N
P
i=1 Wi (π̂) gi (θ, π̂)
′
29
around π gives
N N
−1/2
X
′ −1/2
X √
N Wi (π̂) gi (θ, π̂) = N Wi (π)′ gi (θ, π) + F N (π̂ − π) + op (1), (33)
i=1 i=1
−sit qi1 ϕ21 λi12 (qi1 π1 + λi12 ) 0 . . . 0 0
∇π gi (θ, π) =
...
0 0 . . . 0 −sit qiT ϕ2T λiT 2 (qiT πT + λiT 2 )
(34)
Here we used the fact that the derivative of the inverse Mills ratio is equal to −qit λit2 (qit π+
λit2 ) [see, for example, Wooldridge 2002, p. 522].
Since π̂t , t = 1, . . . , T are maximum likelihood estimators, π̂ satisfies
N
√ X
N (π̂ − π) = N −1/2 di (π) + op (1), (35)
i=1
Hit (πt ) = −{Φ(qit πt )[1 − Φ(qit πt )]}−1 [φ(qit πt )]2 qit′ qit . (36)
N
√ −1 ′ −1 −1/2
X
N (θ̂ − θ) = −C GΩ N [Wi (π)′ gi (θ, π) + F di (π)] + op (1), (37)
i=1
30
and by the central limit theorem,
√ d
N (θ̂ − θ) −→ N ormal(0, C −1 G′ Ω−1 P Ω−1 GC −1 ), (38)
where
P ≡ E[pi p′i ],
By choosing Ω = P , we obtain
√
Avar N (θ̂ − θ) = C −1 = (G′ Ω−1 G)−1 , (40)
and the asymptotic variance of θ is given in equation (23). The variance can be estimated
using the estimators of θ and π instead of the true parameter values and by replacing
matrices G, A, F , and Ω, with their consistent estimators:
N
X
−1
Ĝ ≡ N Wi (π̂)′ ∇θ gi (θ̂, π̂)],
i=1
XN
Ât ≡ N −1 [−Hit (π̂)],
i=1
XN
F̂ ≡ N −1 Wi (π̂)′ ∇π gi (θ̂, π̂),
i=1
N
X
Ω̂ ≡ N −1 [p̂i p̂′i ],
i=1
The asymptotic variance of the pooled NLS estimator can be derived using the same
logic. First, following the same steps as the ones used to obtain (32) [see also Wooldridge
31
2002, Section 12.3], we can write
T
N X
√ −1 −1/2
X
N (θ̂N LS − θ) = −D N ∇θ git (θ, π̂)′ git (θ, π̂) + op (1),
i=1 t=1
" T
#
X
D=E ∇θ git (θ, π̂)′ ∇θ git (θ, π̂) . (42)
t=1
PN PT
The mean-value expansion of N −1/2 i=1 t=1 ∇θ git (θ, π̂)′ git (θ, π̂) around π gives
N X
X T
−1/2
N ∇θ git (θ, π̂)′ git (θ, π̂)
i=1 t=1
N X T
X √
= N −1/2 ∇θ git (θ, π)′ git (θ, π) + Q N (π̂ − π) + op (1),
i=1 t=1
" T
#
X
Q = E ∇θ git (θ, π)′ ∇π git (θ, π)
t=1
= E [∇θ gi1 (θ, π)′ ∇π1 gi1 (θ, π), . . . , ∇θ giT (θ, π)′ ∇πT giT (θ, π)] . (43)
N
" T #
√ X X
N (θ̂N LS − θ) = −D−1 N −1/2 ∇θ git (θ, π)′ git (θ, π) + Qdi (π) + op (1), (44)
i=1 t=1
√ d
N (θ̂N LS − θ) −→ N ormal(0, D−1 RD−1 ),
R ≡ E[ri ri′ ],
XT
ri = ∇θ git (θ, π)′ git (θ, π) + Qdi (π). (45)
t=1
Then, the asymptotic variance of the pooled NLS estimator can be estimated as
32
where
N
" T #
X X
D̂ ≡ N −1 ∇θ git (θ̂, π̂)′ ∇θ git (θ̂, π̂) ,
i=1 t=1
′
R̂ ≡ E[r̂i r̂i ],
XT
r̂i ≡ ∇θ git (θ̂, π̂)′ git (θ̂, π̂) + Q̂di (π̂),
t=1
N
" T #
X X
Q̂ ≡ N −1 ∇θ git (θ̂, π̂)′ ∇π git (θ̂, π̂) . (47)
i=1 t=1
Appendix B
Expressions for the derivatives in matrix ∇θ gi (θ, π) ≡ (∇θ gi1 (θ, π))′ , . . . , ∇θ giT (θ, π)′ )′ are
summarized below:
∂git ∂git ∂git ∂git ∂git ∂git ∂git ∂git
∇θ git (θ, π) ≡ , , , ,..., , , , , t = 1, . . . , T
∂ρ ∂β ∂η1 ∂ξ1 ∂ξT ∂γ1 ∂ϕ12 ∂ϕT 2
" t−1
!
∂git X
= −sit tρt−1 yi0 + j · ρj−1 xi,t−j β
∂ρ j=1
T
!#
1 − ρt tρt−1
X
+ − η1 + ξr zir + γ1 yi0 ,
(1 − ρ)2 1 − ρ r=1
t−1
∂git X
= −sit ρj xi,t−j ,
∂β j=0
1 − ρt
∂git
= −sit ,
∂η1 1−ρ
1 − ρt
∂git
= −sit zir , r = 1, . . . , T,
∂ξr 1−ρ
1 − ρt
∂git
= −sit yi0 ,
∂γ1 1−ρ
∂git
= −sit λit ,
∂ϕt2
∂git
= 0, r 6= t. (48)
∂ϕr2
33
Also, one can easily obtain [sit ∇θ mit (θ, π)] as −∇θ git (θ, π).
References
Ahn SC, Schmidt P. 1995. Efficient estimation of models for dynamic panel data. Journal
of Econometrics 68: 5-27.
Anderson TW, Hsiao C. 1981. Estimation of dynamic models with error components.
Journal of the American Statistical Association 76: 598-606.
Arellano M, Bond SR. 1991. Some tests of specification for panel data: Monte Carlo
evidence and an application to employment equations. Review of Economic Studies
58: 277-297.
Arellano M, Bover O, Labeaga JM. 1999. Autoregressive models with sample selectivity
for panel data. In Analysis of Panels and Limited Dependent Variable Models:
In honour of G. S. Maddala, Hsiao C, Lahiri K, Lee L, and Pesaran MH (eds).
Cambridge University Press.
Binder M, Hsiao C, Pesaran MH. 2005. Estimation and inference in short panel vector
autoregressions with unit roots and cointegration. Econometric Theory 21: 795-837.
Blundell RW, Bond SR. 1998. Initial conditions and moment restrictions in dynamic
panel data models. Journal of Econometrics 87: 115-143.
Bover O, Arellano M. 1997. Estimating dynamic limited dependent variable models from
panel data. Investigaciones Economicas 21: 141-165.
34
Chamberlain G. 1980. Analysis with qualitative data. Review of Economic Studies 47:
225-238.
Chamberlain G. 1982. Multivariate regression models for panel data. Journal of Econo-
metrics 18: 5-46.
Holtz-Eakin D, Newey WK, Rosen HS. 1988. Estimating vector autoregressions with
panel data. Econometrica 56: 1371-1395.
Honore BE, Hu L. 2004. Estimation of cross sectional and panel data censored regression
models with endogeneity. Journal of Econometrics 122: 293-316.
Hsiao C, Pesaran MH, Tahmiscioglu AK. 2002. Maximum likelihood estimation of fixed
effects dynamic panel data models covering short time periods. Journal of Econo-
metrics 109: 107-150.
Ichimura H. 1993. Semiparametric least squares (SLS) and weighted SLS estimation of
single-index models. Journal o f Econometrics 58: 71-120.
Klein RL, Spady RH. 1993. An efficient semiparametric estimator for binary response
models. Econometrica 61: 387-421.
Kyriazidou E. 2001. Estimation of dynamic panel data sample selection models. Review
of Economic Studies 68: 543-572.
35
Labeaga JM. 1999. A Double-hurdle rational addiction model with heterogeneity: Esti-
mating the demand for tobacco. Journal of Econometrics 93: 49-72.
Newey WK, McFadden D. 1994. Large sample estimation and hypothesis testing. In
Handbook of Econometrics, Volume 4, Engle RF, McFadden D (eds). Amsterdam:
North Holland, 2111-2245.
Semykina A, Wooldridge JM. 2010. Estimating panel data models in the presence of
endogeneity and selection. Journal of Econometrics 157: 375-380.
Wooldridge JM. 2002. Econometric Analysis of Cross Section and Panel Data. MIT:
Cambridge, MA.
Wooldridge JM. 2005. Simple solutions to the initial conditions problem in dynamic,
nonlinear panel data models with unobserved heterogeneity. Journal of Applied
Econometrics 20: 39-54.
Ziliak JP, Kniesner TJ. 1998. The importance of sample attrition in life cycle labor
supply estimation. Journal of Human Resources 22: 507-530.
36
Table 1: Estimates for the Dynamic Log(Hourly Earnings) Equation
Wald Test of
Joint Significance χ211 = 18.35 χ212 = 22.13 χ212 = 41.28
of the Inverse (0.074) (0.036) (0.000)
Mills Ratios
37