Papke Wooldridge 1996

Econometric Methods for Fractional Response Variables With an Application to 401 (K) Plan
Participation Rates
Author(s): Leslie E. Papke and Jeffrey M. Wooldridge
Source: Journal of Applied Econometrics, Vol. 11, No. 6 (Nov. - Dec., 1996), pp. 619-632
Published by: John Wiley & Sons
Stable URL: http://www.jstor.org/stable/2285155 .
Accessed: 22/05/2011 17:56
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=jwiley. .
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
John Wiley & Sons is collaborating with JSTOR to digitize, preserve and extend access to Journal of Applied
Econometrics.
http://www.jstor.org
VOL. 11, 619-632 (1996)
JOURNALOF APPLIEDECONOMETRICS,
ECONOMETRICMETHODSFORFRACTIONALRESPONSE
VARIABLESWITHAN APPLICATIONTO 401 (K) PLAN
PARTICIPATIONRATES
LESLIE E. PAPKE AND JEFFREY M. WOOLDRIDGE

Departmentof Economics,MichiganState University,MarshallHall, EastLansing,MI48824-1038, USA
SUMMARY
We develop attractive functional forms and simple quasi-likelihood estimation methods for regression
models with a fractional dependent variable. Compared with log-odds type procedures, there is no
difficulty in recovering the regression function for the fractional variable, and there is no need to use ad
hoc transformationsto handle data at the extreme values of zero and one. We also offer some new, robust
specification tests by nesting the logit or probit function in a more general functional form. We apply these
methods to a data set of employee participation rates in 401 (k) pension plans.
1. INTRODUCTION
Fractional response variables arise naturally in many economic settings. The fraction of total
weekly hours spent working, the proportion of income spent on charitable contributions, and
participation rates in voluntary pension plans are just a few examples of economic variables
bounded between zero and one. The bounded nature of such variables and the possibility of
observing values at the boundaries raise interesting functional form and inference issues. In this
paper we specify and analyse a class of functional forms with satisfying econometric properties.
We also synthesize and expand on the generalized linear models (GLM) literature from statistics
and the quasi-likelihood literature from econometrics to obtain robust methods for estimation
and inference with fractional response variables.
We apply the methods to estimate a model of employee participationrates in 401 (k) pension
plans. The key explanatory variable of interest is the plan's 'match rate,' the rate at which a firm
matches a dollar of employee contributions. The empirical work extends that of Papke (1995),
who studied this problem using linear spline methods. Spline methods are flexible, but they do
not ensure that predicted values lie in the unit interval.
To illustrate the methodological issues that arise with fractional dependent variables, suppose
that a variable y, 0 < y < 1, is to be explained by a 1 x K vector of explanatory variables
x a (xI, x2, ..., XK), with the convention that x 1. The population model
E(y I x)=fl + 42X2+ '+ + PKXK

x (1)
where fi is a Kx 1 vector, rarely provides the best description of E( y I x). The primary reason
is that y is bounded between 0 and 1, and so the effect of any particular xj cannot be constant
throughout the range of x (unless the range of xj is very limited). To some extent this problem
can be overcome by augmenting a linear model with non-linear functions of x, but the predicted
CCC 0883-7252/96/060619-14 Received 25 October 1993

© 1996 by John Wiley & Sons, Ltd. Revised 19 February 1996
620 L. E. PAPKE AND J. M. WOOLDRIDGE
values from an OLS regression can never be guaranteed to lie in the unit interval. Thus, the
drawbacks of linear models for fractional data are analogous to the drawbacks of the linear
probability model for binary data.
The most common alternative to equation (1) has been to model the log-odds ratio as a linear
function. If y is strictly between zero and one then a linear model for the log-odds ratio is
E(log[y/(1 - y)] Ix) = x/ (2)

Equation (2) is attractive because log[y/(l - y)] can take on any real value as y varies between
0 and 1, so it is natural to model its population regression as a linear function. Nevertheless,
there are two potential problems with equation (2). First, the equation cannot be true if y takes
on the values 0 or 1 with positive probability. Consequently, given a set of data, if any
observation yi equals 0 or 1 then an djustment must be made before computing the log-odds
ratio. When the yi are proportions from a fixed number of groups with known group sizes,
adjustments are ae vailable in the literature-see, for example, Maddala (1983, p. 30). Estimation
of the log-odds model then corresponds to Berkson's minimum chi-square method.
Unfortunately, the minimum chi-square method for a fixed number of categories is not
applicable to certain economic problems. First, the fraction y may not be a proportion from a
discrete group size-for example, yi could be the fraction of county land area containing toxic
waste dumps, or the proportion of income given in charitable contributions. Second, one may be
hesitant to adjust the extreme values in the data if a large percentage is at the extremes. In our
application to 401(k) plan participation rates, about 40% of the yi takes on the value unity. It
seems more naturalto treat such examples in a regression-type framework.
Even when model (2) is well defined, there is still a problem. Without further assumptions,
we cannot recover E( y Ix), which is our primary interest. Under model (2) the expected value
of y given x is
xr( 1 + exp(xf +v)

where f( | x) denotes the conditionaldensity of u _log[y/(l - y)] -x/f given x and v is a
dummy argument of integration. Even if u and x are assumed to be independent,
E( y Ix) * exp(xfS)]/[l + exp(xfl)], although E( y Ix) can be estimated using, for example,
Duan's (1983) smearing method. If u and x are not independent, model (3) cannot be estimated
without estimatingf(- Ix). This is either difficult or non-robust,dependingon whether a
non-parametric or a parametric approach is adopted. Instead, we prefer to specify models for
E( y Ix) directly, without having to estimate the density of u given x.
Naturally,it is alwayspossibleto estimateE( y Ix) by assuminga particulardistributionfor y
given x and estimating the parameters of the conditional distribution by maximum likelihood.
One plausible distribution for fractional y is the beta distribution; Mullahy (1990) suggests this
as one possible approach. Unfortunately, the estimates of E( y Ix) that one obtains are known
not to be robust to distributional failure (this follows from Gourieroux, Monfort, and
Trognon(1984); more on this below). Clearly, standard distributional assumptions can fail in
certainapplications.One importantlimitationof the beta distributionis thatit impliesthateach
value in [0, 1 ] is taken on with probability zero. Thus, the beta distribution is difficult to justify
in applications where at least some portion of the sample is at the extreme values of zero or one.
In the next section we specify a reasonable class of functional forms for E( y Ix) and show
how to estimate the parameters using Bernoulli quasi-likelihood methods. These functional
forms and estimators circumvent the problems raised above and are easily implemented. Some
METHODSFORFRACTIONALRESPONSEVARIABLES
ECONOMETRIC 621
new specification tests are offered in Section 3, and Section 4 contains the empirical application
relating 401 (k) plan participationrates to the plan's matching rate and other plan characteristics.
2. FUNCTIONAL FORMS AND QUASI-LIKELIHOOD METHODS

We assume the availability of an independent (though not necessarily identically distributed)
s
sequence of observations { (x, y,) : i = 1,2, ..., N}, where 0< yi 1 and N is the sample size.
The asymptotic analysis is carried out as N--oo. Our maintained assumption is that, for all i,
E( yi Ixi) = G(xfl) (4)
where G(-) is a known function satisfying 0< G(z)< 1 for all zER. This ensures that the
predicted values of y lie in the interval (0, 1). Equation (4) is well defined even if yi can take on
0 or 1 with positive probability. Typically, G(.) is chosen to be a cumulative distribution
function (cdf), with the two most popular examples being G(z) A(z) exp(z)/
[1 + exp(z)]-the logistic function-and G(z) -¢(z), where <(D) is the standardnormal cdf.
However, G(-) need not even be a cdf in what follows.
In stating equation (4) we make no assumption about an underlying structureused to obtain
Yi. In the special case that y, is a proportion from a group of known size ni, the methods in this
paper ignore the information on n,. There are some advantages to ignoring ni. First, one does not
always want to condition on n, n in which case y contains all relevant information. Second, the
methods here are computationally simple. Third, under the assumptions we impose, the method
suggested here need not be less efficient than methods that use information on group size. (See
Papke and Wooldridge (1993) for methods that incorporate information on ni in a similar
framework.)
We have stated the functional form directly in terms of E( yi Ixi), where xi is observable.
Stating the model of interest in terms of E( y, Ixi, 0,), where Oi is unobserved heterogeneity
independent of xi, requires one to specify a distribution for i0 in order to obtain E( yi Ixi) (which
is ultimately of interest in any case). Generally, although not always, this will lead to a different
functional form from equation (4). Allowing for functional forms other than the index structure
in equation (4) may be worth-while, but it is not within the scope of this paper. In Section 3 we
present a general functional form test that has power against a variety of functional form
misspecifications, including those that arise from models of unobserved heterogeneity.
Under equation (4), j8 can be consistently estimated by non-linear least squares (NLS). The
fact that equation (4) is non-linear in fi is perhaps the leading reason a linear model for yi or for
the log-odds ratio is used in applied work. Further, heteroscedasticity is likely to be present
since Var( yi Ixi) is unlikely to be constant when '0<yi < I. Obtaining the NLS estimates and
heteroscedasticity-robust standard errors and test statistics requires special programming, and
the NLS estimator will not have any efficiency properties when Var( yi Ix) is not constant. Still,
the motivation underlying NLS is sound because it directly estimates E( y Ix). See also Mullahy
(1990), who suggests NLS for continuously distributed outcomes on a bounded interval.
The estimation procedure we propose is a particular quasi-likelihood method, as in
Gourieroux, Monfort, and Trognon (1984) (hereafter GMT) and McCullagh and Nelder (1989)
(hereafter MN). The Bernoulli log-likelihood function, given by
l,(b) yi log[G(x.b)] + (1 - yi)log[l1 - G(x,b)] (5)
is well defined for 0< G( )<1 and is attractive for several reasons. First, maximizing the
Bernoulli log-likelihood is easy. Second, because equation (5) is a member of the linear
exponential family (LEF), the quasi-maximum likelihood estimator (QMLE) of 16, obtained
fromthe maximizationproblem
N
max li(b)
b
is consistentfor f providedthatequation(4) holds. (This follows from GMT(1984) and is also

easily seen by computingthe score si(b) Vp/l(b)' and showing that E[si(,8)|x] =0 under
equation (4).) In other words, the Bernoulli, QMLE /3 is consistent and IN-asymptotically
normal regardless of the distribution of yi conditional on xi; yi could be a continuous variable, a
discrete variable, or have both continuous and discrete characteristics. As we will see below, in
some cases for fractional data the Bernoulli QMLE is efficient in a class of estimators containing
all QMLEsin the LEFandweightedNLS estimators.
A special case of equation (5)-namely, when G(.) is the logistic function-has been
suggested in the GLM framework by MN (1989). The GLM approach has two drawbacks for
economicapplications.First,for the logit QMLEit assumesthat
Var(yi Ix) = a2G(xifi)[l - G(xifi)] for some a2>0 (6)
where G(.)=A(.). While we prefer equation (6) as a nominal variance assumptionto the
nominal NLS homoscedasticity assumption Var(Yilxi) = a2, imposing any particular
conditional variance when performing inference is too restrictive. Mechanisms for which
equation (6) fails are common and are related to the literature on binary choice models with
over-dispersion;see, for example,MN (1989, section4.5). Briefly,if each yi is computedas the
averageof ni independentbinaryvariables,say yi, such thatP( y,i = 1 Ixi, ni) = G(xfi,), then it
can be shownthat
Var( yi Ixi) = E(ni-l Ix)G(x,if)[l - G(xii)]
Unless ni and xi are independent,equation(6) generallyfails. In our application,whereyijis a

binaryindicatorfor whetherworkerj at firm i contributesto a 401 (k) plan, ni is the numberof
workers at firm i, and xi contains firm characteristics, ni and xi are unlikely to be independent in
the population. In addition, equation (6) can fail if there are unobserved group effects. Notice,
however, that neither of these situations invalidates equation (4), which is all that is needed to
consistentlyestimate/, usingthe BernoulliQMLE.
The second drawback to the GLM approach is related to the first: if the variance assumption
(6) fails, MN (1989, p. 330) reject the logit quasi-likelihood approach and suggest a more
complicated quasi-likelihood. But this begs the issue of whether the conditional mean model (4)
is appropriate. Here, we are primarily interested in the conditional mean. Rather than
abandoning the Bernoulli QMLE because equation (6) might fail, we propose asymptotically
robustinferencefor the conditionalmeanparameters.
To find the asymptotic variance of the Bernoulli QMLE, define g(z) dG(z)/dz,
Gi - G(xfi) - Yi,andgi = g(xif). Thenthe estimatedinformationmatrixis
N
A ,^2x
A- gi X A (7)
·i= [Gi (1 -Gi)]
The standard error of Bi reported from standard binary response analysis (regardless of the
nature of yi) would be obtained as the square root of the jth diagonal element of A-1. Under
equation(4) only, this is not a consistentestimatorof the true asymptoticstandarderror;we
also need the outer productof the score. Let ui yi-G(xfi,) be the residuals (deviations
ECONOMETRIC METHODS FOR FRACTIONAL RESPONSE VARIABLES 623
betweenYiandits estimatedconditionalexpectation),anddefine
N 2A22
i Xi
B = uig (8)
i=1 [Gi(1-Gi)]2
Thena validestimateof the asymptoticvarianceof / is
_1BA_-1 (9)
The standarderrors are obtained as the square roots of the diagonal elements of equation (9); see
GMT (1984) andWooldridge(1991b) for generaltreatments.
Interestingly, the robust standard errors from equation (9) in the context of ordinary logit
and probit are computed almost routinely by certain statistics and econometrics packages,
such as STATA® and SST®. Unfortunately, the packages with which we are familiar
automaticallytransformthe dependentvariableused in logit or probitinto a binaryvariable
before estimation, or do not allow non-binaryvariables at all (STATA®and SST®fall into
the first category). With the minor change of allowing for fractional y in so-called binary
response analysis, standard software packages could be used to estimate the parameters in
equation (4) and to performasymptoticallyvalid inference. Alternatively,programmingthe
estimatorin a languagesuch as GAUSS®,as we do for our applicationin Section 4, is fairly
straightforward.
If the GLM assumption(6) is maintainedin additionto (4) then a2 is consistentlyestimated
by
N
d2=(N-K)-'1Eui (10)
i=1
where ai arethe weightedresiduals(sometimescalledthe Pearsonresiduals):

ai- ,[Gi(1 - )]1/2 (11)
(It is standardpractice in the GLM literatureto use the degrees-of-freedomadjustmentin

equation(10) in estimatinga2.) Then the asymptoticvarianceof / is estimatedas o2A-1; see
also MN (1989, p. 327). In addition, under equation (6) Var(yilxi) is proportionalto the
variance in the Bernoulli distribution, and so by the results of GMT (1984), the Bernoulli
QMLE is efficient in the class of QMLEs in the LEF. This is essentially the same as the class of
all weightedNLS estimators,andso it is a non-trivialefficiencyresult.
To summarize,we have chosen a functionalform that ensures estimates of E(ylx) are
between zero and one, and a quasi-likelihood function that leads to a relatively efficient QMLE
under a popular auxiliary assumption-namely, equation (6). In addition, we guard against
failure of this variance assumption by using equation (9) as the variance estimator. In the next
sectionwe suggestspecificationtests thatarevalid with andwithoutequation(6).
TESTING
3. SPECIFICATION
Specification testing in this framework can be carried out by applying the results of Wooldridge
(1991a,b). We discuss two forms of the test. The first is valid under equations (4) and (6); these
are non-robust tests because they maintain the GLM variance assumption. The second, robust
formof the testrequiresonly equation(4).
We focus primarilyon Lagrangemultiplieror score tests thatnest E(y Ix) = G(x/l) withina
more generalmodel. Let m(x, z,/8, y) be a model for E( y x, z), where z is a 1 x J vector of
additional variables; the elements of z can be non-linear functions of x (in which case
E( y Ix) = E( y Ix, z)), or variables not functionally related to x, or both. The vector y is a Q x I
vector of additional parameters. The null is assumed to be Ho: y= yo for a specified vector yo
(often yo= 0). Then,by definition,
G(x,8) - m(x, z, y, Yo) (12)
Given the estimates under the null, /B, define the 1 x K vector V -i, am(x,, z,/ , 7o)//fi = gix
and the 1 x Q vector Vyri ,mi (x,,z1,,, yo)/~y; these are the gradientsof the regression
to
functionwith respect / and y, respectively,evaluatedunderthe null hypothesis.Define the
weightedresidualsii as equation(11) andthe weightedgradientsas
in
Vf,ri = VA,in/[Gi(1 - Gi)]1/2= i/[Gi(1 -
Gi)]112 (13)
V7i -3 V y7 /[G(i(1 - Gi)]1/2 (14)
As in equation (11), the weights are proportional to the inverse of the estimated nominal
standarddeviation(see equation(6)). A valid test of Ho: y= Yodependson whatis maintained
underthe null hypothesis.Underthe assumptions
E( YiIxi, zi) = G(xifi) (15)
and
Var(y,i xi, zi) = a2G(xil,)[1 - G(x,if)] (16)
a valid statisticis obtainedas NR2 fromthe OLSregression
uiion Vwrhi,Vrhi i = 1,2, ..., N (17)
where Ru2is the constant-unadjustedr-squared.Under equations (15) and (16), NRu2is
distributedasymptoticallyas 2 -see Wooldridge(199 a).
For binary choice models, Engle (1984) and Davidson and MacKinnon (1984) suggest a test
based on regression (17) for logit and probit. Gurmu and Trivedi (1993) present results for a
class of models thatallows testingthe logit functionagainsta moregeneralindex function.But
for fractionaldependentvariablesit is importantto use the NRu2form ratherthanthe explained
sum of squaresform suggested in Davidson and MacKinnon(1984): the latter test requires
a2 = 1, which is alwaysthe case for binaryresponsevariablesbut is too restrictivefor fractional
response variables. Alternatively, as in Gurmu and Trivedi (1993), each term in regression (17)
can be dividedby 6 and then the explainedsum of squarescan be used. This is essentiallythe
sameas the NR2statistic(althoughthey will differif 6 is estimatedwith the degrees-of-freedom
adjustmentin equation(10)).
It is often useful to have a likelihood-based statistic, especially for testing exclusion
restrictions.Under the same two assumptions(15) and (16), a quasi-likelihoodratio (QLR)
statistic has a limiting chi-square distribution. Let N(f, Yo) denote the log-likelihood
evaluated under the null, and let 4N(P, 5) denote the log-likelihood from the unrestricted
model (that is, the Bernoulli log-likelihood with m(x,z, f, y) used in place of G(xi/f)).
Further,define rmi m(xi, Zi,Bf,)), and let the variance estimatorbased on the unrestricted
estimatesbe
N
2 - (N- K - Q)-' mi)2/[Im(1 -
(yi- ii)] (18)
i=
(note thatthe summationis simplythe sum of weightedsquaredresidualsfrom the unrestricted

model). Then the QLR statistic, defined by

QLR
- 2[~N(P, Y) -£N(f, yo) ]/d2 (19)
is distributed asymptotically as x2 under the null hypothesis, provided equation (16) holds in
addition to (15). The validity of this statistic follows because the usual information matrix
equality holds up to the scalar a2 when the conditional mean and conditional variance are
correctly specified.
A form of the LM statistic that is valid under equation (15) alone can be computed from an
additional regression. First, regress V^ hi on Vrhii and save the 1 x Q residuals,
= =
r (fl, i2, ...,ri), i 1,2,..., N. (This is the same as regressing each element of V^hi on the
entire vector V rhi, and collecting the residuals.) Next, obtain the 1 x Q vector
iiri
= (,,fil, uii, ..., uriiQ).The robust LM statistic is obtained as N - SSR, where SSR is the
usual sum of squaredresiduals from the auxiliary regression of unity on ii'r:
1 onuiri i= 1l,...,N (20)

Under Ho, which is equation (16) in this case, N - SSR 2%X2. The validity of this procedure is
discussed further in Wooldridge (1991a,b). Briefly, N- SSR from equation (20) is a quadratic
form in the vector N-1/2EiN1 r'ii, with a weighting matrix that is the inverse of a consistent
estimator of its asymptotic variance whether or not equation (16) holds.
In testing for omitted variables, one can use the QLR statistic or the usual LM statistic under
equations (15) and (16), or the robust LM statistic under equation (15) only. (Of course, Wald
statistics can also be defined for these two cases, but they are computationally more
cumbersome than the QLR and LM statistics.) For omitted variables tests,
m(xi, zi, , y) = G(xfi + ziy), V,mi = g,zi = g(Xif) zi, and V,mi = gizi/[Gi(1 - Gi)]1/2. One
way to test for functional form is to define zi as polynomials, interactions, or other functions of
i·.
A general functional form diagnostic is obtained by extending Ramsey's (1969) RESET
procedure to index models. For example, let the alternative model be
E( yi| xi) = G(xfi +Y7(xi)2 + y2(Xi.)3) (21)
where, again, G(.) is typically the logistic function or the standardnormal cdf. This alternative
functional form (or including even higher powers of xfi) can be motivated quite generally.
Since G(.) is a strictly increasing function in most applications, any index model of the form
E( yi Ixi) = H(xi/i) for unknownH can be arbitrarily by G(h_=1Yj(xi/)j) for
well approximated
J large enough (by standard approximation results for polynomials). Since models with
unobserved heterogeneity of the form E( yi Ixi, ,i)= G(x,fi + Oi), where i is independent of xi,
have an index structure, a test of the null model against equation (21) should have power for
alternatives that can be derived explicitly from models of unobserved heterogeneity. In practice,
the first few terms in the expansion are the most important, and we use only the quadratic and
cubic terms.
In the context of equation (21), the hypothesis that equation (15) holds (with zi = x,) is
stated as Ho: Yi = 0, Y2 = 0. This is easily tested using the LM procedures outlined above.
(By contrast, the QLR statistic is computationally difficult as well as nonrobust.) First,
estimate the model under the assumption Y7= Y2= 0, as is always done. Define /, Gi, gi,ui,
Vrhi, and ii as before. The gradient with respect to Y-(7Y, 72)' is Vyri= g' (xj)2,
gi (xi])3}, and V hi is defined in equation (14). The statistic obtained from regression (17)
is distributed approximately as x2 under (15) and (16). The robust form is obtained from
regression (20).
4. EMPIRICAL APPLICATION: PARTICIPATION IN 401 (k) PENSION PLANS

401(k) plans differ from traditional employer-sponsored pension plans in that employees are
permitted to make pre-tax contributions and the employer may match part of the contribution.
Since participation in these plans is voluntary, the sensitivity of participation to plan
characteristics-specifically the employer matching rate-will play a critical role in retirement
saving.
Pension plan administratorsare required to file Form 5500 annually with the Internal Revenue
Service, describing participation and contribution behavior for each plan offered. Papke (1995)
uses the plan level data to study, among other things, the relationship between the participation
rate and various plan characteristics, including the rate at which a firm matches employee
contributions. Papke (1995) also contains a discussion of the theoretical underpinnings relating
participationand the size of the match rate. Not surprisingly, under standard assumptions on the
utility function, participation is positively related to the match rate.
The participationrate (PRATE) is constructed as the number of active accounts divided by the
number of employees eligible to participate. An active account is any existing 401(k)
account-a contributionneed not have been made that plan year. The plan match rate (MRATE)
is not reported directly on Form 5500, but can be approximated by the ratio of employer to
employee contributions for plans that provide some matching. This calculated match rate may
exceed the plan's marginal rate because employer contributions include any flat per participant
contribution or any helper contribution made to pass anti-discrimination tests. While the
calculated match rate exceeds the marginal incentive facing each saver, it may be a better
indicator of overall plan generosity. See Papke (1995) for additional discussion.
Papke (1995) uses a spline method to estimate models with the participation rate, PRATE, as
the dependent variable. She finds a statistically significant positive relationship between PRATE
and MRATE, with some evidence of a diminishing marginal effect. Here, we allow for a
diminishing marginal effect of MRATE on PRATE by using a conditional mean of the form (4)
with G(.) taken to be the logit function. We compare this directly with linear models where
PRATE is the dependent variable.
Table I presents summary statistics for the sample of 401 (k) plans from the 1987 plan year.
Statistics are presented separately for the 80% of the plans with match rates less than or equal to
one. Match rates well above one likely indicate end-of-plan year employer contributions made to
avoid IRS disqualification; see Papke (1995) for further discussion. Initially, we focus on the
subsample with MRATE6 1.
Participationrates in 401 (k) plans are high-averaging about 85% in our sample. Over 40%
of the plans (42-73) have a participation proportion of exactly unity-all eligible employees
have an active account. This characteristic of the data would make a log-odds approach
especially awkward because an adjustment would have to be made to 40% of the observations.
The plan match rate averages about 41 cents on the dollar. Other explanatory variables include
total firm employment (EMP) which averages 4,622 across the plans. The plans average 12
years in age (AGE). SOLE is a binary indicator for whether the 401 (k) plan is the only pension
plan offered by the employer. Sole plans comprise about 37% of the sample.
We begin with the linear model
E(PRATE Ix) = #, + B2MRATE+ f3 log(EMP) + ,4 log(EMP)2 + f5AGE + B6AGE2+ ,7SOLE
(22)
which we estimate by ordinary least squares (OLS), initially using the subsample for which
MRATE< 1. The results are given in the first column of Table II. Because of the anticipated
FORFRACTIONAL
METHODS
ECONOMETRIC RESPONSE
VARIABLES 627
TableI. Summarystatistics
Standard
Variable Mean deviation Minimum Maximum
Full sample
Numberof observations= 4734
PRATE 0.869 0.167 0.023 1
MRATE 0.746 0.844 0.011 5
EMPLOYMENT 4621.01 16299.64 53 443040
AGE 13-14 9.63 4 76
SOLE 0.415 0.493 0 1
Restricted sample (MRATE< 1)

Number of observations = 3874
PRATE 0.848 0.170 0.023 1
MRATE 0.408 0.228 0.011 1
EMPLOYMENT 4621.91 17037.11 53 443040
AGE 12.24 8.91 4 76
SOLE 0.373 0.484 0 1
heteroscedasticity in this equation, the heteroscedasticity-robust standard errors are reported in

brackets below the usual OLS standarderrors.
All variables are highly statistically significant except for the sole plan indicator.
Interestingly, there is very little difference between the usual OLS standard errors and the
heteroscedasticity-robust ones. The key variable MRATE has a t-statistic well over 10. Its
coefficient of 0.156 implies that if the match rate increases by 10 cents on the dollar, the
participationrate would increase on average by almost 1.6 percentage points. This is not a small
effect considering that the average participation rate is about 85% in the subsample. The linear
model implies a constant marginal effect throughoutthe range of MRATEthat cannot literally be
true.
That the linear model does not fit as well as it should can be seen by computing Ramsey's
(1969) RESET (and its heteroscedasticity-robustversion). Let u^ibe the OLS residuals and let ,Yi
be the OLS fitted values. Then, the LM version of RESET is obtained as NR2 from the
regression
ui on x, y2, i= 1,2,...,N
Under the null that equation (22) is true, NR2 a x2 (homoscedasticity is also maintained). The
heteroscedasticity-robust version is obtained as N - SSR from regression (20) given the proper
definitions: let uii= ui and let ri be the 1 x 2 residuals from the regression of (y2, y3) on xi; see
Wooldridge (1991a) for more details. Using either non-robust RESET or its robust form,
equation (22) is strongly rejected (the 1% critical value for a x2 is 9.21). Because RESET is a
test of functional form, we conclude that equation (22) misses some potentially importantnon-
linearities. (As usual, there is a potential difference between a statistical rejection of a model and
the economic importance of any misspecification.)
We next use the logit QMLE analysed in Section 2 to estimate the non-linear model
E(PRATEIx) = G(,1 + B2MRATE

+ 83 log(EMP)+ 84 log(EMP)2 + 15AGE+ f6AGE2 + B7SOLE)
(23)
628 L. E. PAPKEAND J. M. WOOLDRIDGE
Table II. Results for the restricted sample
(1) (2) (3) (4)

Variable OLS QMLE OLS QMLE
MRATE 0-156 1-390 0-239 1.218

(0-012) (0-100) (0-042) (0-342)
[0-011] [0-1081 [0-046] [0-378]
MRATE2 -0-087 0.196
(0-043) (0-373)
[0-044] [0-425]
log(EMP) -0-112 -1-002 -0-112 -1-002
(0-014) (0.111) (0-014) (0-111)
[0-013] [0-110] [0.013] [0-110]
log(EMP)2 0.0057 0*052 0-0057 0.0522
(0-0009) (0-0071) (0-0009) (0-0071)
[0-0009] [0-0071 [0-0009] [0-0071]
AGE 0-0060 0-0501 0.0059 0.0503
(0-0010) (0-0087) (0-0010) (0-0087)
[0-0009] [0-0089] [0-0009] [0-0088]
AGE2 -0-00007 -0-00052 -0-00007 -0-00052
(0-00002) (0-00021) (0-00002) (0-00021)
[0-00002] [0-00021] [0-00002] [0-00021]
SOLE -0-0001 0.0080 0*0008 0-0061
(0-0058) (0-0468) (0-0058) (0-0470)
[0-0060] [0-0502] [0-0060] [0-0504]
ONE 1-213 5.058 1-198 5-085
(0-051) (0-427) (0-052) (0-430)
[0-048] [0-4211] [0-049] [0-423]
Observations: 3784 3784 3784 3784
SSR: 93-67 92-70 93-56 92-69
SER: 0-157 0.438 0-157 0.438
R-squared: 0-143 0-152 0-144 0-152
RESET: 39-55 0-606 35-06 0-732
(0-000) (0-738) (0-000) (0-693)
Robust RESET: 45-36 0*782 40-08 0-836
(0-000) (0-676) (0-000) (0-658)
Notes: The quantitiesin (*) below estimatesare the OLS standarderrorsor, for QMLE,the GLMstandarderrors;the
quantitiesin [-] are the standarderrorsrobustto variancemisspecification.SSR is the sum of squaredresidualsand
SER is the standarderrorof the regression;for QMLE,the SER is definedin termsof the weightedresiduals.The
values in parenthesesbelow the RESETstatisticsare p-values;these are obtainedfrom a chi-squaredistributionwith
two degrees-of-freedom.
where G(.) is the logistic function. (The GAUSS® code used for the estimation and testing is
available on request from the authors.) The partial effect of MRATE on E(PRATElx) is
aE(PRATE | x)/9MRATE, or, for specification (23), g(xfi)B2, where g(z)= dG(z)/dz = exp(z)/
[1 + exp(z)]2. Because g(z)--*O as z-- oo, the marginal effect falls to zero as MRATEbecomes
large, holding other variables fixed.
Column (2) of Table II contains the results of estimating equation (23). The variable MRATE
is highly statistically significant and, with the exception of SOLE (which is still not significant),
the directions of effects of all other variables are the same as in the linear model. Unlike the
linear model, the RESET statistic reveals no misspecification in equation (23); the p-value for
the robust statistic is 0.676, and it is even larger for the non-robust statistic. Based on this
RESET analog, equation (23) appears to capture the non-linear relationship between PRATE
and the explanatory variables for MRATE< 1.
There is other evidence that equation (23) fits better than (22). Table II also contains an r-
squared for each model, which in either case is defined as 1 - SSR/SST, where SST is the total
sun of squares of theyi. The SSRs, reported in Table II, are based on the unweighted residuals,
ui y -yi for OLS and QMLE. Thus, the r-squareds are comparable across any model for
E(PRATE Ix) and for any estimation methods. From Table II we see that the r-squared from the
logit model is about 6% higher than the r-squared for the linear model. Also, while OLS chooses
/5 to maximize the r-squared over all linear functions of x, the logit QMLE does not maximize
r-squared given the logit functional form; yet the logit model has a higher r-squared than the
linear model. Since we are only modelling the conditional expectation, with other features of the
conditional distribution left unspecified, the r-squared is the most appropriate goodness-of-fit
measure.
Before directly comparing estimates of the response functions and the marginal effects, some
other comments are worth making about Table II. First, each method comes with an SER
(standard error of the regression). These SERs are the estimates of a for the different models,
and thus are not directly comparable. For OLS, o2 is based on the unweighted OLS residuals,
while for QMLE, o2 is based on the weighted residuals; see equation (11). Because 0-438 =
for the QMLE, this implies that the usual logit standarderrors obtained from the inverse of the
Hessian, A1-, are over twice as large as the GLM standard errors that are obtained as the
squared roots of the diagonal elements of 6r2A-. The latter (smaller) standard errors are the
appropriateones under the GLM assumption (6) because they do not assume that a = 1. MRATE
is underdispersed (a2 < 1) relative to the Bernoulli variance (a2 = 1).
We now turn to a direct comparison of the linear and logistic models. To compare the
estimated response functions and marginal effects, we need to choose values for MRATE, EMP,
AGE, and SOLE. Because most 401 (k) plans are accompanied by other pension plans, we set
SOLE = 0. We also set AGE at roughly its sample average, AGE = 13. To gauge the differences
across firms of different sizes we choose three firm sizes: small (EMP = 200), average
(EMP = 4620), and large (EMP = 100,000) The estimated relationships between E(PRATE Ix)
and MRATE for the three different firm sizes are graphed in Figure 1. Interestingly, for a small
firm the linear and logistic predictions are most different at high match rates; for the average
sized firm, the difference is largest at low match rates; and for a large firm the largest difference
is at a match rate between 0 5 and 0.75.
As is seen from Table II, the marginal effect of MRATEon E(PRATE Ix) for the linear model
is 0 156 for any value of x. For the logistic model, we set SOLE=0, AGE= 13, and
EMP = 4,620, and compute the estimated partial effect at three different match rates:
MRATE1=0, MRATE= 0.50, and MRATE= 1.0. The estimated derivatives are 0.288, 0.197,
and 0.118, respectively, which illustrates the diminishing marginal effect as MRATE increases.
Perhaps not surprisingly, the marginal effect estimated from the linear model is bracketed by the
low and high estimates from the non-linear model. The differences in the estimated marginal
effects are not trivial; for example, the non-linear model predicts an increase in participationof
approximately 2.9 percentage points in moving from a zero match rate to MRATE= 0.10, rather
than the 1 6 percentage point increase obtained from the linear model. Similarly, at high match
rates the marginal effect from increasing the match rate is estimated to be lower in the non-linear
model.
One way to try to salvage the linear model is to use a more flexible functional form in the
match rate. A popular functional form that allows a diminishing marginal effect is a quadratic.
Column (3) contains estimates of the linear model that includes a quadratic in MRATE. The
630 L. E. PAPKEAND J. M. WOOLDRIDGE
(a) I--- (b) 1-
.8 - .9-
I--
.7 .7 -
d .25
.s .is d .i5 A .i5 i
MRATE MATE
(C) 1-
.9 -
.1 .9" linear: ---

..- logistic:
.7 -
d 5.s . .2'5
IRATE
Figure 1. PRATE versus MRATEfor various fi sizes: (a) EMP=200; (b) MP=4620; (c)
EMP= 100,000
squared term is marginally significant (robust t-statistic -1.98), and this does give a
diminishing marginal effect. But even with this additional regressor the model in column (3)
does not fit as well as the logistic model without the quadratic term (the r-squared for the linear
model with the quadratictermis only 0-144). Further,the rejectionof the model by RESETis
almostas strongas it was withoutthe quadratic.Thus, we concludethatsimplyaddingMRATE2
to equation (22) is not sufficient. (The spline approach used by Papke (1995) is more effective in
capturinga diminishingeffect in this application,but the coefficients are more difficult to
interpret.)
When MRATE2is addedto equation(23) it turnsout to be insignificant.Thus, the logistic
functional form, with the term linear in MRATE, appears to be enough to capture the
diminishing effect, at least for MRATE< 1. This is a useful lesson: a significant quadratic term
in a linear model might be indicating that an entirely different, more parsimonious, functional
formcan providea betterfit. Model (23) is clearlythe preferredspecificationthusfar.
As another test of model (23), we interact log(EMP) with each of MRATE, AGE, AGE2, and
SOLE and test for exclusion of these four interactions using the LM and QLR tests discussed in
Section 3. This is similar in spirit to a Chow test where the sample is split based on firm size, but
here we do not need to make an arbitrarychoice about where to split the sample. The LM statistic
is 16-52, the robustLM statisticis 14-41, andthe QLRstatistic,computedfromequation(19), is
15-78 (2 = -1547-33, £ = - 1548-84, and 6=2= 0.1914). The associated p-value for the robust LM
statisticis 0s006, which rejectsequation(23) at the 1% significancelevel. Thus, equation(23)
apparentlymisses some non-linearities, although the significance level is not very small given the
large sample size (comparethe p-value for RESET in the linear model).
From a practical perspective, the story about the relationship between expected PRATE and
MRATE does not change: the t-statistic on the term log(EMP)*MRATE is only -1-27 (the
robust t-statistic is -1-13). In fact, when log(EMP) MRATE is dropped from the more general
model, the coefficient on MRATE becomes 1*396, which is a trivial change from 1*390, the
ECONOMETRIC
METHODS
FORFRACTIONAL VARIABLES
RESPONSE 631
estimate from equation (23). The most significant interaction term is log(EMP)- SOLE, with a
t-statistic of -3.48 (robust t-statistic = -3.47). We report only equation (23) because of its
simplicity and because it captures the economically important relationship between PRATE and
MRATE.The full set of results is available on request from the authors.
The basic story does not change when we estimate the models over the entire sample. One
notable difference is that a quadratic term in MRATE is now significant in equation (23),
reflecting a faster diminishing effect at high match rates. Table III presents the same models as
Table II, now estimated over the full sample. First consider the models without MRATE2.The
discrepancy in r-squareds between equations (23) and (22) is even greater than before, but
RESET now rejects both equations, although the logistic model is rejected less strongly. In
columns (3) and (4) we put MRATE2into each equation. Model (22) is still soundly rejected,
whereas (23) with MRATE2passes the RESET test with a p-value above 0.50. For the full
sample, it seems that a quadratic in MRATE-or some other way to capture additional
non-linearities- is needed to provide a reasonable fit.
TableIII. Resultsfor the full sample
(1) (2) (3) (4)

Variable OLS QMLE OLS QMLE
MRATE 0-034 0-542 0-143 1.665
(0.003) (0-045) (0.008) (0-089)
[0-003] [0-079] [0-008] [0-104]
MRATE2 -0-029 -0-332
(0-002) (0-021)
[0-002] [0-026]
log(EMP) -0-101 -1-038 -0-099 -1-030
(0-012) (0-121) (0-012) (0-112)
[0-012] [0-110] [0.012] [0-110]
log(EMP)2 0-0051 0-0540 0*0050 0.0536
(0-0008) (0-0078) (0-0008) (0-0072)
[0-0008] [0-0071] [0-0008] [0-0071]
AGE 0.0064 0-0621 0-0056 0-0548
(0-0008) (0-0089) (0-0008) (0-0082)
[0-0007] [0-0078] [0-0007] [0-0077]
AGE2 -0-00008 -0-00071 -0-00007 -0-00063
(0-00002) (0-00021) (0-00002) (0-00019)
[0-00002] [0-00018] [0-00001] [0-00018]
SOLE 0*0140 0-1190 0-0066 0-0642
(0-0050) (0-0510) (0-0049) (0-0471)
[0.0052] [0-0503] [0-0051 ] [0.0498]
ONE 1.213 5-429 1.170 5.105
(0-045) (0-467) (0-044) (0.431)
[0-044] [0-422] [0-042] [0-416]
Observations: 4734 4734 4734 4734
SSR: 120-70 109-51 107-76 105-73
SER: 0.154 0-502 0-151 0-461
R -squared: 0.144 0-168 0-182 0-197
RESET: 85-22 50-56 83-80 1.370
0.000) (0-000) (0-000) (0.504)
RobustRESET: 69-15 9-666 98-51 1.275
(0-000) (0-008) (0-000) (0.529)
Note:SeeTableII.
632 ANDJ. M.WOOLDRIDGE
L. E.PAPKE
Putting MRATE2into equation (23) has the usual drawbackfor quadratics:it implies an eventual
negative marginaleffect. In this case, the marginaleffect becomes negative at a match rate of about
2.51. This is a high value for MRATE,but there are some match rates this large in the full sample.
5. CONCLUSION
The functional forms offered in this paper are viable alternatives to linear models that use either
y or the log-odds ratio of y as the dependent variable. No special data adjustments are needed for
the extreme values of zero and one, and the conditional expectation of y given the explanatory
variables is estimated directly. The quasi-likelihood method we propose is fully robust and
relatively efficient under the GLM assumption (6). The empirical application to 401(k) plan
participationrates illustrates the usefulness of these methods: while a linear model to explain the
fraction of participantsis strongly rejected, the logistic conditional mean specification is not.
Methods for fractional dependent variables have many applications in economics. For
example, Hausman and Leonard (1994) have recently applied the methods suggested here to
estimate a model for Nielsen ratings for telecasts of NBA basketball games.
ACKNOWLEDGEMENTS
We are grateful to John Mullahy and two anonymous referees for helpful comments. The second
author would like to thank the Alfred P. Sloan Foundation for financial support.
REFERENCES
Davidson,R. and J. G. MacKinnon(1984),'Convenientspecificationtests for logit and probitmodels',
Journalof Econometrics,24, 241-262.
Duan,N. (1983), 'Smearingestimate:a nonparametric retransformation method',Journalof theAmerican
StatisticalAssociation,78, 605-610.
Engle, R. F. (1984), 'Wald, likelihoodratio, and Lagrangemultiplierstatisticsin econometrics',in Z.
Griliches and M. D. Intriligator(eds), Handbookof Econometrics,Volume 2, 776-828, North-
Holland,Amsterdam.
Gourieroux,C., A. Monfortand A. Trognon (1984), 'Pseudo-maximumlikelihood methods:theory',
Econometrica,52, 681-700.
Gurmu,S. andP. K. Trivedi(1993), 'Variableaugmentationspecificationtests in the exponentialfamily',
EconometricTheory,9, 94-113.
Hausman,J. A. and G. K. Leonard(1994), 'Superstarsin the NBA: economic value and policy', MIT
Departmentof EconomicsWorkingPaperNo. 95-2.
Maddala,G. S. (1983), Limited Dependentand QualitativeVariables in Econometrics,Cambridge
UniversityPress,Cambridge.
McCullagh, P. andJ.A. Nelder(1989),Generalized LinearModels,2ndedition,Chapman andHall,NewYork.
Mullahy, J. (1990), 'Regressionmodels and transformationsfor beta-distributedoutcomes', mimeo,
TrinityCollegeDepartmentof Economics.
Papke,L. E. (1995), 'Participation
in andcontributionsto 401(k) pensionplans:evidencefromplandata,'
Journalof HumanResources,30, 311-325.
Papke,L. E. and J. M. Wooldridge(1993), 'Econometricmethodsfor fractionalresponsevariableswith
an applicationto 401(k) plan participationrates', NationalBureauof EconomicResearchTechnical
WorkingPaperNo. 147.
Ramsey,J. B. (1969), 'Tests for specificationerrorsin classical linearleast squaresregressionanalysis',
Journalof theRoyalStatisticalSociety,SeriesB 31, 350-371.
Wooldridge,J. M. (1991a), 'On the applicationof robust, regression-baseddiagnosticsto models of
conditionalmeansandconditionalvariances',Journalof Econometrics,47, 5-46.
Wooldridge,J. M. (1991b), 'Specificationtestingand quasi-maximumlikelihoodestimation',Journalof
Econometrics,48, 29-55.

Papke Wooldridge 1996

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Papke Wooldridge 1996

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Papke Wooldridge 1996

Uploaded by

Copyright:

Available Formats

Econometric Methods for Fractional Response Variables With an Application to 401 (K) Plan

LESLIE E. PAPKE AND JEFFREY M. WOOLDRIDGE

E(y I x)=fl + 42X2+ '+ + PKXK

CCC 0883-7252/96/060619-14 Received 25 October 1993

E(log[y/(1 - y)] Ix) = x/ (2)

xr( 1 + exp(xf +v)

2. FUNCTIONAL FORMS AND QUASI-LIKELIHOOD METHODS

is consistentfor f providedthatequation(4) holds. (This follows from GMT(1984) and is also

Unless ni and xi are independent,equation(6) generallyfails. In our application,whereyijis a

where ai arethe weightedresiduals(sometimescalledthe Pearsonresiduals):

(It is standardpractice in the GLM literatureto use the degrees-of-freedomadjustmentin

(note thatthe summationis simplythe sum of weightedsquaredresidualsfrom the unrestricted

model). Then the QLR statistic, defined by

1 onuiri i= 1l,...,N (20)

4. EMPIRICAL APPLICATION: PARTICIPATION IN 401 (k) PENSION PLANS

Restricted sample (MRATE< 1)

heteroscedasticity in this equation, the heteroscedasticity-robust standard errors are reported in

E(PRATEIx) = G(,1 + B2MRATE

Table II. Results for the restricted sample

(1) (2) (3) (4)

MRATE 0-156 1-390 0-239 1.218

(a) I--- (b) 1-

.1 .9" linear: ---

TableIII. Resultsfor the full sample

(1) (2) (3) (4)

You might also like