Specification and Testing of Some Modified Count Data Models
Specification and Testing of Some Modified Count Data Models
Specification and Testing of Some Modified Count Data Models
North-Holland
John MULLAHY
Yale University, New Huven, CT 06520, USA
Resources for the Future, Washington, DC 20036. USA
This paper explores the specification and testing of some modified count data models. These
alternatives permit more flexible specification of the data-generating process (dgp) than do
familiar count data models (e.g., the Poisson), and provide a natural means for modeling data that
are over- or underdispersed by the standards of the basic models. In the cases considered, the
familiar forms of the distributions result as parameter-restricted versions of the proposed modified
distributions. Accordingly, score tests of the restrictions that use only the easily-computed ML
estimates of the standard models are proposed. The tests proposed by Hausman (1978) and White
(1982) are also considered. The tests are then applied to count data models estimated using survey
microdata on beverage consumption.
1. Introduction
Interest in comer-solution problems in econometrics has given rise to an
assortment of methods designed to allow consistent parameter estimation in a
corresponding assortment of model specifications. Recent extensions of the
econometric research on comer-solution outcomes have concentrated on the
specification and testing of models for non-negative data that are measured as
integers, or count data.’ In addition to discussion of basic model structures,
*This paper is a substantial revision of a paper originally circulated as ‘Hurdle Models for
Discrete and Grouped Dependent Variables’. The research has been supported in part by a
cooperative agreement between the U.S. Environmental Protection Agency and Resources for the
Future but should not be inferred to represent views of EPA. Thanks are due to two referees and
Paul Portney for helpful and constructive comments on earlier drafts. Any errors that remain must
be attributed to the author.
‘Much of the interest in count data modeling appears to stem from the recognition that the
use of continuous distributions to model integer outcomes might have unwelcome consequences,
including inconsistentparameter estimates. Even if it is maintained that the integer outcomes are
generated by latent continuous variates, the arguments presented by Stapleton and Young (1984)
suggest that because such integer outcomes are actually continuous realizations measured with
error, inconsistent parameter estimates can result if standard methods like ML Tobit are used for
estimation. Rosenzweig and Wolpin (1982) explicitly rule out the use of a Tobit estimator in their
analysis of fertility outcomes, in which the fertility measure used is the number of children born to
a mother during some time interval. In another application, Portney and Mullahy (1986) conduct
some tests for a Tobit specification of their count measure and find considerable evidence of
misspecification.
where r+ = r\ {0}, a standard count data model specifies cpi(y, 0,) = $~~(y, 8,)
for all y E r, so that
Proposed here are two types of modifications to the basic count data models
in which (1) is satisfied but where &(r, 0,) # c#I~(~, 0,). These are termed
hurdle 4 and with-zeros (WZ)’ models. While the two types of modifications in
general have different structures, it is shown later that they collapse into the
same model under some circumstances. The basic idea underlying these
modifications is that both permit the relative probabilities of zero and non-zero
realizations to differ from those implied by the parent distributions that they
modify.
A particularly interesting feature of the modified count data specifications
considered here is that they provide a natural means for modeling overdisper-
sion or underdispersion of the data.6 Specifically, overdispersion and underdis-
persion are viewed as arising from a misspecification of the maintained parent
dgp in which the relative probabilities of zero and non-zero (positive) realiza-
tions implied by the parent distribution are not supported by the data. By
*See variously Cameron and Trivedi (1986), Gourieroux, Montfort and Trognon (1984b),
Hausman, Hall and Griliches (1984), Hausman, Ostro and Wise (1984), Lee (1984a), Manning,
Lillard and Phelps (1983), and Terra (1985).
3The adjectives ‘standard, ‘basic’, and ‘parent’ when describing count data models are used
interchangeably in this paper, and refer to models having structures like that in eq. (2) below.
4This term is borrowed from Cragg (1971).
5This term is used by Johnson and Katz (1969, p. 205).
‘Cox (1983) and McCullagh and Nelder (1983) are good references on overdispersion. Section 2
treats the issue in greater detail, and provides additional citations.
J. Mullahy, Modified count data models 343
Lp = C _hX,P (4)
1E.a
‘For example, a popular generalization of the Poisson model assumes that a random component
in the basic Poisson expectation function is gamma-distributed in the population, so that a
negative binomial model results as a gamma mixture of the Poissons. See Hausman, Hall and
Griliches (1984) and Cameron and Trivedi (1986) for additional discussion.
*In the following, js = aj/as denotes the vector of first partial derivatives of j with respect of
6, and jar = a2j/r9&9[’ is the matrix of second partials of j with respect to S and {‘. Where no
ambiguity is possible, V/ and v2f are occasionally used to denote the gradient and Hessian of j.
9, = (fly, = 0}, Q, = (rly, E r,}, and Q = s2, U D,. The symbols L and A refer in general to
loglikelihood functions of basic and modified models, respectively.
344 J. Mullahy, Modijied count data models
= 0, else,
where
E(Y,)=y a n d var(Y,)=y(l+y).
‘Discussion is confined here to the geometric version of the negative binomial. The analysis is
extended to the general negative binomial distribution and other count data models in a
straightforward manner.
J. Mullahy, Modijied count data models 345
The first modified count data models considered here are termed hurdle
models, following the terminology developed by Cragg (1971).” The idea
underlying the hurdle formulations is that a binomial probability model
governs the binary outcome of whether a count variate has a zero or a positive
realization. If the realization is positive, the ‘hurdle’ is crossed, and the
conditional distribution of the positives is governed by a truncated-at-zero
count data model. Formally, for a random variable Y, the conditional distribu-
tion of the positives is &(y, f3,)/@,(0,), y E r+, where & satisfies (2) and
Q2, the summation of & on the support of the conditional density, is the
truncation normalization. The probability that the threshold is crossed is
@,(8,).” Thus, the general form of the hurdle model likelihood function is
ex~(An) = tg
0 [l - w4N ,IJi 1 MY, 4)1, 02)
which resembles the likelihood function of a Tobit model. If @r(&) = @,,(r3,)
as a result of parameter restrictions 8i = 8,, then the model is akin to that
investigated for normal distributions by Cragg and by Lin and Schmidt (1984)
who demonstrate that Tobit results as a parameter-restricted version of one of
Cragg’s original specifications.
In any particular application, there will likely exist numerous plausible
specifications of both the binary probability model and the conditional distri-
bution of the positives. For present purposes, only specifications where (11)
reduces to (12) as a result of the parameter restrictions t3r = 0, are of interest,
the objective here being the development of count data analogs of Cragg’s
Tobit modifications.
To motivate the Poisson hurdle specification, consider the dgp:
Pr(y=O)=exp(-X,)X{/y!=exp(--A,), (13)
“See Cragg (1971) or Lin and Schmidt (1984) for additional discussion.
“In general, Q1, &, and Q2 also depend on covariates X,.
346 J. Mullahy, Modified count data models
APH=lOg
i[
,g {exp[--exp(X,&)]jt~
0 1
{1--exP[-exP(4&)1)
1
x [ tg
L 1
exp(y,X,P,)/({exp[exP(x,P2)1 - l)Y,!)]) 06)
= [AP’(&)l + [AP2m)l,
which reduces to (4) when /I1 = &. A” can be regarded as a loglikelihood
function for the binary (zero/positive) outcome and Ap2 as a loglikelihood
function for a truncated-Poisson model. Thus, the ML estimates of & and /I2
can be obtained by separate maximization of Apl and Ap2, respectively.
For the geometric model, an interesting hurdle specification corresponding
to (11) is the dgp:
and
Pr(yly>0)=y,(y-1’/[(1+Y2)ul, YEr+
(19)
= 0. else.
Parameterizing y,, = exp( X,b,), it is seen that the binomial probabilities (17)
and (18) are identically those of a standard binomial logit model.12 Eq. (19) is
in the form of a truncated-at-zero geometric model. The complete loglikeli-
12This result suggests that when the basic specification is correct, the geometric parameters can
be estimated consistently, though not efficiently, by standard binomial logit programs. Such an
approach is analogous to consistent but inefficient estimation of a Tobit model’s parameters using
a probit model [see Amemiya (1984) and Ruud (1984)]. In the Tobit/probit case, however, only
the scaled parameters /3/a can be estimated by probit. Due to the functional dependence of the
location and scale parameters in the geometric specification, the natural or unscaled parameters
can be estimated by logit. Appendix A demonstrates the relative inefficiency of both the logit and
truncated geometric estimators of the basic geometric model.
J. Mullahy, Modified count data models 347
(17)-(19) is
= [AG1(P1)l + [AGYP*)
which reduces to (8) when & = &. Again the ML estimates can be obtained
by separate maximization of AC1 and AG2.
The second class of modified count models is termed the WZ class,
following the terminology developed by Johnson and Kotz (1969, pp. 204-206).
Like the hurdle models, the idea motivating the WZ specifications is that the
conditional distribution of the positives is properly characterized by the
truncated-at-zero version of the parent distribution. The probabilities of
the positives relative to the probability of the zero outcome, however, are no
longer as specified by the parent distribution. Instead, the WZ model specifies
that the probability of the zero outcome is additively augmented or reduced by
an amount J, so that, in the notation of (1) and (2),
+,(r, #> @) = J, + 0 - +>+(r, d), y = 0, (21)
and
+2(Y>wo=(1 -aa( YET+, (22)
where +(y, 0) are the probabilities specified for all y E r by the parent
density, and the terms (1 - JI) ensure that (21) and (22) constitute a proper
discrete probability distribution.l3 When +!J > 0 ( +!J < 0), the relative probabili-
ties +,(O, e)/+,(y, t9), y E r+, are greater (less) than those specified by the
parent distribution; similarly, when II/ > 0 (J/ < 0) P(Y) is less (greater) than in
the parent model. When $J = 0, the basic distribution obtains.
The loglikelihood functions of the Poisson and geometric WZ models are,
respectively,
13Johnson and Katz (1969, p. 205) note that the constraint 4 E q = [ -~&p,/(l - +,,), l), where
I$,, = +(O, 0), is also required.
348 J. Mullahy, count data models
and
AGz = C log[ #exp( X,/3) + l]
+= {l-exp[exp(p)-exp(p+o)l}/{l-exp]exp(B)l]
in the Poisson hurdle model, while similar manipulation of (20) and (24) gives
J,= [l-exp(a)l/[l+exp(p+cu)l
in the geometric hurdle model. In both instances, sign(a) = - sign( #), and, for
finite j?, $J approaches sup( ‘k) and inf( ‘k) as (Y approaches - cc and + co,
respectively. It is also the case in the intercept-only specifications that the ML
estimate of the intercept parameter in the WZ models is identical to that
obtained by ML estimation of the intercept parameter in the truncated-at-zero
variant of the parent distribution.15
Overdispersion in count data models has been discussed extensively.16
Overdispersion is meaningful only in reference to some maintained dgp for y,
and for present purposes can be defined as a situation where the ratio
var( Y)/E(Y) exceeds that implied by the maintained dgp for y. For example,
overdispersion is present in the basic Poisson and geometric models if
var( Y)/E( Y) > 1 and var( Y)/E( Y) > 1 + E(Y), respectively. Underdisper-
sion is defined by reversing the inequalities.
To see that the hurdle models naturally admit overdispersion or underdis-
persion, consider var(Y)/E(Y) in a general hurdle formulation. In the nota-
14The gradients and Hessians of (20) and (24) are presented in appendix B.
“Johnson and Katz (1969, pp. 205-206).
16See, for example, Cox (1983), Hausman, Hall and Griliches (1984), Cameron and Trivedi
(1986), and McCullagh and Nelder (1983).
J. Mullahy, Modijed count data models 349
3. Specification testing
This section discusses several specification tests for the models described in
section 2. Score tests and Hausman (1978) tests are proposed for testing the
basic specifications against specific alternatives, the hurdle and WZ models.
White’s (1982) information matrix test is proposed as an omnibus test of the
null hypothesis that the basic specification is a correct characterization of the
dgp.
The score (or Lagrange multiplier) principle for specification testing in
econometrics has been discussed extensively.17 Because ML estimates of the
“See Breusch and Pagan (1980) and Fhgle (1984) for detailed discussions.
350 J. Mullahy, Modified count data models
basic count data models can be easily obtained, 18 the score test approach is
appealing here. The general form of the score statistic for testing H0: h(8) = 0
is
t=s(O)'r(O)-ls(O), (27)
where s(/~) is the k × 1 score vector and T(/~) is the k x k information matrix,
both evaluated at the ML estimates of the restricted model; 8 = (81, O~)' is the
k × 1 parameter vector, where 81 is p × 1 and O2 is (k - p ) × 1; and h(8) is
an r × 1 vector of restrictions, where for present purposes h(O) = 01 = 0 and
p = r. t9 Since some dements of s(/~) are identically zero, only the non-zero
subvector of s(0) and corresponding submatrix of T(~) -1 are required to
compute 4- Under H 0, ~ is asymptotically distributed as a central X~ variate.
Much discussion of the score test has focused on computational methods. 2°
Given s(/~) and T(0), ~ can of course be computed using matrix calculations
according to (27). Typically T will be estimated by either the negative Hessian
of the restricted loglikelihood function or by the gradient outer product G'G,
where G is the T x k matrix having typical e l e m e n t [ O A t / O O j ] , evaluated in
either case at the restricted ML estimates. Alternatively, ~ can often be
obtained as a function of the R 2 of some auxiliary linear regression. 2~
Computations of ~ based on different estimates of T will yield different
values of the test statistic in finite samples, even when the null hypothesis is
tnae and the model is correctly specified. A separate complication arises when
the probabilities underlying the model's likelihood function are misspecified.
Although consistent parameter estimation under such circumstances is possi-
ble when the expectation function has been correctly specified, 22 inferences
based on standard estimates of the parameter covariance matrix will generally
not be robust against such misspecification. White (1982) and Engle (1984)
have suggested an amendment to the standard form of the score statistic (27)
lSSince the loglikelihood functions of both the basic Poisson and basic geometric models (4)
and (8) are concave, convergence to the ML estimates using a Newton-Raphson algorithm has
proven in practice to be quite rapid. Alternatively, non-linear weighted least squares can be used
to obtain the ML estimates of these models; see Hausman, Hall and Griliches (1984) and
Hausman, Ostro and Wise (1984).
19Since the paper is ultimately concerned with applying the fiuite-sample analogs of these test
statistics, whose known properties are largely asymptotic, T is taken here to be - ( V 2 A ) rather
than - E( W2A/T).
2°See Engle (1984) and Davidson and MacKinnon (1984a, b).
2tFor example, since I' can be estimated by G'G, ~ can be calculated as
= ,,a(O)(a(O)'a(O))-la(O) ", (*)
where ~ is a T × 1 vector of ones. Since dG(/~) = s(/))', (*) is simply an alternative expression of
(27). Moreover, since & = T, ~ in (*) is seen to be T times the uncentered R 2 from the regression
of L on G(~), or, alternatively, ~'~ from the same regression.
22See Gourieroux, Montfort and Trognon (1984a) and Cameron and Trivedi (1986).
J. Mullahy, Modified count data models 351
that ensures a test of the proper size when such misspecification is present.
Defining A =, - v*A(d) where A is the maintained loglikelihood function,
B = G(tl)'G(tl), and C = A-'BA -l, then the finite-sample analog of the statis-
tic proposed by White and Engle is
t* =S1(~)‘A1l(C1l)-lA1lSl(e), (28)
where the (1,l) blocks of A -’ and C correspond to the p non-zero elements of
s(8), Sl(J) = [W~~lll(e=e,. For purposes of comparison, the empirical
illustrations presented below in section 4 present score statistics calculated
using both standard parameter covariance estimates [i.e., (- v*A)-’ and
(G’G)-‘1 and the approach suggested by White and Engle.
The score test strategy for the count data hurdle models draws conceptually
on the work of Lin and Schmidt (1984). It was demonstrated in section 2 that
the basic count model specifications result when the restriction p1 = /I2 is
imposed in the hurdle models. Computation of the score test statistic is
simplified by reparameterizing the hurdle models along the lines suggested by
Lin and Schmidt where given the new parameters (OL, p), with p1 = OL + p and
& = /3, the score test is of H,: cx = 0. Under this reparameterization, however,
the Hessian of the hurdle model loglikelihood function is no longer block-
diagonal (see appendix B).
The score test for the WZ specifications is of H,: 4 = 0. Since H, specifies a
point in int(9), s+rdard methods of inference can be used. Note that only
one element of s(e) is non-zero since # is scalar. Under H,, 5 is asymptoti-
cally distributed as x f. The computation of the score test is complicated,
however, because the Hessian of the WZ loglikelihood function is not block-
diagonal. Appendix B provides the formulae used to compute the geometric
hurdle and WZ model score tests.
An alternative test strategy for the hurdle models recognizes that when
PI = P2 in (16) or (20), P (= P1 = Pz) can be estimated consistently by
maximizing the full loglikelihood function AjH (j = P, G), or either of its
components (Ai’, Aj*). However, as noted earlier and demonstrated in ap-
pendix A for the geometric model, the latter estimates are inefficient relative to
the former. Of course, when & # /3*, the three estimators will diverge asymp-
totically. These properties suggest that a Hausman test approach can be used
to test H,: & = /3*.
A finite-sample version of the Hausman test statistic is used here:
H=(&b)‘(@&)- p@))-‘(&&~
where b is the ML estimate of the restricted (i.e., basic) model, 8, is either of
the two estimates of the parameters of the unrestricted model (& or p2), and
352 J. Mullahy, Modified count data models
4. Empirical analysis
To illustrate the specifications and tests described above, data from the 1980
Wave II of the National Survey of Personal Health Practices and Conse-
quences (NSPHPC) are used. 25 Among the data reported in the NSPHPC are
individuals’ daily consumption of various beverages. Although beverage quan-
tity is a continuous measure, the protocol in the NSPHPC is to report
consumption in integer amounts (number of cups, glasses, etc.). Such beverage
consumption measures serve well to illustrate the points discussed above.
Analyzed here are individuals’ daily consumption of coffee (COFFEE), tea
(TEA), and milk (MILK). The explanatory variables used are an intercept
(INT), age in years (AGE), years of completed schooling (EDUC), family
income (INCOME), and O-l dummies for sex (MALE = 1 if male), race
(WHITE = 1 if white, = 0 if black), and marital status (MARRIED = 1 if
Table 1
Sample frequency distribution of dependent variables (T= 1,900).
26The education and income variables are pseudo-continuous, constructed using interval
midpoints. For the open-ended intervals, the value 17 was used for the schooling category ‘16 or
more’ years, and the value 35,000 was used for the income category ‘$25,000 or more’. Some
variables required to properly interpret the estimated models as demand functions are not
available (e.g., own and substitute goods’ prices); similarly, information about other determinants
of beverage consumption (e.g., religion) is not provided in the NSPHPC.
27For example, 13 observations for which daily coffee consumption was reported as greater than
15 cups, 19 observations for which daily tea consumption was reported as greater than 8 cups, and
10 observations for which daily milk consumption was reported as greater than 6 glasses were
deleted.
28E%imation is performed using a program written in SASS PROC MATRIX, which is
available from the author on request.
29For all the WZ models, the requirement that the estimate 4 be in the interval Y& =
I-440, Ml - $40, c%,l) was found to hold for each observation in the sample.
354 J. Muliahy, Modified count data models
Table 2
Sample descriptive statistics (T = 1,900)
Table 3
Estimation results: Dependent variable COFFEE (covariates included).a
“Figures in parentheses are estimated asymptotic standard errors derived from the negative
inverse Hessian of L evaluated at the ML estimates. Figures in square brackets are estimated
asymptotic standard errors derived from the parameter covariance estimates obtained using the
method proposed by White (1982) and Royall (1984).
J. Mullahy, Modified count data models 355
Table 4
Estimation results: Dependent variable TEA (covariates included).a
=Figures in parentheses are estimated asymptotic standard errors derived from the negative
inverse Hessian of L evaluated at the ML estimates. Figures in square brackets are estimated
asymptotic standard errors derived from the parameter covariance estimates obtained using the
method proposed by White (1982) and Royal1 (1984).
Table 5
Estimation results: Dependent variable MILK (covariates included).=
aFigures in parentheses are estimated asymptotic standard errors derived from the negative
inverse Hessian of L evaluated at the ML estimates. Figures in square brackets are estimated
asymptotic standard errors derived from the parameter covariance estimates obtained using the
method proposed by White (1982) and Royal1 (1984).
alternative estimator where the implied variance/mean ratio is less than that
of the geometric (e.g., Poisson) might be appropriate.
Table 6 presents the ML estimates of the intercept-only models. Among
other things, table 6 demonstrates two points noted earlier: first, that the
estimates of the intercept parameters in the truncated and the WZ models will
be identical when only an intercept term is included; and, second, that the
relationship IJ = [l - exp( a)]/[1 + exp( /3 + CX)] obtains between the estimated
parameters in the WZ and hurdle models.30
Tables 7 and 8 summarize the specification test results for the models with
covariates included and the intercept-only models, respectively. In the first
“Considering the tea model as an example, and using the reparameterizations b = fir = 0.124
and & = & - jr = -0.598, where 8, and 8, are the logit and truncated-geometric intercepts,
then 4 = 0.277 = (1 - exp( - 0.598))/(1 + exp( - 0.474)).
J. Mullahy, Modified count data models 351
Table 6
Estimation results: Intercept-only models.”
COFFEE
INT 0.995 1.032 0.982 0.982
(0.027) (0.052) (0.031) (0.031)
[0.024]
4 -0.0137
(0.016)
L, A - 4105.35 - 1094.00 - 3010.99 - 4105.00
TEA
INT - 0.201 - 0.474 0.124 0.124
(0.034) (0.047) (0.051) (0.051)
[0.038]
4 0.211
(0.027)
LA - 2316.11 - 1265.09 - 1074.18 - 2339.28
MILK
INT 0.104 0.390 - 0.150 - 0.150
(0.032) (0.047) (0.044) (0.044)
[0.026]
4 - 0.289
(0.039)
LA - 2112.13 - 1281.51 - 1455.23 - 2136.13
aFigures in parentheses are estimated asymptotic standard errors derived from the negative
inverse Hessian of L evaluated at the ML estimates. Figures in square brackets are estimated
asymptotic standard errors derived from the parameter covariance estimates obtained using the
method proposed by White (1982) and Royal1 (1984).
rows are presented the results of White’s IM test applied to the basic
geometric models. As argued in section 3, the IM test can be viewed as an
omnibus test for model misspecification. For both the models with covariates
and the intercept-only models, the IM tests strongly suggest that the basic
model is a misspecification of the dgp, as the null hypothesis of no misspecifi-
cation is rejected in all but one instance at greater than the 0.9999 level.31
31 When an intercept term is included in the X, vectors, the presence of O-l dummy variables in
X, reduces the number of distinct upper triangular elements in v*L, to at most CJ = 0.5m( m + 1)
_ d, where L, is the contribution of the tth observation to the restricted loglikelihood, m is the
number of columns in X,, and d is the number of O-l dummy variables. Since in the present
application m = 7 and d = 3, the information matrix test statistics in the models with covariates
are distributed x&. The method proposed by Lancaster (1984, eq. 6) is used to calculate the IM
test statistics.
358 .I. Mullahy, Modified count data models
Table 7
Specification test results (models with covariates included).a
“Figures in parentheses are Pr(xi < s), where s is the test statistic and 4 is the degrees of
freedom of the test statistic; a value of 0.9999 signifies Pr(Xi i s) 2 0.9999. Figures in square
brackets are the degrees of freedom of the test statistics. The test statistics have asymptotic central
xz distributions under the null.
The results of the tests designed to test the restricted models against their
corresponding hurdle variants are presented in rows 2-7 of tables 7 and 8. For
each model, the range of the six test statistics is quite small, and in all cases
except the intercept-only coffee model, rejection of the parameter restrictions
specified under the null hypothesis is indicated.32 Moreover, in each instance
32White (1982, p. 8) has noted that the use of the standard likelihood ratio test is not
appropriate in cases where the probability densities that form the sample likelihood function are
r&specified. In addition, the standard Hausman test approach uses estimates of the two
covariance matrixes that are consistent under the null hypothesis of no misspecification; accord-
ingly, no attempt was made to utilize alternative covariance estimators in calculating the Hausman
tests. The covariance estimates used to construct the Hausman test statistics are the inverses of the
matrixes in (A.l)-(A.3) in appendix A evaluated in the ML estimates. As shown in appendix A,
these estimates guarantee that the difference of the covariance matrix estimates in (29) will be
positive semidefinite as required for the Hausman test.
J. Mullahy, Modified count data models 359
Table 8
Specification test results (intercept-only models).a
COFFEE TEA . MILK
Hurdle models
Likelihood ratio PI 0.6916 74.9872 72.0001
(0.5964) (0.9999) (0.9999)
Score (Hessian) [ll 0.6951 14.8446 71.7445
(0.5956) (0.9999) (0.9999)
Score (Gradient) VI 0.1616 82.3494 80.1355
(0.6172) (0.9999) (0.9999)
Score (White-Engle) 111 0.7512 81.3942 16.2101
(0.6139) (0.9999) (0.9999)
Hausman test: geometric 0.6906 70.5412 69.1072
vs. logit PI (0.5940) (0.9999) (0.9999)
Hausman test: geometric 0.1010 74.5735 71.1284
vs. truncated-geometric PI (0.6000) (0.9999) (0.9999)
With-zeros models
Likelihood ratio PI 0.6916 74.9812 72.0001
(0.5964) (0.9999) (0.9999)
Score (Hessian) PI 0.6976 119.125 58.8129
(0.5964) (0.9999) (0.9999)
Score (Gradient) PI 0.7616 82.3494 80.1355
(0.6172) (0.9999) (0.9999)
Score (White-Engle) PI 0.7484 11.0294 71.5301
(0.6130) (0.9999) (0.9999)
‘Figures in parentheses are Pr(xi -C s), where s is the test statistic and 4 is the degrees of
freedom of the test statistic; a value of 0.9999 signifies Pr(xi < s) 2 0.9999. Figures in square
brackets are the degrees of freedom of the test statistics. The test statistics have asymptotic central
xs distributions under the null.
the score test statistics calculated using the White-Engle approach are smaller
and larger than those calculated using the gradient outer product and negative
Hessian, respectively, to estimate the information matrix.
It is interesting that the range of the test statistics for each model is
relatively small: although each statistic has the same asymptotic distribution
under the null hypothesis, the similarity of their finite-sample behavior when
rejection of the null is favored was not anticipated ex ante.
The results of the tests of the basic models against the corresponding WZ
specifications are presented in rows 8-11 of tables 7 and 8. Again the values of
the test statistics fall within narrow ranges. Two results are particularly
interesting here. First, in the covariates-included and intercept-only tea mod-
els, the score test calculated using the White-Engle method is smaller than
360 J. Mullahy, Modified count data models
those based on the gradient outer product and negative Hessian methods.
Second, although still indicating rejection of the null hypothesis, the test
statistics for the coffee model with covariates included are substantially
smaller than those for the tea and milk models. In the intercept-only coffee
model, none of the test statistics recommends rejection of the null at conven-
tional confidence levels. Upon examination of the ratio of the estimates of 1c,
to their asymptotic standard errors in the coffee models, such results are not
surprising.
5. Summary
This paper has explored the specification and testing of some variants on
familiar count data models. The alternative specifications considered were
termed hurdle and with-zero models, from which the familiar models were
demonstrated to arise through parameter restrictions. Both alternatives were
shown to allow for a degree of flexibility in model specification that is
precluded by the basic model. In particular, it was seen that overdispersion
and underdispersion could be accounted for by both alternatives. Score,
Hausman, and information matrix tests for m&specification were proposed.
The ideas were illustrated by estimating count data models of beverage
consumption using survey microdata. In virtually all instances, the specifica-
tion tests recommended rejection of the null hypothesis of no misspecification.
For a given model, the different test statistics tended to behave quite similarly.
Appendix A
Since the ML logit and truncated-geometric estimators of the geometric
model fail to utilize all sample information, their inefficiency relative to the
ML geometric estimator follows immediately from the fact that the geometric
estimator, which uses all sample information, is FIML. The following demon-
stration is illustrative. Let LG, AL, and AT denote the loglikelihood functions
of geometric, logit, and truncated-geometric models. Then
64.2)
Since yt 2 0 for all t, then O,, O,, and @r are each positive semidefinite. It is
J. Mullahy, Mod$ed count data models 361
easy to see from (A.l)-(A.3) that (0, - 0,) = Or and (0, - Or) = 0, if the
0; are all evaluated at the same /3. The logit and truncated-geometric estima-
tors are inefficient relative to the geometric estimator since both (0, - 0,)
and (0, - Or), and therefore (0;’ - 06’) and (0,’ - 0, ‘), are positive
semidefinite.
The relative efficiencies of the logit and truncated-geometric estimators
cannot in general be determined without knowledge of the sample (y,, X,)
values. In one extreme case where all y, tend toward zeros and ones, it can be
seen from (A.2) and (A.3) that (0, - 0,) becomes positive semidefinite, so
that logit is efficient relative to truncated-geometric.33 In another extreme
instance where all y, tend toward strictly positive integers, (Or - 0,) be-
comes positive semidefinite, so that truncated-geometric is efficient relative to
logit.
Appendix B
This appendix presents the gradient vectors and Hessian matrixes for the
geometric hurdle and WZ models, which are used to construct the specifica-
tion tests described in section 3 and implemented in section 4. For economy of
space, the corresponding Poisson formulae are omitted here, but are available
on request from the author.
The loglikelihood function of the geometric hurdle model (20) written in
terms of the parameters (fi + cx) = & and p = & is
AGH= tz -log{l+exp[X,(P+~)l}
0
+ C X,(p+ar)-log{l+exp[X,(P+~>l)
tsa,
GH= C (-exp[Xt(~+~)l/{l+exp[Xt(~+~)l))X:
ten,
33Even in this extreme case the geometric estimator remains efficient relative to the logit
estimator, as the former uses information on the magnitude of the positive y, while the latter
recognizes only their sign.
362 J. Mullahy, Mod$ed count data models
(B.2’)
A$“= C (-exp[X,(B+a)l/{l+exp[X,(~+a)l})x:
fELJ,
+ tsQ [(l/(1 + e&W + 41>)
1
+([~~-1-exp(X,~)l/[l+exp(X~P)I)IX: w
(B.3’)
AGH = AGH
ua 4
(B.5)
( c H A+ l)edX,P)/[l +exPGW12)4%
aIO, t=sJ
(B.5’)
A;;=
[
AGH
aa
AGH
Ba 1
AGH
aB
AGH
88
*
u3.6)
Note that adding to and subtracting from the numerator of each term in the
J. Muliahy, ModiJed count data models 363
s2, summation in (B.2’) the expression [y, - exp( X,/3)], and using (B.3’), the
non-zero elements of the score vector (B.2’) can be expressed as
A-‘B(B-A)_’
-(B-A)-'
-(B-A)-’
(B-A)-' I ’
the (1,l) block of ( - A$;)-’ required to calculate the score test is given by
A?= c ({J/exp(X,~)/[J/exp(X,~)+ll)
rEti?,
-{exp(X,P)/[l+exP(X,p)l})x,
+ tgQ {[Y,-exp(~,~)l/~l+-~p(X,~)l~~l (B-9)
1
xo{[Y,-exP(X,P)I/[l +exp(X,p)]}x;,
(&) fE (B.9’)
(B.lO)
(B.lO’)
(B.ll’)
where Tr = #ti,, and the equalities in (B.S’)-(B.12’) hold under the restriction
I/J = 0. Two points are noteworthy. First, unlike the geometric hurdle model,
the non-zero elements of the score vector (B.8’) are not clearly interpretable as
a function of residuals. Second, a simplification analogous to (B.7) is not
apparent here, so that the full Hessian of - AGz would have to be inverted to
calculate the score test statistic.
References
Amemiya, T., 1984, Tobit models: A survey, Journal of Econometrics 24, 3-61.
Breusch, T. and A.R. Pagan, 1980, The Lagrange multiplier test and its application to model
specification in econometrics, Review of Economic Studies 47, 239-253.
Cameron, A.C. and P.K. Trivedi, 1986, Econometric models based on count data: Comparisons
and applications of some estimators and tests, Journal of Applied Econometrics 1, 29-53.
Cox, D.R., 1983, Some remarks on overdispersion, Biometrika 70, 269-274.
Cragg, J.G., 1971, Some statistical models for limited dependent variables with application to the
demand for durable goods, Econometrica 39, 829-844.
.I. Mullahy, Modified count data models 365
Davidson, R. and J.G. MacKinnon, 1984a, Convenient specification tests for logit and probit
models, Journal of Econometrics 25, 241-262.
Davidson, R. and J.G. Ma&&non, 1984b, Model specification tests based on artificial linear
regressions, International Economic Review 25, 485-502.
Engle, R.F., 1984, Wald, likelihood ratio, and Lagrange multiplier tests in econometrics, in: Z.
Grihches and M.D. Intriligator, eds. Handbook of econometrics, Vol. II (North-Holland,
Amsterdam) 775-826.
Gourieroux, C., A. Montfort and A. Trognon, 1984a, Pseudo maximum likelihood methods:
Theory, Econometrica 52, 681-700.
Gourieroux, C., A. Montfort and A. Trognon, 1984b, Pseudo maximum likelihood methods:
Applications to Poisson models, Econometrica 52, 701-720.
Hausman, J.A., 1978, Specification tests in econometrics, Econometdca 46, 1251-1271.
Hausman, J.A., B. Hall and Z. Griliches, 1984, Econometric methods for count data with an
application to the patents-R&D relationship, Econometrica 52, 909-938.
Hausman, J.A., B. Ostro and D. Wise, 1984. Air pollution and lost work, Working paper no. 1263
(National Bureau of Economic Research, Cambridge, MA).
Johnson, N.L. and S. Katz, 1969, Distributions in statistics: Discrete distributions (Wiley, New
York).
Lancaster, T., 1984, The covariance matrix of the information matrix test, Econometrica 52,
1051-1053.
Lee, L.-F., 1984a, Specification tests for Poisson regression models, Discussion paper no. 208
(Center for Economic Research, University of Minnesota, Minneapolis, MN).
Lee, L.-F., 1984b, Comment to tests of specification in econometrics, Econometric Reviews 3,
257-259.
Lin, T.-F. and P. Schmidt, 1984, A test of the Tobit specification against an alternative suggested
by Cragg, Review of Economics and Statistics 66, 174-177.
Manning, W., L. Lillard and C.E. Phelps, 1983, Preventive medical care and its consequences
(Rand Corporation, Santa Monica, CA).
McCullagh, P. and J.A. Nelder, 1983, Generalized linear models (Chapman and Hall, London).
Portney, P.R. and J. Mullahy, 1986, Urban air quality and acute respiratory illness, Journal of
Urban Economics 20,21-38.
Rosenzweig, M.R. and K.I. Wolpin, 1982, Governmental interventions and household behavior m
a developing country: Anticipating the unanticipated consequences of social programs, Jour-
nal of Development Economics 10, 209-225.
Royall, R.M., 1984, Robust inference using maximum likelihood estimators, Working paper no.
549 (Department of Biostatistics, Johns Hopkins University, Baltimore, MD).
Ruud, P.A., 1984, Tests of specification in econometrics, Econometric Reviews 3, 211-242.
Stapleton, D.C. and D.J. Young, 1984, Censored normal regression with measurement error on the
dependent variable, Econometrica 52, 737-760.
Terza, J.V., 1985, A Tobit-type estimator for the censored Poisson regression model, Economics
Letters 18, 361-365.
U.S. Department of Commerce, 1982, Micro-data tape documentation for wave I and II of the
national survey of personal health practices and consequences, NTIS publication no. PB83-
104315.
White, H., 1982, Maximum likelihood estimation of r&specified models, Econometrica 50, l-25,
White, H., 1983, Corrigendum, Econometrica 51, 513.