Specification and Testing of Some Modified Count Data Models

Journal of Econometrics 33 (1986) 341-365.
North-Holland
SPECIFICATION AND TESTING OF SOME MODIFIED

COUNT DATA MODELS*
John MULLAHY
Yale University, New Huven, CT 06520, USA
Resources for the Future, Washington, DC 20036. USA
Received October 1984, final version received April 1986
This paper explores the specification and testing of some modified count data models. These
alternatives permit more flexible specification of the data-generating process (dgp) than do
familiar count data models (e.g., the Poisson), and provide a natural means for modeling data that
are over- or underdispersed by the standards of the basic models. In the cases considered, the
familiar forms of the distributions result as parameter-restricted versions of the proposed modified
distributions. Accordingly, score tests of the restrictions that use only the easily-computed ML
estimates of the standard models are proposed. The tests proposed by Hausman (1978) and White
(1982) are also considered. The tests are then applied to count data models estimated using survey
microdata on beverage consumption.
1. Introduction
Interest in comer-solution problems in econometrics has given rise to an
assortment of methods designed to allow consistent parameter estimation in a
corresponding assortment of model specifications. Recent extensions of the
econometric research on comer-solution outcomes have concentrated on the
specification and testing of models for non-negative data that are measured as
integers, or count data.’ In addition to discussion of basic model structures,
*This paper is a substantial revision of a paper originally circulated as ‘Hurdle Models for
Discrete and Grouped Dependent Variables’. The research has been supported in part by a
cooperative agreement between the U.S. Environmental Protection Agency and Resources for the
Future but should not be inferred to represent views of EPA. Thanks are due to two referees and
Paul Portney for helpful and constructive comments on earlier drafts. Any errors that remain must
be attributed to the author.
‘Much of the interest in count data modeling appears to stem from the recognition that the
use of continuous distributions to model integer outcomes might have unwelcome consequences,
including inconsistentparameter estimates. Even if it is maintained that the integer outcomes are
generated by latent continuous variates, the arguments presented by Stapleton and Young (1984)
suggest that because such integer outcomes are actually continuous realizations measured with
error, inconsistent parameter estimates can result if standard methods like ML Tobit are used for
estimation. Rosenzweig and Wolpin (1982) explicitly rule out the use of a Tobit estimator in their
analysis of fertility outcomes, in which the fertility measure used is the number of children born to
a mother during some time interval. In another application, Portney and Mullahy (1986) conduct
some tests for a Tobit specification of their count measure and find considerable evidence of
misspecification.
0304~4076/86/$3.5001986, Elsevier Science Publishers B.V. (North-Holland)

342 J. Mullahy, Modified count data models
the interesting complications introduced by random and fixed effects, overdis-

persion, distributional misspecification, censoring, and multivariate dependent
variables have also been treated in the econometrics literature.2
One issue yet to be addressed in detail is whether the statistical model
governing the binary outcome of the count being either zero or positive might
differ from that determining the magnitude of the positive counts. In stan-
dard3 count data models familiar to economists (e.g., the Poisson), these two
processes are constrained to be identical. That is, letting +,(y, 13,) and
&( y, 0,) be two functions defined on y E r = (0, 1,2.. . } satisfying $Q, +2 > 0
and
where r+ = r\ {0}, a standard count data model specifies cpi(y, 0,) = $~~(y, 8,)
for all y E r, so that
C +,(h 4) = C 92(h 0,) = 1. (2)

YET YET
Proposed here are two types of modifications to the basic count data models
in which (1) is satisfied but where &(r, 0,) # c#I~(~, 0,). These are termed
hurdle 4 and with-zeros (WZ)’ models. While the two types of modifications in
general have different structures, it is shown later that they collapse into the
same model under some circumstances. The basic idea underlying these
modifications is that both permit the relative probabilities of zero and non-zero
realizations to differ from those implied by the parent distributions that they
modify.
A particularly interesting feature of the modified count data specifications
considered here is that they provide a natural means for modeling overdisper-
sion or underdispersion of the data.6 Specifically, overdispersion and underdis-
persion are viewed as arising from a misspecification of the maintained parent
dgp in which the relative probabilities of zero and non-zero (positive) realiza-
tions implied by the parent distribution are not supported by the data. By
*See variously Cameron and Trivedi (1986), Gourieroux, Montfort and Trognon (1984b),
Hausman, Hall and Griliches (1984), Hausman, Ostro and Wise (1984), Lee (1984a), Manning,
Lillard and Phelps (1983), and Terra (1985).
3The adjectives ‘standard, ‘basic’, and ‘parent’ when describing count data models are used
interchangeably in this paper, and refer to models having structures like that in eq. (2) below.
4This term is borrowed from Cragg (1971).
5This term is used by Johnson and Katz (1969, p. 205).
‘Cox (1983) and McCullagh and Nelder (1983) are good references on overdispersion. Section 2
treats the issue in greater detail, and provides additional citations.
J. Mullahy, Modified count data models 343
permitting flexible specification of the relative probabilities of zeros and

positives, the modified distributions represent alternatives to standard meth-
ods for modeling overdispersion and underdispersion’ that might be appealing
in a variety of circumstances.
The paper is organized as follows. Section 2 describes the basic models and
the proposed alternatives. The ideas are illustrated in the context of the
Poisson and geometric distributions; however, the results can be extended in a
straightforward manner to other count data models of interest. Section 3
discusses alternative strategies for specification testing. Section 4 illustrates the
ideas with count data models estimated using survey microdata on beverage
consumption. Section 5 summarizes the paper.
2. Some modified count data models

This section considers the specification of two types of modified count data
models. First, the basic Poisson and geometric models, which are used to focus
the analysis, are summarized. The two types of modifications are then described
in detail. Finally, the relationships between overdispersion and the modified
distributions are discussed.
The Poisson distribution of count variate y is defined by
P(y, A) = exp( -A)XY/y!, y E F,

= 0, else,
with
A=E(Y,)=var(Y,)>O.
Since X > 0 the influence of covariates is admitted by specifying XI = exp( X,p),

X,’ and j3 here and throughout being k x 1 vectors of fixed covariates and
unknown parameters, respectively. Thus, the loglikelihood function for a
sample of T independent observations (suppressing terms not depending on
/3) is8
Lp = C _hX,P (4)
1E.a
‘For example, a popular generalization of the Poisson model assumes that a random component
in the basic Poisson expectation function is gamma-distributed in the population, so that a
negative binomial model results as a gamma mixture of the Poissons. See Hausman, Hall and
Griliches (1984) and Cameron and Trivedi (1986) for additional discussion.
*In the following, js = aj/as denotes the vector of first partial derivatives of j with respect of
6, and jar = a2j/r9&9[’ is the matrix of second partials of j with respect to S and {‘. Where no
ambiguity is possible, V/ and v2f are occasionally used to denote the gradient and Hessian of j.
9, = (fly, = 0}, Q, = (rly, E r,}, and Q = s2, U D,. The symbols L and A refer in general to
loglikelihood functions of basic and modified models, respectively.
344 J. Mullahy, Modijied count data models
where y, is the count for the t th observation. The ML estimate of /I satisfies
Lp’= C [r,-exp(X#)]Xl=% (5)

tco
and the Hessian of Lp is
Lzb = C - exp( X,p) X/X,. (6)

tcQ
Should circumstances suggest an alternative to the Poisson distribution be
considered, the geometric distribution is one possible candidate.’ The geomet-
ric model is not encumbered by the Poisson’s mean = variance property, as the
variance of a geometric variate exceeds its mean. The geometric characterizes
discrete decay phenomena in the sense that its probabilities obey Pr(y) >
Pr( y + 1) for all y E r. Thus, the Poisson distribution is in one sense both
more flexible than the geometric, since non-decay count models [i.e., Pr(y) -C
Pr( y + 1) for some y] can be admitted, and more restrictive, since such decay
processes obtain only for h -C 1 in the Poisson.
The geometric distribution of a count variate Yj is defined by
= 0, else,
where
E(Y,)=y a n d var(Y,)=y(l+y).
Since y > 0, yI = exp( X$) is the obvious parameterization. The loglikelihood

function can thus be written as
LG = C r,X,P - (Y,+ lhdl + exdX,P)l- (8)

rcn
The ML estimate of j3 satisfies
Lp”= C {[r,-exp(X,~)l/[l+exp(~~)l}X:=O, (9)

ten
and the Hessian is
Lj$= C - {(~,+l)exp(X,P)/[1+exp(X,P)I*}X;X,. (10)

tEc2
‘Discussion is confined here to the geometric version of the negative binomial. The analysis is
extended to the general negative binomial distribution and other count data models in a
straightforward manner.
J. Mullahy, Modijied count data models 345
The first modified count data models considered here are termed hurdle
models, following the terminology developed by Cragg (1971).” The idea
underlying the hurdle formulations is that a binomial probability model
governs the binary outcome of whether a count variate has a zero or a positive
realization. If the realization is positive, the ‘hurdle’ is crossed, and the
conditional distribution of the positives is governed by a truncated-at-zero
count data model. Formally, for a random variable Y, the conditional distribu-
tion of the positives is &(y, f3,)/@,(0,), y E r+, where & satisfies (2) and
Q2, the summation of & on the support of the conditional density, is the
truncation normalization. The probability that the threshold is crossed is
@,(8,).” Thus, the general form of the hurdle model likelihood function is
When @,(e,) = @‘z(e,), (11) reduces to
ex~(An) = tg
0 [l - w4N ,IJi 1 MY, 4)1, 02)
which resembles the likelihood function of a Tobit model. If @r(&) = @,,(r3,)
as a result of parameter restrictions 8i = 8,, then the model is akin to that
investigated for normal distributions by Cragg and by Lin and Schmidt (1984)
who demonstrate that Tobit results as a parameter-restricted version of one of
Cragg’s original specifications.
In any particular application, there will likely exist numerous plausible
specifications of both the binary probability model and the conditional distri-
bution of the positives. For present purposes, only specifications where (11)
reduces to (12) as a result of the parameter restrictions t3r = 0, are of interest,
the objective here being the development of count data analogs of Cragg’s
Tobit modifications.
To motivate the Poisson hurdle specification, consider the dgp:
Pr(y=O)=exp(-X,)X{/y!=exp(--A,), (13)
[l-Pr(y=O)] = C Pr(y)=[l-exp(-A,)], (14)

YET+
and
Pr(yly>O)=X~/{[exp(X,)-1]y!}, yEr+,
(15)
= 0, else,
“See Cragg (1971) or Lin and Schmidt (1984) for additional discussion.
“In general, Q1, &, and Q2 also depend on covariates X,.
where X, is the parameter of a Poisson/exponential distribution governing the

probability of observing a positive count and where (15) has the form of a
truncated- or positive-Poisson distribution. Parameterizing X, as exp( XJ,),
the loglikelihood function based on (13)-(15) is
APH=lOg
i[
,g {exp[--exp(X,&)]jt~
0 1
{1--exP[-exP(4&)1)
1
x [ tg
L 1
exp(y,X,P,)/({exp[exP(x,P2)1 - l)Y,!)]) 06)
= [AP’(&)l + [AP2m)l,
which reduces to (4) when /I1 = &. A” can be regarded as a loglikelihood
function for the binary (zero/positive) outcome and Ap2 as a loglikelihood
function for a truncated-Poisson model. Thus, the ML estimates of & and /I2
can be obtained by separate maximization of Apl and Ap2, respectively.
For the geometric model, an interesting hurdle specification corresponding
to (11) is the dgp:
Pr(y = 0) = l/(1 + ul), 07)

[l-Pr(y=O)] = C Pr(y)=yI/(l+yI), (18)
YEr+
and
Pr(yly>0)=y,(y-1’/[(1+Y2)ul, YEr+
(19)
= 0. else.
Parameterizing y,, = exp( X,b,), it is seen that the binomial probabilities (17)
and (18) are identically those of a standard binomial logit model.12 Eq. (19) is
in the form of a truncated-at-zero geometric model. The complete loglikeli-
12This result suggests that when the basic specification is correct, the geometric parameters can
be estimated consistently, though not efficiently, by standard binomial logit programs. Such an
approach is analogous to consistent but inefficient estimation of a Tobit model’s parameters using
a probit model [see Amemiya (1984) and Ruud (1984)]. In the Tobit/probit case, however, only
the scaled parameters /3/a can be estimated by probit. Due to the functional dependence of the
location and scale parameters in the geometric specification, the natural or unscaled parameters
can be estimated by logit. Appendix A demonstrates the relative inefficiency of both the logit and
truncated geometric estimators of the basic geometric model.
(17)-(19) is
X ,FI, { exd XtPIMl + exp( X,&)1 >

1 1 (20)
X [ II exp[(y,l)X,8,1/{[1+exP(X,B~)l”jl)
ten,
= [AG1(P1)l + [AGYP*)
which reduces to (8) when & = &. Again the ML estimates can be obtained
by separate maximization of AC1 and AG2.
The second class of modified count models is termed the WZ class,
following the terminology developed by Johnson and Kotz (1969, pp. 204-206).
Like the hurdle models, the idea motivating the WZ specifications is that the
conditional distribution of the positives is properly characterized by the
truncated-at-zero version of the parent distribution. The probabilities of
the positives relative to the probability of the zero outcome, however, are no
longer as specified by the parent distribution. Instead, the WZ model specifies
that the probability of the zero outcome is additively augmented or reduced by
an amount J, so that, in the notation of (1) and (2),
+,(r, #> @) = J, + 0 - +>+(r, d), y = 0, (21)
and
+2(Y>wo=(1 -aa( YET+, (22)
where +(y, 0) are the probabilities specified for all y E r by the parent
density, and the terms (1 - JI) ensure that (21) and (22) constitute a proper
discrete probability distribution.l3 When +!J > 0 ( +!J < 0), the relative probabili-
ties +,(O, e)/+,(y, t9), y E r+, are greater (less) than those specified by the
parent distribution; similarly, when II/ > 0 (J/ < 0) P(Y) is less (greater) than in
the parent model. When $J = 0, the basic distribution obtains.
The loglikelihood functions of the Poisson and geometric WZ models are,
respectively,
A”= rgo log{\I,+(I-IC,)cxp[-exp(X,P)I}

II
+ C log(1 -+) - exp(X,P> +y,X,B, (23)
tco,
13Johnson and Katz (1969, p. 205) note that the constraint 4 E q = [ -~&p,/(l - +,,), l), where
I$,, = +(O, 0), is also required.
348 J. Mullahy, count data models
and
AGz = C log[ #exp( X,/3) + l]
-log[l + exp(X,P)] + C log(1 - 4) (24)

rcn,
+.Ytx,P- 0 +Yt)loi$ + exP(Xtp)l,

where terms not depending on (/I, +) are suppressed.14
Although the hurdle and WZ specifications represent different modifications
of the basic count data models, it is interesting that they collapse into the same
specification in the case where only an intercept is included in the X, vectors.
Here the basic models specify E(Y) = exp(p) and the hurdle models have
parameters & = B + (Y and & = /3, where p, pi, &, and (Y are scalars. Some
manipulation of (16) and (23) yields
+= {l-exp[exp(p)-exp(p+o)l}/{l-exp]exp(B)l]
in the Poisson hurdle model, while similar manipulation of (20) and (24) gives
J,= [l-exp(a)l/[l+exp(p+cu)l
in the geometric hurdle model. In both instances, sign(a) = - sign( #), and, for
finite j?, $J approaches sup( ‘k) and inf( ‘k) as (Y approaches - cc and + co,
respectively. It is also the case in the intercept-only specifications that the ML
estimate of the intercept parameter in the WZ models is identical to that
obtained by ML estimation of the intercept parameter in the truncated-at-zero
variant of the parent distribution.15
Overdispersion in count data models has been discussed extensively.16
Overdispersion is meaningful only in reference to some maintained dgp for y,
and for present purposes can be defined as a situation where the ratio
var( Y)/E(Y) exceeds that implied by the maintained dgp for y. For example,
overdispersion is present in the basic Poisson and geometric models if
var( Y)/E( Y) > 1 and var( Y)/E( Y) > 1 + E(Y), respectively. Underdisper-
sion is defined by reversing the inequalities.
To see that the hurdle models naturally admit overdispersion or underdis-
persion, consider var(Y)/E(Y) in a general hurdle formulation. In the nota-
14The gradients and Hessians of (20) and (24) are presented in appendix B.
“Johnson and Katz (1969, pp. 205-206).
16See, for example, Cox (1983), Hausman, Hall and Griliches (1984), Cameron and Trivedi
(1986), and McCullagh and Nelder (1983).
J. Mullahy, Modijed count data models 349
tion of (ll), if the dgp of y has the hurdle structure then
var(Y)/E(Y) = ( b2+,by 0,) - bW@,)[ &+2(~y e,)]‘}
where each summation is over y E r. If ( !D1/Q2) = 1, then var( Y )/E(Y) is

identical to that given by the basic model. When ( @r/Q2) z 1, the hurdle
formulation characterizes overdispersion when ( @r/!P2) E (0,l) and underdis-
persion when (@r/a2) E (1, + cc).
The WZ specifications allow for overdispersion and underdispersion in a
similar manner, viz,
var(Y)/E(Y) = {&~~dd~~e) - (1 -IC)[LY+(Y,~)I~}

/{G#h @)> 7 (26)
where the summation is again over y E r, and the $(y, 0) are as specified in
the basic model. Overdispersion and underdispersion are present as IJ E (0,l)
and # E [-&o/(1 - (p,), Oh respectively, while the basic models result when
4 = 0.
Thus, tests of the null hypotheses 8r = e2 (i.e., @r = Q2) in the hurdle
models or J/ = 0 in the WZ models are implicitly tests of the null hypothesis of
no overdispersion or underdispersion of the kind described here. It should be
noted, however, that overdispersion and underdispersion can be manifested in
forms other than those examined here.
3. Specification testing
This section discusses several specification tests for the models described in
section 2. Score tests and Hausman (1978) tests are proposed for testing the
basic specifications against specific alternatives, the hurdle and WZ models.
White’s (1982) information matrix test is proposed as an omnibus test of the
null hypothesis that the basic specification is a correct characterization of the
dgp.
The score (or Lagrange multiplier) principle for specification testing in
econometrics has been discussed extensively.17 Because ML estimates of the
“See Breusch and Pagan (1980) and Fhgle (1984) for detailed discussions.
basic count data models can be easily obtained, 18 the score test approach is
appealing here. The general form of the score statistic for testing H0: h(8) = 0
is
t=s(O)'r(O)-ls(O), (27)
where s(/~) is the k × 1 score vector and T(/~) is the k x k information matrix,
both evaluated at the ML estimates of the restricted model; 8 = (81, O~)' is the
k × 1 parameter vector, where 81 is p × 1 and O2 is (k - p ) × 1; and h(8) is
an r × 1 vector of restrictions, where for present purposes h(O) = 01 = 0 and
p = r. t9 Since some dements of s(/~) are identically zero, only the non-zero
subvector of s(0) and corresponding submatrix of T(~) -1 are required to
compute 4- Under H 0, ~ is asymptotically distributed as a central X~ variate.
Much discussion of the score test has focused on computational methods. 2°
Given s(/~) and T(0), ~ can of course be computed using matrix calculations
according to (27). Typically T will be estimated by either the negative Hessian
of the restricted loglikelihood function or by the gradient outer product G'G,
where G is the T x k matrix having typical e l e m e n t [ O A t / O O j ] , evaluated in
either case at the restricted ML estimates. Alternatively, ~ can often be
obtained as a function of the R 2 of some auxiliary linear regression. 2~
Computations of ~ based on different estimates of T will yield different
values of the test statistic in finite samples, even when the null hypothesis is
tnae and the model is correctly specified. A separate complication arises when
the probabilities underlying the model's likelihood function are misspecified.
Although consistent parameter estimation under such circumstances is possi-
ble when the expectation function has been correctly specified, 22 inferences
based on standard estimates of the parameter covariance matrix will generally
not be robust against such misspecification. White (1982) and Engle (1984)
have suggested an amendment to the standard form of the score statistic (27)
lSSince the loglikelihood functions of both the basic Poisson and basic geometric models (4)
and (8) are concave, convergence to the ML estimates using a Newton-Raphson algorithm has
proven in practice to be quite rapid. Alternatively, non-linear weighted least squares can be used
to obtain the ML estimates of these models; see Hausman, Hall and Griliches (1984) and
Hausman, Ostro and Wise (1984).
19Since the paper is ultimately concerned with applying the fiuite-sample analogs of these test
statistics, whose known properties are largely asymptotic, T is taken here to be - ( V 2 A ) rather
than - E( W2A/T).
2°See Engle (1984) and Davidson and MacKinnon (1984a, b).
2tFor example, since I' can be estimated by G'G, ~ can be calculated as
= ,,a(O)(a(O)'a(O))-la(O) ", (*)
where ~ is a T × 1 vector of ones. Since dG(/~) = s(/))', (*) is simply an alternative expression of
(27). Moreover, since & = T, ~ in (*) is seen to be T times the uncentered R 2 from the regression
of L on G(~), or, alternatively, ~'~ from the same regression.
22See Gourieroux, Montfort and Trognon (1984a) and Cameron and Trivedi (1986).
that ensures a test of the proper size when such misspecification is present.
Defining A =, - v*A(d) where A is the maintained loglikelihood function,
B = G(tl)'G(tl), and C = A-'BA -l, then the finite-sample analog of the statis-
tic proposed by White and Engle is
t* =S1(~)‘A1l(C1l)-lA1lSl(e), (28)
where the (1,l) blocks of A -’ and C correspond to the p non-zero elements of
s(8), Sl(J) = [W~~lll(e=e,. For purposes of comparison, the empirical
illustrations presented below in section 4 present score statistics calculated
using both standard parameter covariance estimates [i.e., (- v*A)-’ and
(G’G)-‘1 and the approach suggested by White and Engle.
The score test strategy for the count data hurdle models draws conceptually
on the work of Lin and Schmidt (1984). It was demonstrated in section 2 that
the basic count model specifications result when the restriction p1 = /I2 is
imposed in the hurdle models. Computation of the score test statistic is
simplified by reparameterizing the hurdle models along the lines suggested by
Lin and Schmidt where given the new parameters (OL, p), with p1 = OL + p and
& = /3, the score test is of H,: cx = 0. Under this reparameterization, however,
the Hessian of the hurdle model loglikelihood function is no longer block-
diagonal (see appendix B).
The score test for the WZ specifications is of H,: 4 = 0. Since H, specifies a
point in int(9), s+rdard methods of inference can be used. Note that only
one element of s(e) is non-zero since # is scalar. Under H,, 5 is asymptoti-
cally distributed as x f. The computation of the score test is complicated,
however, because the Hessian of the WZ loglikelihood function is not block-
diagonal. Appendix B provides the formulae used to compute the geometric
hurdle and WZ model score tests.
An alternative test strategy for the hurdle models recognizes that when
PI = P2 in (16) or (20), P (= P1 = Pz) can be estimated consistently by
maximizing the full loglikelihood function AjH (j = P, G), or either of its
components (Ai’, Aj*). However, as noted earlier and demonstrated in ap-
pendix A for the geometric model, the latter estimates are inefficient relative to
the former. Of course, when & # /3*, the three estimators will diverge asymp-
totically. These properties suggest that a Hausman test approach can be used
to test H,: & = /3*.
A finite-sample version of the Hausman test statistic is used here:
H=(&b)‘(@&)- p@))-‘(&&~
where b is the ML estimate of the restricted (i.e., basic) model, 8, is either of
the two estimates of the parameters of the unrestricted model (& or p2), and
f(‘<p^) and f(B,) are estimates of the corresponding covariance matrixes.

Under Ha: & = &, the asymptotic analog of H is distributed x&). This test
is particularly appealing in the case of the geometric distribution, where one
estimate of (p,, V( &)) is easily obtained using familiar logit techniques.
White (1982) has analyzed estimation in cases where the wrong probability
model is used to construct what the researcher believes to be the likelihood
function. Such misspecification is destined to plague many applications of
economic count data models, as theory will typically suggest little about the
model’s probability structure. 23 Accordingly, the information matrix (IM) test
developed by White is a potentially valuable diagnostic tool in the analysis of
economic count data models. Although both the IM and score tests use only
the ML estimates of the parameters of the basic model, the IM test differs
from the score tests proposed above since no specific alternative specifications
or parameter restrictions are involved in its computation. The IM test princi-
ple relies solely on properties of the specification being tested that must obtain
under the null hypothesis of no misspecification. As such, the IM test is an
omnibus specification test; rejection of the null hypothesis of no misspecifica-
tion would not specifically favor either the hurdle or the WZ variant, or any
other specific alternative for that matter.24
4. Empirical analysis
To illustrate the specifications and tests described above, data from the 1980
Wave II of the National Survey of Personal Health Practices and Conse-
quences (NSPHPC) are used. 25 Among the data reported in the NSPHPC are
individuals’ daily consumption of various beverages. Although beverage quan-
tity is a continuous measure, the protocol in the NSPHPC is to report
consumption in integer amounts (number of cups, glasses, etc.). Such beverage
consumption measures serve well to illustrate the points discussed above.
Analyzed here are individuals’ daily consumption of coffee (COFFEE), tea
(TEA), and milk (MILK). The explanatory variables used are an intercept
(INT), age in years (AGE), years of completed schooling (EDUC), family
income (INCOME), and O-l dummies for sex (MALE = 1 if male), race
(WHITE = 1 if white, = 0 if black), and marital status (MARRIED = 1 if
23 Cameron and Trivedi (1986, p. 30).

24Lee (1984a, b) notes circumstances under which the IM test is inconsistent against alternative
specifications.
25The NSPHPC is a national, random-digit telephone survey conducted in two waves in Spring
1979 and Spring 1980. The total sample is comprised of non-institutionalized adults aged 20-64
residing in the coterminous U.S. There were 3,025 survey respondents in Wave I, 2,436 of whom
also responded in Wave II. Additional details on the NSPHPC are available in U.S. Department
of Commerce (1982).
J. Muliahy, Modijed count data models 353
Table 1
Sample frequency distribution of dependent variables (T= 1,900).
n COFFEE TEA MILK
0 499 1171 767

1 259 310 557
2 341 214 333
3 235 99 142
4 189 54 62
5 123 21 23
6 97 24 16
7 12 2 0
8 53 5 0
9 3 0 0
10 58 0 0
12 13 0 0
15 18 0 0
currently married).26 After screening for outliers,27 the number of observations

having all data necessary for estimation is 1,900. Two sets of models are
estimated: the first uses the entire set of regressors, the second uses only the
intercept. The sample frequency distributions of the dependent variables are
presented in table 1, and the descriptive statistics for the estimation sample are
presented in table 2.
For parsimony, only the geometric model and its variants are estimated
here.** Tables 3 through 5 present the estimation results with covariates
included for the coffee, tea, and milk models, respectively. The first column in
each table gives the ML estimates of the basic geometric models. The second
and third columns present the estimates of the unrestricted hurdle models
which, as discussed earlier, can be obtained by separate ML estimation of a
binary logit model estimated over the entire sample and a truncated-geometric
model estimated on the sample having positive realizations of the dependent
variable. The fourth column presents the ML estimates of the unrestricted
geometric WZ models.29
26The education and income variables are pseudo-continuous, constructed using interval
midpoints. For the open-ended intervals, the value 17 was used for the schooling category ‘16 or
more’ years, and the value 35,000 was used for the income category ‘$25,000 or more’. Some
variables required to properly interpret the estimated models as demand functions are not
available (e.g., own and substitute goods’ prices); similarly, information about other determinants
of beverage consumption (e.g., religion) is not provided in the NSPHPC.
27For example, 13 observations for which daily coffee consumption was reported as greater than
15 cups, 19 observations for which daily tea consumption was reported as greater than 8 cups, and
10 observations for which daily milk consumption was reported as greater than 6 glasses were
deleted.
28E%imation is performed using a program written in SASS PROC MATRIX, which is
available from the author on request.
29For all the WZ models, the requirement that the estimate 4 be in the interval Y& =
I-440, Ml - $40, c%,l) was found to hold for each observation in the sample.
354 J. Muliahy, Modified count data models
Table 2
Sample descriptive statistics (T = 1,900)
Variable Mean SD. h4in MaX
COFFEE 2.705 2.845 0.00 15.00

TEA 0.818 1.349 0.00 8.00
MILK 1.109 1.251 0.00 6.00
AGE 39.976 12.603 21.00 65.00
EDUC 13.001 2.697 0.00 17.00
INCOME 21J72.368 10,724.967 2,500.OO 35,OoO.OO
MALE 0.395 0.489 0.00 1.00
WHITE 0.931 0.253 0.00 1.00
MARRIED 0.705 0.456 0.00 1.00
Table 3
Estimation results: Dependent variable COFFEE (covariates included).a
Restricted Binary Truncated Geometric

Variable model logit geometric with zeros
INT - 0.566 - 1.419 - 0.284 - 0.676

(0.213) (0.390) (0.271) (0.205)
[0.178]
AGE 0.0157 0.0443 0.0061 0.0169
(0.0023) (0.0048) (0.0028) (0.0022)
[0.0019]
EDUC - 0.0299 - 0.0559 - 0.0229 - 0.0308
(0.0111) (0.0226) (0.0129) (0.0106)
[0.0092]
INCOME 6.1E-6 1.5E-5 2.8B6 6.5E-6
(2.9E-6) (5.8E-6) (3.4E-6) (2.88-6)
[2.4E-61
MALE 0.174 0.180 0.189 0.173
(0.056) (0.113) (0.065) (0.053)
[0.048]
WHITE 1.111 1.155 1.179 1.108
(0.134) (0.198) (0.191) (0.129)
[0.120]
MARRIED 0.041 0.062 0.021 0.045
(0.065) (0.125) (0.077) (0.062)
[0.060]
- 0.0701
(0.0168)
- 4029.55 - 1013.17 - 2982.11 - 4021.00
“Figures in parentheses are estimated asymptotic standard errors derived from the negative
inverse Hessian of L evaluated at the ML estimates. Figures in square brackets are estimated
asymptotic standard errors derived from the parameter covariance estimates obtained using the
method proposed by White (1982) and Royall (1984).
Table 4
Estimation results: Dependent variable TEA (covariates included).a

INT 0.0018 - 0.353 0.308 0.321

(0.2580) (0.348) (0.396) (0.295)
[0.2818]
AGE - 0.0046 - 0.0053 - 0.0023 - 0.0042
(0.0028) (0.0039) (0.0042) (0.0032)
[0.0031]
EDUC - 0.0337 - 0.0141 - 0.0480 - 0.0371
(0.0144) (0.0194) (0.0222) (0.0166)
[0.0154]
INCOME - 5.1E-6 - 3.1E-6 - 65E-6 - 4.836
(3.6E-6) (5.OE-6) (5.5B6) (4.2E-6)
[3.9E-61
MALE - 0.177 - 0.320 0.046 -0.152
(0.072) (0.098) (0.108) (0.082)
[0.079]
WHITE 0.457 0.292 0.602 0.470
(0.155) (0.198) (0.252) (0.173)
[0.149]
MARRIED 0.218 0.269 0.085 0.196
(0.082) (0.113) (0.122) (0.092)
[0.093]
+ 0.263
(0.028)
L, A - 2360.79 - 1254.91 - 1067.33 - 2327.72
=Figures in parentheses are estimated asymptotic standard errors derived from the negative
method proposed by White (1982) and Royal1 (1984).
The estimates appear generally to be plausible, and in many cases the

parameters are estimated with fair precision. It is noteworthy that in almost all
instances the basic model parameter estimates are bounded by the logit and
truncated-geometric estimates. Also noteworthy is that, with the exception of
the intercept parameter, the estimates of the basic and the WZ model
parameters are quite comparable.
When interpreting the estimates of 4 as diagnostic tests for overdispersion
or underdispersion, it is interesting that the results suggest the presence of
overdispersion in the tea models and underdispersion in both the coffee and
milk models. Since underdispersion of this nature requires the variance/mean
ratio to be less than that implied by the parent model, it is possible that an
Table 5
Estimation results: Dependent variable MILK (covariates included).=

INT 0.719 1.089 0.627 0.429

(0.231) (0.345) (0.322) (0.197)
[0.179]
AGE - 0.0144 - 0.0226 - 0.0124 - 0.0154
(0.0026) (0.0039) (0.0036) (0.0022)
[O.c021]
EDUC - 0.0232 0.0032 - 0.0536 - 0.0205
(0.0127) (0.0194) (0.0175) (0.0107)
[0.0104]
INCOME - 3.3E-6 - 7.6E-6 - 1.4E-6 - 3.7E-6
(3.4E-6) (5.OE-6) (4.6E-6) (2.8E-6)
[2.7E-61
MALE 0.372 0.559 0.337 0.381
(0.065) (0.100) (0.089) (0.054)
[0.051]
WHITE 0.154 0.120 0.260 0.157
(0.132) (0.191) (0.191) (0.122)
[0.096]
MARRIED 0.0106 0.0125 - 0.00017 -0.0018
(0.0754) (0.113) (0.104) (0.0629)
[0.0574]
- 0.354
(0.0401)
- 2736.81 - 1245.12 - 1436.60 - 2684.73
aFigures in parentheses are estimated asymptotic standard errors derived from the negative
alternative estimator where the implied variance/mean ratio is less than that
of the geometric (e.g., Poisson) might be appropriate.
Table 6 presents the ML estimates of the intercept-only models. Among
other things, table 6 demonstrates two points noted earlier: first, that the
estimates of the intercept parameters in the truncated and the WZ models will
be identical when only an intercept term is included; and, second, that the
relationship IJ = [l - exp( a)]/[1 + exp( /3 + CX)] obtains between the estimated
parameters in the WZ and hurdle models.30
Tables 7 and 8 summarize the specification test results for the models with
covariates included and the intercept-only models, respectively. In the first
“Considering the tea model as an example, and using the reparameterizations b = fir = 0.124
and & = & - jr = -0.598, where 8, and 8, are the logit and truncated-geometric intercepts,
then 4 = 0.277 = (1 - exp( - 0.598))/(1 + exp( - 0.474)).
Table 6
Estimation results: Intercept-only models.”

model logit geometric with zeros
COFFEE
INT 0.995 1.032 0.982 0.982
(0.027) (0.052) (0.031) (0.031)
[0.024]
4 -0.0137
(0.016)
L, A - 4105.35 - 1094.00 - 3010.99 - 4105.00
TEA
INT - 0.201 - 0.474 0.124 0.124
(0.034) (0.047) (0.051) (0.051)
[0.038]
4 0.211
(0.027)
LA - 2316.11 - 1265.09 - 1074.18 - 2339.28
MILK
INT 0.104 0.390 - 0.150 - 0.150
(0.032) (0.047) (0.044) (0.044)
[0.026]
4 - 0.289
(0.039)
LA - 2112.13 - 1281.51 - 1455.23 - 2136.13
aFigures in parentheses are estimated asymptotic standard errors derived from the negative
rows are presented the results of White’s IM test applied to the basic
geometric models. As argued in section 3, the IM test can be viewed as an
omnibus test for model misspecification. For both the models with covariates
and the intercept-only models, the IM tests strongly suggest that the basic
model is a misspecification of the dgp, as the null hypothesis of no misspecifi-
cation is rejected in all but one instance at greater than the 0.9999 level.31
31 When an intercept term is included in the X, vectors, the presence of O-l dummy variables in
X, reduces the number of distinct upper triangular elements in v*L, to at most CJ = 0.5m( m + 1)
_ d, where L, is the contribution of the tth observation to the restricted loglikelihood, m is the
number of columns in X,, and d is the number of O-l dummy variables. Since in the present
application m = 7 and d = 3, the information matrix test statistics in the models with covariates
are distributed x&. The method proposed by Lancaster (1984, eq. 6) is used to calculate the IM
test statistics.
358 .I. Mullahy, Modified count data models
Table 7
Specification test results (models with covariates included).a
COFFEE TEA MILK
Information matrix [251 266.130 56.1775 403.064

(0.9999) (0.9997) (0.9999)
Hurdle models
Likelihood ratio [71 68.5451 77.0992 100.192
(0.9999) (0.9999) (0.9999)
Score (Hessian) [71 64.1771 76.9506 108.685
(0.9999) (0.9999) (0.9999)
Score (Gradient) [71 77.4802 89.7063 123.968
(0.9999) (0.9999) (0.9999)
Score (White-Engle) [71 76.8902 87.0582 116.124
(0.9999) (0.9999) (0.9999)
Hausman test: geometric 57.5672 70.5046 99.7153
vs. logit [71 (0.9999) (0.9999) (0.9999)
vs. truncated-geometric [71 ((0.9999) (0.9999) (0.9999)
With-zeros models
Likelihood ratio VI 17.0983 66.1365 104.158
(0.9999) (0.9999) (0.9999)
Score (Hessian) Ill 17.4171 loo.113 83.8957
(0.9999) (0.9999) (0.9999)
Score (Gradient) [ll 18.7919 73.6177 115.890
(0.9999) (0.9999) (0.9999)
Score (White-Engle) 111 18.3835 68.3445 100.599
(0.9999) (0.9999) (0.9999)
“Figures in parentheses are Pr(xi < s), where s is the test statistic and 4 is the degrees of
freedom of the test statistic; a value of 0.9999 signifies Pr(Xi i s) 2 0.9999. Figures in square
brackets are the degrees of freedom of the test statistics. The test statistics have asymptotic central
xz distributions under the null.
The results of the tests designed to test the restricted models against their
corresponding hurdle variants are presented in rows 2-7 of tables 7 and 8. For
each model, the range of the six test statistics is quite small, and in all cases
except the intercept-only coffee model, rejection of the parameter restrictions
specified under the null hypothesis is indicated.32 Moreover, in each instance
32White (1982, p. 8) has noted that the use of the standard likelihood ratio test is not
appropriate in cases where the probability densities that form the sample likelihood function are
r&specified. In addition, the standard Hausman test approach uses estimates of the two
covariance matrixes that are consistent under the null hypothesis of no misspecification; accord-
ingly, no attempt was made to utilize alternative covariance estimators in calculating the Hausman
tests. The covariance estimates used to construct the Hausman test statistics are the inverses of the
matrixes in (A.l)-(A.3) in appendix A evaluated in the ML estimates. As shown in appendix A,
these estimates guarantee that the difference of the covariance matrix estimates in (29) will be
positive semidefinite as required for the Hausman test.
Table 8
Specification test results (intercept-only models).a
COFFEE TEA . MILK
Information matrix VI 40.8520 25.8139 208.268

(0.9999) (0.9999) (0.9999)
Hurdle models
Likelihood ratio PI 0.6916 74.9872 72.0001
(0.5964) (0.9999) (0.9999)
Score (Hessian) [ll 0.6951 14.8446 71.7445
(0.5956) (0.9999) (0.9999)
Score (Gradient) VI 0.1616 82.3494 80.1355
(0.6172) (0.9999) (0.9999)
Score (White-Engle) 111 0.7512 81.3942 16.2101
(0.6139) (0.9999) (0.9999)
vs. logit PI (0.5940) (0.9999) (0.9999)
vs. truncated-geometric PI (0.6000) (0.9999) (0.9999)
With-zeros models
Likelihood ratio PI 0.6916 74.9812 72.0001
(0.5964) (0.9999) (0.9999)
Score (Hessian) PI 0.6976 119.125 58.8129
(0.5964) (0.9999) (0.9999)
Score (Gradient) PI 0.7616 82.3494 80.1355
(0.6172) (0.9999) (0.9999)
Score (White-Engle) PI 0.7484 11.0294 71.5301
(0.6130) (0.9999) (0.9999)
‘Figures in parentheses are Pr(xi -C s), where s is the test statistic and 4 is the degrees of
freedom of the test statistic; a value of 0.9999 signifies Pr(xi < s) 2 0.9999. Figures in square
brackets are the degrees of freedom of the test statistics. The test statistics have asymptotic central
xs distributions under the null.
the score test statistics calculated using the White-Engle approach are smaller
and larger than those calculated using the gradient outer product and negative
Hessian, respectively, to estimate the information matrix.
It is interesting that the range of the test statistics for each model is
relatively small: although each statistic has the same asymptotic distribution
under the null hypothesis, the similarity of their finite-sample behavior when
rejection of the null is favored was not anticipated ex ante.
The results of the tests of the basic models against the corresponding WZ
specifications are presented in rows 8-11 of tables 7 and 8. Again the values of
the test statistics fall within narrow ranges. Two results are particularly
interesting here. First, in the covariates-included and intercept-only tea mod-
els, the score test calculated using the White-Engle method is smaller than
those based on the gradient outer product and negative Hessian methods.
Second, although still indicating rejection of the null hypothesis, the test
statistics for the coffee model with covariates included are substantially
smaller than those for the tea and milk models. In the intercept-only coffee
model, none of the test statistics recommends rejection of the null at conven-
tional confidence levels. Upon examination of the ratio of the estimates of 1c,
to their asymptotic standard errors in the coffee models, such results are not
surprising.
5. Summary
This paper has explored the specification and testing of some variants on
familiar count data models. The alternative specifications considered were
termed hurdle and with-zero models, from which the familiar models were
demonstrated to arise through parameter restrictions. Both alternatives were
shown to allow for a degree of flexibility in model specification that is
precluded by the basic model. In particular, it was seen that overdispersion
and underdispersion could be accounted for by both alternatives. Score,
Hausman, and information matrix tests for m&specification were proposed.
The ideas were illustrated by estimating count data models of beverage
consumption using survey microdata. In virtually all instances, the specifica-
tion tests recommended rejection of the null hypothesis of no misspecification.
For a given model, the different test statistics tended to behave quite similarly.
Appendix A
Since the ML logit and truncated-geometric estimators of the geometric
model fail to utilize all sample information, their inefficiency relative to the
ML geometric estimator follows immediately from the fact that the geometric
estimator, which uses all sample information, is FIML. The following demon-
stration is illustrative. Let LG, AL, and AT denote the loglikelihood functions
of geometric, logit, and truncated-geometric models. Then
64.2)
Since yt 2 0 for all t, then O,, O,, and @r are each positive semidefinite. It is
J. Mullahy, Mod$ed count data models 361
easy to see from (A.l)-(A.3) that (0, - 0,) = Or and (0, - Or) = 0, if the
0; are all evaluated at the same /3. The logit and truncated-geometric estima-
tors are inefficient relative to the geometric estimator since both (0, - 0,)
and (0, - Or), and therefore (0;’ - 06’) and (0,’ - 0, ‘), are positive
semidefinite.
The relative efficiencies of the logit and truncated-geometric estimators
cannot in general be determined without knowledge of the sample (y,, X,)
values. In one extreme case where all y, tend toward zeros and ones, it can be
seen from (A.2) and (A.3) that (0, - 0,) becomes positive semidefinite, so
that logit is efficient relative to truncated-geometric.33 In another extreme
instance where all y, tend toward strictly positive integers, (Or - 0,) be-
comes positive semidefinite, so that truncated-geometric is efficient relative to
logit.
Appendix B
This appendix presents the gradient vectors and Hessian matrixes for the
geometric hurdle and WZ models, which are used to construct the specifica-
tion tests described in section 3 and implemented in section 4. For economy of
space, the corresponding Poisson formulae are omitted here, but are available
on request from the author.
The loglikelihood function of the geometric hurdle model (20) written in
terms of the parameters (fi + cx) = & and p = & is
AGH= tz -log{l+exp[X,(P+~)l}
0
+ C X,(p+ar)-log{l+exp[X,(P+~>l)
tsa,
+k- WW-_hlog[l +exdXtP)l. (B-1)

The gradient and Hessian components are
GH= C (-exp[Xt(~+~)l/{l+exp[Xt(~+~)l))X:
ten,
+ ZQ Wt.1 + exPk@ + 41 >)X: (B-2)

1
33Even in this extreme case the geometric estimator remains efficient relative to the logit
estimator, as the former uses information on the magnitude of the positive y, while the latter
recognizes only their sign.
362 J. Mullahy, Mod$ed count data models
ca;oj tz {-eXP(4P)/[l+ exp(X,P)]}x;

0
(B.2’)
A$“= C (-exp[X,(B+a)l/{l+exp[X,(~+a)l})x:
fELJ,
+ tsQ [(l/(1 + e&W + 41>)
1
+([~~-1-exp(X,~)l/[l+exp(X~P)I)IX: w
(B.3’)
AGH = AGH
ua 4
(B.5)
( c H A+ l)edX,P)/[l +exPGW12)4%
aIO, t=sJ
(B.5’)
where the equalities in (B.2’)-(B.5’) hold under the restriction a = 0. The

Hessian of (B.l) is
A;;=
[
AGH
aa
AGH
Ba 1
AGH
aB
AGH
88
*
u3.6)
Note that adding to and subtracting from the numerator of each term in the
J. Muliahy, ModiJed count data models 363
s2, summation in (B.2’) the expression [y, - exp( X,/3)], and using (B.3’), the
non-zero elements of the score vector (B.2’) can be expressed as
A:Hla_o = h- [ 1 +exdW9l}/{l+ ev(X,P)})X:. (B.2”)

I
Eq. (B.2”) corresponds to the non-zero elements of the score vector in the
score test. Eq. (B.2”) is also equivalent to the first-order conditions for ML
estimation of a truncated-geometric model, or, alternatively, is the cross-prod-
uct of explanatory variables and the (variance-normalized) residuals from the
truncated-geometric mode1.34
Using the equality35
A-‘B(B-A)_’
-(B-A)-'
-(B-A)-’
(B-A)-' I ’
the (1,l) block of ( - A$;)-’ required to calculate the score test is given by
(-A;$l = (-A:;)-‘( -A;;)( -A$ + A;:)-~, (B.7)

with all evaluations at (Y = 0. Eqs. (B.4) and (B.5) show that ( -A$F + AZ:) is
positive semidefinite.
The gradient and Hessian components for the geometric WZ loglikelihood
function (24) are
Ay= c {exp(X,p)/[~exp(X,p)+l]} - c (l-J/)-’ (B-8)

t-En, tEO*
= - Tl+ C edX,B), (B.~J)
(G=O) rsf2,
A?= c ({J/exp(X,~)/[J/exp(X,~)+ll)
rEti?,
-{exp(X,P)/[l+exP(X,p)l})x,
+ tgQ {[Y,-exp(~,~)l/~l+-~p(X,~)l~~l (B-9)
1
34E(x) = [l + exp( X,/3)] in the truncated-geometric model, so that the residual is y, - [l +

exp( X,/3)]. The interpretation of (B.2”) as a function of the residuals of the truncated variant of
the parent model is the same as that given by Lin and Schmidt (1984) of the corresponding
equation used in construction of their Tobit model score test.
35This assumes the arbitrary square matrixes A and B are symmetric and non-singular.
364 J. Mullahy, Modified count dota models
xo{[Y,-exP(X,P)I/[l +exp(X,p)]}x;,
(&) fE (B.9’)
(B.lO)
(B.lO’)
AZ= C (exp(X~B)/[~exp(~B)+112)Xt (B.ll)

rcJ2,
(B.ll’)
A$$ = C { J/edX,P)/[#exp(4B) + 11’) XX

ICP,
+ l)exp(X,P)/[l+ ev(X,P)12)XX (B.12)
+ l)exp(X,P)/[l + ~w(WQ12)X~r (B.12’)
where Tr = #ti,, and the equalities in (B.S’)-(B.12’) hold under the restriction
I/J = 0. Two points are noteworthy. First, unlike the geometric hurdle model,
the non-zero elements of the score vector (B.8’) are not clearly interpretable as
a function of residuals. Second, a simplification analogous to (B.7) is not
apparent here, so that the full Hessian of - AGz would have to be inverted to
calculate the score test statistic.
References
Amemiya, T., 1984, Tobit models: A survey, Journal of Econometrics 24, 3-61.
Breusch, T. and A.R. Pagan, 1980, The Lagrange multiplier test and its application to model
specification in econometrics, Review of Economic Studies 47, 239-253.
Cameron, A.C. and P.K. Trivedi, 1986, Econometric models based on count data: Comparisons
and applications of some estimators and tests, Journal of Applied Econometrics 1, 29-53.
Cox, D.R., 1983, Some remarks on overdispersion, Biometrika 70, 269-274.
Cragg, J.G., 1971, Some statistical models for limited dependent variables with application to the
demand for durable goods, Econometrica 39, 829-844.
.I. Mullahy, Modified count data models 365
Davidson, R. and J.G. MacKinnon, 1984a, Convenient specification tests for logit and probit
models, Journal of Econometrics 25, 241-262.
Davidson, R. and J.G. Ma&&non, 1984b, Model specification tests based on artificial linear
regressions, International Economic Review 25, 485-502.
Engle, R.F., 1984, Wald, likelihood ratio, and Lagrange multiplier tests in econometrics, in: Z.
Grihches and M.D. Intriligator, eds. Handbook of econometrics, Vol. II (North-Holland,
Amsterdam) 775-826.
Gourieroux, C., A. Montfort and A. Trognon, 1984a, Pseudo maximum likelihood methods:
Theory, Econometrica 52, 681-700.
Gourieroux, C., A. Montfort and A. Trognon, 1984b, Pseudo maximum likelihood methods:
Applications to Poisson models, Econometrica 52, 701-720.
Hausman, J.A., 1978, Specification tests in econometrics, Econometdca 46, 1251-1271.
Hausman, J.A., B. Hall and Z. Griliches, 1984, Econometric methods for count data with an
application to the patents-R&D relationship, Econometrica 52, 909-938.
Hausman, J.A., B. Ostro and D. Wise, 1984. Air pollution and lost work, Working paper no. 1263
(National Bureau of Economic Research, Cambridge, MA).
Johnson, N.L. and S. Katz, 1969, Distributions in statistics: Discrete distributions (Wiley, New
York).
Lancaster, T., 1984, The covariance matrix of the information matrix test, Econometrica 52,
1051-1053.
Lee, L.-F., 1984a, Specification tests for Poisson regression models, Discussion paper no. 208
(Center for Economic Research, University of Minnesota, Minneapolis, MN).
Lee, L.-F., 1984b, Comment to tests of specification in econometrics, Econometric Reviews 3,
257-259.
Lin, T.-F. and P. Schmidt, 1984, A test of the Tobit specification against an alternative suggested
by Cragg, Review of Economics and Statistics 66, 174-177.
Manning, W., L. Lillard and C.E. Phelps, 1983, Preventive medical care and its consequences
(Rand Corporation, Santa Monica, CA).
McCullagh, P. and J.A. Nelder, 1983, Generalized linear models (Chapman and Hall, London).
Portney, P.R. and J. Mullahy, 1986, Urban air quality and acute respiratory illness, Journal of
Urban Economics 20,21-38.
Rosenzweig, M.R. and K.I. Wolpin, 1982, Governmental interventions and household behavior m
a developing country: Anticipating the unanticipated consequences of social programs, Jour-
nal of Development Economics 10, 209-225.
Royall, R.M., 1984, Robust inference using maximum likelihood estimators, Working paper no.
549 (Department of Biostatistics, Johns Hopkins University, Baltimore, MD).
Ruud, P.A., 1984, Tests of specification in econometrics, Econometric Reviews 3, 211-242.
Stapleton, D.C. and D.J. Young, 1984, Censored normal regression with measurement error on the
dependent variable, Econometrica 52, 737-760.
Terza, J.V., 1985, A Tobit-type estimator for the censored Poisson regression model, Economics
Letters 18, 361-365.
U.S. Department of Commerce, 1982, Micro-data tape documentation for wave I and II of the
national survey of personal health practices and consequences, NTIS publication no. PB83-
104315.
White, H., 1982, Maximum likelihood estimation of r&specified models, Econometrica 50, l-25,
White, H., 1983, Corrigendum, Econometrica 51, 513.

Specification and Testing of Some Modified Count Data Models

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Specification and Testing of Some Modified Count Data Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Specification and Testing of Some Modified Count Data Models

Uploaded by

Copyright:

Available Formats

Journal of Econometrics 33 (1986) 341-365.

SPECIFICATION AND TESTING OF SOME MODIFIED

Received October 1984, final version received April 1986

0304~4076/86/$3.5001986, Elsevier Science Publishers B.V. (North-Holland)

the interesting complications introduced by random and fixed effects, overdis-

C +,(h 4) = C 92(h 0,) = 1. (2)

permitting flexible specification of the relative probabilities of zeros and

2. Some modified count data models

P(y, A) = exp( -A)XY/y!, y E F,

Since X > 0 the influence of covariates is admitted by specifying XI = exp( X,p),

where y, is the count for the t th observation. The ML estimate of /I satisfies

Lp’= C [r,-exp(X#)]Xl=% (5)

Lzb = C - exp( X,p) X/X,. (6)

Since y > 0, yI = exp( X$) is the obvious parameterization. The loglikelihood

LG = C r,X,P - (Y,+ lhdl + exdX,P)l- (8)

The ML estimate of j3 satisfies

Lp”= C {[r,-exp(X,~)l/[l+exp(~~)l}X:=O, (9)

Lj$= C - {(~,+l)exp(X,P)/[1+exp(X,P)I*}X;X,. (10)

When @,(e,) = @‘z(e,), (11) reduces to

[l-Pr(y=O)] = C Pr(y)=[l-exp(-A,)], (14)

where X, is the parameter of a Poisson/exponential distribution governing the

Pr(y = 0) = l/(1 + ul), 07)

X ,FI, { exd XtPIMl + exp( X,&)1 >

A”= rgo log{\I,+(I-IC,)cxp[-exp(X,P)I}

-log[l + exp(X,P)] + C log(1 - 4) (24)

+.Ytx,P- 0 +Yt)loi$ + exP(Xtp)l,

tion of (ll), if the dgp of y has the hurdle structure then

var(Y)/E(Y) = ( b2+,by 0,) - bW@,)[ &+2(~y e,)]‘}

where each summation is over y E r. If ( !D1/Q2) = 1, then var( Y )/E(Y) is

var(Y)/E(Y) = {&~~dd~~e) - (1 -IC)[LY+(Y,~)I~}

f(‘<p^) and f(B,) are estimates of the corresponding covariance matrixes.

23 Cameron and Trivedi (1986, p. 30).

n COFFEE TEA MILK

0 499 1171 767

currently married).26 After screening for outliers,27 the number of observations

Variable Mean SD. h4in MaX

COFFEE 2.705 2.845 0.00 15.00

Restricted Binary Truncated Geometric

INT - 0.566 - 1.419 - 0.284 - 0.676

Restricted Binary Truncated Geometric

INT 0.0018 - 0.353 0.308 0.321

The estimates appear generally to be plausible, and in many cases the

Restricted Binary Truncated Geometric

INT 0.719 1.089 0.627 0.429

Restricted Binary Truncated Geometric

COFFEE TEA MILK

Information matrix [251 266.130 56.1775 403.064

Information matrix VI 40.8520 25.8139 208.268

+k- WW-_hlog[l +exdXtP)l. (B-1)

+ ZQ Wt.1 + exPk@ + 41 >)X: (B-2)

ca;oj tz {-eXP(4P)/[l+ exp(X,P)]}x;

where the equalities in (B.2’)-(B.5’) hold under the restriction a = 0. The

A:Hla_o = h- [ 1 +exdW9l}/{l+ ev(X,P)})X:. (B.2”)

(-A;$l = (-A:;)-‘( -A;;)( -A$ + A;:)-~, (B.7)

Ay= c {exp(X,p)/[~exp(X,p)+l]} - c (l-J/)-’ (B-8)

34E(x) = [l + exp( X,/3)] in the truncated-geometric model, so that the residual is y, - [l +

AZ= C (exp(X~B)/[~exp(~B)+112)Xt (B.ll)

A$$ = C { J/edX,P)/[#exp(4B) + 11’) XX

var(Y)/E(Y) = {&dde) - (1 -IC)[LY+(Y,~)I~}