0% found this document useful (0 votes)
66 views23 pages

Maximum Likelihood Estimation of Limited and Discrete Dependent Variable Models With Nested Random Effects

New approaches to maximum likelihood estimation of random effects models. Gauss-hermite quadrature is often used to evaluate and maximize the likelihood for random component probit models. We extend the adaptive quadrature approach to general random coefficient models with limited and discrete dependent variables. Models can include several nested random effects (intercepts and coefficients) representing unobserved heterogeneity at different levels of a hierarchical dataset.

Uploaded by

Jaime Navarro
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views23 pages

Maximum Likelihood Estimation of Limited and Discrete Dependent Variable Models With Nested Random Effects

New approaches to maximum likelihood estimation of random effects models. Gauss-hermite quadrature is often used to evaluate and maximize the likelihood for random component probit models. We extend the adaptive quadrature approach to general random coefficient models with limited and discrete dependent variables. Models can include several nested random effects (intercepts and coefficients) representing unobserved heterogeneity at different levels of a hierarchical dataset.

Uploaded by

Jaime Navarro
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

ARTICLE IN PRESS

Journal of Econometrics 128 (2005) 301323 www.elsevier.com/locate/econbase

Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects$
Sophia Rabe-Hesketha,, Anders Skrondalb, Andrew Picklesc
a

Graduate School of Education, University of California, 3659 Tolman Hall, Berkeley, CA 94720, USA b Biostatistics Group, Division of Epidemiology, Norwegian Institute of Public Health, Oslo, Norway c School of Epidemiology and Health Sciences & CCSR, The University of Manchester, UK Received 28 March 2002; received in revised form 15 July 2003 Available online 19 October 2004

Abstract GaussHermite quadrature is often used to evaluate and maximize the likelihood for random component probit models. Unfortunately, the estimates are biased for large cluster sizes and/or intraclass correlations. We show that adaptive quadrature largely overcomes these problems. We then extend the adaptive quadrature approach to general random coefcient models with limited and discrete dependent variables. The models can include several nested random effects (intercepts and coefcients) representing unobserved heterogeneity at different levels of a hierarchical dataset. The required multivariate integrals are evaluated efciently using spherical quadrature rules. Simulations show that adaptive quadrature performs well in a wide range of situations. r 2004 Published by Elsevier B.V.
JEL classication: C1; C8 Keywords: Random effects; Random coefcients; Multilevel models; Hierarchical models; Numerical integration; Adaptive quadrature; Spherical quadrature rules; GLLAMM

$ This paper was completed while the rst author was employed by the Institute of Psychiatry, Kings College London. Corresponding author. E-mail addresses: [email protected] (S. Rabe-Hesketh), [email protected] (A. Skrondal), [email protected] (A. Pickles).

0304-4076/$ - see front matter r 2004 Published by Elsevier B.V. doi:10.1016/j.jeconom.2004.08.017

ARTICLE IN PRESS
302 S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323

1. Introduction We consider novel approaches to maximum likelihood estimation of random effects models for limited and discrete dependent variables based on numerical integration. The simplest model includes a single random component or intercept that varies between clusters of observations and induces dependence within these clusters. Random effects models are useful for modeling panel data or grouped cross-sectional data where the responses for the same person or group cannot be assumed to be independent after conditioning on exogenous variables. In the grouped cross-sectional case the groups or clusters could be for instance households, rms or geographical entities. Multilevel or hierarchical models accommodate more than one level of clustering, an example being panel data with time-points (level 1) nested in individuals (level 2) who are nested in rms (level 3). Nested random intercepts at the rm and individual levels can then be used to model unobserved heterogeneity between rms and between individuals within rms. The rm-level random intercept induces dependence among individuals in the same rm and the individual-level random intercept induces additional dependence among observations on the same individual. Random coefcients can be included to model unobserved heterogeneity in the effects of variables between rms and/or individuals. Recent publications on random effects and multilevel models in economics and econometrics include Antweiler (2001), Baltagi et al. (2001), Beron et al. (1999), Blundell and Windmeijer (1997), Cardoso (2000), Carey (2000), Davis (2002) and Rice and Jones (1997). We also refer to Baltagi (2001) and Hsiao (2003) for discussions of multilevel models. In limited and discrete dependent variable models with normally distributed random effects, the marginal likelihood generally does not have a closed form. A standard approach to parameter estimation is therefore to evaluate the marginal likelihood numerically using GaussHermite quadrature. For two-level random component (also called random intercept) binary probit models, this approach is often attributed to Butler and Moftt (1982) although it was introduced earlier for closely related models by Bock and Lieberman (1970). Gaussian quadrature tends to work well with moderate cluster sizes as typically found in panel data. However with large cluster sizes, which are common in grouped cross-sectional data, the estimates become biased. This problem was pointed out recently by Borjas and Sueyoshi (1994) and Lee (2000) for probit models, by Albert and Follmann (2000) for Poisson models and by Lesaffre and Spiessens (2001) for logit models. Lee (2000) attributes the poor performance of quadrature to numerical underow and develops an algorithm to overcome this problem. For probit models his algorithm works well in simulations with clusters as large as 100 when the intraclass correlation is 0.3 but produces biased estimates when the correlation is increased to 0.6. A likely reason for this is that for large clusters and high intraclass correlations, the integrands of the cluster contributions to the likelihood have very sharp peaks that may be located between adjacent quadrature points. Albert and Follmann (2000) and Lesaffre and Spiessens (2001) illustrate this problem for Poisson and logit models,

ARTICLE IN PRESS
S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323 303

respectively. Naylor and Smith (1982) suggest a solution to a similar problem encountered in Bayesian statistics where numerical integration is often used to compute posterior densities. Essentially, the solution consists of scaling and translating the quadrature locations to place them under the peak of the integrand. A slightly different version of this adaptive quadrature approach has been suggested by Liu and Pierce (1994). In this paper we initially describe and implement Naylor and Smiths version of adaptive quadrature for random component probit models. In a simulation study we show that, in contrast to the method suggested by Lee (2000), adaptive quadrature provides unbiased estimates for random component probit models with clusters as large as 500 and intraclass correlations as high as 0.9. Even for smaller cluster sizes and intraclass correlations, where ordinary quadrature is adequate, adaptive quadrature is superior since it requires fewer quadrature points. We extend the estimation method to models including (1) nested random effects and (2) random coefcients in addition to random intercepts. Although adaptive quadrature has previously been implemented for generalized linear mixed models with a single level of clustering (Pinheiro and Bates, 1995) and for multidimensional probit item factor analysis (Bock and Schilling, 1997), this is to our knowledge the rst generalization for multilevel models. We carry out simulations to assess the performance of adaptive quadrature in the multilevel setting. For models including random coefcients, the likelihood involves multidimensional integrals which are usually evaluated using cartesian product quadrature (e.g. Bock and Aitkin, 1981; Lillard, 1993). We suggest using spherical quadrature rules specically designed for integrating over multivariate normal densities (Stroud, 1971) since these rules require fewer quadrature points to achieve a given accuracy. Simulations are carried out to assess the performance of adaptive quadrature using spherical rules.

2. Estimation using adaptive and spherical quadrature In Section 2.1 we describe adaptive quadrature for random component binary probit models. In Section 2.2 we extend adaptive quadrature to multilevel random coefcient models. Here cartesian product quadrature is used to evaluate multivariate integrals. Section 2.3 describes spherical quadrature rules as a more efcient alternative to cartesian quadrature. Section 4 shows how the methods are applied to models with other types of discrete and limited dependent variables. 2.1. Adaptive quadrature for random component probit models The random component binary probit model can be written as y x0ij b uj ij , ij yij Iy 40, ij

ARTICLE IN PRESS
304 S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323

where i 1; . . . ; nj indexes the individual observations, j 1; . . . ; N indexes clusters of observations, xij is a vector of explanatory variables, b is a vector of corresponding regression coefcients, uj is the random intercept for cluster j and ij is an error term. In a panel data setting, i is a time-point, j an individual and uj represents time constant unobserved heterogeneity in the behavior of the individual which renders his or her nj observations correlated. The random terms uj and ij are mutually independent, uj $ N0; s2 and ij $ N0; 1 and independent of the explanatory variables xij : The residual intraclass correlation for the underlying responses is r  Cory ; y0 j jxij ; xi0 j ij i s2 . 1 s2

The likelihood contribution of the jth cluster is a multivariate integral over the correlated total error terms uj ij ; i 1; . . . ; nj : Using an idea at least known since Dunnett and Sobel (1955), Bock and Lieberman (1970) and Butler and Moftt (1982) simplify this integral to a univariate integral by exploiting the fact that the error terms are conditionally independent given the random effect. For a given cluster j, the likelihood contribution therefore is nj Z Y 1 f 2 y guj ; 0; s2 f ij yjuj duj , (1) j
i1

where y is the vector of all parameters, g; m; s2 is the normal density with mean m and variance s2 and f 1 yjuj is the conditional likelihood contribution of unit ij ij given the random effect, f 1 yjuj yij FZij 1 yij FZij , ij (2)

where F is the standard normal cumulative distribution function and Zij is the linear predictor Zij x0ij b uj . The integral, which cannot be solved analytically, can instead be evaluated numerically using GaussHermite quadrature (see e.g., Stroud and Secrest, 1966). Instead of integrating over uj ; we will integrate over vj uj =s with standard normal density fvj : The approximation then is nj nj Z R Y 1 X Y 1 f ij yjvj dvj % pr f ij yjar , (3) f 2 y fvj j p p where ppr and ar = 2 are the weights and locations of R point Gaussian quadrature R for integrals of the form expx2 f xdx: The method is exact if f x is a polynomial of degree up to 2R 1: In the context of Bayesian inference, Naylor and Smith (1982) suggest an improved integration method that is adaptive in the sense that it takes into account Qn j the properties of the integrand fvj i1 f 1 yjvj : Note that the integrand is the ij product of the prior density of vj and the joint probability of the responses given vj
i1 r1 i1

ARTICLE IN PRESS
S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323 305

which, after normalization with respect to vj ; is just the posterior density of vj given the observed responses. According to the Bayesian central limit theorem (e.g., Carlin and Louis, 2000, p. 122124), posterior densities are approximately normal for large sample sizes, corresponding to large cluster sizes nj in this application. If mj and t2 j are the mean and variance of the posterior density, we would therefore expect the Qnj 1 ratio fvj i1 f ij yjvj =gvj ; mj ; t2 to be well approximated by a low-degree j polynomial. Writing the integral as ! Qn j Z fvj i1 f 1 yjvj ij 2 2 dvj , f j y gvj ; mj ; tj gvj ; mj ; t2 j changing the variable of integration from vj to zj vj mj =tj and applying the standard quadrature rule yields f 2 y j where ajr mj tj ar , pjr p 2ptj expa2 =2fmj tj ar pr . r (5) (6) %
R X r1

pjr

nj Y i1

f 1 yjajr , ij

(4)

Pinheiro and Bates (1995) point out that this approach is essentially a deterministic version of importance sampling with gvj ; mj ; t2 as importance density. j The advantage of adaptive quadrature can be seen in Fig. 1 which illustrates for R 5 how adaptive quadrature translates and scales the locations so that they lie directly under the integrand. The posterior means and standard deviations required for adaptive quadrature are themselves obtained using adaptive quadrature so that the integration is iterative. Using starting values m0 0 and t0 1 to dene a0 and p0 ; the posterior means and j j jr jr

Ordinary quadrature
1.2 1.0 1.2 1.0

Adaptive quadrature

density

0.6 0.4 0.2 0.0 -4 -2 0

density
2 4

0.8

0.8 0.6 0.4 0.2 0.0

-4

-2

Fig. 1. Prior (dotted curve) and posterior (solid curve) densities and quadrature weights (bars) for ordinary and adaptive quadrature. The integrand is proportional to the posterior density.

ARTICLE IN PRESS
306 S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323

standard deviations are updated in the kth iteration using f 2k y j mk j


R X

pk1 jr

nj Y

f 1 yjak1 , ij jr

PR

r1

v Qnj uPR u r1 ak1 2 pk1 i1 f 1 yjak1 jr jr jr ij k t tj mk 2 , j f 2k y j

i1 Q 1 k1 k1 nj k1 r1 ajr pjr i1 f ij yjajr , f 2k y j

followed by evaluation of ak and pk using (5) and (6). This sequence is repeated until jr jr convergence. A similar iterative algorithm is described in another context by Naylor and Smith (1988). The algorithm can converge very slowly or fail to converge if insufcient quadrature points are used to evaluate the posterior moments accurately, giving a useful warning that the approximation is poor. Liu and Pierce (1994) describe an integration method based on a rst order Laplace approximation (Tierney and Kadane, 1986) where mj is the mode of the integrand and tj is the standard deviation of the normal density approximating the integrand at the mode. Pinheiro and Bates (1995) use this method in the context of two-level random coefcient models. An advantage of their approach is that mj and tj do not themselves rely on the quadrature approximation so that an iterative process of the kind described above is not required. However, the method is also computationally demanding since numerical optimization and differentiation are required to determine mj and tj for each cluster. In addition, the posterior mean and standard deviation may better reect the shape of the integrand when its tails are heavier than that of a normal density. Most importantly for our purposes, the rst order Laplace approximation cannot be readily extended to multilevel problems as we will see in the next section. Both methods are of course equivalent if the posterior distribution is normal. So far we have only addressed the problem of evaluating the marginal likelihood for given parameter values y: The next problem is to maximize this marginal likelihood with respect to y: Bock and Aitkin (1981) and others use Gaussian quadrature within an EM algorithm. We use a NewtonRaphson algorithm where the Hessian is obtained by numerical differentiation. Interestingly, numerical derivatives may be more accurate than numerically integrated analytical derivatives since the integrals for the derivatives are often very poorly approximated by quadrature or adaptive quadrature (Lesaffre and Spiessens, 2001). Numerical differentiation requires repeated evaluation of the marginal likelihood in the neighborhood of the current parameter values. We do not update the quadrature locations and weights for each of these evaluations but keep them xed for a full iteration of the NewtonRaphson procedure. The algorithm alternates between a step of NewtonRaphson to update the parameter values and the set of iterations in (7) to update the quadrature locations and weights. The reasons for not updating the quadrature locations and weights during numerical differentiation are that it would

ARTICLE IN PRESS
S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323 307

be computationally demanding and that large changes in these quantities could make the likelihood surface appear discontinuous.

2.2. Adaptive quadrature for multilevel random coefcient models A general three-level random coefcient model can be written as Zijk x0ijk b x20 u2 x30 u3 , ijk jk ijk k (8)

where i, j and k index the units at levels 1, 2 and 3, respectively (e.g. time-points in individuals in rms), x0ijk b is the xed effects part, x2 is a vector of explanatory ijk variables with random effects u2 at level 2 and x3 is a vector of explanatory jk ijk variables with random effects u3 at level 3. The random effects at a given level have k a multivariate normal distribution and the random effects at different levels are mutually independent and independent of the residual error term ijk and explanatory variables. The general L level version of this model can be written as Z x0 b
L X l2

xl0 ul ,

where subscripts are omitted to simplify notation. The marginal log-likelihood is X Ly ln f L y, where f L y is the likelihood contribution of a unit at the highest level L. Let U l ul0 ; . . . ; uL0 0 for lpL: Exploiting conditional independence among levell 1 units given the random effects U l at levels l and above, the likelihood contribution of a given level-l unit can be obtained recursively as R l Q f l yjU l1 gu ; 0; Sl f l1 yjU l dul ; l 2; . . . ; L 1 R L Q L1 L L L L gu ; 0; S f yju du ; f y (9) where f yjU is the conditional level-1 likelihood contribution given in (2) for binary probit models (see Section 4 for other response models), gul ; 0; Sl is the multivariate normal density of ul with covariance matrix Sl and the product is over all level-l 1 units within the level-l unit as shown explicitly for a two-level model in (1). Instead of integrating over the correlated random effects ul ; we will integrate over independent standard normal variables vl with ul Ql vl ,
L0 0 1 2

(10)

where Ql is the Cholesky decomposition of Sl : Letting V l vl0 ; . . . ; v ; the integral over the M l random effects at level l can then be approximated by cartesian

ARTICLE IN PRESS
308 S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323

product quadrature, f yjV


l l1

Y fvM . . . fv1 f l1 yjv1 ; . . . ; vM ; V l1 dv1 dvM Y X X f l1 yjar1 ; . . . ; arM ; V l1 , % pr M pr1 11


rM r1

where we have omitted the l superscript for M and the variables being integrated over and will continue to do so in the remainder of this section. We can improve the approximation by using adaptive quadrature. Although the multivariate integrals in (11) are evaluated as nested sets of univariate integrals, rst over v1 ; then over v2 ; up to vM ; we cannot simply apply the adaptive quadrature rule in (5) and (6) to each univariate integral. This is because when integrating over a given vm ; the integrand is proportional to the posterior density of vm conditional on all random effects not yet integrated over, i.e. vm1 to vM and all higher level random effects. Since the random effects will generally have non-zero posterior correlations, we would therefore require the conditional posterior moments of vm given all random effects not yet integrated over. We can simplify the problem considerably by transforming to a new set of random effects with zero posterior correlations so that the marginal moments can be used. Naylor and Smith (1988) discuss this problem in a Bayesian context and suggest the orthogonalizing transformation w1 v1 , ws vs with gst covvs ; wt =varwt , where we have omitted the l superscript and let P denote the sth of all random vs effects (in some order) with s 1; . . . ; S and S l M l : The transformation has unit Jacobian. The sequence of transformations therefore starts with random effects zs with zero posterior means and covariances and unit posterior variances which are evaluated at the GaussHermite quadrature locations ar ; r 1; . . . ; R: These random effects are rescaled to ws ms ts zs ; giving the adaptive quadrature locations for univariate integration, asr in (5), and transformed to vs via (12). The adaptive quadrature locations for multivariate integration are therefore given by Asr asr
s1 X t1 s1 X t1

gst wt ;

s 2; . . . ; S

12

gst atr

with corresponding weights p Psr 2p ts expa2 =2fAsr pr . r The weights Psr for the sth random effect depend on Asr and hence on the locations atr of all preceding random effects tos: In order to keep the weights of higher level

ARTICLE IN PRESS
S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323 309

effects constant when integrating over the lower level effects, the vs should be ordered from the highest to lowest level, the ordering within a level being arbitrary. For two random effects, the transformation from z1 ; z2 to v1 ; v2 and hence from ar1 ; ar2 to A1r1 ; A2r2 is illustrated in the rst row of Fig. 2. It is clear that, for given posterior means and standard deviations, adaptive quadrature will be particularly superior to ordinary quadrature when the variables vs have marked posterior correlations. Note that we would expect substantial negative posterior correlations between random intercepts at different levels since the effect (on the posterior distribution) of increasing the higher level random intercept can to some degree be counteracted by decreasing the lower level one and vice versa. The gst required for the transformations in (12) as well as the posterior moments m and t of w can be obtained from the posterior means, variances and covariances of v. For given adaptive quadrature locations and weights, the algorithm computes the marginal likelihood and posterior moments of v recursively from level 2 to L. The terms evaluated at a given level l are displayed in Table 1. After evaluating all terms up to level L, the posterior variances and covariances are found using
l l covvk vn Evk vl Evk Evn . m m n m

Ordinary quadrature
6 4 2 8 6 4

Adaptive quadrature

Cartesian (64 points)

z2

v2
-5 0 5

2 0

-2 -4 -6

-2 -4

z1

-5

v1

10

6 4 2

8 6 4

Spherical (44 points)

z2

v2
-5 0 5

2 0

-2 -4 -6

-2 -4

z1

-5

v1

10

Fig. 2. Locations for ordinary and adaptive integration in two dimensions using cartesian and spherical quadrature rules with d 7; where m1 1; m2 2; t1 t2 1 and the posterior correlation is 0:5:

ARTICLE IN PRESS
310 S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323

Table 1 Quantities evaluated at level l to obtain the likelihood by adaptive quadrature Likelihood: f l yjV l1 First order moments: Evl jV l Al , m m X
rM

PMrM . . .

X
r1

P1r1

f l1 yjA1r1 ; . . . ; AMrM ; V l1 .

m 1; . . . ; M l , P Evk jV l1 m P Q

rM

PMrM

r1

P1r1 Evk jV l m
l

f l1 yjA1r1 ; . . . ; AMrM ; V l1

f yjV

l1

k 1; . . . ; l; m 1; . . . ; M k . Second order moments:


li Evl vn jV l Al Ali , m m n

( m 1; . . . ; M ; i 0; . . . ; L l; n
l

m; . . . ; M l 1; . . . ; M li

i0 i40 Q

P Evk vki jV l1 m n

rM

PMrM

k ki jV l f l1 yjA1r1 ; . . . ; AMrM ; V l1 r1 P1r1 Evm vn , l l1

f yjV (

k 1; . . . ; l; m 1; . . . ; M ; i 0; . . . ; L k; n

m; . . . ; M k 1; . . . ; M ki

i0 i40:

These moments can be used to update the quadrature locations and weights and we can iterate as in the univariate case until convergence. This set of iterations is then alternated with single steps of a NewtonRaphson procedure as described in the univariate case. Note that adaptive quadrature as described here, based on the posterior moments, can be applied as easily to multilevel models as to two-level models. This is in contrast to the rst order Laplace approximation suggested by Liu and Pierce (1994). Applying their method to two-level models is straightforwardthe mode with respect to all the random effects is found and the covariance matrix of the

ARTICLE IN PRESS
S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323 311

approximating multivariate normal density is found from the inverse Hessian matrix of the log of the integrand. However, in multilevel models, nding the mode with respect to vl would require integrating out all lower level random effects v2 ; . . . ; vl1 for each value of vl during numerical optimization and differentiation with respect to vl : 2.3. Multivariate integration using spherical quadrature rules Cartesian product quadrature in (11) is a straightforward application of GaussHermite quadrature to multidimensional integration. However, as pointed out in Naylor and Smith (1988), integrals of the form Z Z expx2 x2 f x1 ; . . . ; xM dx1 . . . dxM 1 M can often be integrated more efciently using spherical quadrature rules. These rules are located on concentric hyperspheres as illustrated for two dimensions in the bottom left panel of Fig. 2. A rule of degree d is exact if f x1 ; . . . ; xM is a linear combination of monomials of k kM the form x1 1 . . . xM with k1 kM pd: Cartesian product quadrature with R points per dimension is exact for monomials with degree d 2R 1: In addition, cartesian product quadrature is exact for monomials with k1 kM 4d as long as k1 pd; . . . ; kM pd: A compilation of quadrature rules for multidimensional integrals is given in Stroud (1971) and has been updated by Cools and Rabinowitz (1993) and Cools (1999). The most efcient quadrature rules for a certain dimension M and degree d are those that require the fewest number of points; rules with positive weights are generally more accurate than rules with some negative weights. For integrals of the form above, the most efcient published degree 7 rules with positive weights that we are aware of use 2M 2M 2 1 points for M 3; 4; 6 and 2M1 4M 2 for MX3 and are given in Stroud (1971). For example, in six dimensions, the rule requires 137 points compared with 4096 ( 4M ) for cartesian product quadrature. Unfortunately, we are aware of published higher degree rules with positive weights only for M 2; 3: We can use these spherical rules to evaluate the M l dimensional integrals at each level l 2;P. ; L using ordinary or adaptive quadrature. However, we cannot use a .. single S l M l spherical rule for integrating over the random effects at all levels. This is because, as shown in (9), integration with respect to ul to compute f l yjU l1 requires f l1 yjU l to be evaluated by complete integration with respect to ul1 for each value of ul : Cartesian quadrature provides such nested integration as seen in (11) and by considering the case S 2 illustrated in the rst row of Fig. 2 where the sum along a given column of quadrature points corresponds to complete integration with respect to v2 for a given value of v1 : In contrast, as remarked by Naylor and Smith (1988), spherical quadrature does not permit such marginalization.

ARTICLE IN PRESS
312 S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323

3. Simulation study 3.1. Simple random component probit model We rst investigate the bias in parameter estimates using both ordinary and adaptive quadrature for the random component or random intercept binary probit model. The following model was simulated: y b0 b1 x1ij b2 x2j uj ij ; ij varij 1,

where x1ij varies between level-1 units ij and takes on the values 0 and 1 with probabilities equal to 0.5, whereas x2j varies between clusters j also taking on values 0 and 1 with probabilities 0.5 independently of x1ij : The xed parameters were set to b0 0; b1 1; b2 1; s was varied so that r 0:30; 0.45, 0.60, 0.75, 0.90 and combined with cluster sizes nj 10; 100; 500: We used 1000 clusters to obtain precise estimates of the biases with 50 replications. For each simulated dataset, the parameters were estimated by ordinary quadrature with 10, 20 and 40 points and by adaptive quadrature with 5, 10 and 20 points. If the relative change in mean log-likelihood with increasing numbers of quadrature points was no more than 5 105 ; the smaller number of quadrature points was considered adequate; otherwise the maximum number of quadrature points was used even if it appeared inadequate. Table 2 shows the number of quadrature points and the means and standard deviations of b: The corresponding results for the regression s coefcients are given in Table 3. Fig. 3 shows boxplots of the relative bias of b s dened as b s=s: s
Table 2 Estimates of s using R-point ordinary and adaptive quadrature Ordinary quadrature nj 10 10 10 10 10 100 100 100 100 100 500 500 500 500 500
*

Adaptive quadrature R 10 20 20 40 40 40 40 40 40 40 40 40 40 40 40 Mean b s 0.659 0.904 1.225 1.740 2.986 0.653 0.910 1.229 1.713 2.935 0.654 0.910 1.240 1.732 2.991 (sd) (0.025) (0.038) (0.042) (0.067) (0.133) (0.018) (0.024) (0.030) (0.067) (0.090)* (0.019) (0.021) (0.034)* (0.050) (0.081) R 5 10 10 20 20 5 5 5 20 20 5 5 5 20 20

r 0.30 0.45 0.60 0.75 0.90 0.30 0.45 0.60 0.75 0.90 0.30 0.45 0.60 0.75 0.90

s 0.655 0.905 1.225 1.732 3.000 0.655 0.905 1.225 1.732 3.000 0.655 0.905 1.225 1.732 3.000

Mean b s 0.658 0.903 1.224 1.739 2.812 0.649 0.878 1.073 1.332 1.768 0.543 0.661 0.782 0.951 1.224

(sd) (0.025) (0.037) (0.041) (0.067) (0.106)* (0.017)* (0.028)* (0.030)* (0.044)* (0.062)* (0.023)* (0.023)* (0.030)* (0.043)* (0.056)*

True value outside approximate 95% condence interval.

ARTICLE IN PRESS
S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323 313

Table 3 Estimates of b0 ; b1 and b2 (true values 0,1,1) using ordinary and adaptive quadrature with the same number of quadrature points R as in Table 2
Ordinary quadrature b b0 nj 10 10 10 10 10 100 100 100 100 100 500 500 500 500 500 r 0.30 0.45 0.60 0.75 0.90 0.30 0.45 0.60 0.75 0.90 0.30 0.45 0.60 0.75 0.90 Mean (sd) 0.01 0.01 0.01 0.00 0.05 0.01 0.00 0.03 0.05 0.11 0.00 0.00 0.01 0.03 0.08 (0.04) (0.05) (0.07) (0.08) (0.23) (0.03) (0.08) (0.15) (0.25) (0.36) (0.08) (0.13) (0.18) (0.39) (0.62) b b1 Mean (sd) 0.99 1.00 1.00 0.99 1.00 1.00 1.00 1.00 0.99 0.98 1.00 1.00 1.00 0.99 0.99 (0.03) (0.04) (0.04) (0.04) (0.05) (0.01) (0.01) (0.01) (0.01) (0.02) (0.01) (0.00) (0.01) (0.01) (0.01) b b2 Mean (sd) 1.02 1.00 1.01 0.99 1.02 1.01 1.00 0.95 1.02 0.90 0.99 0.98 0.88 0.75 0.56 (0.06) (0.07) (0.09) (0.11) (0.30) (0.04) (0.11) (0.21) (0.33) (0.47) (0.08) (0.16) (0.26) (0.41) (0.89) Adaptive quadrature b b0 Mean 0.01 0.01 0.01 0.00 0.02 0.00 0.00 0.01 0.01 0.06 0.00 0.01 0.01 0.01 0.06 (sd) (0.04) (0.05) (0.07) (0.08) (0.15) (0.03) (0.04) (0.06) (0.07) (0.14) (0.03) (0.04) (0.05) (0.08) (0.15) b b1 Mean (sd) 0.99 1.00 1.00 0.99 1.01 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 (0.03) (0.04) (0.04) (0.04) (0.05) (0.01) (0.01) (0.01) (0.01) (0.02) (0.01) (0.00) (0.01) (0.01) (0.01) b b2 Mean (sd) 1.02 1.01 1.01 0.99 0.96 1.01 1.02 1.02 1.04 0.97 1.01 1.01 1.02 0.99 0.96 (0.06) (0.07) (0.09) (0.11) (0.22) (0.04) (0.06) (0.06) (0.08) (0.20) (0.04) (0.06) (0.07) (0.13) (0.23)

Ordinary quadrature
0.2 0.2

Adaptive quadrature

0.0

0.0

Relative bias

Relative bias
0.3 0.45 0.6 0.75 0.9 0.3 0.45 0.6 0.75 0.9 0.3 0.45 0.6 0.75 0.9

-0.2

-0.2

-0.4

-0.4

-0.6

-0.6
0.3 0.45 0.6 0.75 0.9 0.3 0.45 0.6 0.75 0.9 0.3 0.45 0.6 0.75 0.9

10

100

500

10

100

500

Correlation and cluster size

Correlation and cluster size

Fig. 3. Relative bias of b for ordinary and adaptive quadrature. s

Adaptive quadrature requires a considerably smaller number of quadrature points than ordinary quadrature to achieve a stable log-likelihood. Using ordinary quadrature, the standard deviation estimates become increasingly biased as the cluster size and intraclass correlation increase, 40 points being clearly inadequate for correlations above 0.45 when nj 100 and above 0.3 when nj 500: Adaptive quadrature performs very well for all combinations of nj and r with no more than 20 quadrature points and fewer for lower intraclass correlations. As expected, adaptive

ARTICLE IN PRESS
314 S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323

quadrature appears to work better for larger cluster sizes where the posterior distribution is closer to normal. Somewhat surprisingly, the estimates of the intercept b0 and of the regression coefcient b1 of the within-cluster covariate are fairly unbiased even where the estimates of the standard deviation s are biased using ordinary quadrature. However, using ordinary quadrature, the estimates of the regression coefcient b2 of the between-cluster covariate have severe downward bias for large clusters and high intraclass correlation. Moreover, in many cases the standard deviations of the estimates of b0 and b2 are substantially larger than for adaptive quadrature, meaning that the estimates for a particular dataset can be very poor. 3.2. Three-level probit model We now consider three-level binary probit models of the form y b0 u2 u3 ijk ; ijk jk k varijk 1,

where the level-2 random intercept u2 has variance s2 and the level-3 random 2 jk intercept u3 has variance s2 : In particular, we will assess the performance of 3 k adaptive quadrature for different cluster sizes and intraclass correlations. There are two cluster sizes for the three-level model, the number of level-1 units in each level-2 unit, n2 ; and the number of level-2 units in each level-3 unit, n3 : The posterior density of u2 ; conditional on u3 becomes increasingly normal as n2 increases. Therefore jk k fewer quadrature points should be required at level 2 for larger n2 : The posterior density of u3 is the product of the prior density and a product of n3 level-2 k likelihood contributions. This density will become increasingly normal as n3 increases but also as n2 increases, since the level-2 likelihood contributions themselves then become closer to normal. Therefore, generally, fewer quadrature points may be required at level 3 than at level 2. In addition to estimating the parameters with 5 and 10 point quadrature per dimension, we will therefore also try using a larger number of quadrature points at level 2 (10 points) than level 3 (5 points). There are several ways of dening intraclass correlations. The marginal correlation between units in the same level-2 and level-3 units is r23  cory ; y0 jk ijk i s2 s2 2 3 , s2 s2 1 2 3 s2 2 . 1

whereas the conditional correlation, conditioning on the level-3 random effect, is r2j3  cory ; y0 jk ju3 ijk i k s2 2

The correlation between units in the same level-3 unit but different level-2 units is r3  cory ; y0 j 0 k ijk i s2 3 . s2 s2 1 2 3

ARTICLE IN PRESS
S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323 315

The mean parameter estimates over 50 simulations for different combinations of n2 ; n3 ; r2j3 and r3 are given in Table 4. Consistent with the results of the previous section, 5-point adaptive quadrature at level 2 is inadequate when the level-2 cluster size, n2 ; is 10 and the intraclass correlation r2j3 is 0:6: These biases are greater when r3 is large, but, surprisingly, lower when n3 is small. When both intraclass correlations are high, s3 is poorly estimated with 5-point quadrature even when n2 is large. A striking result is that in all simulations where 10-point quadrature per dimension performed better than 5point quadrature, the combination of 10 points at level 2 and 5 points at level 3 worked nearly as well.

3.3. Random coefcient probit models We simulated data for 1000 clusters j each with 10 level-1 units ij from the twolevel binary probit model y b0 b1 xij u0j u1j xij ij ; ij varij 1,

where xij varies between level-1 units and equals 0 or 1 with equal probabilities, b0 0:5; b1 1 and the random intercept u0j and slope u1j have unit standard deviations s0 s1 1 and covariance s01 0:5: Conditional on xij ; the correlation between y and y0 j is then 0.5 if xij xi0 j 0; 0.53 if xij axi0 j and 0.75 if xij xi0 j 1: ij i We estimated the parameters using adaptive quadrature with both cartesian and spherical rules of degrees 7, 11 and 15, requiring 16, 36 and 64 points for cartesian quadrature and 12, 28 and 44 for spherical quadrature. The results are shown in Table 5. The spherical rules give nearly identical results as the same degree cartesian rules and the eleven degree rule appears to be adequate. We repeated the simulations with s0 s1 1:5 and s01 1:25 so that the intraclass correlations ranged between 0.67 and 0.87. As expected due to the higher intraclass correlations, the eleven degree rule no longer appears adequate and a fteen degree rule is required. The spherical rules of a given degree now appear a little inferior to the cartesian rules of the same degree. To illustrate the usefulness of spherical rules for estimating models with many random coefcients, we simulated a dataset with six correlated random effects from the model y x0ij b x0ij uj ij ; ij varij 1,

where x1ij 1 and x2ij to x6ij are mutually independent, equal to 0 and 1 with probabilities 0.5. We simulated 100 clusters of size 100 and estimated the parameters using adaptive quadrature with a 137-point degree 7 spherical rule (rule 7-1 in Stroud, 1971). The true and estimated parameters are given in Table 6. Out of the 27 parameters, 20 were within a standard error of the true value and only 3 were more than two standard errors away from the true value.

316

Table 4 Mean parameter estimates (standard deviations) using adaptive quadrature for some three-level models n2 ;n3 10,100 r2j3 0.6 r3 0.3 r23 0.72 Param. s2 s3 b0 Log-lik. s2 s3 b0 Log-lik. s2 s3 b0 Log-lik. s2 s3 b0 Log-lik. s2 s3 b0 Log-lik. True value 1.225 1.035 0.000 1.225 1.035 0.000 1.225 1.936 0.000 1.225 1.936 0.000 1.225 1.936 0.000 Mean estimate 5, 5 points 1.217 (0.014) 2.542 (1.125) 0.026 (0.876) 46399.57 1.246 (0.025) 1.231 (0.111) 0.041 (0.158) 39690.86
* * * *

Mean estimate 10, 10 points 1.222 (0.015) 1.076 (0.117) 0.017 (0.110) 46345.76 1.237 (0.040) 1.102 (0.092) 0.016 (0.133) 39686.62 1.226 (0.021) 2.021 (0.166) 0.072(0.221) 35623.16 1.235 (0.037) 1.965 (0.138) 0.005 (0.245) 30302.50 1.231 (0.051) 1.967 (0.192) 0.003 (0.187) 3664.53

Mean estimate 10, 5 points 1.222 (0.014) 1.065 (0.092) 0.020 (0.104) 46345.25 1.232 (0.031) 1.032 (0.068) 0.004 (0.111) 39685.72 1.221 (0.019) 2.033 (0.187) 0.066 (0.187) 35623.19 1.235 (0.037) 1.979 (0.140) 0.011 (0.244) 30302.57 1.231 (0.051) 1.969 (0.201) 0.004 (0.190) 3664.61

S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323

ARTICLE IN PRESS

100,10

0.6

0.3

0.72

10,100

0.6

0.6

0.84

100,10

0.6

0.6

0.84

1.249 (0.050) 2.919 (0.422) 0.021 (0.407) 30317.49 1.222 (0.050) 2.110 (0.238) 0.002 (0.293) 3665.78

10,10

0.6

0.6

0.84

Converged only for 40 datasets and gave very many large estimates of s3 :

S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323

Table 5 Mean estimates (standard deviations) using cartesian and spherical adaptive quadrature of different degrees for random coefcient probit models Degree 7 True param. b0 b1 s0 s1 s01 Log-lik. b0 b1 s0 s1 s01 Log-lik. (0.5) (1.0) (1.0) (1.0) (0.5) Cartesian 0.500 (0.041) 1.006 (0.048) 1.007 (0.045) 1.001 (0.070) 0.539 (0.078) 5197.2 (64.75) 0.479 (0.055) 1.019 (0.071) 1.534 (0.064) 1.512 (0.099) 1.184 (0.203) 4496.5 (73.84) Spherical 0.499 (0.041) 1.001 (0.049) 1.011 (0.045) 1.008 (0.070) 0.552 (0.076) 5196.7 (64.82) 0.465 (0.054) 1.040 (0.072) 1.554 (0.066) 1.538 (0.101) 1.323 (0.205) 4494.8 (74.17) Degree 11 Cartesian 0.500 (0.041) 1.003 (0.048) 0.993 (0.044) 0.999 (0.068) 0.508 (0.071) 5198.0 (64.64) 0.477 (0.065) 1.007 (0.070) 1.479 (0.074) 1.476 (0.089) 1.090 (0.175) 4501.1 (73.70) Spherical 0.503 (0.042) 1.000 (0.048) 0.993 (0.044) 0.995 (0.066) 0.501 (0.071) 5198.3 (64.62) 0.492 (0.057) 1.002 (0.070) 1.486 (0.062) 1.436 (0.078) 1.036 (0.184) 4502.2 (73.45) Degree 15 Cartesian 0.501 (0.042) 1.001 (0.048) 0.994 (0.044) 1.001 (0.068) 0.507 (0.071) 5198.0 (64.64) 0.497 (0.059) 0.997 (0.071) 1.489(0.062) 1.498 (0.091) 1.098 (0.176) 4500.4 (73.38) Spherical 0.501 (0.041) 1.002 (0.048) 0.994 (0.044) 0.997 (0.068) 0.507 (0.071) 5198.0 (64.63) 0.486 (0.059) 1.003 (0.072) 1.493 (0.063) 1.463 (0.085) 1.097 (0.184) 4501.3 (73.32)

ARTICLE IN PRESS

(0.5) (1.0) (1.5) (1.5) (1.25)

317

318 S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323 Table 6 True parameters in bold and estimates (standard errors) using adaptive spherical quadrature for probit model with six random effects (log-likelihood =5103:82) :0 6 6 :5 6 6 1:0 6 b6 6 1:0 6 6 :5 4 :0 2 2 :03 :05 3

ARTICLE IN PRESS

7 :50 :06 7 7 1:00 :06 7 7 7 :97 :07 7 7 :43 :07 7 5 :07 :06 3 :25 :12 :16 :20 :05 :25 :12 :23 :06 :25 :12 :30 :06 :25 :37 :09 :12 :04 :12 :23 :05 :25 :28 :06 :17 :08 :09 :22 :05 :15 :04 :12 :15 :04 :19 :05 :16 :20 :05 :16 :15 :04 :12 :18 :04 7 7 7 7 7 7 7 7 7 5

:25

:14 :04

6 :13 :04 6 :12 6 6 :12 :07 :03 6 S6 :12 :03 6 :09 6 6 :09 :10 :04 4 :12 :14 :03

ARTICLE IN PRESS
S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323 319

4. Other types of dependent variables The same adaptive quadrature method can be used for counts, durations, continuous, censored and ordinal dependent variables, discrete choices and rankings. The likelihoods have the same form except for the level-1 contribution f 1 yjU 2 ; given for binary dependent variables in (2). For counts, the likelihood contribution of a Poisson model is f 1 yjU 2 expZs s! if y s; s 0; 1; . . . . (13)

Random effects models for counts are discussed in Cameron and Trivedi (1998). If a piecewise exponential proportional hazards model is assumed for durations with hazards remaining constant for intervals of time, Holford (1980) and Clayton (1988) show that each observed duration contributes a product of terms of the form of (13) to the likelihood, namely one term for each interval it exceeds. For continuous dependent variables, we can specify for instance a normal, gamma or inverse Gaussian density depending on the shape of the distribution. For limited dependent variables, we assume that the underlying continuous variable can be modeled as y Z , where  is normally distributed with standard deviation n: For continuous responses subject to left-censoring at bl (Tobin, 1958), right-censoring at br ; or both (Rosett and Nelson, 1975), or for grouped (or interval censored) dependent variables with boundaries bl and br (Stewart, 1983), the likelihood contribution is 8 if uncensored > fy=n=n > > > < FZ br =n if right-censored (14) f 1 yjU 2 if left-censored > Fbl Z=n > > > : Fbr Z=n Fbl Z=n if grouped; where bl and br are usually constant but can vary across units. For ordinal responses with categories s, s 1; . . . ; S; the likelihood is as for grouped dependent variables with unknown thresholds ks1 and ks in place of xed censoring limits bl and br when y s; where 1 k0 ok1 o okS 1 (Aitchison and Silvey, 1957). A number of other random effects models suitable for ordered responses and discrete time durations are described in Rabe-Hesketh et al. (2001c). For discrete choices, we can model the utility for alternative s, s 1; 2; ::; S as y Z s  s , s so that ys if y 4y ; s 8; as.

ARTICLE IN PRESS
320 S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323

If s is Gumbel (extreme value of Type I), with density function exps exps ; the likelihood contribution is a multinomial logit expZs f 1 yjU 2 PS expZ if y s. (15)

An exploded logit is obtained for rankings (e.g. Beggs et al., 1981; Hausman and Ruud, 1987). See Skrondal and Rabe-Hesketh (2003) for a treatment of multilevel random effects models for discrete choices and rankings. Skrondal and RabeHesketh (2004) discuss models with many different types of dependent variables including mixed types. 5. Discussion As far as we are aware, this is the rst generalization of adaptive quadrature for multilevel modeling. Our simulations show that the method performs well in a wide variety of situations including large cluster sizes and high intraclass correlations where ordinary quadrature often fails. Adaptive quadrature requires lower degree integration rules than ordinary quadrature, particularly for the higher level random effects. Further gains in efciency can be achieved by using spherical quadrature rules. Unfortunately, however, there are to our knowledge no published higher degree spherical rules for integrals in four or more dimensions to be used for problems where degree 7 rules are insufcient. Another advantage of adaptive quadrature is that it gives empirical Bayes predictions of cluster or individual-specic random effects and their standard errors as a by-product. These are often of both substantive interest and of importance for checking model specication. Adaptive quadrature is slower than alternative estimation methods such as penalized quasilikelihood (PQL) (Breslow and Clayton, 1993), for example as implemented in the iterative generalized least squares algorithm (Goldstein, 1991). Unfortunately, the parameter estimates from PQL tend to be biased for binary dependent variables with small cluster sizes and high intraclass correlations (e.g. Rodriguez and Goldman, 1995, 2001). Moreover, PQL does not involve a likelihood which prohibits the use of likelihood based inference such as likelihood ratio tests and likelihood based condence intervals. Improved results can be achieved using a sixth order Laplace approximation for the marginal likelihood, LaPlace6, (Raudenbush et al., 2000) which worked as well as 7-point adaptive quadrature in simulations of a two-level binary dependent variable model. However, an advantage of adaptive quadrature is that the precision can be increased by simply using more quadrature points whereas increasing the degree of the Taylor expansion for the Laplace method would require more work (Raudenbush et al., 2000). Computer intensive alternatives to adaptive quadrature include simulation based approaches such as Markov Chain Monte Carlo (MCMC) (e.g. Gelman et al., 2003) and maximum simulated likelihood (MSL) (Hajivassiliou and Ruud, 1994). The hierarchical structure of multilevel models lends itself naturally to MCMC using for instance Gibbs sampling. If vague priors are specied, the method essentially yields

ARTICLE IN PRESS
S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323 321

maximum likelihood estimates. Unfortunately, a problem with this approach is how to ensure that a truly stationary distribution has been obtained. Another important shortcoming is that there is no diagnostic for assessing empirical identication (e.g., Keane, 1992). Regarding simulated maximum likelihood, a merit is that conditional independence specications implicit in standard multilevel models may be relaxed. This can be useful in panel data models where ARMA(p,q) processes and their special cases are sometimes specied for the level-1 errors ij : Furthermore, unlike methods based on quadrature, simulation methods allow statistical analysis of the approximation error. We have conned the simulations to multilevel random effects probit models for binary dependent variables, although the estimation method can be used for many other types of dependent variable as outlined in Section 4. In comparison to binary responses, these other response types tend to yield more concentrated posterior densities where ordinary quadrature can be expected to perform poorly (see e.g. Albert and Follmann, 2000). An example with count data is given in Rabe-Hesketh et al. (2002) where adaptive quadrature recovers previous estimates and standard errors whereas ordinary quadrature fails. The adaptive quadrature method can also be used for the more general class of multilevel factor and structural equation models (Rabe-Hesketh et al., 2004) since they have the same conditional independence structure as random coefcient models: variables (at level 1) are conditionally independent given the factors which in turn are conditionally independent given higher level factors, etc. The marginal likelihood has the same form as that of random coefcient models, the only difference being the form of the linear predictor Z: Factor models are useful for generating exible covariance structures using only a small number of latent variables, see for example Rabe-Hesketh and Skrondal (2001). They are also useful for inducing dependence between multiple processes as required for selection and endogenous treatment models and their multilevel extensions (e.g. Skrondal and Rabe-Hesketh, 2004, Chapter 14). Maximum likelihood estimation and empirical Bayes prediction for all of these models using adaptive quadrature is implemented in gllamm (Rabe-Hesketh et al., 2000, 2001a,b, 2002) which runs in Stata (StataCorp, 2003). The program can also handle discrete random effects including nonparametric maximum likelihood (Heckman and Singer, 1984; Rabe-Hesketh et al., 2003) and is available from http://www.gllamm.org.

References
Aitchison, J., Silvey, S., 1957. The generalization of probit analysis to the case of multiple responses. Biometrika 44, 131140. Albert, P.S., Follmann, D.A., 2000. Modeling repeated count data subject to informative dropout. Biometrics 56, 667677. Antweiler, W., 2001. Nested random effects estimation in unbalanced panel data. Journal of Econometrics 101, 295313. Baltagi, B.H., 2001. Econometric Analysis of Panel Data, 2nd Edition. Wiley, London.

ARTICLE IN PRESS
322 S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323

Baltagi, B.H., Song, S., Jung, B., 2001. The unbalanced nested error component regression model. Journal of Econometrics 101, 357381. Beggs, S., Cardell, S., Hausman, J., 1981. Assessing the potential demand for electric cars. Journal of Econometrics 16, 119. Beron, K., Murdoch, J., Thayer, M., 1999. Hierarchical linear models with application to air pollution in the South Coast air basin. American Journal of Agricultural Economics 81, 11231127. Blundell, R., Windmeijer, F., 1997. Cluster effects and simultaneity in multilevel models. Health Economics 1, 613. Bock, R.D., Aitkin, M., 1981. Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika 46, 443459. Bock, R.D., Lieberman, M., 1970. Fitting a response model for n dichotomously scored items. Psychometrika 33, 179197. Bock, R.D., Schilling, S., 1997. High-dimensional full-information item factor analysis. In: Berkane, M. (Ed.), Latent Variable Modelling and Applications to Causality. Springer, New York, NY, pp. 164176. Borjas, G.J., Sueyoshi, G.T., 1994. A two-stage estimator for probit models with structural group effects. Journal of Econometrics 64, 165182. Breslow, N.E., Clayton, D.G., 1993. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association 88, 925. Butler, J.S., Moftt, R., 1982. A computationally efcient quadrature procedure for the one-factor multinomial probit model. Econometrica 50, 761764. Cameron, A.C., Trivedi, P.K., 1998. Regression Analysis of Count Data. Cambridge University Press, Cambridge. Cardoso, A.R., 2000. Wage differentials across rms: an application of multilevel modelling. Journal of Applied Econometrics 15, 343354. Carey, K.A., 2000. Multilevel modelling approach to analysis of patient costs under managed care. Health Economics 9, 435446. Carlin, B.P., Louis, T.A., 2000. Bayes and Empirical Bayes Methods for Data Analysis, 2nd Edition. Chapman & Hall/CRC, Boca Raton, FL. Clayton, D., 1988. The analysis of event history data: a review of progress and outstanding problems. Statistics in Medicine 7, 819841. Cools, R., 1999. Monomial cubature rules since Stroud: a compilationpart 2. Journal of Computational and Applied Mathematics 112, 2127. Cools, R., Rabinowitz, P., 1993. Monomial cubature rules since Stroud: a compilation. Journal of Computational and Applied Mathematics 48, 309326. Davis, P., 2002. Estimating multi-way error components models with unbalanced data structures. Journal of Econometrics 106, 6795. Dunnett, C.W., Sobel, M., 1955. Approximations to the probability integral and certain percentage points of a multivariate analogue of Students t-distribution. Biometrika 42, 258260. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2003. Bayesian Data Analysis, 2nd Edition. Chapman and Hall/CRC, Boca Raton, FL. Goldstein, H., 1991. Nonlinear multilevel models with an application to discrete response data. Biometrika 78, 4551. Hajivassiliou, V.A., Ruud, P.A., 1994. Classical estimation methods for LDV models using simulation. In: Engle, R.F., McFadden, D.L. (Eds.), Handbook of Econometrics, Vol. IV. Elsevier, New York, NY, pp. 23832441. Hausman, J.A., Ruud, P.A., 1987. Specifying and testing econometric models for rank-ordered data. Journal of Econometrics 34, 83103. Heckman, J.J., Singer, B., 1984. A method of minimizing the impact of distributional assumptions in econometric models for duration data. Econometrica 52, 271320. Holford, T.R., 1980. The analysis of rates and survivorship using log-linear models. Biometrics 36, 299305. Hsiao, C., 2003. Analysis of Panel Data, 2nd Edition. Cambridge University Press, Cambridge. Keane, M.P., 1992. A note on identication in the multinomial probit model. Journal of Business and Economic Statistics 10, 193200.

ARTICLE IN PRESS
S. Rabe-Hesketh et al. / Journal of Econometrics 128 (2005) 301323 323

Lee, L.-F., 2000. A numerically stable quadrature procedure for the one-factor random-component discrete choice model. Journal of Econometrics 95, 117129. Lesaffre, E., Spiessens, B., 2001. On the effect of the number of quadrature points in a logistic randomeffects model: an example. Applied Statistics 50, 325335. Lillard, L.A., 1993. Simultaneous-equations for hazardsmarriage duration and fertility timing. Journal of Econometrics 56, 189217. Liu, Q., Pierce, D.A., 1994. A note on GaussHermite quadrature. Biometrika 81, 624629. Naylor, J.C., Smith, A.F.M., 1982. Applications of a method for the efcient computation of posterior distributions. Applied Statistics 31, 214225. Naylor, J.C., Smith, A.F.M., 1988. Econometric illustrations of novel numerical integration strategies for Bayesian inference. Journal of Econometrics 38, 103125. Pinheiro, J.C., Bates, D.M., 1995. Approximations to the log-likelihood function in the nonlinear mixedeffects model. Journal of Computational Graphics and Statistics 4, 1235. Rabe-Hesketh, S., Skrondal, A., 2001. Parameterization of multivariate random effects models for categorical data. Biometrics 57, 12561264. Rabe-Hesketh, S., Pickles, A., Taylor, C., 2000. sg129: Generalized linear latent and mixed models. Stata Technical Bulletin 53, 4757. Rabe-Hesketh, S., Pickles, A., Skrondal, A., 2001a. GLLAMM: A general class of multilevel models and a Stata program. Multilevel Modelling Newsletter 13, 1723. Rabe-Hesketh, S., Pickles, A., Skrondal, A., 2001b. GLLAMM Manual. Technical Report 2001/01. Department of Biostatistics and Computing, Institute of Psychiatry, Kings College, University of London. Downloadable from http://www.gllamm.org. Rabe-Hesketh, S., Yang, S., Pickles, A., 2001c. Multilevel models for censored and latent responses. Statistical Methods in Medical Research 10, 409427. Rabe-Hesketh, S., Skrondal, A., Pickles, A., 2002. Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal 2, 121. Rabe-Hesketh, S., Pickles, A., Skrondal, A., 2003. Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation. Statistical Modelling 3, 215232. Rabe-Hesketh, S., Skrondal, A., Pickles, A., 2004. Generalized multilevel structural equation modeling. Psychometrika 69, 167190. Raudenbush, S.W., Yang, M.L., Yosef, M., 2000. Maximum likelihood for generalized linear models with nested random effects via high-order, multivariate Laplace approximation. Journal of Computational and Graphical Statistics 9, 141157. Rice, N., Jones, A., 1997. Multilevel models and health economics. Health Economics 6, 561575. Rodriguez, G., Goldman, N., 1995. An assessment of estimation procedures for multilevel models with binary responses. Journal of the Royal Statistical Society, A 158, 7389. Rodriguez, G., Goldman, N., 2001. Improved estimation procedures for multilevel models with binary response: a case study. Journal of the Royal Statistical Society, A 164, 339355. Rosett, R.N., Nelson, F.D., 1975. Estimation of a two-limit probit regression model. Econometrica 43, 141146. Skrondal, A., Rabe-Hesketh, S., 2003. Multilevel logistic regression for polytomous data and rankings. Psychometrika 68, 267287. Skrondal, A., Rabe-Hesketh, S., 2004. Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Chapman & Hall/CRC, Boca Raton, FL. StataCorp., 2003. Stata Statistical Software: Release 8. Stata Press, College Station, TX. Stewart, M.B., 1983. On least-squares estimation when the dependent variable is grouped. Review of Economic Studies 50, 737753. Stroud, A.H., 1971. Approximate Calculation of Multiple Integrals. Prentice-Hall, Englewood Cliffs, NJ. Stroud, A.H., Secrest, D., 1966. Gaussian Quadrature Formulas. Prentice-Hall, Englewood Cliffs, NJ. Tierney, L., Kadane, J.B., 1986. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association 81, 8286. Tobin, J., 1958. Estimation of relationships for limited dependent variables. Econometrica 26, 2436.

You might also like