Analysis of Binary Panel Data by Static and Dynamic Logit Models

Analysis of binary panel data by
static and dynamic logit models
Francesco Bartolucci
University of Perugia
[email protected]
Preliminaries
• Longitudinal (or panel) data consist of repeated observations on the
some subjects at different occasions
• Data of this type are commonly used in many fields, especially in

economics (e.g. analysis of labor market, analysis of the customer
behavior) and in medicine (e.g. study of aging, efficacy of a drug)
• Many longitudinal datasets are now available:

National Longitudinal Surveys of Labor Market Experience (NLS)
Panel Study of Income Dynamics (PSID)
European Community Household Panel (ECHP)
The Netherlands Socio-Economic Panel (SEP)
German Social Economic Panel (GSOEP)
British Household Panel Survey (BHPS)
2
• With respect to cross-sectional data, longitudinal data have the
advantage of allowing one to study (or to take into account in a natural
way):
unobserved heterogeneity
dynamic relationships
causal effects
• Longitudinal studies suffer from attrition
• We will study, in particular, models for the analysis of binary response

variables
3
Basic notation
• There are n subjects (or individuals) in the sample, with:
Ti : number of occasions at which subject i is observed
y it : response variable (binary or categorical) for subject i at
occasion t
x it : vector of covariates for subject i at occasion t
• The dataset is said balanced if all subjects are observed at the same
occasions (T1 = L = Tn ); otherwise, it is said unbalanced
• Usually, the dataset is unbalanced because of attrition; particular care

is needed in this case, especially when the non-responses are not
ignorable
• For simplicity, we will usually refer to the balanced case and we will
denote by T the number of occasions (common to all subjects)
4
Example (similar to Hyslop, 1999)
• We consider a sample of n = 1908 women, aged 19 to 59 in 1980, who
were followed from 1979 to 1985 (source PSID)
• Response variable: y it equal to 1 if woman i has a job position during

year t and to 0 otherwise
• Covariates: age in 1980 (time-constant)

race (dummy equal to 1 for a black; time-constant)
educational level (number of year of schooling; time-constant)
number of children aged 0 to 2 (time-varying), aged 3 to 5 (time-
varying) and aged 6 to 17 (time-varying)
permanent income (average income of the husband from 1980 to
1985; time-constant)
temporary income (difference between income of the husband in a
year and permanent income; time-varying)
5
Homogeneous static logit and probit models
• These are simple models for the probabilities
π (x it ) = p( yit = 1 | x it )
• These probabilities are modeled so that they always belong to [0,1];

this is obtained by a link function of type logit or probit:
π (x it )
logit: log = x it ' β
1 − π (x it )
probit: Φ −1[π (x it )] = x it ' β
Φ −1 (⋅) : inverse of the distribution function of the standard

normal distribution
6
• The inverse link function is:
exp(x it ' β)
logit: π (x it ) =
1 + exp(x it ' β)
probit: π (x it ) = Φ (x it ' β)
Φ (⋅) : distribution function of the standard normal distribution
• Other basic assumptions of the models:

Independence between the response variables given the
covariates (static models)
The heterogeneity between subjects is only explained on the

basis of observable covariates and then unobserved
heterogeneity is ruled out (homogeneous models)
7
1
0,9
0,8
0,7
Probability
0,6
Logit
0,5
Probit
0,4
0,3
0,2
0,1
0
-7,5 -5 -2,5 0 2,5 5 7,5
Linear Predictor
8
Threshold model
• Logit and probit models may be interpreted on the basis of an
underlying linear model for the propensity to experience a certain
situation:
y it* = x it ' β + ε it
ε it : error term with standard normal or logistic distribution
• The situation is experienced ( y it = 1) only if y it* ≥ 0 (threshold), i.e.

1 if y it* ≥ 0
y it = 1( y it* ≥ 0) =  *
0 if y it < 0
• Since the distribution of ε it is symmetric, we have that
p ( y it = 1 | x it ) = p ( y it* ≥ 0 | x it ) = p (x it ' β ≥ −ε it | x it ) = p (ε it ≤ x it ' β | x it )
corresponding to the logistic or standard normal distr. function
9
Model estimation
• The most used method to fit logit and probit models is the maximum
likelihood method, which is based on the maximization of the log-
likelihood:
L(β) = ∑ ∑ y it log[π (x it )] + (1 − y it ) log[1 − π (x it )]
i t
• Maximization of L(β) can be performed by the Newton-Raphson

algorithm. Starting from an initial estimate β (0) , the algorithm consists
of updating the estimate at step h as
β ( h)
=β ( h −1)
(
+Jβ ) s(β (h−1) )
( h −1) −1
∂L(β)
s(β) = : score vector
∂β
∂ 2 L(β)
J (β) = − : observed information matrix
∂β∂β'
10
• An alternative algorithm is the Fisher-scoring which uses the
expected information matrix
 ∂ 2 L(β) 
I (β) = − E  

 ∂ β ∂β ' 
instead of the observed information matrix
• Standard errors for each element of β̂ is computed as the square root

of the corresponding diagonal element of I (β) −1
• For the logit model we have
L(β) = ∑ ∑ y it x it ' β − log[1 + exp(x it ' β)]

i t
s(β) = ∑ ∑ [ y it − π (x it )]x it ,
i t
J (β) = I(β) = ∑ ∑ π (x it )[1 − π (x it )]x it x it '
i t
11
Example
• Maximum likelihood estimates for the PSID dataset (logit model)
Parameter Estimate s.e. t-statistic p-value

Intercept -0.6329 0.3093 -2.0464 0.0407
Age 0.0923 0.0172 5.3750 0.0000
Age^2/100 -0.1694 0.0221 -7.6496 0.0000
Race 0.3161 0.0517 6.1188 0.0000
Education 0.3278 0.0152 21.5510 0.0000
Kids 0-2 -0.7810 0.0447 -17.4890 0.0000
Kids 3-5 -0.6450 0.0406 -15.8920 0.0000
Kids 6-17 -0.1400 0.0201 -6.9497 0.0000
Perm. inc. -0.0215 0.0014 -15.5820 0.0000
Temp. inc. -0.0070 0.0023 -2.9860 0.0028
12
• Maximum likelihood estimates for the PSID dataset (probit model)

Intercept -0.3770 0.1843 -2.0451 0.0408
Age 0.0548 0.0103 5.3308 0.0000
Age^2/100 -0.1009 0.0133 -7.5893 0.0000
Race 0.1990 0.0304 6.5362 0.0000
Education 0.1921 0.0089 21.6380 0.0000
Kids 0-2 -0.4666 0.0266 -17.5360 0.0000
Kids 3-5 -0.3871 0.0242 -15.9790 0.0000
Kids 6-17 -0.0846 0.0120 -7.0353 0.0000
Perm. inc. -0.0115 0.0008 -14.3010 0.0000
Temp. inc. -0.0027 0.0013 -2.0524 0.0401
• By a general rule the estimate of β under the logit model is approx.

equal to 1.6 times the estimate of β under the probit model
13
Heterogeneous static logit and probit models
• A method to incorporate unobserved heterogeneity in a logit or probit
model is to include a set of subject-specific parameters α i and then
assuming that
exp(α i + x it ' β)
π (α i , x it ) = or π (α i , xit ) = Φ (α i + xit ' β)
1 + exp(α i + x it ' β)
π (α i , x it ) = p ( yit = 1 | α i , x it ) : conditional probability of success given
α i and xit
• The parameters α i may be treated as fixed or random:

fixed: the response variables yit are still assumed independent
random: the response variables yit are assumed conditionally
independent given α i
14
• The most used estimation methods of the model are:
joint maximum likelihood (fixed-parameters)
conditional maximum likelihood (only for the logit model)
marginal maximum likelihood (random-parameters)
15
Joint maximum likelihood (JML) method
• It consists of maximizing the log-likelihood
L(α, β) = ∑ ∑ y it log[π (α i , x it )] + (1 − y it ) log[1 − π (α i , x it )]
i t
with respect to (jointly) α = (α 1 , K , α n )' and β
• The method is simple to implement for both logit and probit models
• It is usually based on an iterative algorithm which alternates Newton-

Raphson (or Fisher scoring) steps for updating the estimate of each α i
with Newton-Raphson (or Fisher scoring) steps for updating the
estimate of β
16
• The JML estimator:
does not exist (for α i ) when yi + = 0 or y i + = T , with y i + = ∑t y it
is not consistent with T fixed as n grows to infinity and so a JML

estimate is not reliable for small T even if n is very large; this is
because the number of parameters increases with n (incidental
parameters problem; Neyman and Scott, 1948)
• For the heterogeneous logit model we must solve the equations:
∂L(β)
= ∑ [ y it − π (α i , x it )] = 0 , i = 1, K , n
∂α i t
∂L(β)
= ∑ ∑ [ y it − π (α i , x it )]x it = 0
∂β i t
17
Conditional maximum likelihood (CML) method
• This estimation method may be used only for the logit model
• For the logit model we have that, for i = 1, K , n , y i + is a sufficient

statistic for the subject specific-parameter α i and, consequently, we
can construct a conditional likelihood which does not depend on these
parameters but only on β
• The conditional log-likelihood may be expressed as

Lc (β) = ∑ log[ p (y i | X i , y i + )], y i = ( y i1 , K , y iT )'
i
• From the maximization of Lc (β) we obtain the CML estimator of β , β̂ c ,

which is consistent for fixed T as n grows to infinity; this maximization
may be performed on the basis of a Newton-Raphson algorithm which
also produces standard errors for β̂ c
18
• An important drawback, common to all fixed-parameters approaches,
is that the regression parameters for the time-constant covariates are
not estimable
• The probability of the response configuration y i may be expressed as

exp[ y it (α i + x it ' β)] exp( y i + α i + ∑t y it x it ' β)
p(y i | X i ) = ∏ =
t 1 + exp(α i + x it ' β) ∏t [1 + exp(α i + x it ' β)]
• The probability of the sum of the responses y i + is then equal to
exp( y i + α i )
p( yi + | X i ) = ∑ exp(∑t z t x it ' β)
∏t [1 + exp(α i + x it ' β)] z ( yi + )
∑z ( yi + ) is extended to all the response configurations
z = ( z1 , K , zT )' with sum z + = y i +
19
• The conditional probability of the response configuration y i given y i +
is then
exp(∑t y it x it ' β)
p(y i | X i , yi + ) =
∑z ( y ) exp(∑t z t x it ' β)
i+
which is equal to 1 for y i + = 0 or y i + = T regardless of the value of β

• The conditional log-likelihood is equal to
{ [
Lc (β) = ∑i1(0 < yi + < T ) ∑t yit xit ' β − log ∑z ( y
i+ )
exp(∑t zt xit ' β) ]}
• Score and observed information matrix, to be used within the Newton-
Raphson algorithm and to compute the standard errors for β̂ c :
∂Lc (β)
s c (β) =
∂β
[
= ∑i 1(0 < y i + < T ) X i ' y i − E β (y i | y i + ) ]
∂L2c (β)
J c (β) = − = ∑i 1(0 < y i + < T ) X i ' Vβ (y i | y i + ) X i
∂β∂β'
20
• JML and CML estimates for the PSID dataset (logit model)
JML CML t- p-
Parameter s.e.
estimate estimate statistic value
Intercept - - - - -
Age - - - - -
Age^2/100 - - - - -
Race - - - - -
Education - - - - -
Kids 0-2 -1.3660 -1.1537 0.0899 -12.8290 0.0000
Kids 3-5 -0.9912 -0.8373 0.0840 -9.9638 0.0000
Kids 6-17 -0.2096 -0.1764 0.0637 -2.7691 0.0056
Perm. inc. - - - - -
Temp. inc. -0.0162 -0.0136 0.0033 -4.1186 0.0000
(-) not estimable
21
Marginal maximum likelihood (MML)
• This estimation method may be used for both logit and probit models
• It is based on the assumption that the subject-specific parameters α i

are random parameters with the same distribution f (α i ) which is
independent of X i
• It is also assumed that the response variables y i1 , K , y iT are

conditionally independent given α i , so that
p (y i | X i ) = ∫ p (y i | α i , X i ) f (α i )dα i , p (y i | α i , X i ) = ∏ p( y it | α i , x it ) ,
t
where the integral must usually be computed by a numerical method

(e.g. quadrature)
22
• The marginal log-likelihood is then
Lm (β) = ∑ log[ p (y i | X i )],
i
which can be maximized, with respect to β and (possibly) the

parameters of the distribution of the random effects, by a Newton-
Raphson algorithm
23
Logit model with normal random effects
• Under the assumption α i ∼ N ( µ , σ 2 ) , for the logit model we have
p (y i | X i ) = ∫ p (y i | w, X i )φ ( w)dw
exp[ y it ( µ + wσ + x it ' β)] exp{ y it [z it ( w)' γ ]}

p (y i | w, X i ) = ∏ =∏
t 1 + exp( µ + wσ + x it ' β) t 1 + exp[ z it ( w)' γ ]
φ (w) : density function of the standard normal distribution

z it ( w) = (1 w x it ' )', γ = (µ σ β' )'
• The score vector and the (empirical) information matrix are given by
∂Lm ( γ ) 1 ∂p (y i | w, X i )
s m (γ) = = ∑ s m,i ( γ ) , s m,i ( γ ) = ∫ φ ( w)dw
∂γ i p(y i | X i ) ∂γ
~ 1
J m ( γ ) = ∑ s m,i ( γ )s m,i ( γ )' − s m ( γ )s m ( γ )'
i n
24
Pros and cons of MML
• The MML method is more complicate to implement than fixed-effects
methods (JML, CML), but it allows us to estimate the regression
parameters for both time-fixed and time-varying covariates
• The MML also allows us to predict future outcomes
• Special care has to be used for the specification of the distribution of

the random effects. It may be restrictive to assume:
a specific parametric function for these effects, such as the normal
distribution
that the distribution does not depend on the covariates
25
• The approach may be extended to overcome these drawbacks:
a discrete distribution with free support points and mass probabilities
may be used for the random effects; the approach is in this case of
latent class type and requires the implementation of an EM algorithm
(Dempster et al., 1977) and the choice of the number of support
points
the parameters of the distribution of the random effects are allowed
to depend on the covariates; one possibility is the correlated effect
model of Chamberlain (1984)
26
• JML, CML and MML-normal estimates for the PSID dataset (logit
model); MML algorithm uses 51 quadrature points from –5 to 5
JML CML MML t-
Parameter s.e. p-value
estimate estimate estimate statistic
Intercept - - -2.9448 1.3461 -2.1876 0.0287
Std.dev
3.2196 0.1066 30.2090 0.0000
(σ )
Age - - 0.2652 0.0712 3.7243 0.0002
Age^2/100 - - -0.4285 0.0906 -4.7271 0.0000
Race - - 0.6800 0.2162 3.1449 0.0017
Education - - 0.6737 0.0643 10.4810 0.0000
Kids 0-2 -1.3660 -1.1537 -1.3418 0.0773 -17.3490 0.0000
Kids 3-5 -0.9912 -0.8373 -1.0260 0.0635 -16.1680 0.0000
Kids 6-17 -0.2096 -0.1764 -0.2533 0.0438 -5.7775 0.0000
Perm. inc. - - -0.0427 0.0036 -11.9610 0.0000
Temp. inc. -0.0162 -0.0136 -0.0110 0.0023 -4.7554 0.0000
27
Summary of the models fit
• Estimates for the PSID dataset (logit model):
Log- n.
Method AIC BIC
likelihood parameters
Homogenous -7507.3 10 15034.6 15090.1
Heterogeneous-JML -2986.3 1912 9796.6 20415.5
Heterogeneous-CML* -2128.5 4 4265.0 4287.2
Heterogeneous-MML-
-5264.4 11 10550.8 10611.9
normal
(*) not directly comparable with the others
• AIC: Akaike Information Criterion (Akaike, 1973)

AIC = −2(max . log − likelihood) + 2(n. parameters)
• BIC: Bayesian Information Criterion (Schwarz, 1978)
BIC = −2(max . log − likelihood) + log(n)(n. parameters)
28
Dynamic models
• Previous models are static: they do not include the lagged response
variable among the regressors
• The dynamic version of these models is based on the assumption that,
given yi ,t −1 and α i , every y it is conditionally independent of
y i1 , K , y i ,t − 2 and that
exp(α i + x it ' β + y i ,t −1γ )
π (α i , x it , y i ,t −1 ) =
1 + exp(α i + x it ' β + y i ,t −1γ )
or
π (α i , x it , y i,t −1 ) = Φ (α i + x it ' β + y i,t −1γ )
π (α i , x it , y i ,t −1 ) = p( y it = 1 | α i , x it , y i ,t −1 ) : conditional probability of
success
29
• The initial observation yi 0 must be known. When the parameters α i
are random, the initial condition problem arises. The simplest
approach, which however can lead to an biased estimator of β and γ ,
is to treat yi 0 as an exogenous covariate
• Dynamic models have the great advantage of allowing us to

distinguish between:
true state dependence (Heckman, 1981): effect that experimenting a
certain situation in the present has on the propensity of
experimenting the same situation in the future
spurious state dependence: propensity common to all occasions
which is measured by α i and the time-constant covariates
30
Estimation of dynamic models
• The subject-specific parameters α i may be considered as fixed or
random
• With fixed parameters α i , the conditional probability of a response

configuration y i given yi 0 is:
p (y i | α i , X i , y i 0 ) = ∏ p ( y it | α i , x it , y i ,t −1 )
t
• The random-parameters approach requires to formulate a distribution

for the parameters α i , so that
p (y i | X i , y i 0 ) = ∫ p (y i | α i , X i , y i 0 ) f (α i )dα i ,
p (y i | α i , X i , y i 0 ) = ∏ p ( y it | α i , x it , y i ,t −1 )
t
31
• The most used estimation methods for dynamic models are the same
as for static models:
joint maximum likelihood (fixed-parameters)
conditional maximum likelihood (only for the logit model)
marginal maximum likelihood (random-parameters)
32
Joint maximum likelihood (JML) method
• The log-likelihood has again a simple form:
L(α, β, γ ) = ∑ ∑ y it log[π (α i , x it , y i ,t −1 )] + (1 − y it ) log[1 − π (α i , x it , y i ,t −1 )]
i t
and must be jointly maximized with respect to α , β and γ
• Maximizing the log-likelihood may be performed by using a Newton-

Raphson (or Fisher scoring) algorithm which alternates a step in
which the estimate of each parameter α i is updated with a step in
which the estimates of β and γ are updated
• The algorithm is essentially the same as that used for static models,
but with yi ,t −1 included among the covariates x it
• The JML estimator has the same drawbacks it has for static models:
it does not exist (for α i ) when y i + = 0 or y i + = T , with y i + = ∑t y it
it is not consistent with T fixed as n grows to infinity
33
• JML estimates for the PSID dataset (static and dynamic logit models)
Parameter Static logit Dynamic logit
Kids 0-2 -1.3660 -1.2688
Kids 3-5 -0.9912 -0.8227
Kids 6-17 -0.2096 -0.1730
Temp. inc. -0.0162 -0.0112
Lagged response - 0.5696
• A positive state dependence is observed and the fit of the logit model
improves considerably by including the lagged response variable
Log- n.
Model AIC BIC
Static logit -2986.3 1912 9796.6 20415.5
Dynamic logit -2317.9 1913 8461.8 19086.2
34
Conditional maximum likelihood (CML) method
• The CML method may be used to estimate the dynamic logit model
only in particular circumstances
• Under these circumstances, the method is difficult to implement since

the sum of the response variables y i + is not a sufficient statistic for the
subject specific-parameter α i
• The CML approach may be used when T = 3 and there are no

covariates, so that
exp( y i +α i + y i*γ )
p(y i | α i , y i 0 ) = , y i* = ∑ y i ,t −1 yit
∏t [1 + exp( y it α i + y i,t −1γ )] t
35
• The response configurations y i = (0 1 yi 3 )' and y i = (1 0 yi 3 )' have
conditional probability
exp[(1 + y i 3 )α i + y i 3γ ]
p[(0 1 y i 3 )' | α i , y i 0 ] =
[1 + exp(α i + y i 0 γ )][1 + exp(α i )][1 + exp(α i + γ )]
exp[(1 + y i 3 )α i + y i 0 γ ]
p[(1 0 y i 3 )' | α i , y i 0 ] =
[1 + exp(α i + y i 0 γ )][1 + exp(α i + γ )][1 + exp(γ )]
• We can then condition on y i 0 , y i1 + y i 2 = 1, y i 3 obtaining the conditional

probabilities
exp( yi 3γ ) 1
p[(0 1 yi 3 ) | α i , yi 0 , yi1 + yi 2 = 1, yi 3 ] = =
exp( yi 3γ ) + exp( yi 0γ ) 1 + exp[( yi 0 − yi 3 )γ ]
exp( y i 0 γ ) exp[( y i 0 − y i 3 )γ ]
p[(1 0 y i 3 ) | α i , y i 0 , y i1 + y i 2 = 1, y i 3 ] = =
exp( y i 3γ ) + exp( y i 0 γ ) 1 + exp[( y i 0 − y i 3 )γ ]
36
• The corresponding conditional log-likelihood is
Lc (γ ) = ∑ d i ( y i1 ( y i 0 − y i 3 )γ − log{1 + exp[( y i 0 − y i 3 )γ ]})

i
di = 1( yi1 + yi 2 = 1) ,
which may be maximized by a simple Newton-Raphson algorithm; it

results a consistent estimator of γ (Chamberlain, 1993)
• The conditional approach may also be implemented for T > 3 on the

basis of the pairwise conditional log-likelihood
L pc (γ ) = ∑ ∑ 1( yis + yit = 1)( yis ( yi , s −1 − yi ,t +1 )γ − log{1 + exp[( yi , s −1 − yi ,t +1 )γ ]})
i s <t <T
the resulting estimator has the same properties it has for T = 3 and, in
particular, it is consistent for T fixed as n grows to infinity
37
• The conditional approach may also be used in the presence of
covariates, provided that:
the probability that each discrete covariate is time-constant is
positive (this rules out the possibility of time dummies)
the support of the distribution of the continuous covariates satisfies
suitable conditions
• The algorithm to be implemented in this case is rather complicate and

leads to a consistent estimator of β and γ which, however, is not n -
consistent (Honoré and Kyriazidou, 2000)
• The CML approach has the advantage, over the MML approach, of not
requiring to formulate the distribution of the subject-specific
parameters. It also does not suffer from the initial condition problem
and yi 0 may be treated as an exogenous covariate
38
Marginal maximum likelihood (MML) method
• This estimation method may be used for both dynamic logit and probit
models
• The algorithm is essentially the same as that for static models, but we
have to use an extended vector of covariates which includes the
lagged response variable
• For the dynamic logit model with normal random effects we have to
maximize
L (~
m γ ) = ∑ p ( y | X , y ) , p (y | X , y ) = p (y | w, X , y )φ ( w)dw ,
i i i0 i i i0 ∫ i i i0
i
exp[ yit ( µ + wσ + xit ' β + yi ,t −1γ )]
p (y i | w, Xi , yi 0 ) = ∏
t 1 + exp( µ + wσ + xit ' β + yi ,t −1γ )
39
• MML-normal estimates for the PSID dataset (static and dynamic logit
models)
Static Dynamic t- p-
Parameter s.e.
logit logit statistic value
Intercept -2.9448 -2.3313 0.6609 -3.5275 0.0004
Std.dev (σ ) 3.2196 1.1352 0.0930 12.2060 0.0000
Age 0.2652 0.1037 0.0360 2.8820 0.0040
Age^2/100 -0.4285 -0.1813 0.0464 -3.9096 0.0001
Race 0.6800 0.3011 0.1054 2.8573 0.0043
Education 0.6737 0.3034 0.0332 9.1456 0.0000
Kids 0-2 -1.3418 -0.8832 0.0825 -10.7010 0.0000
Kids 3-5 -1.0260 -0.4390 0.0736 -5.9629 0.0000
Kids 6-17 -0.2533 -0.0819 0.0393 -2.0831 0.0372
Perm. inc. -0.0427 -0.0189 0.0019 -10.1030 0.0000
Temp. inc. -0.0110 -0.0036 0.0030 -1.1783 0.2387
Lagged
- 2.7974 0.0653 42.8420 0.0000
response
40
• For the above example, a much stronger state dependence effect is
observed with the MML method with respect to the JML method
(γˆ = 2.7974 vs. γˆ = 0.5696 )
• The suspect is that with the MML method the parameter γ is

overestimated and this is because the assumptions on the distribution
of the parameters α i are restrictive
• A simple way to give more flexibility to the approach is to allow the

mean of the normal distribution assumed on the parameters α i to
depend (through a linear regression model) on the initial observation
yi 0 and the corresponding time-varying covariates x i 0
41
• MML-normal estimates for the PSID dataset (dynamic and extended
dynamic logit models)
Dynamic Exteded dynamic t- p-
Parameter s.e.
logit logit statistic value
Intercept -2.3313 -3.4484 0.8942 -3.8566 0.0001
Std.dev (σ ) 1.1352 1.6473 0.0900 18.2930 0.0000
Age 0.1037 0.1103 0.0502 2.1970 0.0280
Age^2/100 -0.1813 -0.1902 0.0647 -2.9410 0.0033
Race 0.3011 0.2744 0.1374 1.9971 0.0458
Education 0.3034 0.2864 0.0419 6.8412 0.0000
Kids 0-2 -0.8832 -1.0498 0.0917 -11.4470 0.0000
Kids 3-5 -0.4390 -0.5865 0.0871 -6.7369 0.0000
Kids 6-17 -0.0819 -0.1213 0.0624 -1.9426 0.0521
Perm. inc. -0.0189 -0.0164 0.0031 -5.3094 0.0000
Temp. inc. -0.0036 -0.0049 0.0032 -1.5133 0.1302
Lagged
2.7974 1.8165 0.0824 22.0550 0.0000
response
42
• The estimate for the state dependence effect seems now more reliable
(γˆ = 1.8164 vs. γˆ = 2.7974 ) even if it is strongly positive
• Estimates of the parameters for the mean of the distribution for α i

Kids 0-2 0.2669 0.1284 2.0787 0.0376
Kids 3-5 0.2424 0.1221 1.9864 0.0470
Kids 6-17 0.1299 0.0680 1.9102 0.0561
Temp. inc. 0.0116 0.0058 2.0180 0.0436
Initial observation ( yi 0 ) 2.5915 0.1586 16.3450 0.0000
Log- n.
Model AIC BIC
JML -2317.9 1913 8461.8 19086.2
MML -4188.1 12 8400.2 8466.8
MML extended -3976.2 17 7986.4 8080.8
43
References
Akaike, H. (1973), Information theory and an extension of the maximum likelihood
principle, Second International symposium on information theory, Petrov, B. N. and
Csaki F. (eds), pp. 267-281.
Bartolucci, F. (2006), Likelihood inference for a class of latent Markov models under
linear hypotheses on the transition probabilities, Journal of the Royal Statistical
Society, series B, 68, pp. 155-178.
Chamberlain, G. (1984), Panel data, in Handbook of Econometrics, vol. 2, Z. Griliches
and M.D. Intriligator (eds.), Elsevier Science, Amsterdam, pp. 1247-1318.
Chamberlain, G. (1993), Feedback in Panel Data Models, Mimeo, Department of
Economics, Harvad University.
Dempster A. P., Laird, N. M. and Rubin, D. B. (1977), Maximum likelihood from
incomplete data via the EM Algorithm (with discussion), Journal of the Royal
Statistical Society, Series B, 39, pp. 1-38.
Elliot, D. S., Huizinga, D. and Menard, S. (1989), Multiple Problem Youth: Delinquency,
Substance Use, and Mental Health Problems, Springer-Verlag, New York.
Frees, E. W. (2004), Longitudinal and Panel Data: Analysis and Applications in the
Social Sciences, Cambridge University Press.
44
Heckman, J. J. (1981a), Heterogeneity and state dependence, in Structural Analysis of
Discrete Data, McFadden D. L. and Manski C. A, Cambridge, MA, MIT Press, pp. 91-
139.
Heckman, J. J. (1981b), The incidental parameter problem and the problem of initial
conditions in estimating a discrete time-discrete data stochastic process, in Structural
Analysis of Discrete Data with Econometric Applications, C.F. Manski and D.
McFadden (eds.), MIT Press, Cambridge, USA, 179-195.
Hsiao, C. (2005), Analysis of Panel Data, 2nd edition, Cambridge University Press.
Honoré, B. E. and Kyriazidou, E. (2000), Panel data discrete choice models with lagged
dependent variables, Econometrica, 68, pp. 839-874.
Hyslop, D. R. (1999), State dependence, serial correlation and heterogeneity in
intertemporal labor force participation of married women, Econometrica, 67, pp.
1255-1294.
Manski, C.F. (1975), Maximum score estimation of the stochastic utility model of
choice, Journal of Econometrics, 3, pp. 205-228.
Neyman, J. and Scott, E. (1948), Consistent estimates based on partially consistent
observations, Econometrica, 16, pp. 1-32
Schwarz, G. (1978), Estimating the dimension of a model, Annals of Statistics, 6, pp.
461-464.
45

Analysis of Binary Panel Data by Static and Dynamic Logit Models

Uploaded by

Copyright:

Available Formats

Analysis of Binary Panel Data by Static and Dynamic Logit Models

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis of Binary Panel Data by Static and Dynamic Logit Models

Uploaded by

Copyright:

Available Formats

Analysis of binary panel data by

static and dynamic logit models

• Data of this type are commonly used in many fields, especially in

• Many longitudinal datasets are now available:

• Longitudinal studies suffer from attrition

• We will study, in particular, models for the analysis of binary response

• Usually, the dataset is unbalanced because of attrition; particular care

• Response variable: y it equal to 1 if woman i has a job position during

• Covariates: age in 1980 (time-constant)

• These are simple models for the probabilities

• These probabilities are modeled so that they always belong to [0,1];

probit: Φ −1[π (x it )] = x it ' β

Φ −1 (⋅) : inverse of the distribution function of the standard

Φ (⋅) : distribution function of the standard normal distribution

• Other basic assumptions of the models:

The heterogeneity between subjects is only explained on the

• The situation is experienced ( y it = 1) only if y it* ≥ 0 (threshold), i.e.

• Since the distribution of ε it is symmetric, we have that

p ( y it = 1 | x it ) = p ( y it* ≥ 0 | x it ) = p (x it ' β ≥ −ε it | x it ) = p (ε it ≤ x it ' β | x it )

corresponding to the logistic or standard normal distr. function

• Maximization of L(β) can be performed by the Newton-Raphson

• Standard errors for each element of β̂ is computed as the square root

• For the logit model we have

L(β) = ∑ ∑ y it x it ' β − log[1 + exp(x it ' β)]

Parameter Estimate s.e. t-statistic p-value

Parameter Estimate s.e. t-statistic p-value

• By a general rule the estimate of β under the logit model is approx.

• The parameters α i may be treated as fixed or random:

with respect to (jointly) α = (α 1 , K , α n )' and β

• It is usually based on an iterative algorithm which alternates Newton-

is not consistent with T fixed as n grows to infinity and so a JML

• For the heterogeneous logit model we must solve the equations:

• For the logit model we have that, for i = 1, K , n , y i + is a sufficient

• The conditional log-likelihood may be expressed as

• From the maximization of Lc (β) we obtain the CML estimator of β , β̂ c ,

• The probability of the response configuration y i may be expressed as

which is equal to 1 for y i + = 0 or y i + = T regardless of the value of β

(-) not estimable

• It is based on the assumption that the subject-specific parameters α i

• It is also assumed that the response variables y i1 , K , y iT are

where the integral must usually be computed by a numerical method

which can be maximized, with respect to β and (possibly) the

exp[ y it ( µ + wσ + x it ' β)] exp{ y it [z it ( w)' γ ]}

φ (w) : density function of the standard normal distribution

• The MML also allows us to predict future outcomes

• Special care has to be used for the specification of the distribution of

• AIC: Akaike Information Criterion (Akaike, 1973)

• Dynamic models have the great advantage of allowing us to

• With fixed parameters α i , the conditional probability of a response

• The random-parameters approach requires to formulate a distribution

and must be jointly maximized with respect to α , β and γ

• Maximizing the log-likelihood may be performed by using a Newton-

• Under these circumstances, the method is difficult to implement since

• The CML approach may be used when T = 3 and there are no

• We can then condition on y i 0 , y i1 + y i 2 = 1, y i 3 obtaining the conditional

Lc (γ ) = ∑ d i ( y i1 ( y i 0 − y i 3 )γ − log{1 + exp[( y i 0 − y i 3 )γ ]})

which may be maximized by a simple Newton-Raphson algorithm; it

• The conditional approach may also be implemented for T > 3 on the