Analysis of Binary Panel Data by Static and Dynamic Logit Models
Analysis of Binary Panel Data by Static and Dynamic Logit Models
Analysis of Binary Panel Data by Static and Dynamic Logit Models
Francesco Bartolucci
University of Perugia
[email protected]
Preliminaries
• Longitudinal (or panel) data consist of repeated observations on the
some subjects at different occasions
2
• With respect to cross-sectional data, longitudinal data have the
advantage of allowing one to study (or to take into account in a natural
way):
unobserved heterogeneity
dynamic relationships
causal effects
3
Basic notation
• There are n subjects (or individuals) in the sample, with:
Ti : number of occasions at which subject i is observed
y it : response variable (binary or categorical) for subject i at
occasion t
x it : vector of covariates for subject i at occasion t
• The dataset is said balanced if all subjects are observed at the same
occasions (T1 = L = Tn ); otherwise, it is said unbalanced
• For simplicity, we will usually refer to the balanced case and we will
denote by T the number of occasions (common to all subjects)
4
Example (similar to Hyslop, 1999)
• We consider a sample of n = 1908 women, aged 19 to 59 in 1980, who
were followed from 1979 to 1985 (source PSID)
5
Homogeneous static logit and probit models
π (x it ) = p( yit = 1 | x it )
6
• The inverse link function is:
exp(x it ' β)
logit: π (x it ) =
1 + exp(x it ' β)
probit: π (x it ) = Φ (x it ' β)
7
1
0,9
0,8
0,7
Probability
0,6
Logit
0,5
Probit
0,4
0,3
0,2
0,1
0
-7,5 -5 -2,5 0 2,5 5 7,5
Linear Predictor
8
Threshold model
• Logit and probit models may be interpreted on the basis of an
underlying linear model for the propensity to experience a certain
situation:
y it* = x it ' β + ε it
ε it : error term with standard normal or logistic distribution
9
Model estimation
• The most used method to fit logit and probit models is the maximum
likelihood method, which is based on the maximization of the log-
likelihood:
L(β) = ∑ ∑ y it log[π (x it )] + (1 − y it ) log[1 − π (x it )]
i t
β ( h)
=β ( h −1)
(
+Jβ ) s(β (h−1) )
( h −1) −1
∂L(β)
s(β) = : score vector
∂β
∂ 2 L(β)
J (β) = − : observed information matrix
∂β∂β'
10
• An alternative algorithm is the Fisher-scoring which uses the
expected information matrix
∂ 2 L(β)
I (β) = − E
∂ β ∂β '
instead of the observed information matrix
s(β) = ∑ ∑ [ y it − π (x it )]x it ,
i t
J (β) = I(β) = ∑ ∑ π (x it )[1 − π (x it )]x it x it '
i t
11
Example
• Maximum likelihood estimates for the PSID dataset (logit model)
12
• Maximum likelihood estimates for the PSID dataset (probit model)
14
• The most used estimation methods of the model are:
joint maximum likelihood (fixed-parameters)
conditional maximum likelihood (only for the logit model)
marginal maximum likelihood (random-parameters)
15
Joint maximum likelihood (JML) method
• It consists of maximizing the log-likelihood
L(α, β) = ∑ ∑ y it log[π (α i , x it )] + (1 − y it ) log[1 − π (α i , x it )]
i t
• The method is simple to implement for both logit and probit models
16
• The JML estimator:
does not exist (for α i ) when yi + = 0 or y i + = T , with y i + = ∑t y it
∂L(β)
= ∑ [ y it − π (α i , x it )] = 0 , i = 1, K , n
∂α i t
∂L(β)
= ∑ ∑ [ y it − π (α i , x it )]x it = 0
∂β i t
17
Conditional maximum likelihood (CML) method
• This estimation method may be used only for the logit model
18
• An important drawback, common to all fixed-parameters approaches,
is that the regression parameters for the time-constant covariates are
not estimable
19
• The conditional probability of the response configuration y i given y i +
is then
exp(∑t y it x it ' β)
p(y i | X i , yi + ) =
∑z ( y ) exp(∑t z t x it ' β)
i+
20
• JML and CML estimates for the PSID dataset (logit model)
JML CML t- p-
Parameter s.e.
estimate estimate statistic value
Intercept - - - - -
Age - - - - -
Age^2/100 - - - - -
Race - - - - -
Education - - - - -
Kids 0-2 -1.3660 -1.1537 0.0899 -12.8290 0.0000
Kids 3-5 -0.9912 -0.8373 0.0840 -9.9638 0.0000
Kids 6-17 -0.2096 -0.1764 0.0637 -2.7691 0.0056
Perm. inc. - - - - -
Temp. inc. -0.0162 -0.0136 0.0033 -4.1186 0.0000
21
Marginal maximum likelihood (MML)
• This estimation method may be used for both logit and probit models
p (y i | X i ) = ∫ p (y i | α i , X i ) f (α i )dα i , p (y i | α i , X i ) = ∏ p( y it | α i , x it ) ,
t
22
• The marginal log-likelihood is then
Lm (β) = ∑ log[ p (y i | X i )],
i
23
Logit model with normal random effects
• Under the assumption α i ∼ N ( µ , σ 2 ) , for the logit model we have
p (y i | X i ) = ∫ p (y i | w, X i )φ ( w)dw
• The score vector and the (empirical) information matrix are given by
∂Lm ( γ ) 1 ∂p (y i | w, X i )
s m (γ) = = ∑ s m,i ( γ ) , s m,i ( γ ) = ∫ φ ( w)dw
∂γ i p(y i | X i ) ∂γ
~ 1
J m ( γ ) = ∑ s m,i ( γ )s m,i ( γ )' − s m ( γ )s m ( γ )'
i n
24
Pros and cons of MML
• The MML method is more complicate to implement than fixed-effects
methods (JML, CML), but it allows us to estimate the regression
parameters for both time-fixed and time-varying covariates
25
• The approach may be extended to overcome these drawbacks:
a discrete distribution with free support points and mass probabilities
may be used for the random effects; the approach is in this case of
latent class type and requires the implementation of an EM algorithm
(Dempster et al., 1977) and the choice of the number of support
points
the parameters of the distribution of the random effects are allowed
to depend on the covariates; one possibility is the correlated effect
model of Chamberlain (1984)
26
• JML, CML and MML-normal estimates for the PSID dataset (logit
model); MML algorithm uses 51 quadrature points from –5 to 5
JML CML MML t-
Parameter s.e. p-value
estimate estimate estimate statistic
Intercept - - -2.9448 1.3461 -2.1876 0.0287
Std.dev
3.2196 0.1066 30.2090 0.0000
(σ )
Age - - 0.2652 0.0712 3.7243 0.0002
Age^2/100 - - -0.4285 0.0906 -4.7271 0.0000
Race - - 0.6800 0.2162 3.1449 0.0017
Education - - 0.6737 0.0643 10.4810 0.0000
Kids 0-2 -1.3660 -1.1537 -1.3418 0.0773 -17.3490 0.0000
Kids 3-5 -0.9912 -0.8373 -1.0260 0.0635 -16.1680 0.0000
Kids 6-17 -0.2096 -0.1764 -0.2533 0.0438 -5.7775 0.0000
Perm. inc. - - -0.0427 0.0036 -11.9610 0.0000
Temp. inc. -0.0162 -0.0136 -0.0110 0.0023 -4.7554 0.0000
27
Summary of the models fit
• Estimates for the PSID dataset (logit model):
Log- n.
Method AIC BIC
likelihood parameters
Homogenous -7507.3 10 15034.6 15090.1
Heterogeneous-JML -2986.3 1912 9796.6 20415.5
Heterogeneous-CML* -2128.5 4 4265.0 4287.2
Heterogeneous-MML-
-5264.4 11 10550.8 10611.9
normal
(*) not directly comparable with the others
or
π (α i , x it , y i,t −1 ) = Φ (α i + x it ' β + y i,t −1γ )
π (α i , x it , y i ,t −1 ) = p( y it = 1 | α i , x it , y i ,t −1 ) : conditional probability of
success
29
• The initial observation yi 0 must be known. When the parameters α i
are random, the initial condition problem arises. The simplest
approach, which however can lead to an biased estimator of β and γ ,
is to treat yi 0 as an exogenous covariate
30
Estimation of dynamic models
• The subject-specific parameters α i may be considered as fixed or
random
p (y i | X i , y i 0 ) = ∫ p (y i | α i , X i , y i 0 ) f (α i )dα i ,
p (y i | α i , X i , y i 0 ) = ∏ p ( y it | α i , x it , y i ,t −1 )
t
31
• The most used estimation methods for dynamic models are the same
as for static models:
joint maximum likelihood (fixed-parameters)
conditional maximum likelihood (only for the logit model)
marginal maximum likelihood (random-parameters)
32
Joint maximum likelihood (JML) method
• The log-likelihood has again a simple form:
L(α, β, γ ) = ∑ ∑ y it log[π (α i , x it , y i ,t −1 )] + (1 − y it ) log[1 − π (α i , x it , y i ,t −1 )]
i t
• The algorithm is essentially the same as that used for static models,
but with yi ,t −1 included among the covariates x it
• The JML estimator has the same drawbacks it has for static models:
it does not exist (for α i ) when y i + = 0 or y i + = T , with y i + = ∑t y it
it is not consistent with T fixed as n grows to infinity
33
• JML estimates for the PSID dataset (static and dynamic logit models)
Parameter Static logit Dynamic logit
Kids 0-2 -1.3660 -1.2688
Kids 3-5 -0.9912 -0.8227
Kids 6-17 -0.2096 -0.1730
Temp. inc. -0.0162 -0.0112
Lagged response - 0.5696
• A positive state dependence is observed and the fit of the logit model
improves considerably by including the lagged response variable
Log- n.
Model AIC BIC
likelihood parameters
Static logit -2986.3 1912 9796.6 20415.5
Dynamic logit -2317.9 1913 8461.8 19086.2
34
Conditional maximum likelihood (CML) method
• The CML method may be used to estimate the dynamic logit model
only in particular circumstances
35
• The response configurations y i = (0 1 yi 3 )' and y i = (1 0 yi 3 )' have
conditional probability
exp[(1 + y i 3 )α i + y i 3γ ]
p[(0 1 y i 3 )' | α i , y i 0 ] =
[1 + exp(α i + y i 0 γ )][1 + exp(α i )][1 + exp(α i + γ )]
exp[(1 + y i 3 )α i + y i 0 γ ]
p[(1 0 y i 3 )' | α i , y i 0 ] =
[1 + exp(α i + y i 0 γ )][1 + exp(α i + γ )][1 + exp(γ )]
exp( y i 0 γ ) exp[( y i 0 − y i 3 )γ ]
p[(1 0 y i 3 ) | α i , y i 0 , y i1 + y i 2 = 1, y i 3 ] = =
exp( y i 3γ ) + exp( y i 0 γ ) 1 + exp[( y i 0 − y i 3 )γ ]
36
• The corresponding conditional log-likelihood is
the resulting estimator has the same properties it has for T = 3 and, in
particular, it is consistent for T fixed as n grows to infinity
37
• The conditional approach may also be used in the presence of
covariates, provided that:
the probability that each discrete covariate is time-constant is
positive (this rules out the possibility of time dummies)
the support of the distribution of the continuous covariates satisfies
suitable conditions
38
Marginal maximum likelihood (MML) method
• This estimation method may be used for both dynamic logit and probit
models
• The algorithm is essentially the same as that for static models, but we
have to use an extended vector of covariates which includes the
lagged response variable
• For the dynamic logit model with normal random effects we have to
maximize
L (~
m γ ) = ∑ p ( y | X , y ) , p (y | X , y ) = p (y | w, X , y )φ ( w)dw ,
i i i0 i i i0 ∫ i i i0
i
exp[ yit ( µ + wσ + xit ' β + yi ,t −1γ )]
p (y i | w, Xi , yi 0 ) = ∏
t 1 + exp( µ + wσ + xit ' β + yi ,t −1γ )
39
• MML-normal estimates for the PSID dataset (static and dynamic logit
models)
Static Dynamic t- p-
Parameter s.e.
logit logit statistic value
Intercept -2.9448 -2.3313 0.6609 -3.5275 0.0004
Std.dev (σ ) 3.2196 1.1352 0.0930 12.2060 0.0000
Age 0.2652 0.1037 0.0360 2.8820 0.0040
Age^2/100 -0.4285 -0.1813 0.0464 -3.9096 0.0001
Race 0.6800 0.3011 0.1054 2.8573 0.0043
Education 0.6737 0.3034 0.0332 9.1456 0.0000
Kids 0-2 -1.3418 -0.8832 0.0825 -10.7010 0.0000
Kids 3-5 -1.0260 -0.4390 0.0736 -5.9629 0.0000
Kids 6-17 -0.2533 -0.0819 0.0393 -2.0831 0.0372
Perm. inc. -0.0427 -0.0189 0.0019 -10.1030 0.0000
Temp. inc. -0.0110 -0.0036 0.0030 -1.1783 0.2387
Lagged
- 2.7974 0.0653 42.8420 0.0000
response
40
• For the above example, a much stronger state dependence effect is
observed with the MML method with respect to the JML method
(γˆ = 2.7974 vs. γˆ = 0.5696 )
41
• MML-normal estimates for the PSID dataset (dynamic and extended
dynamic logit models)
Dynamic Exteded dynamic t- p-
Parameter s.e.
logit logit statistic value
Intercept -2.3313 -3.4484 0.8942 -3.8566 0.0001
Std.dev (σ ) 1.1352 1.6473 0.0900 18.2930 0.0000
Age 0.1037 0.1103 0.0502 2.1970 0.0280
Age^2/100 -0.1813 -0.1902 0.0647 -2.9410 0.0033
Race 0.3011 0.2744 0.1374 1.9971 0.0458
Education 0.3034 0.2864 0.0419 6.8412 0.0000
Kids 0-2 -0.8832 -1.0498 0.0917 -11.4470 0.0000
Kids 3-5 -0.4390 -0.5865 0.0871 -6.7369 0.0000
Kids 6-17 -0.0819 -0.1213 0.0624 -1.9426 0.0521
Perm. inc. -0.0189 -0.0164 0.0031 -5.3094 0.0000
Temp. inc. -0.0036 -0.0049 0.0032 -1.5133 0.1302
Lagged
2.7974 1.8165 0.0824 22.0550 0.0000
response
42
• The estimate for the state dependence effect seems now more reliable
(γˆ = 1.8164 vs. γˆ = 2.7974 ) even if it is strongly positive
• Estimates of the parameters for the mean of the distribution for α i
Log- n.
Model AIC BIC
likelihood parameters
JML -2317.9 1913 8461.8 19086.2
MML -4188.1 12 8400.2 8466.8
MML extended -3976.2 17 7986.4 8080.8
43
References
Akaike, H. (1973), Information theory and an extension of the maximum likelihood
principle, Second International symposium on information theory, Petrov, B. N. and
Csaki F. (eds), pp. 267-281.
Bartolucci, F. (2006), Likelihood inference for a class of latent Markov models under
linear hypotheses on the transition probabilities, Journal of the Royal Statistical
Society, series B, 68, pp. 155-178.
Chamberlain, G. (1984), Panel data, in Handbook of Econometrics, vol. 2, Z. Griliches
and M.D. Intriligator (eds.), Elsevier Science, Amsterdam, pp. 1247-1318.
Chamberlain, G. (1993), Feedback in Panel Data Models, Mimeo, Department of
Economics, Harvad University.
Dempster A. P., Laird, N. M. and Rubin, D. B. (1977), Maximum likelihood from
incomplete data via the EM Algorithm (with discussion), Journal of the Royal
Statistical Society, Series B, 39, pp. 1-38.
Elliot, D. S., Huizinga, D. and Menard, S. (1989), Multiple Problem Youth: Delinquency,
Substance Use, and Mental Health Problems, Springer-Verlag, New York.
Frees, E. W. (2004), Longitudinal and Panel Data: Analysis and Applications in the
Social Sciences, Cambridge University Press.
44
Heckman, J. J. (1981a), Heterogeneity and state dependence, in Structural Analysis of
Discrete Data, McFadden D. L. and Manski C. A, Cambridge, MA, MIT Press, pp. 91-
139.
Heckman, J. J. (1981b), The incidental parameter problem and the problem of initial
conditions in estimating a discrete time-discrete data stochastic process, in Structural
Analysis of Discrete Data with Econometric Applications, C.F. Manski and D.
McFadden (eds.), MIT Press, Cambridge, USA, 179-195.
Hsiao, C. (2005), Analysis of Panel Data, 2nd edition, Cambridge University Press.
Honoré, B. E. and Kyriazidou, E. (2000), Panel data discrete choice models with lagged
dependent variables, Econometrica, 68, pp. 839-874.
Hyslop, D. R. (1999), State dependence, serial correlation and heterogeneity in
intertemporal labor force participation of married women, Econometrica, 67, pp.
1255-1294.
Manski, C.F. (1975), Maximum score estimation of the stochastic utility model of
choice, Journal of Econometrics, 3, pp. 205-228.
Neyman, J. and Scott, E. (1948), Consistent estimates based on partially consistent
observations, Econometrica, 16, pp. 1-32
Schwarz, G. (1978), Estimating the dimension of a model, Annals of Statistics, 6, pp.
461-464.
45