0% found this document useful (0 votes)
120 views23 pages

8

This document discusses longitudinal data analysis when the dependent variable is categorical rather than continuous. It notes that linear models are inappropriate in this case, and instead logit models are used for dichotomous variables, multinomial logit for categorical variables, ordered logit for ordinal variables, and Poisson models for count variables. It provides an overview of how to estimate models with lagged, differenced, and cross-lagged dependent variables for different variable types in two-wave panel datasets. Examples are given using Stata code to estimate models with dichotomous and categorical dependent variables.

Uploaded by

He H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views23 pages

8

This document discusses longitudinal data analysis when the dependent variable is categorical rather than continuous. It notes that linear models are inappropriate in this case, and instead logit models are used for dichotomous variables, multinomial logit for categorical variables, ordered logit for ordinal variables, and Poisson models for count variables. It provides an overview of how to estimate models with lagged, differenced, and cross-lagged dependent variables for different variable types in two-wave panel datasets. Examples are given using Stata code to estimate models with dichotomous and categorical dependent variables.

Uploaded by

He H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

SC706: Longitudinal Data Analysis

Instructor: Natasha Sarkisian


Class notes: Categorical Dependent Variables

So far we’ve only dealt with continuous dependent variables, but panel data analysis is often
interested in examining categorical variables as well. Your dependent variable can be
dichotomous (0/1), categorical with multiple unordered categories, ordinal, or count variable. In
such cases, linear models are inappropriate as there are no restrictions on the predicted values of
outcome, the random effect (i.e. level 1 residual) cannot be normally distributed, and cannot have
homogenous variance (the variance depends on the predicted value). Like in non-longitudinal
models, analysis for such variables is accomplished by specifying a link function that transforms
the dependent variable so that the predicted values are constrained to be within a specific
interval. Specifically, we use logit models for dichotomous variables, multinomial logit for
categorical with unordered categories, ordered logit for ordinal variables, and Poisson models for
count variables.

Two-wave Datasets

Variable type Lagged DV Difference score First difference Cross-lagged


Continuous reg reg reg sem
Dichotomous logit mlogit (change mlogit (change biprobit, gsem
categories: 00, categories: 0 0, (logit, probit
11, 01, 10) 11, 01, 10) options)
Ordinal ologit reg reg bioprobit, gsem
(ologit. oprobit
options)
Nominal mlogit mlogit (1, 2, 3 = mlogit (1, 2, 3 = gsem (mlogit
11, 1 2, 13, 11, 1 2, 13, option)
21, 22, 23, 21, 22, 23,
31, 32, 33) 31, 32, 33)
Count poisson, nbreg, reg reg gsem (poisson,
zip, zinb nbreg options)

Everything besides cross-lagged model is estimated using regular regression commands. But for
mlogit, you need to create transition categories. For example:
. use hrs_hours.dta, clear

. tab r1poorhealth r2poorhealth


Poor | Poor health
health | 0 1 | Total
-----------+----------------------+----------
0 | 4,445 420 | 4,865
1 | 314 785 | 1,099
-----------+----------------------+----------
Total | 4,759 1,205 | 5,964

. egen health=group(r1poorhealth r2poorhealth)


(627 missing values generated)

. tab health

1
group(r1poo |
rhealth |
r2poorhealt |
h) | Freq. Percent Cum.
------------+-----------------------------------
1 | 4,445 74.53 74.53
2 | 420 7.04 81.57
3 | 314 5.26 86.84
4 | 785 13.16 100.00
------------+-----------------------------------
Total | 5,964 100.00

. mlogit health r1workhours80 r1married r1totalpar r1siblog h1childlg

Multinomial logistic regression Number of obs = 5558


LR chi2(15) = 723.19
Prob > chi2 = 0.0000
Log likelihood = -4209.2727 Pseudo R2 = 0.0791
-------------------------------------------------------------------------------
health | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
1 | (base outcome)
--------------+----------------------------------------------------------------
2 |
r1workhours80 | -.0130358 .0024295 -5.37 0.000 -.0177977 -.008274
r1married | -.5597468 .1395123 -4.01 0.000 -.8331859 -.2863078
r1totalpar | -.1197857 .0726706 -1.65 0.099 -.2622176 .0226461
r1siblog | .4067526 .0921614 4.41 0.000 .2261197 .5873856
h1childlg | .284537 .1008141 2.82 0.005 .0869451 .4821289
_cons | -2.343312 .2260522 -10.37 0.000 -2.786366 -1.900258
--------------+----------------------------------------------------------------
3 |
r1workhours80 | -.0180743 .0027839 -6.49 0.000 -.0235306 -.0126179
r1married | -.6254326 .1553243 -4.03 0.000 -.9298627 -.3210026
r1totalpar | -.2012291 .0870588 -2.31 0.021 -.3718613 -.0305969
r1siblog | .3246466 .104158 3.12 0.002 .1205006 .5287925
h1childlg | .1905984 .1145384 1.66 0.096 -.0338927 .4150895
_cons | -2.079726 .2496431 -8.33 0.000 -2.569017 -1.590435
--------------+----------------------------------------------------------------
4 |
r1workhours80 | -.0421077 .0020845 -20.20 0.000 -.0461933 -.0380222
r1married | -1.023912 .105026 -9.75 0.000 -1.229759 -.8180651
r1totalpar | -.1763883 .0614712 -2.87 0.004 -.2968697 -.0559069
r1siblog | .5245884 .0728385 7.20 0.000 .3818276 .6673492
h1childlg | .2207622 .0785831 2.81 0.005 .0667421 .3747823
_cons | -.7924487 .1702211 -4.66 0.000 -1.126076 -.4588215
-------------------------------------------------------------------------------

For cross-lagged model with dichotomous dependent variables:


. biprobit ( r2poorhealth r1poorhealth r1married r1workhours80 r1totalpar) ( r2married
r1married r1poorhealth r1workhours80 r1totalpar)

Seemingly unrelated bivariate probit Number of obs = 5880


Wald chi2(8) = 4194.46
Log likelihood = -2963.1838 Prob > chi2 = 0.0000
-------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
r2poorhealth |
r1poorhealth | 1.807378 .0496701 36.39 0.000 1.710026 1.90473
r1married | -.1549207 .0539901 -2.87 0.004 -.2607394 -.049102
r1workhours80 | -.0091585 .0010034 -9.13 0.000 -.0111251 -.0071919

2
r1totalpar | -.0330402 .030248 -1.09 0.275 -.0923252 .0262447
_cons | -.906089 .0672771 -13.47 0.000 -1.03795 -.7742284
--------------+----------------------------------------------------------------
r2married |
r1married | 3.291951 .0673929 48.85 0.000 3.159864 3.424039
r1poorhealth | -.1818643 .077882 -2.34 0.020 -.3345102 -.0292183
r1workhours80 | .0011923 .0014246 0.84 0.403 -.0015999 .0039844
r1totalpar | .0707012 .0413782 1.71 0.088 -.0103986 .151801
_cons | -1.559129 .0907986 -17.17 0.000 -1.737091 -1.381167
--------------+----------------------------------------------------------------
/athrho | -.0035238 .0515823 -0.07 0.946 -.1046233 .0975757
--------------+----------------------------------------------------------------
rho | -.0035238 .0515817 -.1042432 .0972672
-------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chi2(1) = .004666 Prob > chi2 = 0.9455

In this case, rho is the correlation between the residuals of two regressions (non-significant here),
and athrho is the Fisher's Z transformation (the arc-hyperbolic tangent) of that correlation.
. test [r2poorhealth]r1married=[r2married]r1poorhealth

( 1) [r2poorhealth]r1married - [r2married]r1poorhealth = 0

chi2( 1) = 0.08
Prob > chi2 = 0.7762

Another way, using gsem – but note that correlated residuals don’t work when it is estimated
with logit etc.:
. gsem ( r2poorhealth <- r1poorhealth r1married r1workhours80 r1totalpar) ( r2married
<- r1married r1poorhealth r1workhours80 r1totalpar), logit

Generalized structural equation model Number of obs = 5882


Log likelihood = -2962.1503
---------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
r2poorhealth <- |
r1poorhealth | 3.06475 .0869457 35.25 0.000 2.894339 3.23516
r1married | -.2836505 .1005597 -2.82 0.005 -.4807439 -.0865571
r1workhours80 | -.0172547 .001859 -9.28 0.000 -.0208983 -.0136111
r1totalpar | -.0554403 .0576515 -0.96 0.336 -.1684352 .0575547
_cons | -1.52599 .1248862 -12.22 0.000 -1.770762 -1.281217
----------------+----------------------------------------------------------------
r2married <- |
r1married | 5.922521 .1436862 41.22 0.000 5.640901 6.204141
r1poorhealth | -.3999421 .1703288 -2.35 0.019 -.7337804 -.0661037
r1workhours80 | .0030176 .0031457 0.96 0.337 -.0031479 .0091831
r1totalpar | .1628322 .0946266 1.72 0.085 -.0226325 .3482968
_cons | -2.795544 .1991486 -14.04 0.000 -3.185868 -2.40522
---------------------------------------------------------------------------------

. test [r2poorhealth]r1married=[r2married]r1poorhealth

( 1) [r2poorhealth]r1married - [r2married]r1poorhealth = 0

chi2( 1) = 0.35
Prob > chi2 = 0.5566

Here’s another example of a cross-lagged model in gsem but now using Poisson models:

3
. gsem ( r2workhours <- r1allparhelptw r1poorhealth r1married r1workhours80
r1totalpar) ( r2allparhelptw <- r1allparhelptw r1married r1poorhealth r1workhours80
r1totalpar), family(poisson) link(log)
note: r2allparhelptw has noncount values;
you are responsible for the family(poisson) interpretation

Generalized structural equation model Number of obs = 5803


Log likelihood = -69816.414
-----------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------------+----------------------------------------------------------------
r2workhours80 <- |
r1allparhelptw | -.0108981 .0010395 -10.48 0.000 -.0129354 -.0088608
r1poorhealth | -.3278578 .0085658 -38.28 0.000 -.3446465 -.311069
r1married | .0185784 .0067215 2.76 0.006 .0054044 .0317523
r1workhours80 | .0283919 .0001318 215.49 0.000 .0281337 .0286501
r1totalpar | .0469421 .0031718 14.80 0.000 .0407255 .0531587
_cons | 2.218248 .0092682 239.34 0.000 2.200083 2.236413
------------------+----------------------------------------------------------------
r2allparhelptw <- |
r1allparhelptw | .0898139 .0020298 44.25 0.000 .0858355 .0937923
r1married | -.407909 .0278715 -14.64 0.000 -.4625362 -.3532818
r1poorhealth | -.1778003 .0326566 -5.44 0.000 -.2418061 -.1137945
r1workhours80 | -.0018228 .0005492 -3.32 0.001 -.0028991 -.0007465
r1totalpar | .1536927 .0152773 10.06 0.000 .1237499 .1836356
_cons | .2722775 .036161 7.53 0.000 .2014032 .3431518
-----------------------------------------------------------------------------------

Now let’s try ordered logit; need to generate two ordinal variables to use:
. recode r1workhours80 (0=0) (1/30=1) (31/50=2) (51/80=3), gen(r1workhours4)
(4675 differences between r1workhours80 and r1workhours4)

. recode r2workhours80 (0=0) (1/30=1) (31/50=2) (51/80=3), gen(r2workhours4)


(3935 differences between r2workhours80 and r2workhours4)

. recode r1allparhelptw (0=0) (1/5=1) (5.001/20=2), gen( r1allparhelptw3)


(441 differences between r1allparhelptw and r1allparhelptw3)

. recode r2allparhelptw (0=0) (1/5=1) (5.001/20=2), gen( r2allparhelptw3)


(1179 differences between r2allparhelptw and r2allparhelptw3)

. net search bioprobit

Install from: bioprobit from http://fmwww.bc.edu/RePEc/bocode/b

. bioprobit (r2allparhelptw3 r1workhours4 r1married r1poorhealth r1totalpar)


(r2workhours4 r1allparhelptw3 r1married r1poorhealth r1totalpar)

group(r2wor |
khours4) | Freq. Percent Cum.
------------+-----------------------------------
1 | 1,914 33.93 33.93
2 | 603 10.69 44.62
3 | 2,432 43.11 87.73
4 | 692 12.27 100.00
------------+-----------------------------------
Total | 5,641 100.00

Seemingly unrelated bivariate ordered probit regression


Number of obs = 5641
Wald chi2(4) = 52.69
Log likelihood = -12884.726 Prob > chi2 = 0.0000

4
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
r2allparhe~3 |
r1workhours4 | -.007293 .0228304 -0.32 0.749 -.0520398 .0374538
r1married | -.205262 .0432182 -4.75 0.000 -.2899682 -.1205559
r1poorhealth | -.1397515 .0480594 -2.91 0.004 -.2339463 -.0455568
r1totalpar | .1228002 .0221615 5.54 0.000 .0793644 .166236
-------------+----------------------------------------------------------------
r2workhours4 |
r1allparhe~3 | -.1803361 .0368463 -4.89 0.000 -.2525535 -.1081186
r1married | .0423012 .0389766 1.09 0.278 -.0340916 .118694
r1poorhealth | -.8130412 .0412363 -19.72 0.000 -.8938628 -.7322195
r1totalpar | .1512924 .0196722 7.69 0.000 .1127356 .1898492
-------------+----------------------------------------------------------------
athrho |
_cons | -.026925 .025833 -1.04 0.297 -.0775568 .0237068
-------------+----------------------------------------------------------------
/cut11 | .5303018 .0591844 .4143026 .6463011
/cut12 | .5308175 .0591851 .4148169 .6468181
/cut13 | .5328816 .0591878 .4168757 .6488876
/cut14 | .5349478 .0591905 .4189365 .6509591
/cut15 | .5370161 .0591932 .4209995 .6530327
/cut16 | .5432348 .0592011 .4272027 .6592669
/cut17 | .5458319 .0592044 .4297934 .6618703
/cut18 | .5468716 .0592057 .4308306 .6629127
/cut19 | .5473917 .0592063 .4313494 .663434
/cut110 | .5635893 .0592264 .4475077 .6796709
/cut111 | .5699004 .0592341 .4538037 .6859971
/cut112 | .5762354 .0592418 .4601235 .6923473
/cut113 | .5767644 .0592425 .4606512 .6928776
/cut114 | .6050495 .0592784 .4888661 .721233
/cut115 | .6093619 .0592839 .4931676 .7255562
/cut116 | .6267269 .0593053 .5104907 .7429631
/cut117 | .6272727 .0593059 .5110352 .7435101
/cut118 | .6278186 .0593066 .5115799 .7440573
/cut119 | .6283647 .0593072 .5121247 .7446047
/cut120 | .6415286 .0593231 .5252574 .7577997
/cut121 | .6453889 .0593277 .5291087 .7616692
/cut122 | .6459412 .0593284 .5296597 .7622227
/cut123 | .6659519 .0593511 .5496259 .7822778
/cut124 | .6665114 .0593517 .5501843 .7828386
/cut125 | .6698733 .0593552 .5535392 .7862074
/cut126 | .6704343 .0593558 .554099 .7867696
/cut127 | .6726802 .0593581 .5563404 .78902
/cut128 | .6732422 .0593587 .5569013 .7895831
/cut129 | .8039208 .0594996 .6873036 .9205379
/cut130 | .8063982 .0595025 .6897755 .9230209
/cut131 | 1.45921 .0612874 1.339089 1.579332
/cut21 | -.3239231 .0452237 -.4125599 -.2352863
/cut22 | -.025728 .0450239 -.1139733 .0625172
/cut23 | 1.334024 .0477328 1.24047 1.427579
-------------+----------------------------------------------------------------
rho | -.0269185 .0258143 -.0774017 .0237023
------------------------------------------------------------------------------
LR test of indep. eqns. : chi2(1) = 1.09 Prob > chi2 = 0.2973

5
Multiwave Panel Data

Types of models and commands:


Dependent Fixed effects Random effects (RE) Mixed
Variable (FE) effects
Type (ME)
Continuous xtreg, fe xtreg, re mixed
Dichotomous xtlogit, fe xtlogit, re melogit
Ordinal Not meologit meologit
recommended
Nominal Not Gsem, mlogit option Outside
recommended Stata: HLM
7
Count xtpoisson, fe xtpoisson, re mepoisson,
xtnbreg, fe xtnbreg, re menbreg

Dichotomous DV

We’ll use poorhealth variable in the hrs_hours_reshaped.dta file as the outcome for this
example. To specify that the dependent variable is binary, we use xtlogit command, first
specifying FE and then RE model (Stata also offers an xtprobit command if you prefer probit
models):

. use hrs_hours_reshaped.dta, clear

. xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelptw,


fe
note: multiple positive outcomes within groups encountered.
note: 4409 groups (19766 obs) dropped because of all positive or
all negative outcomes.

Conditional fixed-effects logistic regression Number of obs = 10780


Group variable: hhidpn Number of groups = 1837

Obs per group: min = 2


avg = 5.9
max = 9

LR chi2(6) = 480.51
Log likelihood = -3898.61 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
rpoorhealth | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rworkhours80 | -.0248479 .0015259 -16.28 0.000 -.0278387 -.0218571
rmarried | -.0069076 .129063 -0.05 0.957 -.2598664 .2460511
rtotalpar | -.2935712 .0407944 -7.20 0.000 -.3735267 -.2136156
rsiblog | -.3552165 .16598 -2.14 0.032 -.6805313 -.0299018
hchildlg | .4488363 .1593514 2.82 0.005 .1365133 .7611594
rallparhel~w | .010376 .0064248 1.61 0.106 -.0022164 .0229685
------------------------------------------------------------------------------

6
Note that those who experienced no change in health were omitted from this analysis. Also note
that FE model doesn’t distinguish between two different stable health conditions (i.e., 0  0 and
1  1) and considers 0  1 and 1  0 as producing the same size effects in opposite direction.

Random effects:
. xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelptw
female raedyrs age minority, re

Random-effects logistic regression Number of obs = 30541


Group variable: hhidpn Number of groups = 6243

Random effects u_i ~ Gaussian Obs per group: min = 1


avg = 4.9
max = 9

Integration method: mvaghermite Integration points = 12

Wald chi2(10) = 1550.47


Log likelihood = -11019.627 Prob > chi2 = 0.0000
-------------------------------------------------------------------------------
rpoorhealth | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
rworkhours80 | -.0353336 .0013598 -25.99 0.000 -.0379987 -.0326685
rmarried | -.5402026 .0898874 -6.01 0.000 -.7163786 -.3640265
rtotalpar | -.1603994 .0359967 -4.46 0.000 -.2309517 -.0898472
rsiblog | -.0966548 .0698343 -1.38 0.166 -.2335276 .040218
hchildlg | .2094428 .0730742 2.87 0.004 .06622 .3526656
rallparhelptw | .0021523 .0059423 0.36 0.717 -.0094943 .0137989
female | -.6397422 .0902022 -7.09 0.000 -.8165353 -.4629491
raedyrs | -.3337791 .0157692 -21.17 0.000 -.3646861 -.3028721
age | -.0363513 .0144833 -2.51 0.012 -.064738 -.0079646
minority | 1.122804 .1038392 10.81 0.000 .9192827 1.326325
_cons | 4.836503 .866618 5.58 0.000 3.137963 6.535043
--------------+----------------------------------------------------------------
/lnsig2u | 1.921837 .046 1.831679 2.011995
--------------+----------------------------------------------------------------
sigma_u | 2.614096 .0601242 2.498872 2.734634
rho | .6750224 .0100909 .6549413 .6944799
-------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 5476.80 Prob >= chibar2 = 0.000

You can evaluate the assumptions of RE model using Hausman test the same way you did for
regular RE models; however, do not specify “sigmamore” option in hausman command.

Interpreting coefficients

The interpretation of coefficients is very similar to the interpretation of the results of logistic
regression. Interpreting coefficients themselves allows us to discuss the direction and
significance of effects, but not their size. To talk about the size, we use odds ratios. Odds are
ratios of two probabilities – probability of a positive outcome and a probability of a negative
outcome (e.g. probability of voting divided by a probability of not voting). But since
probabilities vary depending on values of X, such a ratio varies as well. What remains constant is
the ratio of such odds – e.g. odds of poor health for a female divided by odds of poor health for a
mlae will be the same number regardless of the values of other variables on the model.

7
Odds ratios are exponentiated logistic regression coefficients. They are sometimes called factor
coefficients, because they are multiplicative coefficients. Odds ratios are equal to 1 if there is no
effect, smaller than 1 if the effect is negative and larger than 1 if it is positive. To get them in
xtlogit command, use OR option:

. xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelptw


female raedyrs age minority, re or

Random-effects logistic regression Number of obs = 30541


Group variable: hhidpn Number of groups = 6243

Random effects u_i ~ Gaussian Obs per group: min = 1


avg = 4.9
max = 9

Wald chi2(10) = 1550.47


Log likelihood = -11019.627 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
rpoorhealth | OR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rworkhours80 | .9652833 .0013126 -25.99 0.000 .9627142 .9678593
rmarried | .5826302 .0523711 -6.01 0.000 .4885182 .6948728
rtotalpar | .8518035 .0306621 -4.46 0.000 .7937778 .9140709
rsiblog | .9078693 .0634004 -1.38 0.166 .7917358 1.041038
hchildlg | 1.232991 .0900998 2.87 0.004 1.068462 1.422855
rallparhel~w | 1.002155 .0059551 0.36 0.717 .9905506 1.013895
female | .5274284 .0475752 -7.09 0.000 .4419602 .6294247
raedyrs | .716212 .0112941 -21.17 0.000 .6944146 .7386936
age | .9643015 .0139663 -2.51 0.012 .937313 .992067
minority | 3.073459 .3191454 10.81 0.000 2.507491 3.767172
-------------+----------------------------------------------------------------
/lnsig2u | 1.921837 .046 1.831679 2.011995
-------------+----------------------------------------------------------------
sigma_u | 2.614096 .0601242 2.498872 2.734634
rho | .6750224 .0100909 .6549413 .6944799
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 5476.80 Prob >= chibar2 =
0.000

So for example, the odds ratio for .53 for females indicates that the odds of reporting poor health
for males are about half those for males –or we can say 47% lower. To get percent change, we
subtract 1 from the odds ratio, and then multiply the result by 100.

Beware: if you would like to know what the increase would be per, say, 10 units increase in the
independent variable – e.g. 10 years of age or education, you cannot simply multiple the odds
ratio by 10! The coefficient, in fact, would be odds ratio to the power of 10. Or alternatively,
you could take the regular logit coefficient, multiply it by 10 and then exponentiate it.

In addition, since odds ratios are multiplicative coefficients, when you want to interpret, for
example, an interaction term, you would have to multiply rather than add the odds ratio numbers.
Alternatively, you can add the numbers presented in the coefficient column and then
exponentiate the result.

8
In addition to using odds ratios, we can use predicted probabilities (P) to interpret our results. We
can get them by calculating predicted logits (L) and then recalculating them into probabilities.
Since L=log(odds)=log(P/(1-P)), then P=eL/(1+eL).

Luckily, we do not have to do that by hand – we can use predict command, adjust command, or
margins comand. Predict command allows us to estimate predicted probabilities for the actual
observations in the data. The options can be used with predict, margins, or adjust after xtlogit:

xb calculates the linear prediction. This is the default for the random-effects model.

stdp calculates the standard error of the linear prediction.

pc1 calculates the predicted probability of a positive outcome conditional on one positive
outcome within group. This is the default for the fixed-effects model.

pu0 calculates the probability of a positive outcome, assuming that the fixed or random effect for
that observation's panel is zero. This may not be similar to the proportion of observed outcomes
in the group.

We can also use adjust or margins commands to calculate predicted probabilities for some
hypothetical, strategically selected cases and to construct graphs:

. qui margins, at(rworkhours80=(0(10)80) rmarried=(0/1)) predict(pu0)

. marginsplot, x(rworkhours80) noci plotop(msymbol(i)) scheme(s1mono)

Variables that uniquely identify margins: rworkhours80 rmarried

Predictive Margins
.25
.2
.15
.1
.05
0

0 10 20 30 40 50 60 70 80
rworkhours80

rmarried=0 rmarried=1

9
Interactions

Note that interactions as a method to compare two or more groups can be problematic in logit
models because the coefficients are scaled according to the differences in residual dispersion. So
it is not as appropriate to rely on the significance test of the interaction term to establish whether
some process differs by group. The best approach to establish whether the two groups do differ is
to examine differences in predicted probabilities. You would have to decide which values to
assign to the rest of the variables in your model; the default is to use the mean.
. xtlogit rpoorhealth c.rworkhours80##i.rmarried rtotalpar rsiblog hchildlg
rallparhelptw female raedyrs age minority, re or

Random-effects logistic regression Number of obs = 30541


Group variable: hhidpn Number of groups = 6243

Random effects u_i ~ Gaussian Obs per group: min = 1


avg = 4.9
max = 9

Integration method: mvaghermite Integration points = 12

Wald chi2(11) = 1561.99


Log likelihood = -11012.155 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------------------
rpoorhealth | OR Std. Err. z P>|z| [95% Conf. Interval]
------------------------+----------------------------------------------------------------
rworkhours80 | .9556948 .0028358 -15.27 0.000 .9501529 .9612689
1.rmarried | .4733899 .0497145 -7.12 0.000 .3853251 .5815815
|
rmarried#c.rworkhours80 |
1 | 1.012447 .0032671 3.83 0.000 1.006064 1.018871
|
rtotalpar | .8485704 .0305472 -4.56 0.000 .7907624 .9106044
rsiblog | .908552 .0634898 -1.37 0.170 .7922599 1.041914
hchildlg | 1.231178 .0900109 2.84 0.004 1.066817 1.420862
rallparhelptw | 1.00155 .0059641 0.26 0.795 .9899281 1.013307
female | .5349173 .0482853 -6.93 0.000 .4481788 .6384427
raedyrs | .7170131 .0113045 -21.10 0.000 .6951955 .7395154
age | .9638097 .0139562 -2.55 0.011 .9368406 .9915553
minority | 3.054056 .3171545 10.75 0.000 2.491623 3.743447
_cons | 150.9996 131.0543 5.78 0.000 27.55549 827.453
------------------------+----------------------------------------------------------------
/lnsig2u | 1.920336 .045987 1.830203 2.010468
------------------------+----------------------------------------------------------------
sigma_u | 2.612135 .0600621 2.497028 2.732547
rho | .6746929 .0100933 .6546077 .6941559
-----------------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 5472.85 Prob >= chibar2 = 0.000

. qui margins, at(rworkhours80=(0(10)80) rmarried=(0/1)) predict(pu0)

. marginsplot, x(rworkhours80) noci plotop(msymbol(i)) scheme(s1mono)

Variables that uniquely identify margins: rworkhours80 rmarried

10
Predictive Margins
.25
.2
.15
.1
.05
0

0 10 20 30 40 50 60 70 80
rworkhours80

rmarried=0 rmarried=1

You can also create confidence intervals; for example:


. qui margins, at(rworkhours80=(0(10)80) rmarried=(0/1)) predict(pu0)

. marginsplot, x(rworkhours80) plotop(msymbol(i)) scheme(s1mono) recastci(rarea)


ciopts(fintensity(5))

Predictive Margins with 95% CIs


.3
.2
.1
0

0 10 20 30 40 50 60 70 80
rworkhours80

rmarried=0 rmarried=1

11
We can also more explicitly graph the difference between the two groups – let’s try at a few
levels of other variables:
. qui margins, dydx(rmarried) at(rworkhours80=(0(10)80) female=(0/1) minority=(0/1))
predict(pu0)

. marginsplot, x(rworkhours80) plotop(msymbol(i)) scheme(s1mono) noci

Variables that uniquely identify margins: rworkhours80 female minority

Average Marginal Effects of 1.rmarried


0
-.05
-.1
-.15

0 10 20 30 40 50 60 70 80
rworkhours80

female=0, minority=0 female=0, minority=1


female=1, minority=0 female=1, minority=1

Variables that uniquely identify margins: rworkhours80 rmarried

For more detail on doing these graphical comparisons, see Scott Long’s article at:
http://www.indiana.edu/~jslsoc/files_research/groupdif/groupwithprobabilities/groups-with-
prob-2009-06-25.pdf

Variance components

Note that the variance component does not contain an estimate of level 1 variance. That is
because in logistic regression models, it is not possible to estimate both the coefficients and the
error variance; therefore, in all logistic regression models, the error variance is always fixed to
the same number which is 2/3 = 3.29. That rule also applies to multilevel models, but only to
their level 1 residuals. Knowing this means that we can calculate the intraclass correlation
coefficient or the proportion of variance explained. For both, we can follow the procedures
described on pp.224-227 of the Snijders and Bosker chapter on dichotomous outcomes. For
instance, the ICC would be calculated as

where tau zero squared is variance of individual-level residuals U. And the proportion of
variance explained can be calculated as

12
Note that in addition to the variance of individual-level residuals U denoted 0 and level 1
variance 2R = 3.29, we need to know the variance of fitted values 2F. That refers to the
variance of linear predictions, which are the values that results if we multiply our coefficients by
our variable values and add up these products. That is, we are talking about the predicted values
of logits. To obtain the variance of fitted values, we can use predict with xb option:

. qui xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg


rallparhelptw female raedyrs age minority, re or

Random-effects logistic regression Number of obs = 30541


Group variable: hhidpn Number of groups = 6243

Random effects u_i ~ Gaussian Obs per group: min = 1


avg = 4.9
max = 9

Wald chi2(10) = 1550.47


Log likelihood = -11019.627 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
rpoorhealth | OR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rworkhours80 | .9652833 .0013126 -25.99 0.000 .9627142 .9678593
rmarried | .5826302 .0523711 -6.01 0.000 .4885182 .6948728
rtotalpar | .8518035 .0306621 -4.46 0.000 .7937778 .9140709
rsiblog | .9078693 .0634004 -1.38 0.166 .7917358 1.041038
hchildlg | 1.232991 .0900998 2.87 0.004 1.068462 1.422855
rallparhel~w | 1.002155 .0059551 0.36 0.717 .9905506 1.013895
female | .5274284 .0475752 -7.09 0.000 .4419602 .6294247
raedyrs | .716212 .0112941 -21.17 0.000 .6944146 .7386936
age | .9643015 .0139663 -2.51 0.012 .937313 .992067
minority | 3.073459 .3191454 10.81 0.000 2.507491 3.767172
-------------+----------------------------------------------------------------
/lnsig2u | 1.921837 .046 1.831679 2.011995
-------------+----------------------------------------------------------------
sigma_u | 2.614096 .0601242 2.498872 2.734634
rho | .6750224 .0100909 .6549413 .6944799
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 5476.80 Prob >= chibar2 = 0.000

. predict xb, xb
(24127 missing values generated)

. sum xb if e(sample)

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
xb | 30541 -2.724561 1.63337 -7.265207 3.923255

Rho:
. di 2.614096^2/(2.614096^2+c(pi)^2/3)
.67502231

R-squared:
. di 1.633347^2/(1.633347^2+2.614096^2+3.29)
.20856505

13
Note that such R squared values are pseudo-R squared and are typically lower than values we are
used to with OLS because 2R is a fixed number.

Obtaining residuals after xtlogit is not possible with predict command, as we saw above. We can
do it, however, by reestimating RE model using mixed effects logit syntax:
. melogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelpt
> w female raedyrs age minority || hhidpn: , or

Mixed-effects logistic regression Number of obs = 30541


Group variable: hhidpn Number of groups = 6243

Obs per group: min = 1


avg = 4.9
max = 9
Integration method: mvaghermite Integration points = 7
Wald chi2(10) = 1557.22
Log likelihood = -11008.605 Prob > chi2 = 0.0000
-------------------------------------------------------------------------------
rpoorhealth | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
rworkhours80 | .965261 .0013117 -26.02 0.000 .9626936 .9678352
rmarried | .5875815 .052345 -5.97 0.000 .4934447 .6996773
rtotalpar | .847142 .0305427 -4.60 0.000 .7893455 .9091704
rsiblog | .9013827 .0618536 -1.51 0.130 .7879507 1.031144
hchildlg | 1.238295 .0890782 2.97 0.003 1.075455 1.425793
rallparhelptw | 1.002406 .0059541 0.40 0.686 .990804 1.014144
female | .5267198 .0463214 -7.29 0.000 .4433251 .6258021
raedyrs | .7136284 .0111674 -21.56 0.000 .6920729 .7358553
age | .9635443 .0135893 -2.63 0.008 .9372744 .9905504
minority | 3.124398 .3184031 11.18 0.000 2.558713 3.815146
_cons | 132.5952 112.0297 5.78 0.000 25.31322 694.5572
--------------+----------------------------------------------------------------
hhidpn |
var(_cons)| 7.442719 .3733355 6.745813 8.211621
-------------------------------------------------------------------------------
LR test vs. logistic regression: chibar2(01) = 5498.85 Prob>=chibar2 = 0.0000

After that we can use predict command to get a range of residuals, like for xtmixed. For example,
you can get option reffects to get random effects (both random intercepts and random slopes, if
you decide to introduce them). You can also use xb option to get fitted values based on
coefficients only (random effects not included), and you can get three types of overall residuals:

pearson calculates Pearson residuals. Pearson residuals large in absolute value may indicate a
lack of fit. By default, residuals include both the fixed portion and the random portion of the
model.The fixedonly option modifies the calculation to include the fixed portion only.

deviance calculates deviance residuals. Deviance residuals are recommended by McCullagh


and Nelder (1989) as having the best properties for examining the goodness of fit of a GLM.
They are approximately normally distributed if the model is correctly specified. They may be
plotted against the fitted values or against a covariate to inspect the model's fit. By default,
residuals include both the fixed portion and the random portion of the model. The fixedonly
option modifies the calculation to include the fixed portion only.

14
anscombe calculates Anscombe residuals, residuals that are designed to closely follow a
normal distribution. By default, residuals include both the fixed portion and the random portion
of the model. The fixedonly option modifies the calculation to include the fixed portion only.

Ordered Logit

Much of what we discussed applies to ordered logit models. To better understand interpretation
of coefficients in ordered logit, you should review my SC704 notes for that topic.

. recode rworkhours80 (0=0) (1/30=1) (31/50=2) (51/80=3), gen(rworkhours4)


(23290 differences between rworkhours80 and rworkhours4)

. meologit rworkhours4 rpoorhealth rmarried rtotalpar rsiblog hchildlg rallparhelpt


> w female raedyrs age minority || hhidpn: , or

Mixed-effects ologit regression Number of obs = 30541


Group variable: hhidpn Number of groups = 6243

Obs per group: min = 1


avg = 4.9
max = 9

Integration method: mvaghermite Integration points = 7

Wald chi2(10) = 3191.62


Log likelihood = -29982.693 Prob > chi2 = 0.0000
-------------------------------------------------------------------------------
rworkhours4 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
rpoorhealth | .2844605 .0135639 -26.36 0.000 .2590801 .3123272
rmarried | .7912272 .0501324 -3.70 0.000 .6988256 .8958465
rtotalpar | 2.424358 .0562904 38.14 0.000 2.316504 2.537234
rsiblog | 1.100873 .0558983 1.89 0.058 .9965895 1.216069
hchildlg | .8502 .045471 -3.03 0.002 .7655905 .9441601
rallparhelptw | .9483347 .0037667 -13.36 0.000 .9409809 .955746
female | .2667214 .0174542 -20.19 0.000 .2346148 .3032217
raedyrs | 1.113577 .0126085 9.50 0.000 1.089138 1.138566
age | .869043 .0091889 -13.27 0.000 .8512184 .8872409
minority | 1.017714 .0796891 0.22 0.823 .8729212 1.186524
--------------+----------------------------------------------------------------
/cut1 | -7.122403 .6293412 -11.32 0.000 -8.355889 -5.888917
/cut2 | -6.146649 .6289257 -9.77 0.000 -7.37932 -4.913977
/cut3 | -2.693722 .6279935 -4.29 0.000 -3.924567 -1.462878
--------------+----------------------------------------------------------------
hhidpn |
var(_cons)| 4.819499 .1531122 4.528557 5.129133
-------------------------------------------------------------------------------
LR test vs. ologit regression: chibar2(01) = 8739.46 Prob>=chibar2 = 0.0000

Briefly, the odds ratios for ordered logit are cumulative odds of belonging to a certain category
or lower versus belonging to one of the higher categories. For example, if our dependent variable
is the level of agreement with some statement and the categories are agree=3, not sure=2, and
disagree=1, and if the odds ratio for gender as a predictor of that agreement is 2.00, we can say
that the odds of disagreeing rather than agreeing or being not sure are 2 times higher for women
than for men. Similarly, the odds of disagreeing or being not sure are also twice as high for
women than for men.

15
What this means is that ologit assumes that these two odds ratios are essentially the same and
thus uses the average. That is called the parallel slopes assumption. So we are assuming these
two odds ratios are the same – if they differ significantly, the assumption is violated.

Stata does not provide diagnostic tools for panel data for testing the parallel slopes assumption,
so in order to obtain a rough test, you might want to run your model without taking panel nature
of the data into account (using regular ologit command) and test that assumption that way, even
though such a test will be approximate.

Another way to do so would be to create the corresponding dichotomies and estimate models
separately for each dichotomy using xtlogit – we can then examine whether odds ratios indeed
look similar in such models.

Multinomial Logit

The best way to estimate a random effects multinomial logit in Stata is using SEM module, and
specifically gsem command with mlogit option. You can either specify a version estimating a
single random effect or a separate random effect for each equation (we have m-1 equations in
multinomial logit with m alternatives). The second option is less restrictive and usually
preferred. Here is an example using rworkhours4 we generated above (its values are 0 -3).
We can either use model builder to build the following GSEM model, or enter a command:

. gsem (1.rworkhours4 <- rpoorhealth rmarried rtotalpar rsiblog hchildlg rallparhelptw


female raedyrs age minority M1[hhidpn]) (2.rworkhours4 <- rpoorhealth rmarried
rtotalpar rsiblog hchildlg rallparhelptw female raedyrs age minority M2[hhidpn])
(3.rworkhours4 <- rpoorhealth rmarried rtotalpar rsiblog hchildlg rallparhelptw female
raedyrs age minority M3[hhidpn]), mlogit

16
Generalized structural equation model Number of obs = 30541
Log likelihood = -27850.963

( 1) [1.rworkhours4]M1[hhidpn] = 1
( 2) [2.rworkhours4]M2[hhidpn] = 1
( 3) [3.rworkhours4]M3[hhidpn] = 1
----------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
0.rworkhours4 | (base outcome)
-----------------+----------------------------------------------------------------
1.rworkhours4 <- |
rpoorhealth | -1.277058 .0828982 -15.41 0.000 -1.439536 -1.114581
rmarried | -.0611107 .1034527 -0.59 0.555 -.2638743 .141653
rtotalpar | .7099914 .0435582 16.30 0.000 .6246189 .7953638
rsiblog | .1661102 .077469 2.14 0.032 .0142738 .3179466
hchildlg | -.096628 .0827913 -1.17 0.243 -.2588961 .06564
rallparhelptw | -.0141175 .0064669 -2.18 0.029 -.0267924 -.0014426
female | -.1735534 .0970918 -1.79 0.074 -.3638498 .016743
raedyrs | .1371333 .0171756 7.98 0.000 .1034697 .1707969
age | -.0867121 .0155403 -5.58 0.000 -.1171705 -.0562537
minority | -.0228635 .116838 -0.20 0.845 -.2518618 .2061348
|
M1[hhidpn] | 1 (constrained)
|
_cons | .9628375 .9322679 1.03 0.302 -.864374 2.790049
-----------------+----------------------------------------------------------------
2.rworkhours4 <- |
rpoorhealth | -1.750017 .0704374 -24.85 0.000 -1.888072 -1.611963
rmarried | -.4516769 .0962687 -4.69 0.000 -.6403599 -.2629938
rtotalpar | 1.442584 .0412238 34.99 0.000 1.361786 1.523381
rsiblog | .1445202 .0763168 1.89 0.058 -.0050579 .2940982
hchildlg | -.3232379 .080663 -4.01 0.000 -.4813346 -.1651413
rallparhelptw | -.0809335 .0061454 -13.17 0.000 -.0929784 -.0688887
female | -1.543764 .1004039 -15.38 0.000 -1.740552 -1.346976
raedyrs | .130158 .017173 7.58 0.000 .0964996 .1638165
age | -.2034751 .0160609 -12.67 0.000 -.2349538 -.1719963
minority | .1211653 .1181552 1.03 0.305 -.1104146 .3527453
|
M2[hhidpn] | 1 (constrained)
|
_cons | 9.600093 .9520152 10.08 0.000 7.734177 11.46601
-----------------+----------------------------------------------------------------
3.rworkhours4 <- |
rpoorhealth | -1.916659 .1124895 -17.04 0.000 -2.137134 -1.696184
rmarried | -.5334537 .1407739 -3.79 0.000 -.8093655 -.2575419
rtotalpar | 1.748915 .0562372 31.10 0.000 1.638692 1.859137
rsiblog | .2218076 .1072693 2.07 0.039 .0115636 .4320515
hchildlg | -.2066806 .1144466 -1.81 0.071 -.4309919 .0176307
rallparhelptw | -.1073193 .0103503 -10.37 0.000 -.1276055 -.0870331
female | -2.875229 .1459855 -19.70 0.000 -3.161355 -2.589102
raedyrs | .2372999 .0241533 9.82 0.000 .1899602 .2846395
age | -.2478874 .0225544 -10.99 0.000 -.2920931 -.2036816
minority | -.0932506 .166257 -0.56 0.575 -.4191083 .2326072
|
M3[hhidpn] | 1 (constrained)
|
_cons | 8.004163 1.330828 6.01 0.000 5.395788 10.61254
-----------------+----------------------------------------------------------------
var(M1[hhidpn])| 7.277483 .3634233 6.598935 8.025803
var(M2[hhidpn])| 10.11641 .4346091 9.299468 11.00512
var(M3[hhidpn])| 16.62133 .7747086 15.17022 18.21125

17
-----------------+----------------------------------------------------------------
cov(M2[hhidpn],|
M1[hhidpn])| 5.774199 .324637 17.79 0.000 5.137922 6.410476
cov(M3[hhidpn],|
M1[hhidpn])| 6.872237 .41161 16.70 0.000 6.065496 7.678978
cov(M3[hhidpn],|
M2[hhidpn])| 10.77613 .50847 21.19 0.000 9.779549 11.77272
----------------------------------------------------------------------------------

You would need to exponentiate coefficients to get odds ratios. Note that variables predict the
membership in each group as compared to membership in the omitted group (rworkgroup4=0).

Count Data Models

Stata has the capability for panel count data models. Count variables are often treated as though
they are continuous, and regular regression is used, but it can result in inefficient, inconsistent,
and biased estimates. Need to use models that are developed specifically for count data. Poisson
model is the most basic of them.

18
Characteristics of Poisson distribution:
1. E(y) = 
2. The variance equals the mean: Var(y)=E(y)=  -- equidispersion. In practice, the variance is
often larger than : this is called overdispersion. The main reason for overdispersion is
heterogeneity – if there are different groups within data that have different means and all of them
are actually equal to their variances, when you put all of these groups together, the resulting
combination will have variance larger than the mean. Therefore, we need to control for all those
sources of heterogeneity. Thus, when using Poisson regression, we need to ensure that the
conditional variance equals to the mean – that is Var(y|X)=E(y|X).
3. As  increases, the probability of zeros decreases. But for many count variables, there are
more observed zeros than would be predicted from Poisson distribution
4. As  increases, the Poisson distribution approximates normal.
5. The assumption of independence of events – past outcomes don’t affect future outcomes.

Poisson distributions:

Luckily, we can estimate Poisson model or negative binomial with either random or fixed
effects. In addition, both xtpoisson and xtnbreg models for count data also allow controlling for
so-called exposure – that is usually a variable that indicates how long there has been an
opportunity to accumulate counts. For example, if we have a count of missed classes from
students in different schools, but different schools have different number of days in their school
year, then some students have more opportunity to miss classes than others and we need to adjust
for exposure – i.e. rather than examine the total count, we would examine the number of missed
classes per school day.

Let’s examine an example of count data model for panel data using Poisson and briefly discuss
interpretation.

19
Fixed effects:
. xtpoisson rworkhours80 rpoorhealth rmarried rtotalpar rsiblog hchildlg rall
> parhelptw, fe
note: 445 groups (445 obs) dropped because of only one obs per group
note: 1257 groups (5712 obs) dropped because of all zero outcomes

Conditional fixed-effects Poisson regression Number of obs = 24384


Group variable: hhidpn Number of groups = 4541

Obs per group: min = 2


avg = 5.4
max = 9

Wald chi2(6) = 29732.01


Log likelihood = -202220.54 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
rworkhours80 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rpoorhealth | -.3102104 .0052439 -59.16 0.000 -.3204882 -.2999325
rmarried | -.0456695 .0083243 -5.49 0.000 -.0619848 -.0293541
rtotalpar | .3171061 .0022524 140.78 0.000 .3126914 .3215208
rsiblog | .1568954 .0116364 13.48 0.000 .1340884 .1797024
hchildlg | -.1351277 .0105238 -12.84 0.000 -.1557539 -.1145014
rallparhel~w | -.0194958 .000431 -45.24 0.000 -.0203405 -.0186511
------------------------------------------------------------------------------

With regular coefficients, we can interpret sign and significance, but to interpret the size, we
exponentiate the coefficients – these are called incidence rate ratios. They are also multiplicative
coefficients, like odds ratios, and can be interpreted as percent change in the number of events.
. xtpoisson rworkhours80 rpoorhealth rmarried rtotalpar rsiblog hchildlg rall
> parhelptw, fe irr
note: 445 groups (445 obs) dropped because of only one obs per group
note: 1257 groups (5712 obs) dropped because of all zero outcomes

Conditional fixed-effects Poisson regression Number of obs = 24384


Group variable: hhidpn Number of groups = 4541

Obs per group: min = 2


avg = 5.4
max = 9

Wald chi2(6) = 29732.01


Log likelihood = -202220.54 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
rworkhours80 | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rpoorhealth | .7332927 .0038453 -59.16 0.000 .7257946 .7408682
rmarried | .9553577 .0079527 -5.49 0.000 .9398972 .9710725
rtotalpar | 1.373148 .0030929 140.78 0.000 1.3671 1.379224
rsiblog | 1.169873 .0136132 13.48 0.000 1.143494 1.196861
hchildlg | .8736044 .0091936 -12.84 0.000 .8557698 .8918107
rallparhel~w | .980693 .0004227 -45.24 0.000 .979865 .9815217
------------------------------------------------------------------------------

20
Random effects:
. xtpoisson rworkhours80 rpoorhealth rmarried rtotalpar rsiblog hchildlg
rallparhelptw female raedyrs age minority

Random-effects Poisson regression Number of obs = 30541


Group variable: hhidpn Number of groups = 6243

Random effects u_i ~ Gamma Obs per group: min = 1


avg = 4.9
max = 9

Wald chi2(10) = 30050.74


Log likelihood = -235735.15 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
rworkhours80 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rpoorhealth | -.3149189 .0052315 -60.20 0.000 -.3251724 -.3046654
rmarried | -.0494218 .0082059 -6.02 0.000 -.0655051 -.0333385
rtotalpar | .3160069 .0022463 140.68 0.000 .3116043 .3204096
rsiblog | .1383593 .0108824 12.71 0.000 .1170302 .1596884
hchildlg | -.128032 .0101043 -12.67 0.000 -.1478361 -.1082279
rallparhel~w | -.019533 .0004303 -45.40 0.000 -.0203764 -.0186897
female | -.4032303 .0378686 -10.65 0.000 -.4774514 -.3290091
raedyrs | .0396955 .0063164 6.28 0.000 .0273156 .0520754
age | -.0411759 .0062501 -6.59 0.000 -.0534259 -.028926
minority | -.0431882 .0446809 -0.97 0.334 -.1307613 .0443848
_cons | 4.773289 .3646625 13.09 0.000 4.058564 5.488015
-------------+----------------------------------------------------------------
/lnalpha | .7846428 .0184326 .7485156 .82077
-------------+----------------------------------------------------------------
alpha | 2.191624 .0403973 2.11386 2.272249
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 4.0e+05 Prob>=chibar2 = 0.000

With exponentiated coefficients:


. xtpoisson, irr

Random-effects Poisson regression Number of obs = 30541


Group variable: hhidpn Number of groups = 6243

Random effects u_i ~ Gamma Obs per group: min = 1


avg = 4.9
max = 9

Wald chi2(10) = 30050.74


Log likelihood = -235735.15 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
rworkhours80 | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rpoorhealth | .729848 .0038182 -60.20 0.000 .7224028 .73737
rmarried | .9517796 .0078102 -6.02 0.000 .9365943 .9672111
rtotalpar | 1.37164 .0030811 140.68 0.000 1.365614 1.377692
rsiblog | 1.148388 .0124972 12.71 0.000 1.124153 1.173145
hchildlg | .8798252 .00889 -12.67 0.000 .8625725 .897423
rallparhel~w | .9806565 .000422 -45.40 0.000 .9798298 .9814839
female | .6681582 .0253022 -10.65 0.000 .6203624 .7196365
raedyrs | 1.040494 .0065722 6.28 0.000 1.027692 1.053455
age | .9596603 .005998 -6.59 0.000 .9479762 .9714884

21
minority | .9577311 .0427923 -0.97 0.334 .8774272 1.045385
-------------+----------------------------------------------------------------
/lnalpha | .7846428 .0184326 .7485156 .82077
-------------+----------------------------------------------------------------
alpha | 2.191624 .0403973 2.11386 2.272249
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 4.0e+05 Prob>=chibar2 = 0.000

This model assumes that random effects (level 2 error term) is distributed according to log
gamma distribution – that is, exponentiated random effects are distributed as gamma with a mean
of 1 and variance=alpha. There is also a test of alpha=0 – if the null is rejected, then it is
appropriate to use a RE model, and if it is not rejected, then there is no unique unexplained
variance on the level of individual (level 2) and RE is not necessary.

Alternatively, we can assume a normal distribution of random effects:


. xtpoisson rworkhours80 rpoorhealth rmarried rtotalpar rsiblog hchildlg rall
> parhelptw female raedyrs age minority, normal irr

Random-effects Poisson regression Number of obs = 30541


Group variable: hhidpn Number of groups = 6243

Random effects u_i ~ Gaussian Obs per group: min = 1


avg = 4.9
max = 9

Wald chi2(10) = 30744.85


Log likelihood = -237577.42 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
rworkhours80 | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rpoorhealth | .7281977 .0038173 -60.51 0.000 .7207542 .735718
rmarried | .9545593 .0079021 -5.62 0.000 .9391965 .9701734
rtotalpar | 1.372548 .0030879 140.76 0.000 1.366509 1.378614
rsiblog | 1.167491 .0132196 13.68 0.000 1.141867 1.193691
hchildlg | .875792 .0090548 -12.83 0.000 .8582235 .8937202
rallparhel~w | .9806912 .0004224 -45.27 0.000 .9798637 .9815194
female | .3369295 .0198189 -18.49 0.000 .3002407 .3781016
raedyrs | 1.137889 .011243 13.07 0.000 1.116065 1.16014
age | .8849735 .0084538 -12.79 0.000 .8685586 .9016987
minority | .8809341 .061137 -1.83 0.068 .7689001 1.009292
-------------+----------------------------------------------------------------
/lnsig2u | 1.620728 .024484 66.20 0.000 1.57274 1.668716
-------------+----------------------------------------------------------------
sigma_u | 2.248727 .0275289 2.195413 2.303335
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01) = 3.9e+05 Pr>=chibar2 = 0.000

This same model (the one that assumes a normal distribution of random effects) can be estimated
using a mixed effects Poisson command; this syntax also allows for an examination of random
slopes, just like mixed (there is also a corresponding negative binomial regression command,
menbreg):

. mepoisson rworkhours80 rpoorhealth rmarried rtotalpar rsiblog hchildlg ra


> llparhelptw female raedyrs age minority || hhidpn:

Mixed-effects Poisson regression Number of obs = 30541

22
Group variable: hhidpn Number of groups = 6243

Obs per group: min = 1


avg = 4.9
max = 9

Integration points = 7 Wald chi2(10) = 30723.86


Log likelihood = -237592.87 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
rworkhours80 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rpoorhealth | -.317193 .0052424 -60.50 0.000 -.327468 -.3069181
rmarried | -.0465037 .0082793 -5.62 0.000 -.0627308 -.0302767
rtotalpar | .3166688 .0022498 140.75 0.000 .3122593 .3210784
rsiblog | .1548763 .0113293 13.67 0.000 .1326713 .1770813
hchildlg | -.1326281 .010343 -12.82 0.000 -.1529001 -.1123561
rallparhel~w | -.0194976 .0004307 -45.27 0.000 -.0203418 -.0186534
female | -1.088872 .0594299 -18.32 0.000 -1.205353 -.972392
raedyrs | .1293615 .009998 12.94 0.000 .1097657 .1489573
age | -.1223611 .0096613 -12.67 0.000 -.141297 -.1034253
minority | -.1271817 .0701653 -1.81 0.070 -.2647031 .0103396
_cons | 7.385015 .5605793 13.17 0.000 6.2863 8.483731
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
hhidpn: Identity |
sd(_cons) | 2.247979 .0274216 2.194871 2.302372
------------------------------------------------------------------------------
LR test vs. Poisson regression: chibar2(01) = 3.9e+05 Prob>=chibar2 = 0.0000

Overall, some of the same concerns apply here as was the case for logistic regression – for
instance, we have to be cautious when interpreting interactions and examine predicted counts.

In terms of variance, the level 1 residuals variance, which we assumed to be 3.29 in logit-based
models, is assumed to be equal to predicted mean, so you need to find the value of average
predicted count by generating predicted values, exponentiating them, and calculating their mean.
You can use that number as level 1 variance in the formulas for percent of variance explained
discussed above as well as when calculating the intraclass correlation coefficient.

23

You might also like