Modeling Ordinal Categorical Data (Agresti)
Modeling Ordinal Categorical Data (Agresti)
Alan Agresti
Department of Statistics
P (y = j), j = 1, 2, . . . , c,
depends on explanatory variables x, which can be categorical
and/or quantitative.
Outline
These notes are extracted from a two-day short course that I’ve
presented at Padova, Firenze, and Groningen.
4
Focus of tutorial
Model satisfies
P (y ≤ j | x1 )/P (y > j | x1 )
log = β(x1 − x2 )
P (y ≤ j | x2 )/P (y > j | x2 )
for all j (Proportional odds property)
logit[P (y ≤ j)] = αj + βx
Coefficients:
Value Std. Error t value
(Intercept):1 -0.71917 0.15881 -4.5285
(Intercept):2 -0.31860 0.15642 -2.0368
(Intercept):3 0.69165 0.15793 4.3796
(Intercept):4 2.05701 0.17369 11.8429
dose -0.17549 0.05632 -3.1159
Note: propodds() is another possible family for vglm; it defaults to cumulative(reverse = TRUE, link = ”logit”, parallel = TRUE)
9
Coefficients:
Value Std. Error t value
dose 0.1754816 0.05671224 3.094245
Intercepts:
Value Std. Error t value
1|2 -0.7192 0.1589 -4.5256
2|3 -0.3186 0.1569 -2.0308
3|4 0.6917 0.1597 4.3323
4|5 2.0570 0.1751 11.7493
> fitted(fit.clogit)
1 2 3 4 5
1 0.2901467 0.08878330 0.2473217 0.2415357 0.1322126
2 0.2901467 0.08878330 0.2473217 0.2415357 0.1322126
...
20 0.1944866 0.07043618 0.2325084 0.2975294 0.2050394
Note: This uses the model formula logit[P (y ≤ j)] = αj − β′ x based on a latent variable model (p. 18 of these notes),
for which β̂ has opposite sign.
10
data trauma;
input dose outcome count @@;
datalines;
1 1 59 1 2 25 1 3 46 1 4 48 1 5 32
2 1 48 2 2 21 2 3 44 2 4 47 2 5 30
3 1 44 3 2 14 3 3 54 3 4 64 3 5 31
4 1 43 4 2 4 4 3 49 4 4 58 4 5 41
;
proc logistic; freq count; * proportional odds cumulative logit model;
model outcome = dose / aggregate scale=none;
run;
----------------------------------------------------------------------
SOME OUTPUT:
logit[P (y ≤ j)] = αj − β ′ x
generated by latent variable model. For some details about the use of the ologit function, see
www.ats.ucla.edu/stat/stata/output/stata_ologit_output.htm
and
www.stata.com/help.cgi?ologit
----------------------------------------------------------------
. *using grouped count data
.
. infile dose y1 y2 y3 y4 y5 using trauma.txt in 2/5, clear
(eof not at end of obs)
(4 observations read)
. gen groupid=_n
.
. reshape long y, i(groupid)
(note: j = 1 2 3 4 5)
. rename y count
. rename _j y
.
. list
+----------------------------+
| groupid y dose count |
|----------------------------|
1. | 1 1 1 59 |
2. | 1 2 1 25 |
3. | 1 3 1 46 |
4. | 1 4 1 48 |
5. | 1 5 1 32 |
|----------------------------|
12
6. | 2 1 2 48 |
7. | 2 2 2 21 |
8. | 2 3 2 44 |
9. | 2 4 2 47 |
10. | 2 5 2 30 |
|----------------------------|
11. | 3 1 3 44 |
12. | 3 2 3 14 |
13. | 3 3 3 54 |
14. | 3 4 3 64 |
15. | 3 5 3 31 |
|----------------------------|
16. | 4 1 4 43 |
17. | 4 2 4 4 |
18. | 4 3 4 49 |
19. | 4 4 4 58 |
20. | 4 5 4 41 |
+----------------------------+
.
. ologit y dose [fw=count] // counts are used as frequency weights
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dose | .1754861 .0567122 3.09 0.002 .0643322 .28664
-------------+----------------------------------------------------------------
/cut1 | -.7191664 .1589164 -1.030637 -.407696
/cut2 | -.3186011 .1568861 -.6260921 -.0111101
/cut3 | .6916531 .1596505 .378744 1.004562
/cut4 | 2.057009 .1750751 1.713868 2.40015
------------------------------------------------------------------------------
.
13
Goodness-of-fit statistics:
Pearson X 2 = 15.8
deviance G2 = 18.2
(df = 16 − 5 = 11)
P -values = 0.15 and 0.18
Model seems to fit adequately
e(4−1)0.176 = 1.69
14
• Any equally-spaced scores (e.g. 0, 10, 20, 30) for dose provide
same fitted values and same test statistics (different β̂ , SE ).
-------------------------------------------------------------------
> attach(trauma)
> library(VGAM)
> summary(fit2)
Coefficients:
Estimate Std. Error z value
(Intercept):1 -0.91880 0.13204 -6.95875
(Intercept):2 -0.51826 0.12856 -4.03122
(Intercept):3 0.49215 0.12841 3.83255
(Intercept):4 1.85785 0.14527 12.78927
factor(dose)2 -0.11756 0.17843 -0.65885
factor(dose)3 -0.31740 0.17473 -1.81649
factor(dose)4 -0.52077 0.17795 -2.92657
data trauma;
input dose outcome count @@;
datalines;
1 1 59 1 2 25 1 3 46 1 4 48 1 5 32
2 1 48 2 2 21 2 3 44 2 4 47 2 5 30
3 1 44 3 2 14 3 3 54 3 4 64 3 5 31
4 1 43 4 2 4 4 3 49 4 4 58 4 5 41
;
proc logistic; freq count; class dose / param=ref; * treat dose as factor;
model outcome = dose / aggregate scale=none;
run;
----------------------------------------------------------------------
Pc
• At setting i of predictor with ni = j=1 nij multinomial
observations, expected frequency estimates equal
µ̂ij = ni P̂ (y = j), j = 1, . . . , c.
logit[P (y ≤ j)] = αj + β1 x1 + · · · + βk xk
y = j if αj−1 < y ∗ ≤ αj
P (y ≤ j) = P (y ∗ ≤ αj ) = P (y ∗ − β ′ x ≤ αj − β ′ x)
= P (ǫ ≤ αj − β ′ x) = G(αj − β ′ x)
→ Model G−1 [P (y ≤ j | x)] = αj − β ′ x
logit[P (y ≤ j)] = αj − β ′ x.
Then, βj > 0 has usual interpretation of ‘positive’ effect
This model and most others in this tutorial imply that conditional
distributions of y at different settings of explanatory variables are
stochastically ordered ; i.e., the cdf at one setting is always above
or always below the cdf at another level.
21
c
n Y
Y yij
P (Yi = j | xi ) =
i=1 j=1
c
n Y
Y yij
P (Yi ≤ j | xi ) − P (Yi ≤ j − 1 | xi ) =
i=1 j=1
c
n Y ′ ′
yij
Y exp(αj + β xi ) exp(αj−1 + β xi )
−
i=1 j=1
1 + exp(αj + β ′ xi ) 1 + exp(αj−1 + β ′ xi )
22
β̂j −βj0
Wald: z= SE , or z 2 ∼ χ2 poorest method for small n or
extremely large estimates (infinite being a special case)
• Effect estimators using simple model are biased but may have
smaller MSE than estimators from more complex model, and
tests may have greater power, especially when more complex
model has many more parameters.
Coefficients:
Value Std. Error t value
(Intercept):1 -0.864585 0.194230 -4.45133
(Intercept):2 -0.093747 0.178494 -0.52521
(Intercept):3 0.706251 0.175576 4.02248
(Intercept):4 1.908668 0.238380 8.00684
dose:1 -0.112912 0.072881 -1.54926
dose:2 -0.268895 0.068319 -3.93585
dose:3 -0.182341 0.063855 -2.85555
dose:4 -0.119255 0.084702 -1.40793
> 1 - pchisq(deviance(fit)-deviance(fit2),
df=df.residual(fit)-df.residual(fit2))
[1] 0.002487748
The improvement in fit is statistically significant, but perhaps not substantively significant;
effect of dose is moderately negative for each cumulative probability.
27
y = Religious Beliefs
logit[P (y ≤ j)] = αj + β1 r1 + β2 r2 + β3 r3
GENMOD output:
Analysis Of Parameter Estimates
Likelihood Ratio
Standard 95% Confidence Chi-
Parameter DF Estimate Error Limits Square
Intercept1 1 -1.2618 0.0632 -1.3863 -1.1383 398.10
Intercept2 1 0.4729 0.0603 0.3548 0.5910 61.56
region 1 1 -0.0698 0.0901 -0.2466 0.1068 0.60
region 2 1 0.2688 0.0830 0.1061 0.4316 10.48
region 3 1 0.8897 0.0758 0.7414 1.0385 137.89
region 4 0 0.0000 0.0000 0.0000 0.0000 .
29
R for religion and region data, using vglm() for cumulative logit
modeling with and without proportional odds structure
> religion <- read.table("religion_region.dat",header=TRUE)
> religion
region y1 y2 y3
1 1 92 352 234
2 2 274 399 326
3 3 739 536 412
4 4 192 423 355
> r1 < -ifelse(region==1,1,0); r2 <-ifelse(region==2,1,0); r3 <-ifelse(region==3,1,0)
> cbind(r1,r2,r3)
r1 r2 r3
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
[4,] 0 0 0
> library(VGAM)
> fit.po <- vglm(cbind(y1,y2,y3) ˜ r1+r2+r3,
family=cumulative(parallel=TRUE),data=religion)
> summary(fit.po)
Coefficients:
Value Std. Error t value
(Intercept):1 -1.261818 0.064033 -19.70584
(Intercept):2 0.472851 0.061096 7.73948
r1 -0.069842 0.093035 -0.75071
r2 0.268777 0.083536 3.21750
r3 0.889677 0.075704 11.75211
Residual Deviance: 98.0238 on 3 degrees of freedom
Log-likelihood: -77.1583 on 3 degrees of freedom
> fit.npo <- vglm(cbind(y1,y2,y3) ˜ r1+r2+r3, family=cumulative,religion)
> summary(fit.npo)
Coefficients:
Value Std. Error t value
(Intercept):1 -1.399231 0.080583 -17.36377
(Intercept):2 0.549504 0.066655 8.24398
r1:1 -0.452300 0.138093 -3.27532
r1:2 0.090999 0.104731 0.86888
r2:1 0.426188 0.107343 3.97032
r2:2 0.175343 0.094849 1.84866
r3:1 1.150175 0.094349 12.19065
r3:2 0.580174 0.087490 6.63135
Residual Deviance: -5.1681e-13 on 0 degrees of freedom
Log-likelihood: -28.1464 on 0 degrees of freedom
> 1 - pchisq(deviance(fit.po)-deviance(fit.npo),
df=df.residual(fit.po)-df.residual(fit.npo))
[1] 4.134028e-21
30
Stata for modeling religion and region data, for cumulative logit
modeling with and without proportional odds
------------------------------------------------------------------------------
. infile region y1 y2 y3 using region.txt in 2/5, clear
(eof not at end of obs)
(4 observations read)
. list
+--------------------------+
| region y1 y2 y3 |
|--------------------------|
1. | 1 92 352 234 |
2. | 2 274 399 326 |
3. | 3 739 536 412 |
4. | 4 192 423 355 |
+--------------------------+
. gen groupid=_n
. reshape long y, i(groupid)
(note: j = 1 2 3)
. rename y count
. rename _j y
. list
+------------------------------+
| groupid y region count |
|------------------------------|
1. | 1 1 1 92 |
2. | 1 2 1 352 |
3. | 1 3 1 234 |
4. | 2 1 2 274 |
5. | 2 2 2 399 |
|------------------------------|
6. | 2 3 2 326 |
7. | 3 1 3 739 |
8. | 3 2 3 536 |
31
9. | 3 3 3 412 |
10. | 4 1 4 192 |
|------------------------------|
11. | 4 2 4 423 |
12. | 4 3 4 355 |
+------------------------------+
.
. tab region, gen(reg) // create dummy indicators for region
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
reg1 | .0698393 .0901259 0.77 0.438 -.1068042 .2464828
reg2 | -.2687773 .0830439 -3.24 0.001 -.4315402 -.1060143
reg3 | -.8896776 .0757644 -11.74 0.000 -1.038173 -.741182
-------------+----------------------------------------------------------------
_cut1 | -1.261818 .0632411 (Ancillary parameters)
_cut2 | .4728514 .0602666
------------------------------------------------------------------------------
Approximate likelihood-ratio test of proportionality of odds
across response categories:
chi2(3) = 98.78
Prob > chi2 = 0.0000
32
(x | y = j) ∼ N (µj , Σ)
then
P (y = j | x) ′
log = αj + β j x
P (y = j + 1 | x)
with
β j = Σ−1 (µj − µj+1 )
Since for j
< k,
πj πj πj+1 πk−1
log = log +log +· · ·+log ,
πk πj+1 πj+2 πk
πj (x)
ACL model log πj+1 (x) = αj + β ′ x
log(πj /πj+1 ) = αj + β1 x + β2 g
Coefficients:
Value Std. Error t value
(Intercept):1 -0.95090 0.142589 -6.66880
(Intercept):2 0.55734 0.145084 3.84147
(Intercept):3 -0.10656 0.164748 -0.64680
religion 0.26681 0.047866 5.57410
gender -0.01412 0.076706 -0.18408
> fitted(fit.adj)
y1 y2 y3 y4
1 0.2177773 0.4316255 0.1893146 0.16128261
2 0.2138134 0.4297953 0.1911925 0.16519872
3 0.2975956 0.4516958 0.1517219 0.09898673
4 0.2931825 0.4513256 0.1537533 0.10173853
5 0.3830297 0.4452227 0.1145262 0.05722143
6 0.3784551 0.4461609 0.1163995 0.05898444
39
SAS: Can fit with PROC NLMIXED, which permits specifying the
log-likelihood to be maximized, here ll statement and expressing
model as baseline-category logit model.
data stemcell;
input religion gender y1 y2 y3 y4;
datalines;
1 0 21 52 24 15
1 1 34 67 30 25
2 0 30 52 18 11
2 1 41 83 23 14
3 0 64 50 16 11
3 1 58 63 15 12
;
/* Adjacent-categories logit model with proportional odds */
proc nlmixed data=stemcell;
eta1 = alpha1 + alpha2 + alpha3 + 3*beta1*religion + 3*beta2*gender;
eta2 = alpha2 + alpha3 + 2*beta1*religion + 2*beta2*gender;
eta3 = alpha3 + beta1*religion + beta2*gender;
p4 = 1 / (1 + exp(eta1) + exp(eta2) + exp(eta3));
p1 = exp(eta1)*p4;
p2 = exp(eta2)*p4;
p3 = exp(eta3)*p4;
ll = y1*log(p1) + y2*log(p2) + y3*log(p3) + y4*log(p4);
model y1 ˜ general(ll);
run;
----------------------------------------------------------------------------
Parameter Estimates
Standard
Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper
alpha1 -0.9509 0.1426 6 -6.67 0.0006 0.05 -1.2998 -0.6020
alpha2 0.5573 0.1451 6 3.84 0.0085 0.05 0.2023 0.9123
alpha3 -0.1066 0.1648 6 -0.65 0.5417 0.05 -0.5097 0.2966
beta1 0.2668 0.04787 6 5.57 0.0014 0.05 0.1497 0.3839
beta2 -0.01412 0.07671 6 -0.18 0.8600 0.05 -0.2018 0.1736
40
• Can fit general ACL model with software for BCL model,
∗
converting its {β̂j∗ } estimates to β̂j = β̂j∗ − β̂j+1 , since
πj πj πj+1
log = log − log ,
πj+1 πc πc
or using specialized software such as vglm function in R
without “PARALLEL = TRUE” option.
42
Example: Data on stemcell research that had been fitted with ACL
model of proportional odds form
y1 y2 y3 y4
1 0.1875000 0.4642857 0.2142857 0.13392857
2 0.2179487 0.4294872 0.1923077 0.16025641
3 0.2702703 0.4684685 0.1621622 0.09909910
4 0.2546584 0.5155280 0.1428571 0.08695652
5 0.4539007 0.3546099 0.1134752 0.07801418
6 0.3918919 0.4256757 0.1013514 0.08108108
Call:
vglm(formula = cbind(y1, y2, y3, y4) ˜ religion + gender,
family = acat(reverse = TRUE, parallel = FALSE), data = stemcell)
Coefficients:
(Intercept):1 (Intercept):2 (Intercept):3 religion:1 religion:2
-1.24835878 0.47098433 0.42740812 0.43819661 0.25962043
religion:3 gender:1 gender:2 gender:3
0.01192302 -0.13683357 0.18706754 -0.16093003
Then
πj
log = log[ωj /(1 − ωj )],
πj+1 + · · · + πc
bin[1, yi1 ; ω1 (xi )] · · · bin[1 − yi1 − · · · − yi,c−2 , yi,c−1 ; ωc−1 (xi )].
44
Tonsil Size
has β̂
= −0.528 (SE = 0.196). Model estimates an assumed
common value exp(−0.528) = 0.59 for cumulative odds ratio
from first part of model and for local odds ratio from second part.
e.g., given that tonsils were enlarged, for carriers, estimated odds
of response enlarged rather than greatly enlarged were 0.59 times
estimated odds for non-carriers.
Coefficients:
Value Std. Error t value
(Intercept):1 0.51102 0.056141 9.1025
(Intercept):2 -0.73218 0.072864 -10.0486
carrier 0.52846 0.197747 2.6724
> fitted(fit.cratio)
y1 y2 y3
1 0.2612503 0.4068696 0.3318801
2 0.3749547 0.4220828 0.2029625
. list
+---------------------------+
| carrier y1 y2 y3 |
|---------------------------|
1. | 1 19 29 24 |
2. | 0 497 560 269 |
+---------------------------+
. gen groupid=_n
.
. reshape long y, i(groupid)
(note: j = 1 2 3)
. rename y count
. rename _j y
.
. list
+-------------------------------+
| groupid y carrier count |
|-------------------------------|
1. | 1 1 1 19 |
2. | 1 2 1 29 |
3. | 1 3 1 24 |
4. | 2 1 0 497 |
5. | 2 2 0 560 |
|-------------------------------|
6. | 2 3 0 269 |
+-------------------------------+
48
| y
carrier | 1 2 3 | Total
-----------+---------------------------------+----------
0 | 497 560 269 | 1,326
| 37.48 42.23 20.29 | 100.00
-----------+---------------------------------+----------
1 | 19 29 24 | 72
| 26.39 40.28 33.33 | 100.00
-----------+---------------------------------+----------
Total | 516 589 293 | 1,398
| 36.91 42.13 20.96 | 100.00
.
. ologit y carrier [fw=count] // ordered logit model
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
carrier | .6026492 .2274158 2.65 0.008 .1569224 1.048376
-------------+----------------------------------------------------------------
/cut1 | -.5085091 .0563953 -.6190418 -.3979763
/cut2 | 1.36272 .0673406 1.230735 1.494705
------------------------------------------------------------------------------
.
. ocratio y carrier [fw=count] // continuation ratio model
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
carrier | .5284613 .197904 2.67 0.008 .1405766 .916346
------------------------------------------------------------------------------
_cut1 | -.5110188 .0561416 (Ancillary parameters)
_cut2 | .7321801 .0728583
------------------------------------------------------------------------------
49
Φ−1 [P (y ≤ j)] = αj + β ′ x, j = 1, . . . , c − 1
Properties
Religious Beliefs
Less than high school 4913 (43%) 4684 (41%) 1905 (17%)
> summary(fit.cprobit)
Call:
vglm(formula = cbind(y1, y2, y3) ˜ degree, family = cumulative(link = probit,
parallel = TRUE), data=fundamentalism)
Coefficients:
Value Std. Error t value
(Intercept):1 -0.22398 0.0079908 -28.030
(Intercept):2 0.94001 0.0086768 108.336
degree -0.20594 0.0044727 -46.044
Coefficients:
(Intercept):1 (Intercept):2 degree
-0.3520540 1.5498053 -0.3446603
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 1 -0.2240 0.00799 785.6659 <.0001
Intercept 2 1 0.9400 0.00868 11736.5822 <.0001
degree 1 -0.2059 0.00447 2120.0908 <.0001
-----------------------------------------------------------------------
Criterion Value DF Value/DF Pr > ChiSq
Deviance 5.1606 4 1.2902 0.2712
Pearson 5.1616 4 1.2904 0.2711
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 1 -1.0169 0.0210 2355.5732 <.0001
Intercept 2 1 0.1478 0.0206 51.3520 <.0001
degree 0 1 0.8298 0.0231 1289.2450 <.0001
degree 1 1 0.5599 0.0217 666.9138 <.0001
degree 2 1 0.4639 0.0303 234.1537 <.0001
degree 3 1 0.1695 0.0247 47.0787 <.0001
54
. gen groupid=_n
.
. reshape long y, i(groupid)
(note: j = 1 2 3)
. rename y count
. rename _j y
.
. list
+------------------------------+
| groupid y degree count |
|------------------------------|
1. | 1 1 0 4913 |
2. | 1 2 0 4684 |
3. | 1 3 0 1905 |
4. | 2 1 1 8189 |
5. | 2 2 1 11196 |
|------------------------------|
6. | 2 3 1 6045 |
55
7. | 3 1 2 728 |
8. | 3 2 2 1072 |
9. | 3 3 2 679 |
10. | 4 1 3 1304 |
|------------------------------|
11. | 4 2 3 2800 |
12. | 4 3 3 2468 |
13. | 5 1 4 495 |
14. | 5 2 4 1193 |
15. | 5 3 4 1369 |
+------------------------------+
.
. ologit y degree [fw=count] // ordered logit
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
degree | .3446603 .0075309 45.77 0.000 .3299 .3594205
-------------+----------------------------------------------------------------
/cut1 | -.352054 .0130676 -.3776659 -.3264421
/cut2 | 1.549805 .0149954 1.520415 1.579196
------------------------------------------------------------------------------
.
. oprobit y degree [fw=count] // ordered probit
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
degree | .2059429 .0044745 46.03 0.000 .1971731 .2147128
-------------+----------------------------------------------------------------
/cut1 | -.2239807 .0079956 -.2396517 -.2083096
/cut2 | .9400106 .0086789 .9230003 .957021
------------------------------------------------------------------------------
56
Goodness of fit?
Males Females
> library(VGAM)
> fit.cloglog <- vglm(cbind(y1,y2,y3,y4,y5) ˜ gender+race,
family=cumulative(link=cloglog, parallel=TRUE),data=life)
> summary(fit.cloglog)
Call:
vglm(formula = cbind(y1, y2, y3, y4, y5) ˜ gender + race,
family = cumulative(link = cloglog, parallel = TRUE), data = life)
Coefficients:
Value Std. Error t value
(Intercept):1 -4.21274 0.133834 -31.4773
(Intercept):2 -3.19223 0.091148 -35.0225
(Intercept):3 -2.58210 0.076360 -33.8147
(Intercept):4 -1.52163 0.062317 -24.4176
gender -0.53827 0.070332 -7.6533
race 0.61071 0.070898 8.6139
60
data lifetab;
input sex $ race $ age count;
datalines;
m w 20 13
f w 20 9
m b 20 26
f b 20 18
...
m w 100 805
f w 100 879
m b 100 668
f b 100 792
;
proc logistic; freq count; class sex race / param=ref;
model age = sex race / link=cloglog aggregate scale=none;
run;
proc genmod; freq count; class sex race;
model age = sex race / dist=multinomial link=CCLL lrci type3 obstats;
run;
-----------------------------------------------------------------------------
The GENMOD Procedure
.
. gen groupid=_n
.
. reshape long y, i(groupid)
(note: j = 1 2 3 4 5)
. rename y percent
. rename _j y
.
. gen count=percent*10
.
. tab gender y if race==0 [fw=count], row
| y
gender | 1 2 3 4 5 | Total
-----------+-------------------------------------------------------+----------
0 | 13 28 32 122 805 | 1,000
| 1.30 2.80 3.20 12.20 80.50 | 100.00
-----------+-------------------------------------------------------+----------
1 | 9 13 19 80 879 | 1,000
| 0.90 1.30 1.90 8.00 87.90 | 100.00
-----------+-------------------------------------------------------+----------
Total | 22 41 51 202 1,684 | 2,000
| 1.10 2.05 2.55 10.10 84.20 | 100.00
+----------------+
62
| Key |
|----------------|
| frequency |
| row percentage |
+----------------+
| y
gender | 1 2 3 4 5 | Total
-----------+-------------------------------------------------------+----------
0 | 26 49 56 201 668 | 1,000
| 2.60 4.90 5.60 20.10 66.80 | 100.00
-----------+-------------------------------------------------------+----------
1 | 18 24 37 129 792 | 1,000
| 1.80 2.40 3.70 12.90 79.20 | 100.00
-----------+-------------------------------------------------------+----------
Total | 44 73 93 330 1,460 | 2,000
| 2.20 3.65 4.65 16.50 73.00 | 100.00
.
. *cumulative complementary log log model
. ocratio y gender race [fw=count], link(cloglog) cumulative
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gender | .5382685 .0703365 7.65 0.000 .4004115 .6761254
race | -.6107102 .0708956 -8.61 0.000 -.749663 -.4717574
------------------------------------------------------------------------------
_cut1 | -4.21274 .1338366 (Ancillary parameters)
_cut2 | -3.19223 .1228357
_cut3 | -2.582102 .1161383
_cut4 | -1.521633 .0906746
------------------------------------------------------------------------------
63
Given race, proportion of men living longer than a fixed time equals
proportion for women raised to exp(0.538) = 1.71 power.
If Ω denotes odds of living longer than some fixed time for white
women, then estimated odds of living longer than that time are
R (and S-Plus)
Stata
SPSS
Some Books
Clogg and Shihadeh (1994). Statistical Models for Ordinal Variables, Sage.
Greene, W. H., and D. A. Hensher. 2010. Modeling Ordered Choices. Cambridge U. Press.
Agresti, A. 1999. Modelling ordered categorical data: Recent advances and future challenges. Statist. Medic. 18: 2191–2207.
Agresti, A., and R. Natarajan. 2001. Modeling clustered ordered categorical data: A survey. Intern. Statist. Rev., 69: 345-371.
Liu, I., and A. Agresti. 2005. The analysis of ordered categorical data: An overview and a survey of recent developments (with
discussion). Test 14: 1–73.
McCullagh, P. 1980. Regression models for ordinal data. 42: J. Royal. Stat. Society, B, 109–142.