Agresti Ordinal Tutorial
Agresti Ordinal Tutorial
Alan Agresti
P (y = j), j = 1, 2, . . . , c,
depends on explanatory variables x (categorical and/or
quantitative).
The models treat observations on y at fixed x as multinomial.
Outline
Section 1: Logistic Regression Models Using Cumulative Logits
(Proportional odds and extensions)
Section 2: Other Ordinal Response Models
(adjacent-categories and continuation-ratio logits, stereotype
model, cumulative probit, log-log links, count data responses)
Focus of tutorial
Survey of approaches to modeling ordinal categorical responses
Emphasis on concepts, examples of use, complicating issues,
rather than theory, derivations, or technical details
Examples of how to conduct methods using SAS, but output
provided to enhance interpretation of methods, not to teach
SAS. For R (and S-Plus) and Stata, we list functions and give
references for details in Section 3; e.g., detailed tutorial by
Laura Thompson shows how to use R for nearly all models in
this tutorial (link at www.stat.ufl.edu/aa/cda/software.html).
But first, .... , why not just assign scores to the ordered categories
and use ordinary regression methods?
mean may fall below lowest score or above highest score and
fitting fails.
For binary response, this approach simplifies to linear
probability model, P (y
= 1) = + 0 x, (i.e., response
= 100, suppose
When x
= 0 (and interaction is
80
o
o z=0
1 z=1
oo
o
o
o
o
o o
o
oo
o
o
oo
o
o
o o
1
o
oo o
oo
o
o
o
o1
o
o o
o
o
o
oo
o
o
1
o
o
o
o
1
1
1 1
11 11 1
1
11 1
1
o oo
1
o
1
1
1
1
1
1
1
1
1 1
1
1
11
1
1
1
1
1
1
1
111 1 1
1
1
1
1
11
1
o oo oo
1 1o
o oo
oooo ooo
o oo o
1o
o111111 11 1o1111 11 1 1
1 11 1 1 1 1 11
20
20
y*
40
60
20
40
60
x
80
100
20
40
60
x
80
100
Notation:
j|i = P (y = j | x = i), j = 1, 2, . . . , c,
or (1 , 2 , . . . , c ) when suppress explanatory variables
=
=
=
p1|2 /p2|2
p12 p21
n12 n21
For r
ij =
ni,j+1 ni+1,j
Global odds ratios
P
P
P
n
)(
ai
bj ab
a>i
b>j nab )
G
P
P
P
ij = P
( ai b>j nab )( a>i bj nab )
(
P
n
)(
Fj|i /(1 Fj|i )
bj ib
b>j ni+1,b )
C
P
ij = P
=
( b>j nib )( bj ni+1,b )
Fj|i+1 /(1 Fj|i+1 )
(
10
P (x = i, y = j)P (x = i + 1, y = j + 1)
P (x = i, y = j + 1)P (x = i + 1, y = j)
P (y j | x = i)/P (y > j | x = i)
P (y j | x = i + 1)/P (y > j | x = i + 1)
CO
ij
P (y = j | x = i)/P (y > j | x = i)
=
P (y = j | x = i + 1)/P (y > j | x = i + 1)
11
= (C D)/(C + D)
12
logit[P (y
13
Model satisfies
log
P (y j | x1 )/P (y > j | x1 )
= (x1 x2 )
P (y j | x2 )/P (y > j | x2 )
14
j)] = j + 1 x1 + + k xk
exp(
j + xi )
P (yi j) =
0
1 + exp(
j + xi )
15
) with = (x) = 0 x
thresholds (cutpoints) = 0 < 1 < . . . < c = such
cdf G(y
that
y = j if j1 < y j
Ex. earlier in notes, p. 6. Then (Figure 3.4, p. 54 of OrdCDA)
P (y j | x) = P (y j | x) = G(j 0 x)
Model G1 [P (y j | x)] = j 0 x
Get cumulative logit model when G = logistic cdf
(G1 = logit).
So, cumulative logit model fits well when regression model holds for
underlying logistic response.
Note: Model often expressed as
logit[P (y
Then, j
j)] = j 0 x.
(Software may use either. Same fit, estimates except for sign)
16
j)] = j 0 xi
i=1
c
n Y
Y
i=1
j=1
j=1
P (Yi j | xi ) P (Yi j 1 | xi )
j=1
n Y
c
Y
i=1
P (Yi = j | xi )
yij ff
yij ff
0
exp(j + xi )
exp(j1 + xi )
1 + exp(j + 0 xi )
1 + exp(j1 + 0 xi )
yij ff
17
j j0
SE , or
18
Pc
j=1
nij multinomial
ij = ni P (y = j), j = 1, . . . , c.
Pearson test statistic is
2
X (nij
)
ij
.
X2 =
ij
i,j
G2 = 2
X
i,j
nij log
nij
.
ij
19
Veget.
Major
Minor
Good
Death
State
Disab.
Disab.
Recov.
Placebo
59
25
46
48
32
Low dose
48
21
44
47
30
Med dose
44
14
54
64
31
High dose
43
49
58
41
Group (x)
logit[P (y
has ML estimate
j)] = j + x
Likelihood-ratio test of H0
= 0.
20
Parameter
Intercept
Intercept
Intercept
Intercept
dose
1
2
3
4
Pr > ChiSq
<.0001
0.0417
<.0001
<.0001
0.0018
21
3, P = 0.02).
Using ordinality often increases power (focused on df = 1).
22
y = mental impairment
(1=well, 2=mild impairment, 3=moderate impairment, 4=impaired)
Mental SES
Well
1
Well
1
Well
1
Well
1
Well
0
Well
1
Well
0
Well
1
Well
1
Well
1
Well
0
Well
0
Mild
1
Mild
0
Mild
1
Mild
0
Mild
1
Mild
1
Mild
0
Mild
1
Life
1
9
4
3
2
0
1
3
3
7
1
2
5
6
3
1
8
2
5
5
Subj
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Mental
Mild
Mild
Mild
Mild
Moderate
Moderate
Moderate
Moderate
Moderate
Moderate
Moderate
Impaired
Impaired
Impaired
Impaired
Impaired
Impaired
Impaired
Impaired
Impaired
SES
1
0
1
1
0
1
0
0
1
0
0
1
1
1
0
0
0
1
0
0
Life
9
3
3
1
0
4
3
9
6
4
3
8
2
7
5
4
4
8
8
9
23
data impair;
input mental ses life;
datalines;
1 1 1
1 1 9
1 1 4
...
4 0 8
4 0 9
;
proc logistic;
model mental = life ses ;
run;
proc genmod;
model mental = life ses / dist=multinomial link=clogit lrci type3;
run;
OUTPUT FROM PROC GENMOD
Analysis Of Parameter Estimates
Parameter
Intercept1
Intercept2
Intercept3
life
ses
DF
1
1
1
1
1
Estimate
-0.2819
1.2128
2.2094
-0.3189
1.1112
Standard
Error
0.6423
0.6607
0.7210
0.1210
0.6109
Likelihood Ratio
95% Confidence
Limits
-1.5615
0.9839
-0.0507
2.5656
0.8590
3.7123
-0.5718
-0.0920
-0.0641
2.3471
ChiSquare
0.19
3.37
9.39
6.95
3.31
e.g., 95% likelihood-ratio confidence interval for the cumulative odds ratio for SES is (e0.064 ,
the odds of mental impairment below any particular point could be as much as 10.45 times as high for those with high SES compared
to those with low SES, for a given level of life events
24
(y
ex.: At mean life events of 4.3, P
(y
and P
(y
For high SES, P
25
rij =
(
(nij
ij )
or
SE
Pj
k=1
nik ) (
SE
Pj
ik )
k=1
26
27
P (nij ij )2
X =
ij
with
= 0 against Ha :
6= 0 in cumulative logit model, logit[P (y j)] = j + xi ,
with scores {xi = i} for rows (i.e., using orderings, df = 1)?
rc
Pearson X 2
Ordinal test
23
0.23
0.31
0.16
0.42
0.12
0.48
0.09
0.51
44
66
1010
28
n = 12(z/2 + z )
/[02 (1
j3 )],
Setting {j
29
j)] = j + j xi , j = 1, . . . , c 1.
logit[P (yi
Effect estimators using simple model are biased but may have
smaller MSE than estimators from more complex model, and
30
y = Religious Beliefs
x = Region
Fundamentalist
Moderate
Liberal
Northeast
92 (14%)
352 (52%)
234 (34%)
Midwest
274 (27%)
399 (40%)
326 (33%)
South
739 (44%)
536 (32%)
412 (24%)
West/Mountain
192 (20%)
423 (44%)
355 (37%)
j)] = j + 1 r1 + 2 r2 + 3 r3
31
Parameter
Intercept1
Intercept2
region
region
region
region
1
2
3
4
DF
1
1
1
1
1
0
Estimate
-1.2618
0.4729
-0.0698
0.2688
0.8897
0.0000
Standard
Error
0.0632
0.0603
0.0901
0.0830
0.0758
0.0000
Likelihood Ratio
95% Confidence
Limits
-1.3863
-1.1383
0.3548
0.5910
-0.2466
0.1068
0.1061
0.4316
0.7414
1.0385
0.0000
0.0000
ChiSquare
398.10
61.56
0.60
10.48
137.89
.
32
= 0)
1
be fundamentalist (see
1
less likely to be liberal (see
j)] = j + xi + j ui , j = 1, . . . , c 1.
j 0 x
.
logit[P (y j)] =
exp( 0 x)
33
status
Smoker
350 (23%)
307 (20%)
345 (22%)
481 (31%)
67 (4%)
Non-smoker
334 (45%)
99 (13%)
117 (16%)
159 (22%)
30 (4%)
j)] = j + 1 x + (j 1)2 x.
C = + ,
log 12
1
2
C = + 2 ,
log 13
1
2
C = + 3 .
log 14
1
2
C = 0.72,
log
12
C = 0.42,
log
13
C = 0.12.
log
14
34
L() = X
for probabilities or expected frequencies in a contingency
table, where L is a general link function (Lang 2005).
35
P (yi = j) =
exp(j + xi )
, j = 1, . . . , c1
Pc1
0
1 + k=1 exp(k + xi )
36
with
0
P (y = j | x)
log
= j + j x
P (y = j + 1 | x)
j = 1 (j j+1 )
Equally-spaced means imply ACL model holds with same
effects for each logit.
37
log
2
c1
1
, log
, . . . , log
.
c
c
c
Since
log
j
c
= log
j
+log
j+1
j+1
+ +log
j+2
ACL model
log
j (x)
= j + 0 x
j+1 (x)
j (x)
log
c (x)
c1
X
k=j
k + 0 (c j)x
= j + 0 uj
with adjusted predictor uj
= (c j)x.
c1
,
c
38
Definitely
Probably
Probably
Gender
Beliefs
Fund
Fund
Not Fund
Not Fund
Female
Fundamentalist
34 (22%)
67 (43%)
30 (19%)
25 (16%)
Moderate
41 (25%)
83 (52%)
23 (14%)
14 (9%)
Liberal
58 (39%)
63 (43%)
15 (10%)
12 (8%)
Fundamentalist
21 (19%)
52 (46%)
24 (21%)
15 (13%)
Moderate
30 (27%)
52 (47%)
18 (16%)
11 (10%)
Liberal
64 (45%)
50 (36%)
16 (11%)
11 (8%)
Male
Definitely
log(j /j+1 ) = j + 1 x + 2 g
is equivalent to BCL model
39
40
= 0.488, SE = 0.080),
1 /SE)
significance similar, with (
41
table with ordered rows and columns, ACL model for row scores u i ,
= E(nij )},
log ij = + xi + yj + ui vj ,
with {vj
ij (indep)]
log ij (indep)]
(Goodman 1986)
42
= P (y = j | y j) =
j
j ++c
Then
log
j
j+1 + + c
= j + x, j = 1, . . . , c 1,
0
43
Not enlarged
Enlarged
Greatly Enlarged
Yes
19 (26%)
29 (40%)
24 (33%)
No
497 (37%)
560 (42%)
269 (20%)
log
= 1):
1
2
= 1 + x, log
= 2 + x
2 + 3
3
has
e.g., given that tonsils were enlarged, for carriers, estimated odds
of response enlarged rather than greatly enlarged were 0.59 times
estimated odds for non-carriers.
By contrast, cumulative logit model estimates
44
Parameter
Intercept
stratum
stratum
carrier
1
2
DF
1
1
0
1
Estimate
0.7322
-1.2432
0.0000
-0.5285
Standard
Error
0.0729
0.0907
0.0000
0.1979
Likelihood Ratio
95% Confidence
Limits
0.5905
0.8762
-1.4220
-1.0662
0.0000
0.0000
-0.9218
-0.1444
ChiSquare
100.99
187.69
.
7.13
45
Can fit general ACL model with software for BCL model,
} estimates to j = , since
converting its {
j+1
j
j
j
j
j+1
log
= log
log
.
j+1
c
c
46
0
j
log
= j + j x, j = 1, . . . , c 1.
c
47
Stereotype model:
0
j
log
= j + j x, j = 1, . . . , c 1.
c
exp(j + j xi )
P (yi = j) =
Pc1
0
1 + k=1 exp(k + k xi )
48
1 [P (y j)] = j + 0 x, j = 1, . . . , c 1
j) = 1/2 when j + 0 x = 0
since (0) = 1/2 = P (standard normal r.v. < 0)
e.g., P (y
49
Properties
50
Fundamentalist
Moderate
Liberal
4913 (43%)
4684 (41%)
1905 (17%)
High school
8189 (32%)
11196 (44%)
6045 (24%)
Junior college
728 (29%)
1072 (43%)
679 (27%)
Bachelor
1304 (20%)
2800 (43%)
2464 (38%)
Graduate
495 (16%)
1193 (39%)
1369 (45%)
j)] = j + xi
51
----------------------------------------------------------------------data religion;
input degree religion count;
datalines;
0 1 4913
0 2 4684
0 3 1905
1 1 8189
1 2 11196
1 3 6045
...
4 3 1369
;
proc logistic; weight count;
model religion = degree / link=probit aggregate scale=none;
---------------------------------------------------------------------Score Test for the Equal Slopes Assumption
Chi-Square
DF
Pr > ChiSq
0.2452
1
0.6205
Deviance and Pearson Goodness-of-Fit Statistics
Criterion
Value
DF
Value/DF
Pr > ChiSq
Deviance
48.7072
7
6.9582
<.0001
Pearson
48.9704
7
6.9958
<.0001
Model Fit Statistics
Intercept
Intercept
and
Criterion
Only
Covariates
AIC
105532.77
103395.09
SC
105534.19
103397.21
-2 Log L
105528.77
103389.09
Standard
Wald
Parameter
DF
Estimate
Error
Chi-Square
Pr > ChiSq
Intercept 1
1
-0.2240
0.00799
785.6659
<.0001
Intercept 2
1
0.9400
0.00868
11736.5822
<.0001
degree
1
-0.2059
0.00447
2120.0908
<.0001
-----------------------------------------------------------------------
52
Goodness of fit?
Cumulative probit: Deviance = 48.7 (df
Cumulative logit: Deviance = 45.4 (df
= 7)
= 7)
1
Probit
= 4)
53
54
Males
Females
Life Length
White
Black
White
Black
0-20
1.3
2.6
0.9
1.8
20-40
2.8
4.9
1.3
2.4
40-50
3.2
5.6
1.9
3.7
50-65
12.2
20.1
8.0
12.9
Over 65
80.5
66.8
87.9
79.2
55
data lifetab;
input sex $ race $ age count;
datalines;
m w 20 13
f w 20
9
m b 20 26
f b 20 18
m w 40 28
f w 40 13
m b 40 49
f b 40 24
m w 50 32
f w 50 19
m b 50 56
f b 50 37
m w 65 122
f w 65 80
m b 65 201
f b 65 129
m w 100 805
f w 100 879
m b 100 668
f b 100 792
;
proc logistic; weight count; class sex race / param=ref;
model age = sex race / link=cloglog aggregate scale=none;
run;
proc genmod; weight count; class sex race;
model age = sex race / dist=multinomial link=CCLL lrci type3 obstats;
run;
----------------------------------------------------------------------------The GENMOD Procedure
Parameter
Intercept1
Intercept2
Intercept3
Intercept4
sex
sex
race
race
f
m
b
w
ChiSquare
991.04
1226.85
1143.60
596.43
58.57
.
74.20
.
56
1 = 0.538, 2 = 0.611
Gender effect:
= 1.71 power.
= 1.84 power.
If denotes odds of living longer than some fixed time for white
women, then estimated odds of living longer than that time are
57
model.
Models (b) and (c) require separate parameters for the effects
of explanatory variables on the mixture probability and in the
loglinear model.
58
The first category is the zero outcome and each other count
outcome is a separate category
59
312
30
11
278
39
20
Total
590
69
31
60
effects ui
61
62
63
Not enlarged
Enlarged
Greatly Enlarged
Yes
19 (26%)
29 (40%)
24 (33%)
No
497 (37%)
560 (42%)
269 (20%)
log
1
2
= 1 + x, log
= 2 + x
2 + 3
3
the logit has the same prior variability for each logit.
64
Prior Distribution
Mean
Std Dev
Normal (
= 1000)
0.533
0.199
(0.926, 0.146)
Normal (
= 1.0)
0.518
0.194
(0.902, 0.141)
0.5285
0.198
(0.922, 0.144)
ML
Note: HPD credible interval ok for model parameters, but not sensible for
odds ratios e .
65
Summary
methods to allow for the ties that occur with ordinal data.
66
SAS
67
R (and S-Plus)
68
www2.warwick.ac.uk/fac/sci/statistics/staff/research/turner/gnm/
69
Stata
70
Modelling Agreement
Latent trait models e.g., Uebersax and Grove (1993), Yang and Becker
(1997)
Loglinear and association models Agresti (1988), Becker (1989, 1990),
71
Becker and Agresti (1992), Schuster and von Eye (2001), Valet,
Guinot, and Mary (2007)
Random effects Williamson and Manatunga (1997)
ROC and related methods Toledano and Gatsonis (1996), Ishwaran
and Gatsonis (2000)
Measures of agreement Banerjee et al. (1999), Roberts and McNamee
(2005)
Multivariate Models
Marginal models Heagerty and Zeger (1996), Lipsitz, Kim, and Zhao
(1994), Molenberghs and Lesaffre (1994, 1999), Lang, McDonald, and
Smith (1999), Lumley (1996), Stram, Wei, and Ware (1988), Ten Have,
Landis, and Hartzel (1998)
Random effects models Ezzet and Whitehead (1991), Hartzel, Agresti,
and Caffo (2001), Hedeker and Gibbons (1994), Tutz and Hennevogl
(1996), Liu and Hedeker (2006), Ten Have et al. (2000), Xie, Simpson,
and Carroll (2000)
Multilevel models Fielding and Lang (2005), Gibbons and Hedeker
(1997), Grilli and Rampichini (2003, 2007), Steele and Goldstein
(2006), Zaslavsky and Bradlow (1999)
72
Brunner (2003), Brunner and Puri (2001, 2002), Rayner and Best
(2001)
Nonparametric random effects Hartzel, Agresti, and Caffo (2001)
Bayesian Inference
Modelling an ordinal response Lang (1999), Johnson and Albert (1999),
Hoff (2009, Ch. 12)
Multivariate ordinal responses, hierarchical models Albert and Chib
(1993, 2001), Bradlow and Zaslavsky (1999), Cowles et al. (1996),
Kaciroti et al. (2006), Qu and Tan (1998)
Association models Iliopoulos, Kateri, and Ntzoufras (2007), Kateri,
Nicolaou, and Ntzoufras (2005)
Case-control analyses with an ordinal response Mukherjee et al. (2007,
2008), Mukherjee and Liu (2008)
Small-Sample Inference
Exact tests of independence and conditional independence for ordinal
variables Agresti, Mehta and Patel (1990 and linear-by-linear option
in StatXact), Kim and Agresti (1997), Agresti and Coull (1998)
Improved tests from a decision-theoretic perspective Cohen and
Sackrowitz (1991), Berger and Sackrowitz (1997)
Higher-order approximations such as the saddlepoint essentially exact for
single-parameter inference Pierce and Peters (1992), Agresti, Lang,
73
Goodness of Fit
Chi-squared statistics inappropriate with continuous predictors or highly
sparse data
Generalization of Hosmer-Lemeshow statistic Lipsitz, Fitzmaurice,
Molenberghs (1996)
For cumulative logit models, can test proportional odds assumption
Brant (1990), Peterson and Harrell (1990)
Goodness-of-link testing Genter and Farewell (1985)
Missing Data
Accounting for drop out Molenberghs, Kenward, and Lesaffre (1997),
Ten Have et al. (2000)
Score test of independence in two-way tables with ordered categories
and extensions for stratified data Lipsitz and Fitzmaurice (1996)
Comparison of likelihood-based and GEE methods for repeated ordinal
responses Mark and Gail (1994), Kenward, Lesaffre, and
Molenberghs (1994)
Order-Restricted Inference
Estimate cell proportions (and conduct tests) assuming solely that a type
of ordinal log odds ratio is uniformly nonnegative
74
Local odds ratios Patefield (1982), Dykstra et al. (1995), Agresti and
Coull (1998)
Cumulative odds ratios Grove (1980), Robertson and Wright (1981),
Cohen and Sackrowitz (1996), Evans et al. (1997)
Order restrictions on parameters in association models Agresti, Chuang
and Kezouh (1987), Galindo-Garre and Vermunt (2004), Iliopoulos,
Kateri, and Ntzoufras (2007), Ritov and Gilula (1991, 1993)
Marginal modeling Bartolucci et al. (2001), Bartolucci and Forcina
(2002), Colombi and Forcina (2001)
75
Some Books
Agresti, A. 2010. Analysis of Ordinal Categorical Data, Wiley, 2nd ed.
Clogg and Shihadeh (1994). Statistical Models for Ordinal Variables, Sage.
Fahrmeir, L., and G. Tutz. 2001. Multivariate Statist. Modelling based on Generalized Linear Models, 2nd ed. Springer-Verlag.
Johnson, V. E., and J. H. Albert 1999. Ordinal Data Modeling. Springer.
McCullagh, P., and J. A. Nelder. 1983, 2nd edn. 1989. Generalized Linear Models. London: Chapman and Hall.