Conditional Logistic Regression Models
Conditional Logistic Regression Models
8LC8LSSlCn
8arbara McknlghL
8lCS1/Ll 336
uecember 1, 2009
Cu1LlnL
1wo klnds of maLchlng
uaLa where Lhere ls blas ln ordlnary loglsuc
regresslon
8elauonshlp beLween maLchlng and Lhe need for
condluonal loglsuc regresslon
Pow CL8 ls dlerenL from ordlnary L8
8elauonshlp of CL8 Lo maLched analysls"
AdvanLages of CL8
Lxample
Lmclency of CL8
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
423
CCnul1lCnAL LCClS1lC 8LC8LSSlCn
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
424
LCClS1lC 8LC8LSSlCn Wl1P A CCnul1lCnAL LlkLLlPCCu
lor maLched or nely sLraued daLa where Lhere are many
nulsance" parameLers.
Condluonal Loglsuc 8egresslon should noL be LhoughL of as a
maLched analysls".
noL all maLched daLa requlre condluonal loglsuc regresslon
Some daLa LhaL are noL maLched requlre condluonal loglsuc
regresslon
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
423
CCnul1lCnAL LCClS1lC 8LC8LSSlCn
WP??
1wo klnds of maLchlng:
1. MaLchlng on Lhe values of a few, measured varlables
Lxample: Age, counLy of resldence
noLhlng lnherenLly slmllar abouL cases and conLrols ln Lhe
same maLched seL.
Several maLched seLs mlghL have ldenucal values of Lhe
maLchlng varlables. 1hese slmllar maLched seLs should be
pooled ln Lhe analysls.
Ad[usLmenL could ln some cases be made uslng conunuous
varlables raLher Lhan sLraucauon.
CCnul1lCnAL LCClS1lC 8LC8LSSlCn
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
426
(MaLchlng on a few, measured varlables, conunued...)
logit( p) = ! +!
2
x
s2
+!+!
m
x
sm
+ "
1
x
1
+!+ "
k
x
k
m = # confounder ad[usLmenL varlables
(maLchlng sLraLum lndlcaLors or conunuous
covarlaLe ad[usLmenLs or some of each)
m small relauve Lo overall sample slze n.
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
427
CCnul1lCnAL LCClS1lC 8LC8LSSlCn
2. MaLchlng based on unmeasurable slmllarlues:
Lxamples: Lwln conLrols, slbllng conLrols, frlend conLrols, same medlcal
pracuce conLrols.
- Case(s) and conLrol(s) ln each maLched seL are lnherenLly slmllar
- Lach maLched seL ls dlerenL, Lhey cannoL be pooled
- 1he only way Lo ad[usL for Lhe maLchlng facLors ls by sLraucauon on
maLched seL.
m noL small relauve Lo overall sample slze n
logit( p) = ! +!
2
x
s2
+!+!
m
x
sm
+ "
1
x
1
+!+ "
k
x
k
m = # maLched seLs, each wlLh, say,
M conLrols and 1 case.
! "#$%&'( #"
!
%&'
!
n!"
"
CLher posslblllues:
1. MaLchlng based on many measured varlables
2. MaLched or unmaLched daLa where sLraucauon ad[usLmenL ls necessary
for many varlables.
LCClS1lC 8LC8LSSlCn (uncondluonal) breaks down when Lhe number of
nulsance parameLers ls large.
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
428
! !
n"#
!
when Lhe number of nulsance
parameLers grows blg wlLh n.
for maLched palrs:
dlchoLomous exposure
(M = 1)
! "
n"#
2! and OR
!
"
n"#
OR
2
!"#$%&!
&'(" ) *+',$%-&("-$
!
"
#
#"&,
CCnul1lCnAL LCClS1lC 8LC8LSSlCn
8lAS
BIOST/EPI 536 Autumn 2009
B. McKnight
429
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
430
v v
v v
Can obLaln unblased esumaLes of coemclenLs of lnLeresL uslng condluonal
loglsuc regresslon.
(ulerenL esumauon procedure, maxlmlzes condluonal llkellhood.)
Many nulsance
parameLers
lew nulsance
arameLers
MaLched
uaLa
unmaLched
uaLa
ordlnary loglsuc regresslon
esumaLes blased
ordlnary loglsuc
regresslon esumaLes Ck
*
CCnul1lCnAL LCClS1lC 8LC8LSSlCn
*
L8lC8MlnC CCnul1lCnAL LCClS1lC
8LC8LSSlCn
MusL dlvlde Lhe daLa lnLo small sLraLa based on comblnauons
of Lhe values of Lhe nulsance varlables. egen group
CreaLe a varlable Lo lndlcaLe Lhe sLraLum Lo whlch each
observauon belongs.
use cloglL command ln S1A1A Lo L Lhe model:
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
431
logit( p) = ! +!
2
x
s2
+!+!
m
x
sm
+ "
1
x
1
+!+ "
k
x
k
lndlcaLors for many small sLraLa.
noL of lnLeresL per se.
varlables of lnLeresL and
any addluonal ad[usLmenL
varlables
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
432
L8lC8MlnC CCnul1lCnAL LCClS1lC
8LC8LSSlCn:
8AC1lCAL ulllL8LnCLS
Speclfy Lhe sLraLum varlable separaLely from Lhe covarlaLes ln S1A1A
!
k
"
n"#
!
k
!"#$%"&'$"()# (+
!
se
!
(!)
"
n"#
,-./01
!"#$%"&'$"()# (+ [2log L(+'22 3(452
"
)] $[2log L(%54'654 3(452
"
)] "
n"#
%
K$ p
2
-')45% H
0
: %54'654 3(4521
785 )'"#9)65 :9%935$5%# &
1
,&
2
,,&
m
9%5 )($ 5#$"39$54/ &'$ 9%5 ") $85 3(452;
ulC8LSSlCn
(PLu8lS1lC LxLAnA1lCn)
Conslder Lhe rsL sLraLum. Say lL has one case and Lwo conLrols.
Say Lhere ls a slngle, dlchoLomous exposure varlable.
Label Lhe sub[ecLs wlLh l = 1, 2 or 3.
LeL
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
433
!
!
"#
!
$ #
&'
()*+,-& #( &', -.(, # / $0 10 2
3 45&
!
"
#
$
#
!
6#
!
$ #
&'
()*+,-& #( ,!75(,8 # / $0 10 2
3 45&
!
"
#
$
#
* noL on Lhe nal
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
434
ulC8LSSlCn
(conunued)
lf Lhe daLa conslsLed only of Lhls sLraLum, we would maxlmlze
Lhe llkellhood (uncondluonal) Lo L
loglL p = o + x
L
reLend daLa are as follows:
sLraLum-speclc lnLercepL
l x
ul
x
Ll
1 1 1
2 0 1
3 0 0
1he (uncondluonal) llkellhood Lakes:
1he (uncondluonal) MLL's are Lhe values of o and LhaL maxlmlze Lhls
funcuon.
noLe LhaL Lhe probablllLy above depends on:
Lhe underlylng dlsease rlsk (case probablllLy) ln Lhe sLraLum (o)
Lhe assoclauon beLween dlsease lncldence and exposure ()
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
433
ulC8LSSlCn
(conunued)
!"#$
%&
' &) $
%*
' +) $
%,
' +- $
.&
' &) $
.*
' &) $
.,
' +) !) ") /00 12 34"/456 &7
829 :1;<3 13 /3 / =52>41?2 ?= 4@; A/"/6;4;"3B
=CD!) "-$
%&
' &) $
%*
' +) $
%,
' +) $
.&
' &) $
.*
' &) $
.,
' +) 34"/456 &E
1haL ls, lL equals
1he condluonal llkellhood Lakes:
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
436
ulC8LSSlCn
(conunued)
"##$%&"'&$( )*'+**(
,&#- "(. */0$#1,*
! " # $ #
does noL depend on underlylng
dlsease rlsk (case probablllLy) (o) ln SLraLum 1
!"
(
#$%&'() +,-
./0-%0-. 0123-&'
0'"%'14 5 06-&/7/&
"/08
6"+2 +7 &%0-
! " ### $ ###
%,.
/' /0 '9- 7/"0'
0123-&' :9+
/0 ./0-%0-.
%00+&/%'/+,
2-':--, "/08
%,. -$6+01"-
! " ## $ ##
;!
"<
=<= !
">
=<= !
"?
= @= != "= 0'"%'14 <
)
!"#$
%&
' &) $
%*
' +) $
%,
' +- $
.&
' &) $
.*
' &) $
.,
' +) !) ") /0"1023 &) 4
%&
5 4
%*
5 4
%,
' &6
=!"
70 7/ 089 :7"/0 /2;<9=0
>8? 7/ @7/91/9@
$
.&
' &) $
.*
' &) $
.,
' +) !) ") /0"1023 &)
941=0AB ?C9
@7/91/9@ /2;<9=0
#
$
%
%
&
'
(
(
and vlews lL as a funcuon of Lhe parameLers:
Maxlmlzlng L
c
( | ~) wlLh respecL Lo glves Lhe MCLL of .
Slnce L
c
( | ~) does noL depend on o, maxlmlzlng lL cannoL
provlde an esumaLe of o.
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
437
ulC8LSSlCn
(conunued)
!
!
"!# $
%&
( &) $
%*
( +) $
%,
( +) $
-&
( &) $
-*
( +) $
-,
( +) ./01/23 &4
Lnu Cl ulC8LSSlCn
lrom Lhe dlgresslon, can glean followlng facLs !:
Condluonal Loglsuc 8egresslon provldes esumaLes of coemclenLs of any
covarlaLe x LhaL varles wlLhln aL leasL one sLraLum
Lxamples:
exposure
confounders LhaL vary wlLhln one or more sLraLa
lnLeracuons of above 2 wlLh sLraLum deLermlnanLs" (maLchlng varlables)
Condluonal Loglsuc 8egresslon cannoL provlde esumaLes for Lhe coemclenL
of any varlable LhaL does noL vary wlLhln aL leasL one sLraLum (dlscordanL palrs)
Lxamples:
sLraLum deLermlnanLs" (maLchlng varlables)
lnLeracuons beLween sLraLum deLermlnanLs"
! nC1L: 1hls ls Lrue for any model LhaL conLalns dummy varlable sLraucauon
ad[usLmenL (condluonal or uncondluonal)
! on Lhe nal
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
438
439
lMC81An1 Cln1:
Lven Lhough Lhey don'L appear on our compuLer ouLpuL ln condluonal
loglsuc regresslon, Lhe sLraLum lndlcaLors are sull
ln Lhe model":
!"#$%&!' " ! +!
(
"
#(
+!+!
$
"
#$
)$* %,- ."/-!0) 12% *"% -3%$.4%-/
" # $$$ % $$$
# "
5
"
5
+!+ "
%
"
%
)$* %,- ."/-!) 4*/ -3%$.4%-/
" # $$ % $$
1hls means:
We lmpose no consLralnLs on how Lhe underlylng loglL p's vary from
sLraLum Lo sLraLum when we perform a CL8
lncludlng lnLeracuons beLween some of x
1
......x
k
and sLraLum
deLermlnanLs does noL vlolaLe rules abouL how maln eecLs need Lo be
ln models LhaL conLaln Lhelr lnLeracuons.
8LLA1lCnSPl Cl CCnul1lCnAL LCClS1lC 8LC8LSSlCn 1C
'MA1CPLu AnAL?SlS'
lor a slngle dlchoLomous exposure
lC8 MA1CPLu Al8S
see 8 & u, Ch's 3 and 7 for example
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
440
!
"
#
$ &!'()&*
+ ,(-
!
"
#
!"# %&' (
)
*
+ ,!-' )
. /01/
!
"
#
$
#
(
2
*
+ ,!-' 2
. /01/
!
"
#
$
#
3 (
4
*
+ ,!-' 4
. /01/
!
"
#
$
#
5&#/06 0&7-8 , * %
+
9 %
)
(
)
9 %
2
(
2
939 %
4
(
4
9 & (
:
!
!
!
= "#
!
Mcnemar's LesL Condluonal Llkellhood
score LesL of P
0
: = 0
M-P maLched
palr C8 esumaLe
(8auo of
dlscordanL palrs)
Maxlmum Condluonal
Llkellhood LsumaLe
(MCLL)
8LLA1lCnSPl Cl CCnul1lCnAL LCClS1lC 8LC8LSSlCn 1C
'MA1CPLu AnAL?SlS'
lC8 M:1 MA1CPlnC
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
441
!"# %&' (
)
*
+ ,!-./0# 10- )
2 0310
!
"
#
$
#
4 (
,
*
+ ,!-./0# 10- ,
2 0310
!
"
#
$
#
5$ 3&78-9:; * %
+
< %
)
(
)
< %
=
(
=
<4< %
,
(
,
< & (
>
usual 'maLched' x
2
LesL
for M:1 maLched daLa
(Lq (8.23 llelss, Lq (3.19) 8&u)
condluonal llkellhood
score LesL of P
0
: = 0
M-P M:1 maLchlng
C8 esumaLe
(Lq (8.26) llelss, Lq (3.18) 8&u)
Maxlmum Condluonal
Llkellhood LsumaLe
(MCLL)
!
!
!
= "#
!
MC8L CLnL8ALL?
8lCS1/Ll 336 AuLumn 2009
8. McknlghL
442
Condluonal Loglsuc 8egresslon exLends capablllues wlLh maLched daLa
because:
lL accounLs easlly for maLched seLs wlLh dlerlng numbers of
cases and conLrols. (mlsslng daLa)
?ou can ad[usL for confounders even lf you dldn'L maLch on
Lhem (or maLched loosely).
?ou can look aL lnLeracuons beLween any of Lhe maLchlng
varlables and varlables llke exposure level LhaL vary wlLhln sLraLa.
see Lxample
LxAMLL
Lelsure World sLudy of endomeLrlal cancer and menopausal
esLrogens, wlLh posslble lnLeracuons due Lo hyperLenslon and
gall bladder dlsease. Mack eL al. (1976) #$%& '()* 1262-7
LxLenslvely analyzed ln 8reslow and uay.
63 cases.
lour conLrols maLched Lo each case
Llvlng ln Lhe communlLy aL ume of dlagnosls of case.
?ear of blrLh wlLhln one year of case's.
Same marlLal sLaLus
Slmllar ume of enLry lnLo Lhe communlLy
BIOST/EPI 536 Autumn 2009
B. McKnight
443
BIOST/EPI 536 Autumn 2009
B. McKnight
444
. *** Use conditional logistic regression to fit a model to the matched data
. *** from the Leisure World study of endometrial cancer
. ***
. xi: clogit case i.ob gall age, group(set) or
i.ob _Iob_0-9 (naturally coded; _Iob_0 omitted)
Iteration 0: log likelihood = -92.845539
Iteration 1: log likelihood = -92.335412
Iteration 2: log likelihood = -92.335003
Iteration 3: log likelihood = -92.335003
Conditional (fixed-effects) logistic regression Number of obs = 315
LR chi2(4) = 18.12
Prob > chi2 = 0.0012
Log likelihood = -92.335003 Pseudo R2 = 0.0893
------------------------------------------------------------------------------
case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iob_1 | 1.949185 .7137611 1.82 0.068 .9509467 3.995303
_Iob_9 | .8076838 .472402 -0.37 0.715 .2566767 2.541536
gall | 3.548028 1.345318 3.34 0.001 1.687463 7.46002
age | .8072255 .191249 -0.90 0.366 .5073726 1.284289
------------------------------------------------------------------------------
BIOST/EPI 536 Autumn 2009
B. McKnight
445
. xi: clogit case i.ob gall age, group(set) or robust
i.ob _Iob_0-9 (naturally coded; _Iob_0 omitted)
Iteration 0: log pseudolikelihood = -92.845539
Iteration 1: log pseudolikelihood = -92.335412
Iteration 2: log pseudolikelihood = -92.335003
Iteration 3: log pseudolikelihood = -92.335003
Conditional (fixed-effects) logistic regression Number of obs = 315
Wald chi2(4) = 16.75
Prob > chi2 = 0.0022
Log pseudolikelihood = -92.335003 Pseudo R2 = 0.0893
(Std. Err. adjusted for clustering on set)
------------------------------------------------------------------------------
| Robust
case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iob_1 | 1.949185 .6797683 1.91 0.056 .9840127 3.861048
_Iob_9 | .8076838 .4576747 -0.38 0.706 .2660157 2.452311
gall | 3.548028 1.308153 3.43 0.001 1.722465 7.308425
age | .8072255 .158935 -1.09 0.277 .548784 1.187376
------------------------------------------------------------------------------
. *** If the data were matched on age, how can we fit a model with
. *** an additional age term?
BIOST/EPI 536 Autumn 2009
B. McKnight
446
. *** What happens if we use the age group variable instead?
. ***
. xi: clogit case i.ob gall est ageg, group(set) or robust
i.ob _Iob_0-9 (naturally coded; _Iob_0 omitted)
note: ageg omitted because of no within-group variance.
Iteration 0: log pseudolikelihood = -77.341092
Iteration 1: log pseudolikelihood = -77.190946
Iteration 2: log pseudolikelihood = -77.190763
Iteration 3: log pseudolikelihood = -77.190763
Conditional (fixed-effects) logistic regression Number of obs = 315
Wald chi2(4) = 37.20
Prob > chi2 = 0.0000
Log pseudolikelihood = -77.190763 Pseudo R2 = 0.2387
(Std. Err. adjusted for clustering on set)
------------------------------------------------------------------------------
| Robust
case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iob_1 | 1.941858 .7032707 1.83 0.067 .9548675 3.949043
_Iob_9 | 2.143222 1.437656 1.14 0.256 .5755568 7.980794
gall | 3.709897 1.702867 2.86 0.004 1.50888 9.121557
est | 9.07388 4.271298 4.69 0.000 3.606713 22.82835
------------------------------------------------------------------------------
BIOST/EPI 536 Autumn 2009
B. McKnight
447
. *** Now try an interaction term with the age group variable:
. ***
. gen agegest = ageg*est
. xi:clogit case i.ob gall est age agegest, group(set) or robust
i.ob _Iob_0-9 (naturally coded; _Iob_0 omitted)
Iteration 0: log pseudolikelihood = -77.404147
Iteration 1: log pseudolikelihood = -76.111727
Iteration 2: log pseudolikelihood = -76.104973
Iteration 3: log pseudolikelihood = -76.10497
Conditional (fixed-effects) logistic regression Number of obs = 315
Wald chi2(6) = 35.13
Prob > chi2 = 0.0000
Log pseudolikelihood = -76.10497 Pseudo R2 = 0.2494
(Std. Err. adjusted for clustering on set)
------------------------------------------------------------------------------
| Robust
case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iob_1 | 2.08832 .7271266 2.11 0.034 1.055406 4.132136
_Iob_9 | 2.555483 1.814062 1.32 0.186 .6356678 10.27344
gall | 3.71187 1.702688 2.86 0.004 1.510548 9.121183
est | 11.04112 8.301101 3.19 0.001 2.529587 48.19215
age | .6945569 .1497555 -1.69 0.091 .4551742 1.059834
agegest | .863018 .8057113 -0.16 0.875 .1384651 5.378972
------------------------------------------------------------------------------
. *** Why can we fit an interaction term but not the main effect of ageg?
BIOST/EPI 536 Autumn 2009
B. McKnight
448
. *** Compare to ordinary logistic regression, ignoring matching
. xi: logistic case i.ob gall est age ageg agegest, robust
i.ob _Iob_0-9 (naturally coded; _Iob_0 omitted)
Logistic regression Number of obs = 315
Wald chi2(7) = 43.56
Prob > chi2 = 0.0000
Log pseudolikelihood = -133.88861 Pseudo R2 = 0.1506
------------------------------------------------------------------------------
| Robust
case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iob_1 | 1.693862 .5952596 1.50 0.134 .8506436 3.37294
_Iob_9 | 2.107707 1.273524 1.23 0.217 .6449038 6.888514
gall | 3.162227 1.330689 2.74 0.006 1.386121 7.214148
est | 10.4021 7.996219 3.05 0.002 2.305644 46.92993
age | .9924756 .0424825 -0.18 0.860 .9126086 1.079332
ageg | 1.666909 1.558169 0.55 0.585 .2668322 10.41324
agegest | .7735706 .7078048 -0.28 0.779 .1287254 4.648745
------------------------------------------------------------------------------
LlllClLnC?
BIOST/EPI 536 Autumn 2009
B. McKnight
449