DEPARTEMEN AKTUARIA
ANALISIS SURVIVAL : EVALUATING THE COX MODEL
ASSUMPTIONS
2
Review: Cox PH Model
The Cox (PH) model
h ( t , X ) = h0 ( t ) exp ( βT X )
Assumptions of this model:
1. the regression effect β is constant over time (PH
assumption)
2. linear combination of the covariates (including
possibly higher order terms, interactions)
3. the link function is exponential
3
Residuals (1/2)
The PH assumption in (1) has received most attention in
both research and application.
In order to check these model assumptions, we often
make use of residuals.
Residuals for survival data are somewhat different than
for other types of models, mainly due to the censoring.
4
Residuals (1/2)
What are the residuals for the Cox model?
a. generalized (Cox-Snell) residual (to Identify extreme
observations that need additional investigation)
b. Schoenfeld residual plus Grambsch and Therneau test
(to check the proportional hazards assumption)
c. Martingale residual (to assess nonlinearity)
Some residuals, in particular the martingale residuals,
can be used in more sophisticated (and more powerful)
ways
5
Graphical Techniques (1/2)
The most popular of these involves comparing
estimated –ln(–ln) survival curves over different
(combinations of) categories of variables being
investigated.
–ln(–ln) S curves parallel?
6
Problem (1/3)
Problems with log–log survival curve approach: How
parallel is parallel?
Recommend:
subjective decision
conservative strategy: assume PH is OK unless
strong evidence of nonparallelism
many categories data “thins out”
different categorizations may give different graphical
pictures
small # of categories (2 or 3)
meaningful choice
reasonable balance
7
Problem (2/3)
How to evaluate several variables simultaneously?
Strategy:
categorize variables separately
form combinations of categories
compare log–log curves on same graph
Drawback:
data “thins out”
difficult to identify variables responsible for
nonparallelism
Alternative Strategy: Adjust for predictors already
satisfying PH assumption, i.e., use adjusted log–log
survival curves
8
Problem (3/3)
Alternative Strategy: Adjust for predictors
already satisfying PH assumption, i.e., use
adjusted log–log survival curves.
9
Data Example
> library(survival)
> data(ovarian) → ovarian cancer
The data consists of the following variables:
futime: survival time (in days) after diagnosis of the
cancer
fustat: 0 = censored, 1 = dead.
age: age in years
residual.dz: a measure of health condition after
chemotherapy.
rx: 1 = treatment A, 2 = treatment B
ecog.ps: measure of functioning of the ovaries
10
Cox PH Model (1/2)
Hypothesis for overall test.
H0: β1 = β2 = β3 = β4 = 0
H1: at least one of these β’s are nonzero
We derive from coxph output (LRT) that p-
value=0.001896 lower than α=0.05 (the three of
statistics tests give the same result). It means that
H0 is rejected, or at least one of these β’s are
nonzero.
11
Cox PH Model (2/2)
12
-ln(–ln) Survival Curves (1/2)
log #1:
( ( ))
ln S (t , X ) = exp ∑i =1 β i X i × ln S 0 (t )
p
0 ≤ S (t , X ) ≤ 1
log #2:
ln[− ln S (t , X )] = ∑i =1 β i X i + ln[− ln S 0 (t )]
p
or
− ln[− ln S (t , X )] = −∑i =1 β i X i − ln[− ln S 0 (t )]
p
13
–ln(–ln) Survival Curves (2/2)
Figure 1. KM survival curve vs -ln(-ln) survival curve
14
Log Rank Test
Hypothesis:
H0: S1(t) = S2(t) for all t
H1: S1(t) ≠ S2(t) for at least one t
15
Graphical Techniques (2/2)
Other graphical techniques that also commonly used
are Cox-Snell residual, and martingale residual.
16
Cox-Snell Residual (1/3)
The ith Cox-Snell residual is defined as
( )
rCi = Hˆ 0 (ti ) × exp xTi βˆ
where Hˆ 0 (ti ) and β̂ are the MLE’s of the baseline
cumulative hazard function and coefficient vector,
respectively.
The Cox-Snell residuals are most useful for
examining the overall fit of a model.
This plot is generally used only as a rough
diagnostic.
17
Cox-Snell Residual (2/3)
Figure 2. Cox-Snell residual of ovarian data
18
Cox-Snell Residual (3/3)
The final model gives a reasonable fit to the data.
Overall the residuals fall on a straight line with an
intercept zero and a slope one.
Further, there are no large departures from the
straight line and no large variation at the right-hand
tail.
19
Martingale Residual (1/3)
The ith martingale residual is defined as
Mˆ i = δ i − rCi
The M̂ i take value in (−∞,1] and are always
negative for censored observations.
Used to check the linearity assumption of the
covariate.
It is common practice in many medical studies to
discretize continuous covariate. The martingale
residual are useful for determining possible cut
points for such variables.
20
Martingale Residual (2/3)
Figure 3. Martingale residual of ovarian data
21
Martingale Residual (3/3)
In the plot of the Martingale residuals, Figure 3,
there appears to be just little bit bump for age,
between 52 and 65. Moreover, the lines before
and after the bump nearly coincide. Therefore, a
linear form seems appropriate for age.
22
The GOF Testing Approach (1/2)
Grambsch and Therneau’s test for PH assumption
Variation of test of Schoenfeld
Uses Schoenfeld residuals
Statistical test appealing
Provides p-value
More objective decision than when using graphical
approach
Schoenfeld residuals defined for
Each predictor in model
Every subject who has event
23
The GOF Testing Approach (2/2)
p-value large ) PH satisfied
(e.g. P > 0.10)
p-value small ) PH not satisfied
(e.g. P < 0.05)
24
GOF (Grambsch and Therneau’s test)
25
Schoenfeld Residuals
Figure 4. Schoenfeld residual of ovarian data
26
Backward Method
27
Cox PH (Best Model)
28
Test PH (Best Model)
Figure 5. Schoenfeld residual of best model