Biostatistics 203. Survival Analysis: Yhchan
Biostatistics 203. Survival Analysis: Yhchan
Biostatistics 203. Survival Analysis: Yhchan
Biostatistics 203.
Survival analysis
Y H Chan
Table I. Summary of the common univariate/multivariate analysis, let’ us consider a simple example on the survival
biostatistical techniques to analyse quantitative and
times (in months) for 25 lung cancer patients who all
qualitative data types.
died; the timings are : 1, 5, 6, 6, 9, 10, 10, 10, 12, 12, 12,
Quantitative data(1) Qualitative data(2)
12, 12, 13, 15, 16, 20, 24, 24, 27, 32, 34, 36, 36, 44 months.
Normality/homogeneity of Independent Matched Performing a simple descriptive, we have n = 25,
variance assumptions satisfied? sample case-control mean (sd) = 17.52 (11.48) months and median =
YES NO 12 months.
Parametric Non-parametric
tests tests Fig. 1 The distribution of the survival times.
1 Sample T Sign test 10
Paired T Wilcoxon Chi Square/ McNemar Mean = 17.52
Signed Rank Fisher Exact test 8 Std. dev. = 11.482
N = 25
Frequency
2 Sample T Wilcoxon 6
Rank Sum/
Mann Whitney U 4
Multivariate tests
0
0 10 20 30 40
Multiple linear regression(3) Logistic Conditional
Time (in months)
regression(4) logistic
regression It is obvious that the distribution is not normal
(Fig. 1) as expected from survival-time data.
In this article, we shall discuss the use of survival Kaplan Meier is the usual technique performed to
analysis on a quantitative type of data corresponding analyse survival-time data. Table II shows the Kaplan
to the time from a well-defined time origin until the Meier analysis for the above 25 subjects (all died of
occurrence of some particular event of interest or lung cancer):
end-point.
Table II. Kaplan Meier analysis (no censoring).
Medical examples are: Kaplan Meier technique (All subjects died)
• Duration – time from randomisation to relapse
Survival time Standard error 95% CI
• Pressure sore – time to development
• Survival – time from randomisation until death Mean 17.52 2.30 13.02, 22.02
Median 12.00 1.25 9.55, 14.45
Clinical Trials and Non-medical examples are:
Epidemiology
Research Unit
• Banking – time from making a loan to full- What do we observe? The Kaplan Meier results of
226 Outram Road repayment Table II is exactly the same to that of the descriptive
Blk B #02-02
Singapore 169039 • Economy – time from graduation to get 1st job results above. So why do we need to do a survival
Y H Chan, PhD • Social – time from being single to getting analysis? To quote a Chinese saying, we have used
Head of Biostatistics married “a bull knife to kill a chicken”: an “overkill in
Correspondence to: analysis”! The reason here is: since all the subjects
Dr Y H Chan
Tel: (65) 6325 7070 Since survival time is a quantitative variable, why died (presumably of lung cancer), we have no extra
Fax: (65) 6324 2700
Email: chanyh@
can’t we just use the usual techniques from Table I? information to require us to perform a survival analysis
cteru.com.sg Before we explain the main reason why we use survival – no censored data.
Singapore Med J 2004 Vol 45(6) : 250
What are censored observations? Censored Put the variables “time” and “status” at their
observations arise in cases for which appropriate options, click on ‘Define Event’ button
• the critical event has not yet occurred to get Template II.
• lost to follow-up
• other interventions offered Template II. Defining the event.
• event occurred but unrelated cause
This means that with the factoring in of the “extra” Table V shows the mean/median survival times
information, we are being “realistic” about the survival for the control and active groups with log-rank test
time of, in this case, lung cancer or being “fair” to the p = 0.1835 – no differences between the active and
treatment under study with the intent of extending the control on having a shorter time to event, with the
survival time of these subjects. Fig. 2 shows the survival survival plot given in Fig. 3. One common misconception
plots for both censored and no-censored scenarios. of survival analysis is that some researchers interpret
the result as one group being more likely to have
Fig. 2 Survival plots – lung cancer example. deaths (this should be given by logistic regression!). It
No censoring With censoring is the time to event which is the primary response here.
1.0
0.8 Table V. Kaplan Meier analysis for comparison between two groups.
Cum Survival
0.6
the two groups (Template V).
0.4
Group
Template V.The log-rank test 0.2
Active
Control
Active-censored
Control-censored
0.0
0 10 20 30 40 50
Time (in months)
COX REGRESSION Template VIII. Invoking the 95% CI for the hazard ratio.
For the above lung cancer example, we have collected
information on race, age and gender, and want to look
at a confounder model to determine whether the two
groups differ after adjusting for demographics.
To perform a cox regression, go to Analyse, Survival,
Cox regression to get Template VI.
Group 1.00=control 12 1
2.00=active 13 0
Race 1=chinese 15 1 0 0
2=indian 5 0 1 0
In Template VI, click on “Options” to invoke the
3=malay 2 0 0 1
95% CI for the hazard ratio (HR), given by the
4=other 3 0 0 0
expression exp(B) – which is also the same expression
Sex 1=male 17 1
for odds ratios in logistic regression. This is another
2=female 8 0
common mistake – researchers at times refer to odds
ratio in survival analysis (mistaken by the same
The reference category for group is active, race
symbol). The interpretation for the hazard ratio is
is “other race” and sex is female.
similar to that of the odds ratio. A value of one
Table VIb gives the p-values (Sig) and the hazard
means there is no differences between two groups
ratios (Exp(B)) of the variables. Firstly, we have to check
in having a “shorter time to event”. A HR >1 means
for multicolinearity by observing whether the SE of
that the group of interest comparing to the reference
all the variables are small (see logistic regression(4)
group (to be observed from the categorical
for a detailed discussion on this checking).
declaration) likely have a shorter time to event. A HR
<1 means that the group of interest less likely to have
a shorter time to event.
Singapore Med J 2004 Vol 45(6) : 253
Since this is an adjusting for confounder model, Thus taking into account these information, a
our interest is only in the variable group. ‘Thankfully’ treatment difference is found, as observed from the
the p-value is 0.043 (statistically significant!) compared survival plot in Fig. 4.
to the Kaplan Meier analysis (well, we do not always
get this happy ending). The HR is 6.302 (95% CI 1.058 Fig. 4 Survival plot for the lung cancer example.
- 37.55), comparing the control with the active (obtained Survival functions for patterns 1 - 2
from the categorical definition table IVa), the control 1.0
0.6
difference? Table VIb also showed that there are
statistical differences for gender and also age – the 0.4
men and older people were doing worst. Performing
a cross-tabulation shows that there are more men and 0.2
Group
less women in the control group (p = 0.673) and mean Active
Control
age is higher in the active group. See Tables VIc 0.0
and VId. 0 10 20 30 40
Time (in months)
Those with a positive lymph node more likely to >5cm are at risk (HR=22.19, 95% CI 2.56 - 192.57,
have a shorter time to death (HR = 2.06, 95% CI p=0.005) and for subjects with tumour size 2 - 5cm,
1.07 - 4.0, p = 0.032). Tumour size is “just off statistical they are at a higher risk if they have a positive lymph
significance”. Should we conclude that only women node (HR=5.31, 95% CI 1.33 - 21.25, p=0.018).
with a positive lymph node are at a higher risk? Chotto One last assumption to check: proportional hazard
matte (wait a minute) – what happens if we include a model. From the lung cancer example, in Template IX,
lymph node * tumor size interaction (see Table VIIc). click on the “log-minus-log” plot option to get Fig. 5,
Here we can see that lymph node status is no we do not want the lines to cross each other. When
more statistically significant but tumour size and their the proportional hazard assumption is not satisfied,
interaction are! The results are telling us that regardless we will have to use Cox regression with time-
of the lymph node status, subjects with tumour size dependent covariate to analyse the data.
Singapore Med J 2004 Vol 45(6) : 256
Fig. 5 Log-minus-log plot for proportional hazard checking. Our next article will be “Biostatistics 301. Repeated
LML function for patterns 1 - 2 measurement analysis”.
2
1 REFERENCES
1. Chan YH. Biostatistics 102. Quantitative data – parametric and
0
non-parametric tests. Singapore Med J 2003; 44:391-6.
Log minus log
5 10 15 20 25 30 35
Time (in months)