Survival Analysis With R
Survival Analysis With R
This tutorial shows some basic tools for survival analysis using R. In particular, how to obtain the Kaplan-Meier graph and how to fit a
univariate and a multiple Cox regression model. It is also shown how to export the results in a publishable table format.
For the purpose of illustration we use the German breast cancer data which are for example shipped with the R-package pec:
library(data.table)
library(pec)
data(GBSG2,package="pec")
setDT(GBSG2)
GBSG2
horTh age menostat tsize tgrade pnodes progrec estrec time cens
1: no 70 Post 21 II 3 48 66 1814 1
2: yes 56 Post 12 II 7 61 77 2018 1
3: yes 58 Post 35 II 9 52 271 712 1
4: yes 59 Post 17 II 4 60 29 1807 1
5: no 73 Post 35 II 1 26 65 772 1
---
682: no 49 Pre 30 III 3 1 84 721 0
683: yes 53 Post 25 III 17 0 0 186 0
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
684: no 51 Pre 25 III 5 43 0 769 1 TABLE OF CONTENTS
685: no 52 Post 23 II 3 15 34 727 1
686: no 55 Post 23 II 9 116 15 1701 1
library(prodlim)
library(survival)
library(Publish)
When reporting the length of follow-up usually the aim is to describe how long the study was able to observe the enrolled
subjects. Naive estimates using e.g., the observed times of all subjects tend to underestimate follow-up because of early events.
Schemper and Smith (Control Clin Trials, 1996,17:343–346) suggested to estimate potential follow-up based on the Kaplan-
Meier for the censored times. An approximation of this so-called reverse Kaplan-Meier can be obtained by reversing the event
indicator so that the outcome of interest becomes being censored. However, when there are ties in the data in the sense that some
subjects have exactly the same event times then this would not be exactly correct. Here is how to calculate the reverse Kaplan-
Meier estimate which is always exactly correct and how to obtain median follow-up with 95% confidence limits and inter quartile
range (IQR):
quantile(prodlim(Hist(time,cens)~1,data=GBSG2,reverse=TRUE))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
TABLE OF CONTENTS
Quantiles of the potential follow up time distribution based on the Kaplan-Meier method
applied to the censored times reversing the roles of event status and censored.
The median potential follow-up time of the GBSG2 study was 1645 days (IQR: [1100 days;1714 days]). This means that 50% of
the patients would have been observed for at least 1645 days had there been no events.
Kaplan-Meier
Kaplan-Meier graph
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
TABLE OF CONTENTS
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
mgp=c(4,1,0)) # move axis label away from figure TABLE OF CONTENTS
plot(km0,
xlab="Years", # label for x-axis
ylab="Absolute risk of death", # label for y-axis
type="cuminc", # increasing risks = 1-survival instead of
axis1.at=seq(0,2900,365.25), # time grid for x-axis
axis1.labels=0:7, # time labels for x-axis
axis2.las=2, # rotate labels of y-axis
atrisk.dist=1, # adjust numbers below the figure
atrisk.labels="Number of \npatients: ") # labels for numbers below figure
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
TABLE OF CONTENTS
Stratified Kaplan-Meier
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
km1 <- prodlim(Hist(time,cens)~tgrade,data=GBSG2) TABLE OF CONTENTS
par(mar=c(7,7,5,5), mgp=c(3,1,0))
plot(km1,
atrisk.labels=paste("Tumor grade: ",c("I","II","III"),": "),
atrisk.title="",
xlab="Years", # label for x-axis
axis1.at=seq(0,2900,365.25), # time grid for x-axis
axis1.labels=0:7, # time labels for x-axis
legend.x="bottomleft", # positition of legend
legend.cex=0.8, # font size of legend
legend.title="Tumor Grade\n", #
logrank=TRUE) # show log-rank p-value
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
TABLE OF CONTENTS
Log-rank test
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
survdiff(Surv(time,cens)~horTh,data=GBSG2) TABLE OF CONTENTS
Call:
survdiff(formula = Surv(time, cens) ~ horTh, data = GBSG2)
Median survival
overall Kaplan-Meier
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
stratified Kaplan-Meier TABLE OF CONTENTS
tgrade=I
q quantile lower upper 1 0.00 NA NA NA 2 0.25 NA NA NA 3 0.50 NA 1990 NA 4 0.75 1459 991 NA 5 1.00
476 476 662 Median time (IQR):– (1459.00;–)
tgrade=II
q quantile lower upper 1 0.00 NA NA NA 2 0.25 NA 2456 NA 3 0.50 1730 1481 2018 4 0.75 745 646 842 5 1.00
72 72 171 Median time (IQR):1730.00 (745.00;–)
tgrade=III
q quantile lower upper 1 0.00 NA NA NA 2 0.25 NA 2034 NA 3 0.50 1337 956 2034 4 0.75 476 403 571 5 1.00
98 98 177 Median time (IQR):1337.00 (476.00;–)
tgrade=I
tgrade=II
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Median time (lower.95; upper.95): 1730 (1481;2018) IQR (time): (745;–) TABLE OF CONTENTS
tgrade=III
Tabulate results
Overall Kaplan-Meier
Interval No. at risk No. of events No. lost to follow-up Survival probability CI.95
0.0– 0.0 686 0 0 100.0 0.0–100.0
0.0– 365.2 686 56 28 91.6 89.4– 93.7
365.2– 730.5 602 109 35 74.6 71.3– 78.0
730.5–1095.8 458 59 68 64.3 60.5– 68.1
1095.8–1461.0 331 39 64 55.9 51.8– 60.0
1461.0–1826.2 228 22 85 49.2 44.7– 53.7
1826.2–2191.5 121 11 74 41.9 36.3– 47.4
2191.5–2556.8 36 3 30 34.3 24.7– 43.8
Stratified Kaplan-Meier
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
TABLE OF CONTENTS
tgrade=I
Interval No. at risk No. of events No. lost to follow-up Survival probability CI.95
0.0– 0.0 81 0 0 100.0 0.0–100.0
0.0– 365.2 81 0 4 100.0 0.0–100.0
365.2– 730.5 77 5 7 93.1 87.2– 98.9
730.5–1095.8 65 6 8 83.8 74.9– 92.6
1095.8–1461.0 51 5 10 74.4 63.4– 85.5
1461.0–1826.2 36 0 15 74.4 63.4– 85.5
1826.2–2191.5 21 2 14 62.0 43.8– 80.2
2191.5–2556.8 5 0 5 NA NA– NA
tgrade=II
Interval No. at risk No. of events No. lost to follow-up Survival probability CI.95
0.0– 0.0 444 0 0 100.0 0.0–100.0
0.0– 365.2 444 33 19 92.3 89.8– 94.8
365.2– 730.5 392 69 19 75.8 71.7– 79.9
730.5–1095.8 304 44 42 64.1 59.4– 68.8
1095.8–1461.0 218 26 37 55.7 50.6– 60.8
1461.0–1826.2 155 20 54 47.0 41.4– 52.6
1826.2–2191.5 81 7 46 40.5 33.9– 47.1
2191.5–2556.8 28 3 22 30.7 18.9– 42.5
tgrade=III
Interval No. at risk No. of events No. lost to follow-up Survival probability CI.95
0.0– 0.0 161 0 0 100.0 0.0–100.0
0.0– 365.2 161 23 5 85.3 79.7– 90.8
365.2– 730.5 133 35 9 62.5 54.9– 70.2
730.5–1095.8 89 9 18 55.3 47.2– 63.4
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Interval No. at risk No. of events No. lost to follow-up Survival probability CI.95 TABLE OF CONTENTS
1095.8–1461.0 62 8 17 47.2 38.6– 55.9
1461.0–1826.2 37 2 16 43.5 34.0– 53.0
1826.2–2191.5 19 2 14 35.4 22.6– 48.2
2191.5–2556.8 3 0 3 NA NA– NA
Cox regression
Call:
coxph(formula = Surv(time, cens) ~ tgrade, data = GBSG2)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Wald test = 19.75 on 2 df, p=0.00005 TABLE OF CONTENTS
Score (logrank) test = 21.1 on 2 df, p=0.00003
library(pec)
km.grade <- prodlim(Surv(time,cens)~tgrade,data=GBSG2)
cox.grade <- coxph(Surv(time,cens)~tgrade,data=GBSG2,x=TRUE,y=TRUE)
newdata <- data.frame(tgrade=c("I","II","III"))
## first show Kaplan-Meier without confidence limits
plot(km.grade, lty=1, lwd=3,
col=c("darkgreen","darkorange","red"), confint=FALSE)
## now add survival estimates based on Cox regression
plotPredictSurvProb(cox.grade, lty=2,
col=c("darkgreen","darkorange","red"),
add=TRUE, sort(unique(GBSG2$time)),
newdata=newdata)
mtext("Comparison of univariate Cox regression and stratified Kaplan-Meier")
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
TABLE OF CONTENTS
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
cox2 <- coxph(Surv(time,cens)~tgrade+age+tsize+pnodes+progrec+estrec,data=GBSG2) TABLE OF CONTENTS
summary(cox2)
Call:
coxph(formula = Surv(time, cens) ~ tgrade + age + tsize + pnodes +
progrec + estrec, data = GBSG2)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
The following code extracts the survival probabilities for specific combinations of the risk factors: for all three categories ofTABLE OF CONTENTS
tumor grade and fixed age, tumor size and number of positive lymph nodes.
library(pec)
data(GBSG2)
cox2 <- coxph(Surv(time,cens)~tgrade+age+tsize+pnodes,data=GBSG2,x=TRUE,y=TRUE)
newdata <- data.frame(tgrade=c("I","II","III"),age=50,tsize=30,pnodes=8)
plotPredictSurvProb(cox2,
sort(unique(GBSG2$time)),
newdata=newdata,
col=c("darkgreen","darkorange","red"),
legend.title="Tumor grade",
legend.legend=c("I","II","III"))
mtext("Individualized survival curves from multiple Cox regression
age=50, tumor size= 30, no. positive lymph nodes=8",line=1.5,cex=1.3)
multiple-cox-survival-curves.jpg
Stratified
Call:
coxph(formula = Surv(time, cens) ~ tgrade + age + strata(menostat) +
tsize + pnodes + progrec + estrec, data = GBSG2, x = TRUE)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
TABLE OF CONTENTS
n= 686, number of events= 299
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Analysis of Deviance Table TABLE OF CONTENTS
Cox model: response is Surv(time, cens)
Model 1: ~ tgrade + Age + strata(menostat) + tsize + pnodes + progrec + estrec
Model 2: ~ tgrade * Age + strata(menostat) + tsize + pnodes + progrec + estrec
loglik Chisq Df P(>|Chi|)
1 -1535.1
2 -1531.0 8.2495 6 0.2204
The tumor grade effect on the hazard rate is not significantly different between age groups (p=0.47).
Within the limitations of this study we were not able to identify a significant difference of the tumor grade effect on the the
hazard rate between age groups (p=0.47).
Tabulate results
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Variable Units HazardRatio CI.95 p-value TABLE OF CONTENTS
pnodes 1.05 [1.04;1.07] < 0.001
progrec 1.00 [1.00;1.00] < 0.001
estrec 1.00 [1.00;1.00] 0.70522
Note: the main effects do usually not have an interpretation in the presence of interaction (effect modification).
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
par(cex=4) TABLE OF CONTENTS
cox2 <- coxph(Surv(time,cens)~tgrade+age+tsize+pnodes,data=GBSG2)
rt2 <- do.call("rbind",regressionTable(cox2))
plotConfidence(x=rt2$HazardRatio,
lower=rt2$Lower,
upper=rt2$Upper,
xlim=c(0,5),
factor.reference.pos=1,
labels=c("Tumor grade I",
" II",
" III",
"Age (years)",
"Tumor size (mm)",
"No. positive\nlymph nodes"),
title.labels="Factor",
cex=2,
xlab.cex=1.3,
xlab="Hazard ratio")
forest-cox2.jpg
Competing risks
library(riskRegression)
data(Melanoma)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
levels(Melanoma$event) <- gsub("\\.","\n\n",levels(Melanoma$event)) TABLE OF CONTENTS
with(Melanoma,plot(Hist(time,event,cens.code="censored"),arrowLabelStyle="count"))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Aalen-Johansen graph TABLE OF CONTENTS
The graph shows the cumulative incidence of cancer related death since surgery.
library(riskRegression)
data(Melanoma)
aj <- prodlim(Hist(time,status)~1,data=Melanoma)
plot(aj,cause=1)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
TABLE OF CONTENTS
Stratified Aalen-Johansen
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
library(riskRegression) TABLE OF CONTENTS
data(Melanoma)
aj2 <- prodlim(Hist(time,status)~sex,data=Melanoma)
plot(aj2,cause=1)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
TABLE OF CONTENTS
Gray's test
library(cmprsk)
with(Melanoma,cuminc(ftime=time,fstatus=status,group=sex))$Tests
stat pv df
1 5.8140209 0.0158989 1
2 0.8543656 0.3553203 1
The Gray test shows a significantly higher risk of cancer related death for males compared to females (p-value= 0.0159).
The Gray test shows no non-significant difference in risks of other cause mortality between males and females (p-value=
0.3553).
Cox regression
The function fits Cox regression models, one for each competing risk.
library(riskRegression)
CSC(Hist(time,status)~sex+age+invasion+logthick+ulcer,data=Melanoma)
Pattern:
----------> Cause: 1
Call:
survival::coxph(formula = survival::Surv(time, status) ~ sex +
age + invasion + logthick + ulcer, x = TRUE, y = TRUE)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
TABLE OF CONTENTS
----------> Cause: 2
Call:
survival::coxph(formula = survival::Surv(time, status) ~ sex +
age + invasion + logthick + ulcer, x = TRUE, y = TRUE)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
library(riskRegression) TABLE OF CONTENTS
CSC(list(Hist(time,status)~sex+age+invasion+logthick+ulcer,Hist(time,status)~sex+ag
No.Observations: 205
Pattern:
----------> Cause: 1
Call:
survival::coxph(formula = survival::Surv(time, status) ~ sex +
age + invasion + logthick + ulcer, x = TRUE, y = TRUE)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
invasionlevel.2 1.491 0.6705 0.5214 4.266 TABLE OF CONTENTS
logthick 1.526 0.6555 0.9342 2.492
ulcerpresent 2.590 0.3861 1.3581 4.940
----------> Cause: 2
Call:
survival::coxph(formula = survival::Surv(time, status) ~ sex +
age, x = TRUE, y = TRUE)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD