0% found this document useful (0 votes)
17 views8 pages

Solved Problems - Survival

The document provides solutions to practice problems related to survival analysis using R, specifically focusing on a study comparing survival rates between smokers and non-smokers after root canal treatment. It includes steps for creating a dataset, obtaining Kaplan-Meier estimates, and performing a log-rank test to compare survival curves. The analysis is supported by R code and manual calculations for clarity.

Uploaded by

Paul Muthoka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Solved Problems - Survival

The document provides solutions to practice problems related to survival analysis using R, specifically focusing on a study comparing survival rates between smokers and non-smokers after root canal treatment. It includes steps for creating a dataset, obtaining Kaplan-Meier estimates, and performing a log-rank test to compare survival curves. The analysis is supported by R code and manual calculations for clarity.

Uploaded by

Paul Muthoka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

BIOSTATS 640 Spring 2022 Homework #13 Unit 8.

Survival Analysis R

Unit 8 – Introduction to Survival Analysis


Practice Problems
SOLUTIONS
R Users

Before you begin:


Be sure to have installed the package {survival}.

The following are some hypothetical data on two groups, smokers and non-smokers, in a study
that investigated survival (days) following a root canal.
Group Days(X) Status at Last Follow-up (C)
group days status

smoker 4 alive
smoker 7 dead
smoker 8 alive
nonsmoker 29 alive
smoker 29 dead
smoker 31 alive
nonsmoker 40 dead
smoker 65 dead
nonsmoker 69 dead
nonsmoker 78 alive
nonsmoker 79 alive
nonsmoker 106 dead
smoker 107 alive
nonsmoker 129 dead
smoker 130 alive
smoker 140 alive
smoker 142 alive
smoker 149 dead
smoker 158 alive
smoker 160 dead
nonsmoker 161 dead
smoker 162 alive
smoker 187 dead
smoker 188 alive
nonsmoker 197 dead
nonsmoker 204 alive
nonsmoker 208 alive
smoker 221 dead
nonsmoker 228 dead
nonsmoker 231 alive

sol_survival.docx Page 1 of 8
BIOSTATS 640 Spring 2022 Homework #13 Unit 8. Survival Analysis R

#1.
Create a data set of the observations in the table on page 1. Tips. #1. Create a 0/1 variable called group
(1=smoker, 0 = nonsmoker); and #2. Create a 0/1 variable called status (1=dead, = alive).

Solution:
I created an excel dataset that I named hw_survival.xlsx and then imported it into R.

hw_survival.xlsx (partial listing)

group days status


1.00 4.00 0.00
1.00 7.00 1.00
1.00 8.00 0.00
0.00 29.00 0.00
1.00 29.00 1.00
1.00 31.00 0.00
0.00 40.00 1.00
1.00 65.00 1.00
0.00 69.00 1.00
0.00 78.00 0.00
0.00 79.00 0.00

#2.
Obtain the Kaplan-Meier estimates of survival, separately for smokers and non-smokers.

library(survival)

dat <- hw_survival # I decided to rename my data to something shorter


group <- as.factor(dat$group) # convert vars group and status to factor vars
status <-as.factor(dat$status)
dat # show

## group days status


## 1 1 4 0
## 2 1 7 1
## 3 1 8 0
## 4 0 29 0

--- rows not shown ----


## 28 1 221 1
## 29 0 228 1
## 30 0 231 0

# create survival analyis object using Surv( ) in package {survival}


surv.object <- with(dat, Surv(days,status))

sol_survival.docx Page 2 of 8
BIOSTATS 640 Spring 2022 Homework #13 Unit 8. Survival Analysis R

By hand: Smokers
ID x c t # At Risk at t- # Surviving t Conditional # at Risk
An actual time t instant before % Surviving to carry forward
Of death or
censoring
Always start at “0” 17 17 17/17 17
1 4 0 Drop 16
2 7 1 7 16 15 15/16 15
3 8 0 Drop 14
4 29 1 29 14 13 13/14 13
5 31 0 Drop 12
6 65 1 65 12 11 11/12 11
7 107 0 Drop 10
8 130 0 Drop 9
9 140 0 Drop 8
10 142 0 Drop 7
11 149 1 149 7 6 6/7 6
12 158 0 Drop 5
13 160 1 160 5 4 4/5 4
14 162 0 Drop 3
15 187 1 187 3 2 2/3 2
16 188 0 Drop 1
17 221 1 221 1 0 0/1 0
Key: ID - Subject Identifier, X – Time on Study, C – Censoring Indicator (C=1 if Event of Death, C=0 if Censored)

By hand: NON-Smokers:
ID x c t # At Risk at t- # Surviving t Conditional # at Risk
An actual time t instant before % Surviving to carry forward
Of death or
censoring
Always start at “0” 13 13 13/13 13
1 29 0 Drop 12
2 40 1 40 12 11 11/12 11
3 69 1 69 11 10 10/11 10
4 78 0 Drop 9
5 79 0 Drop 8
6 106 1 106 8 7 7/8 7
7 129 1 129 7 6 6/7 6
8 161 1 161 6 5 5/6 5
9 197 1 197 5 4 4/5 4
10 204 0 Drop 3
11 208 0 Drop 2
12 228 1 228 2 1 1/2 1
13 231 0 Drop 0
Key: ID - Subject Identifier, X – Time on Study, C – Censoring Indicator (C=1 if Event of Death, C=0 if Censored)

sol_survival.docx Page 3 of 8
BIOSTATS 640 Spring 2022 Homework #13 Unit 8. Survival Analysis R

By hand: Kaplan-Meier Estimates S[t] for Smokers:

t Formula for Estimating S[t] S[t]


0 S[0]=Pr[T>0]=17/17 1.0
7 S[7]=Pr[T>0]P[T>7| T>0] = (17/17)(15/16) .9375
29 S[29]=Pr[T>0]P[T>7| T>0] Pr[T>29|T>7] = (17/17)(15/16)(13/14) .8705
65 … = (17/17)(15/16)(13/14)(11/12) .7980
149 … = (17/17)(15/16)(13/14)(11/12)(6/7) .6840
160 … = (17/17)(15/16)(13/14)(11/12)(6/7)(4/5) .5472
187 … = (17/17)(15/16)(13/14)(11/12)(6/7)(4/5)(2/3) .3648
221 … = (17/17)(15/16)(13/14)(11/12)(6/7)(4/5)(2/3)(0/1) 0

By hand: Kaplan-Meier Estimates S[t] for NON-Smokers:

t Formula for Estimating S[t] S[t]


0 S[0]=Pr[T>0]=13/13 1.0
40 S[40]=Pr[T>0]P[T>40| T>0] = (13/13)(11/12) .9167
69 S[69]=Pr[T>0]P[T>40| T>0] Pr[T>69|T>40] = (13/13)(11/12)(10/11) .8333
106 … = (13)(11/12)(10/11)(7/8) .7292
129 … = (13)(11/12)(10/11)(7/8)(6/7) .6250
161 … = (13/13)(11/12)(10/11)(7/8)(6/7)(5/6) .5208
197 … = (13/13)(11/12)(10/11)(7/8)(6/7)(5/6)(4/5) .4167
228 … = (13/13)(11/12)(10/11)(7/8)(6/7)(5/6)(4/5)(1/2) .2083

R
# survfit ( ) in package {survival}
library(survival)
q2fit <- survival::survfit(surv.object~group,data=dat)
summary(q2fit)

## Call: survfit(formula = surv.object ~ group, data = dat)


##
## group=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 40 12 1 0.917 0.0798 0.7729 1.000
## 69 11 1 0.833 0.1076 0.6470 1.000
## 106 8 1 0.729 0.1355 0.5066 1.000
## 129 7 1 0.625 0.1510 0.3893 1.000
## 161 6 1 0.521 0.1577 0.2877 0.943
## 197 5 1 0.417 0.1568 0.1993 0.871
## 228 2 1 0.208 0.1669 0.0433 1.000
##
## group=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 7 16 1 0.938 0.0605 0.826 1.000
## 29 14 1 0.871 0.0856 0.718 1.000
## 65 12 1 0.798 0.1048 0.617 1.000
## 149 7 1 0.684 0.1386 0.460 1.000
## 160 5 1 0.547 0.1651 0.303 0.989
## 187 3 1 0.365 0.1852 0.135 0.987
## 221 1 1 0.000 NaN NA NA

# plot( ) in package {base}


plot(q2fit,
xlab="Survival Time in Days",
ylab="% Surviving", yscale=100, col=c("red","blue"),
main="Survival Distributions by Group")
legend("topright", title="Group", c("Smokers", "Non-smokers"),
fill=c("red", "blue"))

sol_survival.docx Page 4 of 8
BIOSTATS 640 Spring 2022 Homework #13 Unit 8. Survival Analysis R

#3.
By hand, perform a log rank test of the null hypothesis of equal survival curves for smokers and non- smokers.
Worksheet .
t O1t n1t dt (N t - d t ) n 2t Nt
7 1 16 1 28 13 29
29 1 14 1 26 13 27
40 0 12 1 23 12 24
65 1 12 1 22 11 23
69 0 11 1 21 11 22
106 0 11 1 18 8 19
129 0 10 1 16 7 17
149 1 7 1 12 6 13
160 1 5 1 10 6 11
161 0 4 1 9 6 10
187 1 3 1 7 5 8
197 0 1 1 5 5 6
221 1 1 1 2 2 3
228 0 0 1 1 2 2
Key: O1t = 1 if death in smoker, 0 if death in nonsmoker, n1t=# at risk among smokers dt = # deaths
(Nt – dt ) = # surviving n2t = # at risk among nonsmokers Nt = Total # at risk

sol_survival.docx Page 5 of 8
BIOSTATS 640 Spring 2022 Homework #13 Unit 8. Survival Analysis R

Worksheet - continued .
é n (N - d t ) ù
T O1t E[O1t ] = (n1t )[d t N t ] V[O1t ] = [E(O1t )] ê 2t t ú
ë N t (N t - 1) û
7 1 0.5517 0.2473
29 1 0.5185 0.2497
40 0 0.5000 0.2500
65 1 0.5217 0.2495
69 0 0.5000 0.2500
106 0 0.5789 0.2438
129 0 0.5882 0.2422
149 1 0.5385 0.2485
160 1 0.4545 0.2479
161 0 0.4000 0.2400
187 1 0.3750 0.2344
197 0 0.1667 0.1389
221 1 0.3333 0.2222
228 0 0 0
Totals 7 6.0272 3.0644

2
æ #deaths #deaths
ö
ç å 1t å E(O1t ) ÷
O -
( 7 - 6.02717 )
2

c log rank;1df = è t=1 t=1 ø = = 0.3088


#deaths
3.0644
å V(O1t )
t=1
Intepretation. Do not reject the null hypothesis of equality of survival distributions (p-value = .31). These data do not
provide statistically significant evidence of a diferrence in survival by group. Note. This could be a reflection of the small
number of events in each group (7 in each group).

#4.
Using any software you like, reproduce the log rank test that you did by hand in exercise #3.

# survival curves (logrank test) using command survdiff(). Package: survival


library(survival)
survival::survdiff(surv.object~group, data=dat)

## Call:
## survdiff(formula = surv.object ~ group, data = dat)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## group=0 13 7 7.97 0.119 0.309
## group=1 17 7 6.03 0.157 0.309
##
## Chisq= 0.3 on 1 degrees of freedom, p= 0.578

Intepretation. The answers match! And so the conclusion is the same; do not reject the null hypothesis of equality of
survival distributions (p-value = .31)

sol_survival.docx Page 6 of 8
BIOSTATS 640 Spring 2022 Homework #13 Unit 8. Survival Analysis R

#5.
Write an expression for a Cox Proportional Hazards Model that could be explored to investigate the
association of survival time following root canal with smoking status. Define all terms.

Solution:

A Cox PH model for the hazard of death following root canal and its association with smoking status is

h(t; Z) = h 0 (t) exp[ βZ ] where -

h(t; Z) = instantaneous hazard of death at time “t” given survival to “t-“ for person with
covariate Z

h 0 (t) = baseline hazard of death at time “t” given survival to “t-“

Z = group = indicator of smoking status with Z=1 for smokers, 0 for nonsmokers.

#6.
What assumptions must hold in order for this model to be valid?

Solution:

(1) Model:
A Cox PH model for the hazard of death following root canal and its association with smoking status is

h(t; Z) = h 0 (t) exp[ βZ ] where Z=group and

h(t; Z) = instantaneous hazard of death at time “t” given survival to “t-“ for person with
covariate Z

h 0 (t) = baseline hazard of death at time “t” given survival to “t-“

Z = group and Z=1 for smokers, 0 for nonsmokers.

(2) Proportional Hazards: The relative hazard of death for smokers is a constant multiple (called the hazard
ratio) of the hazard of death for non-smokers over all occasions of time.

(3) Independence: The observations are independent.

sol_survival.docx Page 7 of 8
BIOSTATS 640 Spring 2022 Homework #13 Unit 8. Survival Analysis R

#7.
Using any software you like, fit the model you stated in exercise #5. Report your output and provide
annotations that explain the output.

# coxph() in package {survival}


library(survival)
q7cox <- survival::coxph(surv.object~group, data=dat)
q7cox

## Call:
## coxph(formula = surv.object ~ group, data = dat)
##
## coef exp(coef) se(coef) z p
## group 0.317 1.373 0.573 0.55 0.58
##
## Likelihood ratio test=0.31 on 1 df, p=0.579
## n= 30, number of events= 14

Interpretation. In this sample, smokers have a non-statistically significant (p=.58) relative hazard of death that is 37%
greater than that of nonsmokers following root canal. (HR = 1.37 with 95% CI limits 0.45 to 4.22)

#8.
Compare the fit of the model you obtained for exercise #7 to the results of the log- rank test that you got for
exercises #3 and #4.

Solution:
It’s a match!

A Cox PH model for the hazard of event with one 0/1 predictor is equivalent to the log rank test for the comparison of
two groups.

Log Rank Test Chi Square = 0.3088 on df=1 has p-value = .5784

Cox PH Model Score Test for significance of 0/1 GROUP = 0.3088 on df=1 has p-value = .5784

sol_survival.docx Page 8 of 8

You might also like