2 Right Censoring and Kaplan-Meier Estimator: ST 745, Daowen Zhang
2 Right Censoring and Kaplan-Meier Estimator: ST 745, Daowen Zhang
2 Right Censoring and Kaplan-Meier Estimator: ST 745, Daowen Zhang
In biomedical applications, especially in clinical trials, two important issues arise when studying
“time to event” data (we will assume the event to be “death”. It can be any event of interest):
1. Some individuals are still alive at the end of the study or analysis so the event of interest,
namely death, has not occurred. Therefore we have right censored data.
2. Length of follow-up varies due to staggered entry. So we cannot observe the event for those
x x x
x x o
x x o
x o o
In addition to censoring because of insufficient follow-up (i.e., end of study censoring due to
Censoring from these types of causes may be inherently different from censoring due to
PAGE 11
CHAPTER 2 ST 745, Daowen Zhang
Censoring and differential follow-up create certain difficulties in the analysis for such data
as is illustrated by the following example taken from a clinical trial of 146 patients treated after
The data have been grouped into one year intervals and all time is measured in terms of
patient time.
[0, 1) 146 27 3
[1, 2) 116 18 10
[2, 3) 88 21 10
[3, 4) 57 9 3
[4, 5) 45 1 3
[5, 6) 41 2 11
[6, 7) 28 3 5
[7, 8) 20 1 8
[8, 9) 11 2 1
[9, 10) 8 2 6
PAGE 12
CHAPTER 2 ST 745, Daowen Zhang
1. The first estimate would be correct if all censoring occurred after 5 years. Of cause, this
was not the case leading to overly optimistic estimate (i.e., overestimates S(5)).
2. The second estimate would be correct if all individuals censored in the 5 years were censored
immediately upon entering the study. This was not the case either, leading to overly
Our clinical colleagues have suggested eliminating all individuals who are censored and use
the remaining “complete” data. This would lead to the following estimate
76 deaths in 5 years
Fb (5) = P [T ≤ 5] = = 88.4%, b
S(5) = 1 − Fb (5) = 11.6%.
146 -60 (censored)
Life-table Estimate
More appropriate methods use life-table or actuarial method. The problem with the above
two estimates is that they both ignore the fact that each one-year interval experienced censoring
(or withdrawing). Obviously we need to take this information into account in order to reduce
bias. If we can express S(5) as a function of quantities related to each interval and get a very
good estimate for each quantity, then intuitively, we will get a very good estimate of S(5). By
= P [T ≥ 3] · q4 · q5
= = q1 · q2 · q3 · q4 · q5
we will get a very good estimate of S(5). Note that 1 − qi is the mortality rate m(x) at year
x = i − 1 by our definition.
PAGE 13
CHAPTER 2 ST 745, Daowen Zhang
Table 2.2: Life-table estimate of S(5) assuming censoring occurred at the end of interval
bR (t ) = Q
duration [ti−1 , ti ) c(x) =
n(x) d(x) w(x) m d(x)
n(x)
1−m
c(x) S i (1 − m
c(x))
Case 1: Let us first assume that anyone censored in an interval of time is censored at the
end of that interval. Then we can estimate each qi = 1 − m(i − 1) in the following way:
d(0) 27
d(0) ∼ Bin(n(0), m(0)) =⇒ m
c(0) = = = 0.185, qb1 = 1 − m
c(0) = 0.815
n(0) 146
d(1) 18
d(1)|H ∼ Bin(n(1), m(1)) =⇒ m
c(1) = = = 0.155, qb2 = 1 − m
c(1) = 0.845
n(1) 116
···
where H means data history (i.e, data before the second interval).
The life table estimate would be computed as shown in Table 2.2. So the 5 year survival
probability estimate SbR (5) = 0.432. (If the assumption that anyone censored in an in-
terval of time is censored at the end of that interval is true, then the estimator SbR (5) is
Of course, this estimate SbR (5) will have variation since it was calculated from a sample. We
need to estimate its variation in order to make inference on S(5) (for example, construct a
However, SbR (5) is a product of 5 estimates (qb1 – qb5 ), whose variance is not easy to find.
But we have
So if we can find out the variance of each log(qbi ), we might be able to find out the variance
PAGE 14
CHAPTER 2 ST 745, Daowen Zhang
For this purpose, let us first introduce a very popular method in statistics: delta method:
Delta Method:
θb ∼ N(θ, σ 2 )
a
If
Proof of delta method: If σ 2 is small, θb will be close to θ with high probability. We hence
b about θ using Taylor expansion:
can expand f (θ)
Returning to our problem. Let φbi = log(qbi ). Using the delta method, the variance of φbi
is approximately equal to
à !2
1
var(φbi ) = var(qbi ).
qi
Therefore we need to find out and estimate var(qbi ). Of course, we also need to find out the
covariances among φbi and φbj (i 6= j). For this purpose, we need the following theorem:
Double expectation theorem (Law of iterated conditional expectation and variance): If X and
E(X) = E[E(X|Y )]
Since qbi = 1 − m
c(i − 1), we have
c(i − 1))
var(qbi ) = var(m
PAGE 15
CHAPTER 2 ST 745, Daowen Zhang
Now let us look at the covariances among φbi and φbj (i 6= i). It is very amazing that they
For example, let us consider the covariance between φb1 and φb2 . Since φb1 = log(qb1 ) and
φb2 = log(qb2 ), using the same argument for the delta method, we know that we only need
to find out the covariance between qb1 and qb2 , or equivalently, the covariance between m
c(0)
c(0)m
E[m c(1)] = E[E[m
c(0)m
c(1)|n(0), d(0), w(0)]]
c(0)E[m
= E[m c(1)|n(0), d(0), w(0)]]
c(0)m(1)]
= E[m
c(0)]
= m(1)E[m
c(0)]E[m
= m(1)m(0) = E[m c(1)].
c(0) and m
Therefore, the covariance between m c(1) is zero. Similarly, we can show other
var(SbR (5)) = (eθ )2 var(log(SbR (5))) = (S(5))2 [var(φb1 )+var(φb2 )+var(φb3 )+var(φb4 )+var(φb5 )],
PAGE 16
CHAPTER 2 ST 745, Daowen Zhang
Case 2: Let us assume that anyone censored in an interval of time is censored right at the
beginning of that interval. Then the life table estimate would be computed as shown in
Table 2.3. So the 5 year survival probability estimate = 0.400. (In this case, the estimator
The variance estimate of SbL (5) is similar to that of SbR (5) except that we need to change
Table 2.3: Life-table estimate of S(5) assuming censoring occurred at the beginning of interval
bL (t ) = Q
duration [ti−1 , ti ) c(x) =
n(x) d(x) w(x) m d(x)
n(x)−w(x)
1−m
c(x) S i (1 − m
c(x))
The naive estimates range from 35% to 47.9% for the five year survival probability with the
“complete case” (i.e., eliminating anyone censored) estimator giving an estimate of 11.6%.
The life-table estimate ranged from 40% to 43.2% depending on whether we assume censoring
occurred at the left (i.e., beginning) or right (i.e., end) of each interval.
More than likely censoring occurs during the interval. Thus SbL and SbR are not correct. A
PAGE 17
CHAPTER 2 ST 745, Daowen Zhang
Table 2.4: Life-table estimate of S(5) assuming censoring occurred during the interval
bLT (t ) = Q
duration [ti−1 , ti ) n(x) c(x) =
d(x) w(x) m d(x)
n(x)−w(x)/2
1−m
c(x) S i (1 − m
c(x))
That is, when calculating the mortality estimate in each interval, we use (n(x) − w(x)/2) as
the “sample size”. This number is often referred to as the effective sample size.
So the 5 year survival probability estimate SbLT (5) = 0.417, which is between SbL = 0.400 and
SbR = 0.432.
0.6
0.4
0.2
0 2 4 6 8 10
Time (years)
Figure 2.2 shows the life-table estimate of the survival probability assuming censoring oc-
curred during interval. Here the estimates were connected using straight lines. No special
significance should be given to this. From this figure, the median survival time is estimated to
PAGE 18
CHAPTER 2 ST 745, Daowen Zhang
be about 3 years.
The variance estimate of the life-tabble estimate SbLT (5) is similar to equation (2.1) except
Of course, we can also use the above formula to calculate the variance of SbLT (t) at other
The calculation presented in Table 2.4 can be implemented using Proc Lifetest in SAS:
Data mi;
input survtime number status;
cards;
0 27 1
0 3 0
1 18 1
1 10 0
2 21 1
2 10 0
3 9 1
3 3 0
4 1 1
4 3 0
5 2 1
5 11 0
6 3 1
6 5 0
7 1 1
7 8 0
8 2 1
8 1 0
9 2 1
9 6 0
;
PAGE 19
CHAPTER 2 ST 745, Daowen Zhang
Note that the number of observed events and withdrawals in [ti−1 , ti ) were entered after ti−1
Effective Conditional
Interval Number Number Sample Probability
[Lower, Upper) Failed Censored Size of Failure
0 1 27 3 144.5 0.1869
1 2 18 10 111.0 0.1622
2 3 21 10 83.0 0.2530
3 4 9 3 55.5 0.1622
4 5 1 3 43.5 0.0230
5 6 2 11 35.5 0.0563
6 7 3 5 25.5 0.1176
7 8 1 8 16.0 0.0625
8 9 2 1 10.5 0.1905
9 10 2 6 5.0 0.4000
Conditional
Probability Survival Median
Interval Standard Standard Residual
[Lower, Upper) Error Survival Failure Error Lifetime
0 1 0.0324 1.0000 0 0 3.1080
1 2 0.0350 0.8131 0.1869 0.0324 4.4265
2 3 0.0477 0.6813 0.3187 0.0393 5.2870
3 4 0.0495 0.5089 0.4911 0.0438 .
4 5 0.0227 0.4264 0.5736 0.0445 .
5 6 0.0387 0.4166 0.5834 0.0446 .
6 7 0.0638 0.3931 0.6069 0.0450 .
7 8 0.0605 0.3469 0.6531 0.0470 .
8 9 0.1212 0.3252 0.6748 0.0488 .
9 10 0.2191 0.2632 0.7368 0.0558 .
Here the numbers in the column under Conditional Probability of Failure are the es-
c(x) = d(x)/(n(x) − w(x)/2).
timated mortality m
The above lifetable estimation can also be implemented using R. Here is the R code:
PAGE 20
CHAPTER 2 ST 745, Daowen Zhang
se.pdf se.hazard
0-1 0.032426423 0.03945410
1-2 0.028930638 0.04143228
2-3 0.033999501 0.06254153
3-4 0.026163333 0.05859410
4-5 0.009742575 0.02325424
5-6 0.016315545 0.04097447
6-7 0.025635472 0.07202769
7-8 0.021195209 0.06448255
8-9 0.040488466 0.14803755
9-10 NA NA
Note: Here the numbers in the column of hazard are the estimated hazard rates at the
midpoint of each interval by assuming the true survival function S(t) is a straight line in each
interval. You can find an explicit expression for this estimator using the relation
f (t)
λ(t) = ,
S(t)
and the assumption that the true survival function S(t) is a straight line in [ti−1 , ti ):
S(ti ) − S(ti−1 )
S(t) = S(ti−1 ) + (t − ti−1 ), for t ∈ [ti−1 , ti ).
ti − ti−1
These estimates are very close to the mortality estimates we obtained before (the column under
Kaplan-Meier Estimator
The Kaplan-Meier or product limit estimator is the limit of the life-table estimator when
intervals are taken so small that only at most one distinct observation occurs within an interval.
Kaplan and Meier demonstrated in a paper in JASA (1958) that this estimator is “maximum
likelihood estimate”.
PAGE 21
CHAPTER 2 ST 745, Daowen Zhang
1.0
0.8
0.6
0.4
0.2
0.0 x x o x o x x o x o
9 8 6 4 3 1
1−m
c(x) : 1 1 1 1 10
1 1 9
1 1 1 7
1 1 1 5 4
1 1 2
1 1
b
S(t) : 1 1 1 1 9
. . 8
. . . 48
. . . 192 144
. . 144
. .
10 10 70 350 350 700
We will illustrate through a simple example shown in Figure 2.3 how the Kaplan-Meier
estimator is constructed.
By convention, the Kaplan-Meier estimate is a right continuous step function which takes
The calculation of the above KM estimate can be implemented using Proc Lifetest in SAS
as follows:
Data example;
input survtime censcode;
cards;
4.5 1
7.5 1
8.5 0
11.5 1
13.5 0
15.5 1
16.5 1
17.5 0
19.5 1
21.5 0
;
Proc lifetest;
PAGE 22
CHAPTER 2 ST 745, Daowen Zhang
time survtime*censcode(0);
run;
Survival
Standard Number Number
SURVTIME Survival Failure Error Failed Left
0.0000 1.0000 0 0 0 10
4.5000 0.9000 0.1000 0.0949 1 9
7.5000 0.8000 0.2000 0.1265 2 8
8.5000* . . . 2 7
11.5000 0.6857 0.3143 0.1515 3 6
13.5000* . . . 3 5
15.5000 0.5486 0.4514 0.1724 4 4
16.5000 0.4114 0.5886 0.1756 5 3
17.5000* . . . 5 2
19.5000 0.2057 0.7943 0.1699 6 1
21.5000* . . . 6 0
* Censored Observation
The above Kaplan-Meier estimate can also be obtained using R function survfit(). The
code is given in the following:
> survtime <- c(4.5, 7.5, 8.5, 11.5, 13.5, 15.5, 16.5, 17.5, 19.5, 21.5)
> status <- c(1, 1, 0, 1, 0, 1, 1, 0, 1, 0)
> fit <- survfit(Surv(survtime, status), conf.type=c("plain"))
> summary(fit)
Call: survfit(formula = Surv(survtime, status), conf.type = c("plain"))
time n.risk n.event survival std.err lower 95% CI upper 95% CI
4.5 10 1 0.900 0.0949 0.7141 1.000
7.5 9 1 0.800 0.1265 0.5521 1.000
11.5 7 1 0.686 0.1515 0.3888 0.983
15.5 5 1 0.549 0.1724 0.2106 0.887
16.5 4 1 0.411 0.1756 0.0673 0.756
19.5 2 1 0.206 0.1699 0.0000 0.539
Let d(x) denote the number of deaths at time x. Generally d(x) is either zero or one, but we
allow the possibility of tied survival times in which case d(x) may be greater than one. Let n(x)
PAGE 23
CHAPTER 2 ST 745, Daowen Zhang
denote the number of individuals at risk just prior to time x; i.e., number of individuals in the
sample who neither died nor were censored prior to time x. Then Kaplan-Meier estimate can be
expressed as
à !
Y d(x)
KM (t) = 1− .
x≤t n(x)
Note: In the notation above, the product changes only at times x where d(x) ≥ 1; , i.e.,
Non-informative Censoring
In order that the life-table estimates give unbiased results there is an important assumption
that individuals who are censored are at the same risk of subsequent failure as those who are still
alive and uncensored. The risk set at any time point (the individuals still alive and uncensored)
should be representative of the entire population alive at the same time. If this is the case, the
dent of the survival time, then we will automatically have non-informative censoring. Actually,
If censoring only occurs because of staggered entry, then the assumption of non-informative
censoring seems plausible. However, when censoring results from loss to follow-up or death from
a competing risk, then this assumption is more suspect. If at all possible censoring from these
The derivation given below is heuristic in nature but will try to capture some of the salient
feature of the more rigorous treatments given in the theoretical literature on survival analysis.
For this reason, we will use some of the notation that is associated with the “counting process”
approach to survival analysis. In fact we have seen it when we discussed the life-table estimator.
PAGE 24
CHAPTER 2 ST 745, Daowen Zhang
It is useful when considering the product limit estimator to partition time into many small
x
Patient time
Let “x” denote some arbitrary time point on the grid above and define
• Y (x) = number of individuals at risk (i.e., alive and uncensored) at time point x.
Recall: Previously, Y (x) was denoted by n(x) and dN (x) was denoted by d(x).
It should be straightforward to see that “w(x)”, the number of censored individuals in [x, x +
Note: In theory, we should be able to choose ∆x small enough so that {dN (x) > 0 and
w(x) > 0} should never occur. In practice, however, data may not be collected in that fashion,
in which case, approximations such as those given with life-table estimators may be necessary.
dN (x)
If the sample size is large and ∆x is small, then Y (x)
is a small number (i.e., close to zero)
and as long as x is not close to the right hand tail of the survival distribution (where Y (x) may
PAGE 25
CHAPTER 2 ST 745, Daowen Zhang
Here we used the approximation ex ≈ 1 + x when x is close to zero. This approximation is exact
dN (x)
when Y (x)
= 0.
here and thereafter, {x < t} means {all grid points x such that x + ∆x ≤ t}.
If ∆x is taken to be small enough so that all distinct times (either death times or withdrawal
P dN (x)
times) are represented at most once in any time interval, then the estimator x<t Y (x) will be
uniquely defined and will not be altered by choosing a finer partition for the grid of time points.
P dN (x)
In such a case the quantity x<t Y (x) is sometimes represented as
Z t dN (x)
.
0 Y (x)
1. Basically, this estimator take the sum over all the distinct death times before time t of the
number of deaths divided by the number at risk at each of those distinct death times.
P dN (x)
2. The estimator x<t Y (x) is referred to as the Nelson-Aalen estimator for the cumulative
Rt
hazard function Λ(t) = 0 λ(x)dx. That is
X dN (x)
b
Λ(t) = .
x<t Y (x)
PAGE 26
CHAPTER 2 ST 745, Daowen Zhang
With independent censoring, it would seem reasonable to estimate λ(x)∆x, i.e., “the con-
dN (x)
ditional probability of dying in [x, x + ∆x) given being alive at time x” by Y (x)
. Therefore we
X dN (x)
b
Λ(t) = .
x<t Y (x)
We will now show how to estimate the variance of the Nelson-Aalen estimator and then show
how this will be used to estimate the variance of the Kaplan-Meier estimator.
For a grid point x, let H(x) denote the history of all deaths and censoring occurring up to
time x.
H(x) = {dN (u), w(u); for all values u on our grid of points for u < x}.
1. Conditional on H(x), we would know the value of Y (x) (i.e., the number of risk at time x)
where π(x) is the Conditional probability of an individual dying in [x, x + ∆x) given that
the individual was at risk at time x (i.e., π(x) = P [x ≤ T < x + ∆x|T ≥ x]). Recall that
2. The following are standard results for a binomially distributed random variable.
PAGE 27
CHAPTER 2 ST 745, Daowen Zhang
b P dN (x)
Consider the Nelson-Aalen estimator Λ(t) = x<t Y (x)
. We have
" # " #
X dN (x) X
dN (x)
b
E[Λ(t)] = E = E
x<t Y (x) x<t Y (x)
" " ¯ ##
X dN (x) ¯¯ X
= E E ¯ H(x) = π(x)
x<t Y (x) ¯ x<t
X Z t
≈ λ(x)∆x ≈ λ(x)dx = Λ(t).
x<t 0
Hence
b P
• E[Λ(t)] = x<t π(x).
P b
• If we take ∆x smaller and smaller, then in the limit x<t π(x) goes to Λ(t). Namely Λ(t)
b
How to Estimate the Variance of Λ(t)
b
Var(Λ(t)) b
= E[Λ(t) b
− E(Λ(t))]2
" #2
X dN (x) X
= E − π(x)
x<t Y (x) x<t
" ( )#2
X dN (x)
= E − π(x) .
x<t Y (x)
Note: The square of a sum of terms is equal to the sum of the squares plus the sum of all
PAGE 28
CHAPTER 2 ST 745, Daowen Zhang
We will first demonstrate that the cross product terms have expectation equal to zero. Let
us take one such term and let us say, without loss of generality, that x < x0 .
"( )( )#
dN (x) dN (x0 )
E − π(x) 0
− π(x0 )
Y (x) Y (x )
" "( )( )¯ ##
dN (x) dN (x0 ) ¯
0 ¯ 0
= E E − π(x) − π(x ) ¯¯ H(x )
Y (x) Y (x0 )
Note: Conditional on H(x0 ), dN (x), Y (x) and π(x) are constant since x < x0 . Therefore the
Since the cross product terms have expectation equal to zero, this implies that
" #2
X
dN (x)
b
Var(Λ(t)) = E − π(x)
x<t Y (x)
PAGE 29
CHAPTER 2 ST 745, Daowen Zhang
π(x)[1−π(x)]
If we wanted to estimate Y (x)
, then using (2.d) we might think that
h i
dN (x) Y (x)−dN (x)
Y (x) Y (x)
Y (x) − 1
b
may be reasonable. In fact, we would then use as an estimate for Var(Λ(t)) the following
estimator; summing the above estimator over all grid points x such that x + ∆x ≤ t.
dN (x) h Y (x)−dN (x) i
X Y (x) Y (x)
d Λ(t))
Var( b = .
x<t Y (x) − 1
b
In fact, the above variance estimator is unbiased for Var(Λ(t)), which can be seen using the
following argument:
h i
dN (x) Y (x)−dN (x)
X Y (x) Y (x)
E
x<t Y (x) − 1
dN (x) h Y (x)−dN (x) i
X Y (x) Y (x)
= E
x<t Y (x) − 1
n o¯
X
dN (x) Y (x)−dN (x) ¯
Y (x) Y (x) ¯
= E E ¯ H(x) (double expectation again)
¯
x<t Y (x) − 1 ¯
" #
X π(x)[1 − π(x)]
= E (by (2.d))
x<t Y (x)
b
= Var[Λ(t)].
b
What this last argument shows is that an unbiased estimator for Var[Λ(t)] is given by
dN (x) h Y (x)−dN (x) i
X Y (x) Y (x)
.
x<t Y (x) − 1
Note: If the survival data are continuous (i.e., no ties) and ∆x is taken small enough, then
and
X dN (x)
d Λ(t))
Var( b = ,
x<t Y 2 (x)
PAGE 30
CHAPTER 2 ST 745, Daowen Zhang
Remark:
P dN (x)
• We proved that the Nelson-Aalen estimator x<t Y (x) is an unbiased estimator for
P
x<t π(x). We argued before that in the limit as ∆x goes to zero,
X dN (x) Z t dN (x)
becomes .
x<t Y (x) 0 Y (x)
X Z t
π(x) goes to λ(x)dx.
x<t 0
namely,
"Z #
t dN (x)
E = Λ(t).
0 Y (x)
b P dN (x)
• Since Λ(t) = x<t Y (x)
is made up of a sum of random variables that are conditionally
uncorrelated, they have a “martingale” structure for which there exists a body of theory
PAGE 31
CHAPTER 2 ST 745, Daowen Zhang
b
Λ(t) b
is asymptotically normal with mean Λ(t) and variance Var[Λ(t)], which can be estimated
unbiasedly by
dN (x) h Y (x)−dN (x) i
X Y (x) Y (x)
d Λ(t))
Var( b = ;
x<t Y (x) − 1
X dN (x)
d Λ(t))
Var( b = .
x<t Y 2 (x)
b
Let us refer to the estimated standard error of Λ(t) by
dN (x) h Y (x)−dN (x) i 1/2
X Y (x) Y (x)
b
se[Λ(t)] = .
x<t Y (x) − 1
b
The unbiasedness and asymptotic normality of Λ(t) about Λ(t) allow us to form confidence
intervals for Λ(t) (at time t). Specifically, the (1 − α)th confidence interval for Λ(t) is given by
b
Λ(t) b
± zα/2 ∗ se(Λ(t)),
where zα/2 is the (1 − α/2)th quantile of a standard normal distribution. That is, the random
interval
b
[Λ(t) b
− zα/2 ∗ se(Λ(t)), b
Λ(t) b
+ zα/2 ∗ se(Λ(t))]
This result could also be used to construct confidence intervals for the survival function S(t).
S(t) = e−Λ(t) ,
b b b b
[e−Λ(t)−zα/2 ∗se(−Λ(t)) , e−Λ(t)+zα/2 ∗se(Λ(t)) ],
PAGE 32
CHAPTER 2 ST 745, Daowen Zhang
meaning that this random interval will cover the true value S(t) with probability 1 − α.
An example: We will use the hypothetical data shown in Figure 2.3 to illustrate the calcu-
b
lation of Λ(t), d Λ(t),
Var b and confidence intervals for Λ(t) and S(t). For illustration, let us take
b b
S(t) = e−Λ(t) = e−0.804 = 0.448.
b
Note The above Nelson-Aalen estimate S(t) = 0.448 is different from (but close to) the
Kaplan-Meier estimate KM (t) = 0.411. It should also be noted that above confidence interval
b
for the survival probability S(t) is not symmetric about the estimator S(t). Another way of
getting approximate confidence intervals for S(t) = e−Λ(t) is by using the delta method. This
b ± z |f 0 (θ)|
f (θ) b σ b.
α/2
b
In our case, Λ(t) takes on the role of θ, Λ(t) b f (θ) = e−θ so that
takes on the role of θ,
b b
|f 0 (θ) = | − e−θ | = e−θ , and S(t) = e−Λ(t) .
PAGE 33
CHAPTER 2 ST 745, Daowen Zhang
b
S(t) b
∼ N(S(t), [S(t)]2 Var[Λ(t)]),
a
b
S(t) b
± zα/2 {S(t) b
∗ se[Λ(t)]}.
b
Remark: Note that [S(t)]2 Var[Λ(t)] b
is an estimate of Var[S(t)], b
where S(t) b
= exp[−Λ(t)].
b
Thus a reasonable estimator of Var(KM (t)) would be to use the estimator of Var[exp(−Λ(t))],
X dN (x)
b
[S(t)]2d b b
Var[Λ(t))] = [S(t)]2
.
x<t Y 2 (x)
This is very close (asymptotically the same) as the estimator for the variance of the Kaplan-
Note: SAS uses the above formula to calculate the estimated variance for the life-table estimate
Note: The summation in the above equation can be viewed as the variance estimate for the
b
cumulative hazard estimator defined by Λ KM (t) = −log[KM (t)]. Namely,
X dN (x)
b
Var{Λ KM (t)} = .
x<t [Y (x) − w(x)/2][Y (x) − dN (x) − w(x)/2]
In the example shown in Figure 2.3, using the delta-method approximation for getting a
confidence interval with the Nelson-Aalen estimator, we get that a 95% CI for S(t) (where t=17)
PAGE 34
CHAPTER 2 ST 745, Daowen Zhang
is
b b b
e−Λ(t) ± 1.96 ∗ e−Λ(t) se[Λ(t)] = e−0.801 ± 1.96 ∗ e−0.801 ∗ 0.381 = [0.114, 0.784].
b
The estimated se[S(t)] = 0.171.
If we use the Kaplan-Meier estimator, together with Greenwood’s formula for estimating the
which is close to the confidence interval using delta method, considering the sample size is only 10.
b
In fact the estimated standard errors for S(t) and KM (t) using delta method and Greenwood’s
formula are 0.171 and 0.175 respectively, which agree with each other very well.
Note: If we want to use R function survfit() to construct a confidence interval for S(t) with
in survfit(). The default constructs the confidence interval for S(t) by exponentiating the
confidence interval for the cumulative hazard using the Kaplan-Meier estimator. For example,
a 95% CI for S(t) is KM (t) ∗ [e−1.96∗se[ΛbKM (t)] , e1.96∗se[ΛbKM (t)] ] = 0.411 ∗ [e−1.96∗0.427 , [e1.96∗0.427 ] =
[0.178, 0.949].
1. exponentiating the 95% CI for cumulative hazard using Nelson-Aalen estimator: [0.212, 0.944].
PAGE 35
CHAPTER 2 ST 745, Daowen Zhang
3. exponentiating the 95% CI for cumulative hazard using Kaplan-Meier estimator: [0.178, 0.949].
4. Kaplan-Meier estimator together with Greenwood’s formula for variance: [0.068, 0.754].
These are relatively close and the approximations become better with larger sample sizes.
Of the different methods for constructing confidence intervals, “usually” the most accurate
is based on exponentiating the confidence intervals for the cumulative hazard function based on
Nelson-Aalen estimator. We don’t feel that symmetry is necessarily an important feature that
Summary
Q ³ ´
1. We first estimate S(t) by KM (t) = 1− d(x) b
, then estimate Λ(t) by Λ
x<t n(x) KM (t) =
X dN (x)
d Λ
Var{ b
KM (t)} =
x<t [Y (x) − w(x)/2][Y (x) − dN (x) − w(x)/2]
d
Var{KM d Λ
(t)} = {KM (t)}2 ∗ Var{ b
KM (t)}.
b b b
KM (t) ± zα/2 ∗ se[KM (t)], or e−ΛKM (t)±zα/2 ∗se[ΛKM (t)] = KM (t) ∗ e±zα/2 ∗se[ΛKM (t)]
b P dN (x)
2. We first estimate Λ(t) by Nelson-Aalen estimator Λ(t) = x<t Y (x) , then estimate S(t) by
b
S(t) = e−Λb(t) . Their variance estimates are given by
dN (x) h Y (x)−dN (x) i
X Y (x) Y (x)
d Λ(t)}
Var{ b =
x<t Y (x) − 1
Var{ b
d S(t)} b
= {S(t)}2 d Λ(t)}.
∗ Var{ b
The confidence intervals for S(t) can also be constructed in two ways:
b b b b b b
S(t) ± zα/2 ∗ se[S(t)], or e−Λ(t)±zα/2 ∗se[Λ(t)] = S(t) ∗ e±zα/2 ∗se[Λ(t)] .
PAGE 36
CHAPTER 2 ST 745, Daowen Zhang
Estimators of quantiles (such as median, first and third quartiles) of a distribution can be
Suppose we want to estimate the median S −1 (0.5) or any other quantile ϕ = S −1 (θ); 0 <
θ < 1. Then the point estimate of ϕ is obtained (using the Kaplan-Meier estimator of S(t))
An approximate (1 − α)th confidence interval for ϕ if given by [ϕbL , ϕbU ], where ϕbL satisfies
b
Proof: We prove this argument for a general estimator S(t). So if we use the Kaplan-Meier
b
estimator, then S(t) is KM (t). It can also be the Nelson-Aalen estimator. Then
P [ϕbL < ϕ < ϕbU ] = P [S(ϕbU ) < θ < S(ϕbL )] (note that S(t) is decreasing and S(ϕ) = θ)
b
S(ϕU ) + zα/2 ∗ se[S(ϕ U )] = θ.
b ϕ
P [S(ϕbU ) > θ] = P [S(ϕbU ) > S( b ϕ
bU ) + zα/2 ∗ se[S( bU )]]
" #
b ϕ
S( bU ) − S(ϕbU )
= P b ϕ
< −zα/2
se[S( bU )]
" #
b
S(ϕ U ) − S(ϕU )
≈ P b
< −zα/2
se[S(ϕ U )]
PAGE 37
CHAPTER 2 ST 745, Daowen Zhang
α
P [S(ϕbL ) < θ] ≈ .
2
Therefore,
µ ¶
α α
P [ϕbL < ϕ < ϕbU ] ≈ 1 − + = 1 − α.
2 2
We illustrate this practice using a simulated data set generated using the following R com-
mands
PAGE 38
CHAPTER 2 ST 745, Daowen Zhang
The true survival time has an exponential distribution with λ = 0.2/year (so the true mean
is 5 years and median is 5 ∗ log(2) ≈ 3.5 years). The (potential) censoring time is independent
from the survival time and has an exponential distribution with λ = 0.1/year (so it is stochas-
tically larger than the survival time). The Kaplan estimate (solid line) and its 95% confidence
intervals (dotted lines) are shown in Figure 2.5, which is generated using R function plot(fit,
xlab="Patient time (years)", ylab="survival probability"). Note that these CIs are
constructed by exponentiating the CIs for Λ(t). From this figure, the median survival time is
estimated to be 3.56 years, with its 95% confidence interval [2.51, 6.20].
Figure 2.5: Illustration for constructing 95% CI for median survival time
1.0
0.8
survival probability
0.6
0.4
0.2
0.0
If we use symmetric confidence intervals of S(t) to construct the confidence interval for the
median of the true survival time, then we need to specify conf.type=c("plain") in survfit()
as follows
PAGE 39
CHAPTER 2 ST 745, Daowen Zhang
> summary(fit)
Call: survfit(formula = Surv(obstime, status), conf.type = c("plain"))
The Kaplan estimate (solid line) and its symmetric 95% confidence intervals (dotted lines) are
shown in Figure 2.6. Note that the Kaplan estimate is the same as before. From this figure, the
median survival time is estimated to be 3.56 years, with its 95% confidence interval [2.51, 6.12].
Note: If we treat the censored data obstime as uncensored and fit an exponential model
to it, then the “best” estimate of the median survival time is 2.5, with 95% confidence interval
[1.8, 3.2] (using the methodology to be presented in next chapter). These estimates severely
PAGE 40
CHAPTER 2 ST 745, Daowen Zhang
Figure 2.6: Illustration for constructing 95% CI for median survival time using symmetric CIs
of S(t)
1.0
0.8
survival probability
0.6
0.4
0.2
0.0
Note:
If we want a CI for the quantile such as the median survival time with a different confidence
level, say, 90%, then we need to construct 90% confidence intervals for S(t). This can be done
If we use Proc Lifetest in SAS to compute the Kaplan-Meier estimate, it will produce 95%
confidence intervals for 25%, 50% (median) and 75% quantiles of the true survival time.
• Left censoring: This kind of censoring occurs when the event of interest is only known to
happen before a specific time point. For example, in a study of time to first marijuana use
(example 1.17, page 17 of Klein & Moeschberger) 191 high school boys were asked “when
did you first use marijuana?”. Some answers were “I have used it but cannot recall when
the first time was”. For these boys, their time to first marijuana use is left censored at
their current age. For the boys who never used marijuana, their time to first marijuana use
is right censored at their current age. Of course, we got their exact time to first marijuana
PAGE 41
CHAPTER 2 ST 745, Daowen Zhang
use for those boys who remembered when they first used it.
• Interval censoring occurs when the event of interest is only known to take place in an
for breast cancer patients treated with radiotherapy and radiotherapy + chemotherapy,
patients were examined at each clinical visit for breast retraction and the breast retraction
is only known to take place between two clinical visits or right censored at the end of the
• Left truncation occurs when the time to event of interest in the study sample is greater
than a (left) truncation variable. For example, in a study of life expectancy (survival time
measured from birth to death) using elderly residents in a retirement community (example
1.16, page 15 of Klein & Moeschberger), the individuals must survive to a sufficient age to
enter the retirement community. Therefore, their survival time is left truncated by their
age entering the community. Ignoring the truncation will lead to a biased sample and the
survival time from the sample will over estimate the underlying life expectancy.
• Right truncation occurs when the time to event of interest in the study sample is less
than a (right) truncation variable. A special case is when the study sample consists of
only those individuals who have already experienced the event. For example, to study the
induction period (also called latency period or incubation period) between infection with
AIDS virus and the onset of clinical AIDS, the ideal approach will be to collect a sample
of patients infected with AIDS virus and then follow them for some period of time until
some of them develop clinical AIDS. However, this approach may be too lengthy and costly.
An alternative approach is to study those patients who were infected with AIDS from a
contaminated blood transfusion and later developed clinical AIDS. In this case, the total
number of patients infected with AIDS is unknown. A similar approach can be used to
study the induction time for pediatric AIDS. Children were infected with AIDS in utero or
at birth and later developed clinical AIDS. But the study sample consists of children only
known to develop AIDS. This sampling scheme is similar to the case-control design. See
PAGE 42
CHAPTER 2 ST 745, Daowen Zhang
example 1.19 on page 19 of Klein & Moeschberger for more description and the data.
Note: The K-M survival estimation approach cannot be directly applied to the data with the
above censorings and truncations. Modified K-M approach or others have to be used. Similar to
right censoring case, the censoring time and truncation time are often assumed to be independent
of the time to event of interest (survival time). Since right censoring is the most common
censoring scheme, we will focus on this special case most of the time in this course. Nonparametric
estimation of the survival function (or the cumulative distribution function) for the data with
other censoring or truncation schemes can be found in Chapters 4 and 5 of Klein & Moeschberger.
PAGE 43