Lec5 Survival
Lec5 Survival
Lec5 Survival
Note: in this lecture, we will use the notations T1 , · · · , Tn as the response variable and all these random
variables are positive. These random variables will be called event time or death time. They often refer to
certain ‘time’ characteristics of each individual, e.g., the time that the individual is dead/gets a disease.
We assume that our data consists of IID random variables T1 , · · · , Tn ∼ F . The survival function S(t) of
this population is defined as
S(t) = P (T1 > t) = 1 − F (t).
Namely, it is just one minus the corresponding CDF. Although this definition is extremely simple and seems
to be very trivial from the CDF, later we will see that it turns out to be an elegant tool of modeling and
interpreting the data.
In medical research, the quantity Ti often refers to certain time characteristic of individual i. For instance,
the variable T may refer to the age that the individual i passes away. Then the survival function S(t) can be
interpreted as the chance that an individual is still alive after age t. If S(60) = 0.8, it means that there are
80% of the individuals in the population who will still be alive at the age 60. Namely, S(t) is the probability
that an individual will survive past time t.
Here are some basic properties about S(t):
A quantity that is often used along with the survival function is the hazard function. The hazard function
is
P (t < T1 ≤ t + ∆t|T1 > t) p(t)
h(t) = lim = ,
∆t→0 ∆t S(t)
d
where p(t) = dt F (t) is the PDF of random variable T1 . Note that you can also write the hazard function as
∂ log S(t)
h(t) = − .
∂t
How can we interpret the hazard function? The hazard function describes the ‘intensity of death’ at the
time t given that the individual has already survived past time t.
There is another quantity that is also common in survival analysis, the cumulative hazard function. The
cumulative hazard function is Z t
H(t) = h(s)ds.
0
5-1
5-2 Lecture 5: Survival Analysis
You can interpret H(t) as the cumulative amount of hazard up to time t. The cumulative hazard function
and survival function as linked as follows:
Rt
H(t) = − log S(t), S(t) = e−H(t) = e− 0
h(s)ds
.
Example 1. What is the survival function and hazard function of an exponential R.V.? Let T1 ∼ Exp(λ).
Then
p(t) = λe−λt , F (t) = 1 − e−λt for t ≥ 0
Thus,
S(t) = e−λt
and
h(t) = λ, H(t) = λt.
Namely, in an exponential distribution, the hazard function is a constant and the cumulative hazard is just
a linear function of time.
Example 2 (Weibull distribution). The Weibull distribution is a distribution with two parameters, λ
and k, and it is a distribution for positive random variable. Its PDF is
k
p(t) = λk · (λt)k−1 · e−(λt) , t ≥ 0.
When k = 1, it reduces to the exponential distribution. Its CDF and survival function are
k k
F (t) = 1 − e−(λt) , S(t) = e−(λt) .
And the hazard function and cumulative hazard function are
h(t) = λk · (λt)k , H(t) = (λt)k .
How do we estimate the survival function? There are three methods. The first method is a parametric
approach. This method assumes a parametric model (e.g., exponential distribution) of the data and we
estimate the parameter first then form the estimator of the survival function. A second approach is to
compute the EDF first and then converted it to an estimator of the survival function. The last approach
is a powerful nonparametric method called the Kaplan-Meier estimator and we will discuss it in the next
section.
Parametric Approach. Assume that we model the distribution as an exponential distribution with unknown
parameter λ. An estimator of λ is (you can check HW01 to see why this is an estimator)
b = 1 = Pnn
λ .
T̄n i=1 Ti
Let t1 < t2 < · · · < tm be the time point where the observations T1 , · · · , Tn actually take values.
To see how the estimator is constructed, we do the following analysis. We partition the time axis into disjoint
segments:
B0 = [0, t1 ), B1 = [t1 , t2 ), · · · , Bm−1 = [tm−1 , tm ), Bm = [tm , ∞).
Then we define
n
X
N` = number of individuals alive at (event happens after) the beginning of B` = I(Ti ≥ t` )
i=1
and
n
X
D` = number of individuals die (event happens at) in B` = I(Ti ∈ B` ).
i=1
Now we have converted T1 , · · · , Tn to (N0 , D0 ), · · · , (Nm , Dm ). Formally, N` should be defined as the number
of individuals at risk at the beginning of B` . Later we will explain what does the at risk means.
The Kaplan-Meier (KM) estimator estimates S(t) using
Y D`
SKM (t) =
b 1− .
N`
`:t` ≤t
What is the intuition of the KM estimator? We now consider t in different time segments and see if we can
gain some intuitions. Recall that the survival function
For t ∈ B0 = [0, t1 ), there is no event happens within this interval so SbKM (t) = 1.
For t ∈ B1 = [t1 , t2 ), the survival function
S(t) = P (T > t) = P (survives past time t) = P (survives in [0, t1 ) and in [t1 , t)) = P (survives in B0 and in B1 ).
Now recall that for two events A and B, P (A and B) = P (A)P (B|A). Thus,
Now for the next time segment B2 , we apply the same intuition. Namely, for t ∈ B2 ,
For the other segments, we can apply the same procedure to obtain the estimator. This gives you the intuition
of how the KM estimator is constructed. This derivation can also be seen in http://pages.stat.wisc.edu/
~ifischer/Intro_Stat/Lecture_Notes/8_-_Survival_Analysis/8.2_-_Kaplan-Meier_Formula.pdf.
Note that when we observe every individual’s event time (namely, there is no censoring – a mechanism we
will discuss later), the KM estimator and the EDF approach are the same.
Nelson-Aalen (NA) estimator is another powerful estimator of the survival function. It not only estimates
the survival function but also provides an estimate of the cumulative hazard. Actually, NA estimator first
estimate the cumulative hazard function and then convert it into an estimate of the survival function using
the relation S(t) = e−H(t) . Here is an intuition about how this estimator is constructed.
Recall that the KM estimator uses
Y D`
SbKM (t) = 1− .
N`
`:t` ≤t
Using the above derivation, the NA estimator estimates the cumulative hazard function by
X D`
H
b N A (t) =
N`
`:t` ≤t
Lecture 5: Survival Analysis 5-5
The theoretical analysis of the KM and NA estimators (such as the expectation and variance) involve
some non-trivial algebra. If you are interested in, I would recommend the following lecture note http:
//www4.stat.ncsu.edu/~dzhang2/st745/chap2.pdf.
5.2 Censoring
However, in reality, our data may not be so nice. We may not be able to observe the actual event time Ti
because of many complications. For instance, in a medical research, individuals may leave the study (called
dropout) so we only observe their leaving time instead of the actual death time. The phenomena that we
sometimes cannot observe the actual time but a ‘censoring time’ is called censoring in Statistics.
To model this process, we often need to introduce two other variables: Y and C. The T is the actual event
time of interest and C is the censoring time that is competing with T and Y is the actual observing time.
In most cases, we will consider the right-censoring problem where the three variables are related by
Y = min{T, C}.
We will assume that T and C are independent. Note that if what we observe is Y = max{T, C}, this problem
is called a left-censoring problem. Moreover, we not only observe Y , we also know if this Y comes from the
event time or censoring time. Namely, we have one extra variable δ such that δ = I(T < C).
When we only observe (Y1 , δ1 ), · · · , (Yn , δn ) instead of T1 , · · · , Tn , how can we infer the survival function T1 ?
This is the central question to many biostatistical research.
Because we have several R.V.s now, we will add subscript to denote the functions associated to each random
variable. Namely, FT , ST , hT , HT are the CDF, survival function, hazard function, and cumulative hazard
function of random variable T and FC , SC , hC , HC are those of random variable C and FY , SY , hY , HY are
those of random variable Y .
Here are some relations among these functions.
Note that δ is just a Bernoulli random variable with probability being 1 as P (T < C).
5-6 Lecture 5: Survival Analysis
When there is censoring, the EDF approach no longer works. However, the KM and NA estimators are still
valid. Essentially, the estimator is the same but we need to modify a little bit about N` and D` . As we have
mentioned, formally, N` should be defined as
What does the phrase at risk means? It refers to as being alive and not censored so it can be modified by
replacing Ti with Yi . Thus,
Xn
N` = I(Yi ≥ t` ).
i=1
For the quantity D` , it is still the number of events in the interval B` but we need to modify it by the number
of observed events in the interval. Therefore,
n
X
D` = I(Yi ∈ B` , δi = 1).
i=1
Note that parametric models may still be applicable during the censoring case and the estimator is often
done using a maximum likelihood approach, which is beyond the scope of this course so we will not cover it
here. Here is a lecture note about this topic: http://www4.stat.ncsu.edu/~dzhang2/st745/chap3.pdf
In reality, we often not only observe the event time for an individual but also have access to other covariates
of this individual. We often are interested in understanding how these covariates affect the survival function
of the event.
For instance, in a cancer study, we may have each individual’s age when they got cancer (the event time T )
and this individual’s gender, BMI, smoking habit, and education level. The other variables are the covariates
in this study. Health scientists are often interested in how these covariates change the survival function. Let
X denotes the covariates. A parameter of interest will be the survival function of T given X. Namely, it is
the conditional survival function
S(t|x) = P (T > t|X = x).
For instance, we may be interested in
We can then define the conditional hazard function and conditional cumulative hazard function as
∂ log S(t|x)
h(t|x) = − , H(t|x) = − log S(t|x).
∂t
Lecture 5: Survival Analysis 5-7
The Cox (proportional hazard) model is one of the most popular model combining the covariates and
the survival function. It starts with modeling the hazard function h(t|X = x):
h(t|X = x) = h0 (t) exp(xT β),
where β is the vector of coefficients of each covariate. The function h0 (t) is called the baseline hazard
function. Namely, the Cox model assumes that the covariates have a linear multiplication effect on the
hazard function and the effect stays the same across time.
This implies the conditional hazard function being
Z t
H(t|x) = exp(xT β) h0 (s)ds = exp(xT β)H0 (t),
0
where H0 (t) is the baseline cumulative hazard function. This further yields the conditional survival function
exp(xT β) T
S(t|x) = exp(−H(t|x)) = exp − exp(xT β)H0 (t) = exp (−H0 (t)) = S0 (t)exp(x β) ,
where
h(Ti |Xi ) exp(XiT β)
Li (β) = P =P T
.
j:Tj ≥Ti h(Tj |Xj ) j:Tj ≥Ti exp(Xj β)
• http://www4.stat.ncsu.edu/~dzhang2/st745/chap6.pdf
• http://www.public.iastate.edu/~kkoehler/stat565/coxph.4page.pdf
1 https://en.wikipedia.org/wiki/Semiparametric_model