Survival Analysis Notes
Survival Analysis Notes
Survival Analysis:
Background:
Survival analysis is appropriate for analyzing data consisting of the length of time to some event.
It was first developed, and is still commonly used in, medical research, where the event in
question is death. For example, two cancer treatments might be given to two cohorts of cancer
patients, who have volunteered to participate in medical research, with the measured outcome
being time to death (or, putting it more positively, survival time). The likely objective of the
research would be to determine if one treatment or the other results in longer survival. For
ethical reasons, one of the treatments would normally be the best available current treatment, and
the other would be a new treatment for which there is a reasonable basis for hope that it will
improve on the best current treatment. Alternatively, the two treatments could be current
treatments with uncertainty as to which is better.
Mathematically, it makes no difference what event is being considered, so survival analysis can
be used in any situation involving time to an event. For example, it could be used to analyze
data on the time to commute on the subway, where the event is reaching ones destination. In
fact, the outcome variable doesnt have to be time. It just needs to be some positive going
quantity that is terminated by some occurrence. It could, for example, be the distance drilled
before striking oil. (Positive going means the variable only increases, and never reverses
direction and decreases. Time is an obvious case, but other quantities also have this property.)
A frequent characteristic of data of this type is censoring. Censoring most commonly occurs in
the form called right censoring. This means that for some reason, one might not observe all
subjects until the event in question occurs. One reason is that there is often a finite time
available for observation. Some subjects may not reach the event within the allotted observation
time. Another is that subjects may be lost to observation. A patient may drop out of a study, or
die for a reason unrelated to the illness under investigation. In this case, what we know is that
the subject survived for at least some period of time, but we do not know how much longer.
Thus, survival data commonly consist of two types of observations: actual survival times, and
censored times.
In cancer studies, for example, five years is a common period of observation. That is,
observation of the subjects is terminated at five years, and if any survive longer, the only data
available are that they survived at least five years. It is also common in medical studies that
some subjects will drop out before the observation period completes. They may lose their
enthusiasm for the study, move to a new location, have to terminate treatment due to excessive
side effects, etc. This also leads to censored values. For example, if a patient drops out after
three years, we know the survival time was at least three years, but not how much longer.
Two other forms of censoring are left censoring and interval censoring. Left censoring occurs
when the event takes place prior to the commencement of observation. For example, suppose we
are monitoring a marathon, and we arrive at the finish line after the first runners have already
crossed it. Say we arrived three hours after the race started. In that case, we know that the
fastest runners finished in less than three hours, but we dont know how much less. Interval
censoring occurs when we monitor a process periodically rather than continuously. We then
know that the event occurred between times A and B, but not exactly when. One way to handle
interval censored data is to take the average of the two observation times, but other approaches
are possible.
Right censored data is the most common, and in this treatment, it is the only form we will
discuss.
Censoring makes more conventional analyses such as those based on mean survival time or
median survival done less useful. The mean has to be calculated on non-censored data, since no
specific value is available for censored cases. But ignoring censored cases is inappropriate. The
more successful a treatment is, the more censored cases there will be. The median is defined if
fewer than 50% of cases are censored. But it is undefined if more than 50% are censored, again
reducing its usefulness. Survival analysis was devised to avoid these issues.
The Kaplan-Meier Plot:
So-called Kaplan-Meier plots are commonly used to provide a graphical representation of
survival data. One begins with a cohort of n individuals. When the first individual reaches the
event in question at time t, n-1 individuals remain. The proportion of surviving individuals is
now:
n
n
S
t
1
=
More generally, several individuals could reach the event at the same time. Say x
t
individuals
do. Then the proportion that survive is:
n
x n
S
t
t
=
To form a Kaplan-Meier plot, we start with a horizontal line at S
0
=1 (everyone surviving at time
0). When the first individual(s) experiences the event, we plot a downward step at time t to the
new proportion:
n
x n
t
Each individual that reaches the event results in another downward step of the plot line. It is also
common to place a vertical line whenever an individual reaches a censored time (that is, the
event has not occurred for this individual, but no further observation can be made of the
individual). Here is an example of a Kaplan-Meier plot:
2
In this plot, there are two groups of individuals, designated A and B. The Kaplan-Meier plot
provides a convenient visual comparison of survivorship in the two groups. For example, A and
B could be two groups of cancer patients on different treatments. Visually, it looks like survival
is better in group A. (A higher proportion remain alive at any given time.) However, naturally
we need to account for the possibility that the difference is just due to chance.
Comparing Two or More Groups:
While the Kaplan-Meier plot provides a useful visual presentation of survivorship results, we, of
course, would like a statistical test of whether two or more groups differ to a degree greater than
is likely through the operation of chance. There are various approaches to this. They can be
divided into three main categories:
A. Parametric approaches
B. Non-parametric
C. Semi-parametric (proportional hazards)
Parametric methods presume that survival follows a specific mathematical function. An example
is the negative exponential, defined by the function:
rt
t
e S S
=
0
Since we are dealing with proportions, and the proportion alive at time 0, S
0
, is 1, in the context
of survival analysis, the equation simplifies to:
rt
t
e S
=
Here is the negative exponential plotted for several values of r:
3
The statistical task here would be to estimate r for the survival curves, and to calculate a
probability that all the samples came from the same population (have the same true value for r).
Another approach is non-parametric. It tests the null hypothesis that two or groups have the
same survival function, without specifying what that function is. The detailed mathematics are
beyond this discussion, but the essence is that if two groups are following the same survivorship
curve, then at any given time, the differences between the proportion experiencing the event in
group A and the number experiencing it in group B should not be too large. That is, the curves
should not be too far apart. There are several approaches to assessing this. The test we will use
is called the log rank, also called the Mantel-Cox. It is based on determining expected number of
events in each group under the assumption of no true difference between the groups, and
comparing this with the observed numbers, using a chi-square test.
The third approach is the semi-parametric, or proportional hazards model. It assumes no
particular mathematical curve for survival, but does assume that two groups being compared will
exhibit a property called proportional hazards. The hazard is the risk of experiencing the event at
any given point in time. This risk can change over time. For example, in humans, the risk of
death is relatively high during the first year after birth (infant mortality), drops to a low value
after infancy, and then gradually rises with advancing age (V-shaped hazard function):
4
If two groups are being compared, say men versus women, or cancer patients under one
treatment versus cancer patients under another treatment, the proportional hazards model
assumes that at all times, the proportion between the hazard of one group and the hazard of the
other is constant. So, if at time t
1
group A experiences, say, a 10% greater hazard than group B,
it will also have a 10% greater hazard at time t
2
, t
3
, t
4
, etc. An advantage to the proportional
hazards model is that it allows a regression-like approach to modeling independent variables that
predict or affect the proportional hazard, while not requiring the strict assumptions of a fully
parametric model. The proportional hazards model is also called a Cox regression. The
proportional hazard model is expressed mathematically as:
...
) (
) (
log
2 2 1 1
0
1
+ + =
|
|
.
|
\
|
x b x b
t h
t h
e
The log
e
transform of the hazard proportion
|
.
|
\
|
) (
) (
0
1
t h
t h
is needed to make the relationship
linear. The denominator, h
0
(t) is the hazard as a function of t (time) under some baseline
condition. The quantities x
1
, x
2
, etc. are explanatory, or predictor variables in the same sense as
for multiple regression. In a medical context, they might be called risk factors. The baseline
hazard can be taken as the hazard when all xs have value 0, while h
1
(t) is the hazard for some
situation in which at least one x is non-zero. If a predictor is categorical, it will recoded as a
series of binary xs, in the same manner as we have seen previously. You may have noticed that
while this is a multiple regression model, there is no intercept term, b
0
. This is because when all
xs are zero, h
1
(t) =h
0
(t), so
|
.
|
\
|
) (
) (
0
1
t h
t h
=1, and its log is 0, so b
0
=0 and can be omitted. Also,
t represents time, and under the proportional hazards model, the quantity
|
.
|
\
|
) (
) (
0
1
t h
t h
is the
same at all times, so specific values for t do not need to be specified. (In other words, while the
hazards are functions of t, the proportion between them is constant.)
5
6
Performing a Survival Analysis with SPSS:
It does not appear that SPSS can perform a fully parametric survival analysis. It will perform
several non-parametric tests (we will use the log rank), and the proportional hazards Cox
regression.
We will illustrate with a data set collected by a physiology class at Fordham University. This
came from an experiment with the model organism C. elegans. This is a minute, almost
microscopic, nematode worm that has been used for numerous studies in genetics, development,
and neurophysiology. In the wild, C. elegans lives in rotting fruit, feeding on the bacteria that
develop in this situation. In the lab, it is grown on Petri plates seeded with bacteria. When a
hungry C. elegans encounters bacteria, it exhibits an immobilization response: that is, it stops
moving, presumably so as not to lose contact with the bacteria. It is known that this response is
mediated by the neurotransmitter serotonin. (A neurotransmitter is a chemical signaling
molecule released by one nerve cell, or neuron, to communicate with an adjacent neuron.)
Serotonin is also an important neurotransmitter in humans. The receiving neuron detects the
neurotransmitter by having a protein, called a receptor, in its cell membrane. Such a protein that
detects serotonin would be called a serotonin receptor. Once a neurotransmitter has done its job,
it needs to be cleared from the synaptic cleft (the space between adjacent neurons). In the case
of serotonin, this is accomplished by a protein in the cell membrane of the neuron that released
the serotonin. This protein retrieves the serotonin back into the neuron. The protein is called the
serotonin reuptake transporter. To summarize, the serotonin system has three components with
which we need to be concerned:
1. serotonin itself
2. the serotonin receptor
3. the serotonin reuptake transporter
There is a class of drugs called SSRIs (selective serotonin reuptake inhibitors) which, as their
name implies, interfere with the operation of the serotonin reuptake transporter. The original,
and still most famous, example is fluoxetine, marketed under the brand name Prozac. A couple
of other well-known ones are Paxil and Zoloft. In humans, they are used to treat several
psychological ailments, including major depression, anxiety disorders, and OCD (obsessive-
compulsive disorder), which seem to share in common an association with insufficient serotonin
activity. By slowing reuptake, SSRIs prolong the action of serotonin in the synaptic cleft.
(There are also non-pharmacologic treatments for these conditions, and therapy can involve
trying several treatments or combinations of treatments.)
The experiment involved three different genetic strains C. elegans:
1. N2: Wild-type C. elegans
2. MT9668: Mutation in the gene coding for the serotonin receptor, resulting in a less
effective receptor
3. MT9772: Mutation in the gene coding for the serotonin reuptake transporter, resulting in
a less effective transporter.
There were also three pharmacologic treatments:
1. Buffer: A buffer solution compounded to maintain the pH conditions, osmotic
concentration, and ionic needs of C. elegans. (Control condition.)
2. S10: Buffer with 10mg/ml of serotonin
3. F15: Buffer with 1.5 mg/ml of fluoxetine (Prozac)
For each trial, a C. elegans of one genetic strain was deposited in a drop of one of the solutions
on a slide. In liquid medium, C. elegans, normally swims vigorously. The worm was observed
under the microscope for up to five minutes. If the worm immobilized (stopped swimming)
within the five-minute period, the exact time in seconds was recorded with a stop watch. If the
worm was still swimming after five minutes, the time was recorded as 300 seconds (=five
minutes), and designated as censored. The classs data were pooled, so there are several trials
under each combination of genetic strain and pharmacologic treatment. The data were entered
into an SPSS data file, named C_elegans_s09.
Here is a portion of the file:
Each case (row) is the record for one worm. Strain is the genetic strain. Treatment is the
pharmacologic condition, TIME_Sec is the time to immobilization in seconds. Censor is the
censor status: 1=immobilization was observed, and the time is the actual time; 0=censored time
(worm still moving at five minutes).
We could analyze these data by strains within treatments, or by treatments within strains. We
will illustrate with the first of these.
We start with the non-parametric analysis, done by log-rank:
- On the toolbar near the top click Analyze > Survival > Kaplan-Meier
- Note that available variables are listed to the left. Click the variable name TIME_Sec,
and then the right arrow next to the Time: box.
- Click the variable name censor, and then the right arrow next to the Status: box.
- Click Define Event under the Status: box.
o Under Values indicating event has occurred, make sure Single value is chosen (it
is the default). In the box to its right, type the number 1. Remember, 1 is the
7
o Click Continue.
- Click the variable name Strain, and then the right arrow next to the Factor: box. This
indicates that we want to compare the strains.
- Click the variable name Treatment, and then the right arrow next to the Strata box. This
will produce a separate analysis for each pharmacologic treatment.
- Click the Compare Factor . . . button.
o Click Log Rank.
o Click For each stratum. This will compare the genetic strains within each stratum
defined by pharmacologic treatment.
o Click Continue
- Click the Options . . . button.
o Under Plots, click Survival (This will produce Kaplan-Meier survival plots for our
data.)
o Under Statistics, unclick Survival table(s). (Leaving this clicked will produce
some lengthy tables, not very informative for out purposes, that will clutter the
output.)
o Click Continue
- Click OK.
SPSS provides tables of means and medians by default. However, as previously discussed, these
are of limited usefulness when censored data are involved, and will not be considered further.
Let us start with the Kaplan-Meier plots. The most interesting one is the results in S10:
8
The gold line is the survival curve for strain N2, considered to be wild type. Strain MT9668
(blue line), with a defective serotonin receptor, is less affected by serotonin, and shows longer
times to immobilization, with some worms not immobilized at five minutes, as indicated by the
vertical lines near the end marking censored values. (It is not clear why there is a line at just
under 300 seconds: it may be an error, or than an observation that was terminated just short of
five minutes for some reason.) Strain MT9772 (green line), with a defective reuptake
transporter, shows faster immobilization than either N2 or MT9668. This is as expected, since it
is less able to clear any serotonin that diffuses into the synaptic clefts.
The results in buffer are also not too surprising:
Most of the worms remain active for the entire 300 seconds, and differences among strains
appear to be minimal. Possibly the few that did immobilize got tired from the vigorous
swimming that occurs in liquid media.
Finally here is the result in F15:
9
Fluoxetine seems to have resulted in a considerable immobilization response (compared with
buffer) and there does not appear to be much difference among the strains. The result for N2 is
reasonable: with its reuptake transporter disabled by fluoxetine, it should show an increased
immobilization response to its endogenous (self-produced) serotonin. The results from the other
strains were not expected, based on the literature used to design this exercise. One would have
expected fluoxetine to have a smaller effect on MT9772, with its defective transporter, and also a
smaller effect on MT9668, with its defective receptor, than on N2. Reasons for this deviation
from expectation are not known, and we have plans to investigate it further.
Naturally, we would like to have statistical confirmation of our impressions from the plots. This
can be obtained by looking at the omnibus tests from the log rank analysis:
This confirms that there is no evidence for a difference among strains in buffer (P =0.407 >
0.05), also no evidence for a difference in F15 (P = 0.899 >0.05), and strong evidence for a
difference in S10 (P =0.000 <0.05).
As in ANOVA, one would like to complete the analysis by looking at specific differences.
Survival analysis does not provide the possibility of constructing specific contrasts that are of
interest to the researcher. It does, however, allow pairwise comparisons. As in any multiple
comparison situation, it is necessary to adjust the alpha level if we desire to maintain an
experiment-wide type I error risk below 0.05. Nothing comparable to the Tukey is available in
10
survival analysis, so we are restricted to the Bonferroni adjustment. There are three pairs of
comparisons possible (N2 to MT9772, N2 to MT9668, and MT9668 to MT9772), so we need to
divide our desired alpha (presumably 0.05) by three:
0167 . 0
3
05 . 0
= = =
k
B
o
o
We therefore can declare significance only for P <0.0167. To obtain pairwise comparisons:
- Return to the Kaplan-Meier survival analysis menu
- Click Compare Factor
- Select Pairwise for each stratum
- Click Continue
- Click OK
Here is the result:
Since only the S10 results were significant under the omnibus test, these are the only results of
interest for pairwise comparisons. We see that the comparison of N2 to MT9772 shows a
significant difference (P =0.000 <0.0167), the comparison of MT9772 to MT9668 shows a
significant difference (P = 0.000 <0.0167), but the comparison of N2 to MT9668 misses
significance by a small amount (P = 0.029 >0.0167).
The Bonferroni multiple comparison correction is very conservative, and in the absence of a less
conservative correction like the Tukey, some statisticians would consider it acceptable to skip the
correction as long as the omnibus test is significant. Since this is the case for the S10 results,
skipping the correction would result in the N2 to MT9668 comparison being significant (P =
0.029 <0.05).
Next, we undertake a Cox regression (proportional hazards) model. We include strain and
treatment as two x-variables for the regression:
- Click Analyze > Survival > Cox Regression
- Put TIME_Sec in the Time box
- Put Censor in the Status box
- Click Define event
11
o Select Single value and type 1 in the white box to the right
o Click Continue
- Put Strain and Treatment in the Covariates box. (As with the SPSS logistic regression
procedure, there is no need to separate quantitative and categorical variables.)
- Click Options
o Select CI for exp(B)
o Click Continue
- Click OK
We start by looking at the coding of the categorical variables:
For strain, N2 (coded 0,0) is the reference group, and since it is "wild type", this seems to make
sense. For treatment, S10 is the reference group, and this makes less sense. It would be more
logical for plain buffer to be the reference treatment. As with logistic regression this can be
accomplished by recoding BUF, using the data transformation function of SPSS, so that it is
alphabetically last. We recode it to ZBU, and run the analysis again.
Here is the new categorical variable coding:
ZBU, the former BUF, is now the reference group for treatment.
Next, we look at the omnibus test:
The Cox regression routine can do stepwise regression, but we did not request this option, so
there is just one step, and the change from previous step and change from previous block parts of
12
the output are meaningless. We examine the probability under Overall, which is 0.000, clearly
less than 0.05. We therefore reject the null hypothesis of no effect of treatment and strain.
Next, we examine the breakdown for the effects of strain and treatment:
The overall effects of strain and treatment are significant (P =0.000 <0.05 in both cases).
Within strain, we see that both mutants differ significantly from N2. (For strain(1), which from
the coding table, we determine is MT9668, P =0.032 <0.05. For strain(2), MT9772, P =0.006
<0.05).
Within Treatment, we see that both agents differ from plain buffer (P =0.000 <0.05).
Note that the non-parametric log rank analysis did not actually provide a significance test of the
differences among treatments. All the tests were of strains within treatments. We inferred from
the Kaplan-Meier plots that fluoxetine resulted in more rapid immobilization compared to buffer,
but we didn't rigorously test this impression. However, the Cox regression does give us such a
test. From the coding table Treatment(1) is fluoxetine, and the proportional hazard function for
this is indeed significantly different from buffer (P=0.000 <0.05).
13