0% found this document useful (0 votes)
26 views38 pages

Regression Lecture Survival Analysis

Uploaded by

Alem Ayahu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views38 pages

Regression Lecture Survival Analysis

Uploaded by

Alem Ayahu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

More Statistics tutorial at www.dumblittledoctor.

com

Survival Analysis
A Brief Introduction
More Statistics tutorial at www.dumblittledoctor.com

2
More Statistics tutorial at www.dumblittledoctor.com

1. Survival Function, Hazard


Function
y In many medical studies, the primary
endpoint is time until an event occurs
(e.g. death, remission)
y Data are typically
yp y subject
j to censoringg
(e.g. when a study ends before the event
occurs)
y Survival Function - A function
describing the proportion of individuals
surviving
i i to or beyondb d a given
i time.
i
Notation:

◦ T: survival time of a randomly selected ⎞
⎛ ⎞ ⎟
individual S (t ) = P(T ≥ t ) = exp ⎜ − ∫ λ (u )du
t

⎝ 0 ⎠
◦ t: a specific point in time. 3
More Statistics tutorial at www.dumblittledoctor.com

Hazard Function/Rate
y Hazard Function λ(t): instantaneous
failure rate at time t given that the
subject has survived upto time t.
P(t ≤ T < t + δ | T ≥ t ) P(t ≤ T < t + δ )
λ ( t ) = limδ →0+ =
δ P(T ≥ t ) × δ
S (t ) − S (t + δ ) 1 f (t )
= limδ →0+ × =
δ S (t ) S ( t )
d
f (t ) = F (t )
dt
y Here f(t) is the probability density
function of the survival time T. That
is, F ( t ) = 1 − S ( t ) = P (T ≤ t )
y where F(t) is the cumulative 4

di ib i i
More Statistics tutorial at www.dumblittledoctor.com

2. The Key Word is


‘Censoring’
y Because of censoring,
g, many
y common
data analysis procedures can not
p
be adopted directly.
y
y For example, one could use the
logistic regression model to model
the relationship between survival
probability and some relevant
covariates
◦ However one should use the customized
logistic regression procedures
designed to account for censoring 5
More Statistics tutorial at www.dumblittledoctor.com

Key Assumption:
Independent Censoring
y Those still at risk at time t in
the study are a random sample of
the population at risk at time t,
for all t

y This assumption
p means that the
hazard function, λ(t), can be
estimated in a fair/unbiased/valid
/ /
way
6
More Statistics tutorial at www.dumblittledoctor.com
3A. Kaplan-Meier (Product-
Limit) Estimator of the
Survival Curve
y The Kaplan–Meier estimator is the
nonparametric maximum likelihood
estimate of S(t). It is a product of
th fformr − d
the r −d r −d
Sˆ (t ) = 1 1
× 2 2
× ... × i i
r1 r2 ri
rk
y is
t k the number of subjects alive
djust
k
before time tk
y d
denotes
t the
th number
b whoh died
di d att time
ti
7
More Statistics tutorial at www.dumblittledoctor.com

Kaplan Meier Curve,


Kaplan-Meier Curve
Example
Time ti # at # events Ŝ
risk
0 20 0
11.00
00
5 20 2 [1-
(2/20)]*1 00 0 90
(2/20)]*1.00=0.90
6 18 0 [1-
(0/18)]*0 90=0 90
(0/18)]*0.90=0.90
10 15 1 [1- 8
(1/15)]*0 90=0 84
More Statistics tutorial at www.dumblittledoctor.com

Kaplan Meier Curve


ence)
1.0
% Confide
0.9
viving (95%
0.7
7 0.8
Proporrtion Surv
0.6
0

0 5 10 15 20
Survival Time

9
More Statistics tutorial at www.dumblittledoctor.com

Figure 1. Plot of survival distribution functions for the NCI and


the SCI Groups
Groups. The Y Y-axis
axis is the probability of not declining to
GDS 3 or above. The X-axis is the time (in years) to decline.
(Barry Reisberg et al., 2010; Alzheimer & Dementia; in press.)
10
More Statistics tutorial at www.dumblittledoctor.com

3B. Comparing Survival


Functions
1.00

0 75
0.75
High
Survival
Distribution 0.50
Function
0.25
Low
Medium

0.00
0 10 20 30 40 50 60

Time
11
More Statistics tutorial at www.dumblittledoctor.com

Log--Rank Test
Log

The log-rank test


• tests whether
hethe the survival
s i al functions
f nctions are
a e
statistically equivalent
• is a large-sample chi-square test that uses the
observed and expected cell counts across the
event times
• has maximum power when the ratio of hazards
is constant over time.

12
More Statistics tutorial at www.dumblittledoctor.com

Wilcoxon Test

The Wilcoxon test


• weights
g the observed number of events
minus the expected number of events by
the number at risk across the event times
• can be biased if the pattern of censoring is
different between the groups.

13
More Statistics tutorial at www.dumblittledoctor.com

Log-rank versus Wilcoxon


Log-
Test
Log-rank test
• is more sensitive than the Wilcoxon test to
d ff
differences between
b groups in later
l points in
time.
Wilcoxon test
• is more sensitive than the log-rank test to
differences between groups that occur in
early points in time.

14
More Statistics tutorial at www.dumblittledoctor.com

4. Two Parametric
Distributions
y Here we p present two most notable
models for the distribution of T.
y Exponential distribution:λ (t ) = λ
y Weibull distribution:
λ (t ) = λp(λt ) = pλ × t
p −1 p p −1

◦ Its survival function:


⎛ t ⎞
(
S (t ) = exp⎜⎜ − ∫ pλ u du ⎟⎟ = exp − (λt ) p
p p −1
)
⎝ 0 ⎠
◦ Thus:
Th l (− ln(
ln l (S (t ) ) = p(ln( l (λ ) )
l (t ) + ln(
15
More Statistics tutorial at www.dumblittledoctor.com

Weibull Hazard Function,


Plot

16
More Statistics tutorial at www.dumblittledoctor.com

5 Regression Models
5.
y The Exponential
p and the Weibull
distribution inspired two
pparametric regression
g
approaches:
1. Parametric proportional hazard
model – this model can be
generalized to a semi-parametric
semi parametric
model: the Cox proportional
hazard model
2. Accelerated failure time model
17
More Statistics tutorial at www.dumblittledoctor.com

Proportional Hazard Model


y In a regression model for survival
analysis one can try to model the
dependence on the explanatory
variables
i bl bby ttaking
ki ththe ((new)
)
hazard rate to be:
λ = λ0 × c( β 0 + β1 xi1 + β 2 xi 2 + ... + β k xik )

y Hazardd rates bbeing


H i positive
i i iit iis
natural to choose the function c
such that c(β x)
c(β,x) is positive
irrespective the values of x.
18
Proportional Hazard Model
y Thus a good choice is: c (.) = exp(.)
y The resulting proportional hazard
model is:
λ = λ0 × exp(β 0 + β1 xi1 + β 2 xi 2 + ... + β k xik )

y For the Weibull distribution we


λ = pλ0 × t × exp( β 0 + β1 xi1 + β 2 xi 2 + ... + β k xik )
have:
p p −1

λ = λ0 × exp((β 0 + β1 xi1 + β 2 xi 2 + ... + β k xik )


y For the Exponential distribution we
have: 19
More Statistics tutorial at www.dumblittledoctor.com

Accelerated Failure Time


Model
y For the Weibull distribution
(including the Exponential
distribution), the proportional p p
hazard model is equivalent to a
logg linear model in survival time
l (T ) = α 0 + α1 xi1 + α 2 xi 2 + ... + α k xik + σε
ln
T:
ε

y Here the error term can be


shown to follow the 2-parameter 20

Extreme Vvalue distribution


More Statistics tutorial at www.dumblittledoctor.com

Apply Both Models


Simultaneously
y If the underlying
y g distribution for
T is Weibull or Exponential, one
pp y both regression
can apply g models
simultaneously to reflect
different aspects of the survival
process. That is
y Prediction of degree of decline
using the Weibull proportional
hazard model
y Prediction of time of decline
21
using the accelerated failure time
More Statistics tutorial at www.dumblittledoctor.com

An Example
y In a recent p
paper
p (Reisberg
g et
al., 2010), we applied both
g
regression models to a dementia
study conducted at NYU:
Λ(T) = Λ0 (T)exp(α1 *Group +α2 * Age +α3 *Gender +α4 * Education +α5 * FollowUp)

log T = β 0 + β1 * Group + β 2 * Age + β 3 * Gender + β 4 * Education + β 5 * FollowUp + σε

y The results are shown next

22
More Statistics tutorial at www.dumblittledoctor.com

23
More Statistics tutorial at www.dumblittledoctor.com

6 Cox Proportional Hazards


6.
Model

24
More Statistics tutorial at www.dumblittledoctor.com

Parametric versus
Nonparametric Models

Parametric models require that


• the distribution of survival time is known
• the hazard function is completely specified
exceptt for
f the
th values
l off the
th unknown
k
parameters.
Examples include the Weibull model
model, the
exponential model, and the log-normal model.

25
More Statistics tutorial at www.dumblittledoctor.com

Parametric versus
Nonparametric Models

Properties
p of nonparametric
p models are
• the distribution of survival time is unknown
• the hazard function is unspecified.
unspecified
An example is the Cox proportional hazards
model.

26
More Statistics tutorial at www.dumblittledoctor.com
...
Cox Proportional Hazards
Model
{ β1 X i 1 +...+ β k X ik }
hi (t ) = h0 (t )e

Baseline Hazard Linear function of a


function - involves set of predictor
time but not predictor variables - does
variables not involve time

β = 0 → hazard ratio = 1
Two ggroups
p have the same
survival experience
27
More Statistics tutorial at www.dumblittledoctor.com

Popularity of the Cox


Model
The Cox proportional hazards model
• provides the primary information
d
desiredd ffrom a survivall analysis,
l h
hazard
d
ratios and adjusted survival curves, with
a minimum number of assumptions p
• is a robust model where the regression
coefficients closely approximate the
results from the correct parametric
model.

28
Partial Likelihood

Partial likelihood differs from maximum


likelihood because
• it does not use the likelihoods for all subjects
• it only considers likelihoods for subjects that
experience the event
• it considers subjects as part of the risk set
until they are censored.

29
More Statistics tutorial at www.dumblittledoctor.com

Partial Likelihood

Subject Survival Time Status


C 20
2.0 1
B 3.0 1
A 40
4.0 0
D 5.0 1
E 60
6.0 0

30
More Statistics tutorial at www.dumblittledoctor.com

Partial Likelihood
hc (2)
Lc =
hc (2) + hb (2) + ha (2) + hd (2) + he (2)

hb (3)
Lb =
hb (3) + ha (3) + hd (3) + he (3)

hd (5)
Ld =
hd (5) + he (5)

31
More Statistics tutorial at www.dumblittledoctor.com

Partial Likelihood
hd (5)
Ld =
hd (5) + he (5)

β1 X d 1 +β 2 X d 2 + .... + β k X dk
ho (5)e
Ld = β1 X d 1 +β 2 X d 2 + .... + β k X dk β1 X e1 +β 2 X e 2 + .... + β k X ek
ho (5)e + ho (5)e

β1 X d 1 +β 2 X d 2 + .... + β k X dk
e
Ld =
e β1 X d 1 +β 2 X d 2 + .... + β k X dk + e β1 X e1 +β 2 X e 2 + .... + β k X ek
32
More Statistics tutorial at www.dumblittledoctor.com

Partial Likelihood
y The overall likelihood is the
product of the individual
likelihood. That is:
L = Lc * Lb * Ld

33
More Statistics tutorial at www.dumblittledoctor.com

7. SAS Programs for Survival


Analysis
y There are three SAS procedures for analyzing
survival
ur i l data:
d t : LIFETEST,
LIFETEST PHREG,
PHREG and
nd LIFEREG.
LIFEREG
y PROC LIFETEST is a nonparametric procedure
for estimating the survivor function,
comparing the underlying survival curves of
two or more samples, and testing the
association of survival time with other
variables.
y PROC PHREG is a semiparametric procedure that
fits the Cox pproportional
p hazards model and
its extensions.
y PROC LIFEREG is a parametric regression
procedure for modeling the distribution of
survival time with a set of concomitant
variables. 34
More Statistics tutorial at www.dumblittledoctor.com

Proc LIFETEST
y The Kaplan
Kaplan-
p -Meier(K-M) survival
curves and related tests (Log-
(Log-
Rank, Wilcoxon) can be generated
g
using SAS PROC LIFETEST

PROC LIFETEST DATA=SAS-data-set


<options>;
TIME variable <*censor(list)>;
STRATA variable <(list)> <...variable
<(list)>>;
TEST variables; 35
More Statistics tutorial at www.dumblittledoctor.com

Proc PHREG
y The Cox (proportional
p p hazards)
regression is performed using SAS
PROC PHREG

proc phreg data=rsmodel.colon;


data=rsmodel colon;
model surv_mm*status(0,2,4) = sex
yydx / risklimits;
run;

36
More Statistics tutorial at www.dumblittledoctor.com

Proc LIFEREG
y The accelerated failure time
regression is performed using SAS
PROC LIFEREG

proc lifereg data=subset


outest=OUTEST(keep=_scale_);
model (lower
(lower, hours) = yrs
yrs_ed
ed
yrs_exp / d=normal; output out=OUT
xbeta=Xbeta; run;
37
More Statistics tutorial at www.dumblittledoctor.com

Selected References
y PD Allison (1995). Survival
Analysis Using SAS: A Practical
g
Guide. SAS Publishing.
y JD Kalbfleisch and RL Prentice
(2002).The Statistical Analysis
of Failure Time Data. Wiley-
Interscience.

38

You might also like