Event History and Survival Analysis - v3
Event History and Survival Analysis - v3
Event History and Survival Analysis - v3
Silvia Avram1
1
University of Essex
1
event history methods model the survivor and hazard functions. These are
measures specific to time to event data and are introduced in Section 4.
Section 5 covers descriptive methods: the non-parametric estimators of the
survivor and cumulative hazard functions. Social scientists often deal with
data where time is measured discretely. Section 6 reviews two of the most
common models used in this case: the logistic and the complementary log-
log. Section 7 describes the most common modelling strategy adopted in
the case of continuous time data: the semi-parametric Cox model, as well as
two classes of parametric models. Finally, Section 8 briefly introduces more
advanced topics such as repeated events, competing risks and unobserved
heterogeneity.
2
We wish to model the time it takes for an event to occur, also called
survival time, failure time or duration. Duration captures the length of time
a unit is at risk of experiencing the event: the unit could experience the event
but is not observed to do so. A single person is at risk of getting married,
but a married person is not. A poor family is at risk of escaping poverty,
whereas a rich family is not. The period of time a unit is at risk is also
called risk period or spell. The risk period / spell ends when the event takes
place. But when does it begin? It begins when the unit is first capable of
experiencing the event. Sometimes this is a clear time point. An individual
starts being at risk of divorce the moment she gets married. But when does
a person start being at risk of marriage? When the start of the risk period is
not clear cut, the researcher must use her judgement and knowledge of the
data and subject to set an appropriate starting point (see Singer and Willet
[2003] for some examples).
Survival time (or duration) can be continuous or discrete. The distinction
determines which class of methods are suitable. Events happening in contin-
uous time can occur at any moment. Events happening in discrete time occur
at pre specified intervals. An employee can become unemployed at any mo-
ment. In contrast, a student can usually progress from one year to the next
only at the end of an academic term. In practice, distinguishing between con-
tinuous and discrete time is not always straightforward. The measurement
of time is intrinsically discrete. This is because time is recorded using some
unit (year, month, week, day) that can always be subdivided. Still, time can
be considered approximately continuous if the unit of measurement is small
relative to the time scale at which the event takes place. If we measure time
until marriage in days, an average person will typically be at risk for many
days before the event. There will be few people in the data who get mar-
ried on exactly the same day. When the unit used to measure time is large
relative to the scale of the event, time should be considered to be discrete.
If we measure time until recovering from flu in days, the average person will
typically be observed only a few days before recovery and there will be many
individuals who recover on the same day.
Often, events occur in continuous time but are observed only at pre speci-
fied intervals. For example, a yearly panel study that recorded marital status
might have information about the year a person got married but not the exact
day. In this case, time should be modelled as discrete.
3
3 Features of time-to-event data
Time-to-event data have a longitudinal dimension and exhibit one or more of
the following features: censoring, time-varying covariates and non-normality
of residuals.
3.1 Censoring
Censoring refers to exact durations (or survival times) being unknown for
some units in the data. The event is not observed because the study came
to an end, the unit dropped out of the study, or transitioned to another
state. For example, assume we wish to model the effect of training on the
probability of an employee being promoted and have longitudinal data on
employee pay levels. Figure 1 shows some of the possible situations that
might arise. Some of the employees do not receive a promotion by the time
our study ends (case B). Some employees drop out of the study, perhaps
because they move to a different area for reasons unrelated to their job (case
C). Finally, some employees temporarily leave the labour force for health ,
family or other reasons. For all these employees, the exact survival time,
i.e. the time at which a promotion would have occurred, is unknown. These
observations are right-censored.
Right-censoring is a form of missing data. Researchers sometimes deal
with this problem by dropping cases with missing values. In the case of time-
to-event data this can be problematic. Often, the share of censored cases is
high, perhaps higher than 50 percent. More importantly, because longer
spells are more likely to be right-censored, by excluding right-censored ob-
servations we are likely altering the distribution of survival times and poten-
tially biasing results. A different option would be to substitute the last known
time at which the unit is observed for the survival time. While not discarding
observations, this approach introduces significant non-random measurement
error and can also lead to biased estimates. We do not know the exact sur-
vival times for censored observations but we do know that they are later than
the censoring time. Ideally, we want to incorporate this information into our
models.
The process behind censoring is assumed to be independent of the event-
generating process. Unfortunately, this assumption is not testable empirically
but see Allison [2014] for a simple way to sensitivity test results. More often,
the researcher needs to rely on theory to determine whether the assumption
4
Figure 1: Different types of censoring and truncation
−5 0 5 10 15 20 25
Years since study began
Notes: A-completed spell; B-right censored spell (end of study); C-right censored
spell (attrition); D-left censored spell; E-right and left censored spell
is reasonable. For example, we know that people who become divorced are
more likely to move home and drop out from a panel study. In this case,
censoring is related to the event of interest because experiencing the event
increases the likelihood of right-censoring. In this chapter, we will assume
censoring is non informative.
Left-censoring refers to the situation when the start of the spell is un-
known. Whereas event history analysis can easily deal with right-censoring,
left-censoring poses much more serious challenges and typically requires mak-
ing untestable assumptions about the data generating process.
5
sections 6.2 and 7.1).
The residuals in time-to-event data are unlikely to be well approximated
by the normal distribution. For a detailed discussion of this problem see
Cleves et al. [2008].
3.3 Truncation
Truncation describes the situation where short or long spells are under sam-
pled due to the design of the study. Left truncation (also called delayed
entry) occurs when we only observe spells that have survived a minimum
amount of time, typically because the spells were already ongoing when we
began observing them (cases D and E in Figure 1). Spells that are ongoing
at the start of the study are more likely to have longer durations. To see
why, look at the example in Figure 2. Jack, Mary and Alice start a spell
of unemployment at exactly the same time before we start observing them.
Jack and Mary have shorter spells and are able to find new jobs before our
study begins. As a result, they are left censored. We do not know about their
unemployment spells because they already found a job by the time we start
observing them and they appear as employed in our data. In contrast, Alice
has a longer unemployment spell that is observed. Of all the unemployment
spells that started at the same time as those of John, Mary and Alice, only
spells that last at least t3 are observable to us. If we were to treat ongoing
spells the same as fresh ones, we would be artificially shifting the distribu-
tion of survival times to the right, towards longer durations. Fortunately,
left truncated spells can be used in our analyses once we condition on them
lasting long enough to be included in our study. These spells cannot tell us
anything about the risk of experiencing an event at shorter durations, but
they give us information about the risk at longer durations. In our example,
Alice’s spell cannot tell us anything about the risk of experiencing the event
at durations shorter than t3 because only spells longer than t3 are observed.
However, the same spell does contain information about the risk of finding a
job at durations larger than t3 and smaller than t4 . Note that left truncation
only poses a problem because we do not have information about the entire
universe of spells. Were we to collect the complete history of all unemploy-
ment spells for our study participants as opposed to only ongoing or newly
started spells, we would not have to concern ourselves with any adjustments.
Right truncation occurs when we observe only spells that ended with an
event by some specified date, usually the date of the study. Because an event
6
Figure 2: Left truncated spells
Alice
Mary
Jack
0 t1 t2 t3 t4
Time since becoming unemployed
is more likely to be observed the shorter the duration of the spell, long spells
will be under-represented. We can correct for this bias by conditioning on
the event occurring at the observed times.
7
Many of our respondents will have married by the time we begin the
study. Their marriage spell is ongoing at the time we sample them. We
might ask them about their marriage date but this date will fall outside the
window of observation. These spells are left-truncated and give rise to a stock
sample.
Classical event history methods have been developed for inflow samples.
Whenever the sample contains ongoing (or left-truncated)spells, the mod-
elling strategy needs to be adjusted to account for ongoing spells system-
atically over-representing longer spells and under-representing shorter ones.
In practice, this is done by adjusting the maximum likelihood estimator by
conditioning on survival up to the point of study entry. See Jenkins [2005],
Box-Steffensmeier and Jones [2004], Cleves et al. [2008] for details.
In practice, survey data is likely to contain mostly ongoing spells. Unless
complete histories are collected, fresh spells do not appear in time-to-event
data based on cross-sectional designs and they are likely to be a small pro-
portion of all spells observed in panel data.
P r(t ≤ T ≤ t + ∆t)
f (t) = lim (1)
∆t→0 ∆t
The cumulative density function of T , denoted by F (t), captures the
probability that an event will occur before or at time t: survival time T is
less than t. F (t) is an increasing function of t that lies between 0 and 1.
8
f (t) and F (t) are related. f (t) represents the derivative of F (t) with
respect to t and F (t) is the integral of f (t) between 0 and t.
The survivor function is defined as 1 − F (t). It captures the probability
that a spell that begun at t0 = 0 is still ongoing at time t.
P r(t ≤ T ≤ t + ∆t|T ≥ t)
θ(t) = lim (4)
∆t→0 ∆t
It can also be expressed as the ratio between the probability density
function f (t) and the survivor function S(t).
f (t)
θ(t) = (5)
S(t)
The hazard rate captures the rate at which units surviving up to time t
experience the event. It represents a measure of the intensity of risk at time
t. The hazard rate is positive and can vary over time in any arbitrary way. It
can be increasing, decreasing, U-shaped or take any other form. For example,
the risk of dying falls in the first years of life, reaches a long plateau, and in
later life starts increasing again.
Notice the difference between the hazard rate and the probability density
function. The former is a conditional probability whereas the latter repre-
sents an unconditional probability. The probability density function tells us
something about the risk of experiencing the event at time t among all our
units. The hazard rate describes the risk of experiencing the event at time
t only among those units who have survived up to time t. For example, the
probability of dying at age 60 for someone aged 60 (the hazard) is different
from the probability of dying at age 60 for a newborn baby (the probability
density).
9
There is a one-to-one relationship between f (t), F (t), S(t) and θ(t). If we
know one function, we can derive the other three. An important link between
the hazard rate and the survivor function is the cumulative integrated hazard
rate, H(t). The cumulative integrated hazard at time t is the integral (or
area under) the hazard rate function between 0 and t.
Z t
H(t) = θ(u)du (6)
0
Combining equations (5) and (6), we have
Z t
f (u)
H(t) = du = −ln(S(t)) (7)
0 S(u)
The cumulative hazard at time t measures the amount of risk that has
accumulated between the start of the process at t0 = 0 and time t. The
cumulative hazard, just like the hazard, is a rate and varies between 0 and
∞. It expresses the number of events that we expect to observe in a unit of
time. The cumulative hazard is increasing with survival time and is inversely
related to the probability of surviving. The more risk accumulates over time,
the lower the chances of surviving.
10
Figure 3: The density, survivor and hazard functions
(b) Survivor function
(a) Hazard rate
1
3 0.8
0.6
S(t)
θ(t)
2
0.4
1
0.2
0
0 1 2 3 4 5 0 1 2 3 4 5
Survival time(t) Survival time(t)
(c) Probability density function
0.4
0.3
f (t)
0.2
0.1
0
0 1 2 3 4 5
Survival time(t)
11
We start with the Kaplan-Meier estimator and, at first, assume no left-
truncation. Let t1 < t2 < ... < tn be the survival times observed in the data.
Then, for each time ti , i = 1...n, we can compute the following quantities:
• n(ti ): the number of units who are at risk of experiencing the event
between ti−1 and ti . This is the number of units who have not failed or
have been right censored before ti−1 .
• d(ti ) : the number of units who failed (experienced the event) between
ti−1 and ti
• c(ti ) : the number of units who are right censored between ti−1 and ti
d(ti )
ĥ(ti ) =
n(ti )
The Kaplan Meier estimator of the survivor function is
Y d(tj )
Ŝ(ti ) = 1− (8)
t ≤t
n(tj )
j i
12
Table 1: The estimation of the empirical survival function
people are censored in the first day. Because they are lost to observation,
they cannot be observed experiencing the event in day 2 (or any subsequent
days). They must be removed from the pool of people at risk. In the presence
of right censoring, we cannot compute the survivor function directly except
for the first interval (between 0 and the first observed survival time t1 ). For
the remaining intervals, the survivor function is computed as the product
between the survivor function in the previous interval and the proportion of
units surviving in the current interval. The survivor function in day 2 is the
product of the survivor function in day 1 and the proportion of people who did
not develop an infection in day 2 calculated relative to the number of people
21
who were at risk at the beginning of day 2: Ŝ(t2 ) = 0.92 ∗ (1 − 220 ) = 0.84.
The survivor function can only be computed for those times at which
events are observed. As a result, the survivor function is a step function, as
shown in Figure 4. It changes sharply at the observed survival times and
remains constant between those times.
What happens if we have left-truncated spell in our sample? Suppose
that 10 patients arrive at our unit 3 days after they have been operated on
elsewhere. None of them were showing signs of infection on arrival and we
observed them during day 4 and day 5. Two persons developed an infection
in day 4 and the remainder are right censored at the end of day 5. Table 2
shows how our Kaplan Meier estimator can be modified to take account of
these new spells.
m(ti ) now represents the number of patients whom we start observing at
ti . m(t1 ) is 250 and m(t4 ) is 10.We cannot use the 10 patients who entered
the study at the start of day 4 to compute the survivor function for durations
shorter than 4. This is because those patients who have been operated on
at the same time but developed an infection in the first three days are not
13
Figure 4: The Kaplan Meier estimator of the survivor function
0.8
0.6
S(t)
0.4
0.2
0
0 2 4 6 8
Time until infection(t)
Time At risk Failed Censored Entered Interval hazard rate Survivor function
t n(ti ) d(ti ) c(ti ) m(ti ) h(ti ) S(ti )
Day 1 250 20 10 250 0.08 [20/250] 0.92[1-0.08]
Day 2 220 21 14 0 0.09 [21/220] 0.84 [(1-0.09)* 0.92]
Day 3 185 12 9 0 0.06 [12/185] 0.79 [(1-0.06)* 0.84]
Day 4 178 10 14 10 0.06 [10/178] 0.74 [(1-0.06)* 0.79]
Day 5 154 5 149 0 0.03 [5/154] 0.72 [(1-0.03)* 0.74]
14
methods such as the log-rank and the Wilcoxon tests to check that they are
statistically different. See Cleves et al. [2008] for a description and STATA
implementation.
We can use the estimated survivor function to derive various quantiles of
survival time. For example, median survival time tm is the time at which
the survivor function is 0.5: S(tm )=0.5. The first quartile is the time tq1
at which the survivor function equals 0.75: S(tq1 ) = 0.75. More generally,
the p-th percentile of survival time tp is the smallest observable time beyond
which the proportion of units expected to survive falls bellow 1 − p/100:
S(tp ) ≤ (1 − p/100). In Figure 4, we can determine the first quartile of
survival time by drawing a horizontal line through 0.75. At the point at
which this line intersects the survivor curve we draw a vertical line. The
point at which this line intersects with the x axis is tq1 . In this case, tq1 = 4.
We can only determine the value of a percentile p if at least p% of the sample
experience the event. In Figure 4, we cannot determine median survival
time because less than 50% of the sample failed before the study concluded.
Median survival time is in this case unknown or missing.
Having estimated the survivor function, we can derive the cumulative
integrated hazard function using equation 7. Alternatively, the cumulative
hazard function can be computed directly using the Nelson-Aalen estimator.
X
Ĥ(ti ) = ĥ(ti ) (9)
j≤i
15
not well defined. To reconstruct the hazard rate, we need to smooth H(t)
(by for example connecting the dots directly, rather than through a step).
To do so requires assumptions about how the hazard rate changes between
observed survival times.
0.8
0.6
H(t)
0.4
0.2
0
0 2 4 6 8
Time until infection (t)
Life table estimators are very similar to Kaplan-Meier but have been
designed specifically for the case where the underlying survival times are
continuous but are only observed at discrete intervals. See Jenkins [2005],
Singer and Willet [2003] and Blossfeld et al. [1989] for more details.
16
Figure 6: Discrete time
Survival time
t0 t1 t2 t3 t4 t5
In discrete time, both the survivor and the failure functions will be step
functions (similar to the Kaplan Meier estimator).
The hazard function in discrete time is defined as the probability of ex-
periencing the event in the i-th interval conditional on surviving up to the
end of interval i − 1. The hazard function in this case is a true probability,
meaning it ranges between 0 and 1.
17
i
Y
S(i) = S(i−1)(1−h(i)) = S(i−2)(1−h(i−1))(1−h(i)) = (1−h(j)) (15)
j=1
18
Table 3: Data format for discrete time analysis
For a detailed review of how to set up the data in the correct format using
Stata see Longhi and Nandi [2015]. For a similar review using R see Mills
[2011].
19
event.
h(i)
log = α + βX1 + γX2 (i) (16)
1 − h(i)
Hazard rate
0.5 1
−5
logistic function
cloglog function
In equation 16, the hazard rate only depends on a time constant covariate
X1 and a time-varying covariate X2 (i). This implies that the hazard does
not vary with survival time. Because survival time does not appear on the
right-hand side, its coefficient is implicitly zero. Usually, this assumption is
not realistic. We can improve the model by adding survival time on the right
hand side, as in equation 17. This model is still relatively restrictive because
it assumes that the hazard changes linearly with time. Flexibility could be
increased by adding higher terms such i2 or i3 .
h(i)
log = α + βX1 + γX2 (i) + δi (17)
1 − h(i)
We can make the hazard rate fully flexible with respect to time by re-
placing δi with a full set of dummy variables corresponding to the observed
durations in the data, as in equation 18.
h(i)
log = α1 D1 + α2 D2 + ....αn Dn + βX1 + γX2 (i) (18)
1 − h(i)
D1 , D2 ...Dn represent dummy variables for each interval. For example, D1
will be 1 in rows corresponding to i = 1 and zero otherwise. The advantage
20
of this specification is that the hazard rate can change in an unrestricted way
from one interval to the next. The disadvantage is that as the number of
intervals increases , we are left with a large number of coefficients to interpret.
Moreover, if no event is observed in interval i then αi cannot be estimated.
If this situation occurs, intervals with no observed events need to be merged
to adjacent ones. A different approach that is particularly suitable when the
number of intervals is relatively large is to use a local smoothing function.
See Box-Steffensmeier and Jones [2004] for an application.
The
coefficients
of the logistic model represent the change in the logit
h(i)
(log 1−h(i) ) corresponding to a 1 unit change in X. Exponentiating both
sides of equation 17, we obtain a model in which the odds of experiencing
the event depend multiplicatively on the covariates.
h(i)
exp(log ) = exp(α + βX1 + γX2 (i) + δi)
1 − h(i)
(19)
h(i)
= exp(α) ∗ exp(βX1 ) ∗ exp(γX2 (i)) ∗ exp(δi))
1 − h(i)
Notice that when the dependent variable is the log of the odds, the terms
on the right hand side are added. When the dependent variable is the odds,
the terms are multiplied. A one unit change in X1 increases the odds of
experiencing the event exp(β) times. A different way to express the same idea
is to say that the odds of experiencing the event change by (exp(β) − 1)%.
Because the odds vary multiplicatively with the covariates, the logistic model
is a proportional odds model.
To obtain the effect on the hazard itself, called the marginal effect, is
more complicated and requires calculating predicted hazard rates at various
values of X and other covariates. The marginal effect is not constant but
depends on the value of X and the other covariates. Most statistical software
have packages for the estimation of marginal effects.
21
log(− log(1 − h(i))) = α1 D1 + α2 D2 + ...αn Dn + βX1 + γX2 (i) (20)
22
are needed. The first (Duration in Table 4 ) records the length of time the
unit has been observed to be at risk. If in our study of divorce we measured
time in days rather than years, Duration would record the number of days
a person has been married before divorcing or being right-censored. The
second variable (Event in Table 4 records whether the spell is completed
(Event=1) or censored (Event=0).
Table 4: Data format for continuous time analysis with time-constant covari-
ates
23
Table 5: Data format for continuous time analysis with time-varying covari-
ates
parameters that need to be estimated( Allison [2014]). There are two classes
of parametric models: proportional hazard (PH) and accelerated failure time
(AFT) models.
24
Figure 8: The PH property
log(θ(t)
θ(t)
50
−5
Log(hazard rate): men
Log(hazard rate):women
0
0 1 2 3 4 5 0 0.5 1 1.5
Survival time (t) Log(Survival time)
rate of women will be the same. This ratio is called the hazard ratio and is
constant for all survival times. Figure 8 shows an example when the hazard
is increasing over time. The graph on the left shows the hazard vs. survival
time. The hazard of divorce for women is three times higher than that of men
at all survival times. The hazard ratio in this case is 3 and does not change
with t. The graph on the right shows the log(hazard) vs. the log(survival
time). The difference between the logs of the hazard rate for women is now
constant (and equal to log(3)). The two lines are parallel. Notice how the
relationship is multiplicative when expressed in terms of the hazard rate
(equation 21) and additive when expressed in terms of the log(hazard rate)
(equation 22).
The function used to define λ(x) is most often the exponential function.
This ensures that the right hand side is always positive (as is the hazard
rate). Assuming there is only one covariate, the model becomes
25
constant. The hazard rate will vary with changes in X but it does not
depend on survival time(notice how t does not appear on the right hand
ride; its implied coefficient is zero).
26
where u is a disturbance term independent of X. Models in the AFT
class differ depending on the distribution used to model u. For example, if u
is assumed to have a normal distribution, we obtain the log-normal model. If
u is assumed to have a logistic distribution, we obtain the log-logistic model
and if u is assumed to have an extreme value distribution, we obtain the
Weibull model. Unlike the Weibull model, the log-logistic and log-normal
specifications allow the hazard to change direction, for example to increase
and then decrease. For a review of their properties and parametrizations see
Box-Steffensmeier and Jones [2004], Cleves et al. [2008], and Jenkins [2005].
The Weibull model is the only class that satisfies both the PH and AFT
properties.
Only if all spells in the data are completed (i.e. there is no right-
censoring), AFT models can be estimated using standard OLS regression
(see [Allison, 2014]). In the presence of censoring, the estimation can be
done using maximum likelihood.
Equation (30) can be re-written (once we exponentiate) as
ti = exp(βXi ) ∗ exp(σui ) or
exp(σui ) = exp(−βXi ) ∗ ti (31)
τi = exp(−βXi ) ∗ ti
The effect of the covariates is to change the scale of the survivor function
by a constant factor Ψ = exp(−βXi ). One common analogy used to explain
the AFT property is the comparison of human-years and dog-years (see Mills
[2011]).Each dog-year is the equivalent of seven human years. In this case,
time passes seven times faster for dogs than for humans (ψ = 7). Figure 9
illustrates the AFT property graphically. The proportion of dogs expected
to survive beyond 10 years is around 20%, the same as the proportion of
humans expected to survive beyond 70 years.
The estimated coefficients in an AFT model can be interpreted as the
proportionate change in survival time associated with a 1 unit increase in
27
Figure 9: The AFT property
1
Survivor function: humans
Survivor function: dogs
0.8
0.6
S(t)
0.4
0.2
0
10 30 50 70 90
Survival time (t)
X. The exponentiated coefficient exp(β) is also called the time ratio. The
effect of a 1 unit change in X is to multiply the expected survival time by
exp(β). Notice how the AFT and PH parametrizations imply coefficients
with different signs. A higher hazard rate is associated with shorter survival
times and vice-versa.
Choosing a parametric specification relies on theory and initial inspec-
tion based on non-parametric methods. Unfortunately, often there is little
theory to guide the choice [Box-Steffensmeier and Jones, 2004, Blossfeld and
Rohwehr, 2002]. Choosing the wrong parametrization can bias both the
estimated duration dependence and the estimated effects of the other covari-
ates. Box-Steffensmeier and Jones [2004] and Cleves et al. [2008] provide a
description of methods to choose a parametrization based on nested mod-
els. Blossfeld and Rohwehr [2002] describe methods to check distributional
assumptions.
28
model [Cox, 1972], probably the most popular model for analysing continuous
time data in the social sciences. The attractiveness of the Cox model is that
it does not impose any functional form on duration dependence. θ0 (t) is
unspecified and thus can take any form. The downside is that the Cox model
has nothing to say about duration dependence (although an approximation
of the baseline hazard can be obtained). As a result, the model is most
useful when the primary interest lies in the relationship between the hazard
rate and the covariate X and duration dependence is viewed as nuisance.
The Cox model is also called a semiparametric model because it makes no
assumption about how the hazard rate varies with survival time but does
specify the relationship between the hazard and X (the log of the hazard
varies linearly with X).
We can estimate this model using a method called partial likelihood (see
Jenkins [2005], Allison [2014] and Cleves et al. [2008]). This method allows
us to estimate the parameter β without estimating the parameters of dura-
tion dependence, θ0 (t). It requires the PH assumption to hold and no (or
few) identical survival times (called ’ties’)in the data. The latter condition
is needed because partial likelihood uses information about the ordering of
events as opposed to survival times themselves. When events occur at the
same time, a clear ordering of events cannot be generated. Approximation
methods have been developed to deal with ’ties’ (see Cleves et al. [2008], Box-
Steffensmeier and Jones [2004] but these approximation become increasingly
poor as the number of ’ties’ increases. In this case, it is advisable to use
discrete time methods.
An expanded representation of the Cox model is given in equation 33
where X1 −Xk are the covariates to be included in the model. The covariates
can be time-constant or time-varying.
29
the hazard rate corresponding to an individual with all covariates equal to
zero X1 = X2 = ... = Xk = 0. In a model of the hazard rate of divorce in
which X1 = 1 for females and X2 (t) represents earnings, the baseline hazard
corresponds to a male with zero earnings. Most statistical software packages
report the quantity exp(β) called the hazard ratio rather than β. The hazard
ratio is always positive (as it is the result of an exponential transformation).
A value larger than 1 implies that individuals with higher values on X have
an increased hazard and thus experience the event sooner. A value equal
to 1 implies that there is no relationship between X and the hazard rate
and a value between (0, 1) implies that individuals with higher values on X
have lower values on the hazard and thus experience the event later. An
alternative interpretation is that (exp(β) − 1) ∗ 100 represents the percentage
change in the hazard rate associated with a 1 unit change in X.
The baseline hazard ratio is left unspecified and is not estimated. How-
ever, approximations of the survivor function and the cumulative hazard
function can be recovered based on β1 X1i + β2 X2i + ... + βk Xki , also called
the risk score. The exact estimation procedure is not straightforward but it
closely resembles the Kaplan-Meier estimator. In fact, a Cox model with-
out any covariates produces the same survivor function as the Kaplan-Meier
estimator and the same cumulative hazard function as a Nelson-Aalen es-
timator. When covariates are included, the survivor function / cumulative
hazard produced by the Cox model can be thought of as covariate-adjusted
Kaplan-Meier/ Nelson-Aalen estimates [Cleves et al., 2008]. Both functions
will be step functions. The baseline hazard can subsequently be recovered
by smoothing the cumulative hazard function.
We can test the PH assumption in two ways, i.e. graphically and numeri-
cally. There are many graphical methods available but the most widely used
is the log minus log survivor plot. It relies on testing the relationship between
the cumulative hazards of two (or more groups) groups. Recall equation 33
describes how the hazard can be decomposed into the baseline hazard and
the shifter exp(βX). We can integrate this equation to obtain
30
log H(t, X) = log(H0 (t)) + βX (36)
31
will need to specify θ0 (t). The exact specification is not important: any func-
tion of t can be used. In practice, the most common functional forms used
are the linear( θ0 (t) = t ) and the logarithmic (theta0 (t) = log(t)).
The Cox model assumes proportionality of hazards for all the covariates
included. The previous two methods can be used to check one covariate
at a time. If there are many covariates in the model, a more convenient
method is a test based on Schoenfield residuals. The test works in two
steps. First, a Cox model is fit to the data and a set of residuals ηij is
calculated for each covariate Xi and each individual j. In the second step,
a regression of these residuals versus survival time is fitted. The coefficients
are then tested for statistical significance. A significant coefficient indicated
that the PH assumption is violated. The Schoenfield residuals can be tested
simultaneously for all the covariates included in the model. See [Cleves et al.,
2008] for a detailed explanation of how the residuals are constructed and how
the test can be carried out in STATA.
32
divorce. A retired person is no longer at risk of finding a job. Competing risks
models are a class of event history methods designed to deal with multiple
types of events.
The methods discussed in this chapter can easily be extended to accom-
modate competing risks if the processes generating the different types of
events are independent. In continuous time, each event type can be mod-
elled separately treating the occurrence of the other event type as censoring.
See Jenkins [2005], Box-Steffensmeier and Jones [2004] and Mills [2011] for
detailed discussions. In discrete time, the logistic regression is replaced by
a multinomial logistic regression (see Jenkins [2005]). Andersen et al. [2012]
provide a good introduction to the specific problems posed by competing
risks.
33
assume unobserved heterogeneity has a particular distribution (usually the
gamma) and integrate it out [Lancaster, 1990, Abbrig and Van den Berg,
2007]. Heckman and Singer [1984] have proposed a non-parametric model
where unobserved heterogeneity takes an arbitrary number of discrete val-
ues called mass points. Estimation remains non-trivial and in some cases
model convergence may not be achieved. All methods assume unobserved
heterogeneity to be time invariant.
References
Jaap H. Abbrig and Gerard J. Van den Berg. The unobserved heterogeneity
distribution in duration analysis. Biometrika, 94(1):87–99, 2007.
Leila Amorim and Jianwen Cai. Modelling recurrent events: A tutorial for
analysis in epidemiology. International Journal of Epidemiology, 44(1):
324–333, 2015.
Hans-Peter Blossfeld, Alfred Hamerle, and Karl Ulrich Mayer. Event History
Analysis. Statistical Theory and the Application in the Social Sciences.
Lawrence Erlbaum Associates, New Jersey, 1989.
34
David Cox. Regression models and life-tables. Journal of the Royal Statistical
Society, 34(2):187–220, 1972.
James Heckman and Burton Singer. A method for minimizing the impact of
distributional assumptions in econometric models of duration data. Econo-
metrica, 52(2):271–320, 1984.
Simonetta Longhi and Alita Nandi. A Practical Guide to Using Panel Data.
Sage, London, 2015.
Melinda Mills. Introducing Survival and Event History Analysis. Sage, Lon-
don, 2011.
35