Event History and Survival Analysis - v3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Event history and survival analysis

Silvia Avram1
1
University of Essex

This version: 02 August 2019

1 Why event history analysis?


Understanding processes of change lies at the heart of many social sciences.
Why do some people marry early on in life while others remain single? Why
do some students drop out from university while others successfully graduate?
Why are some political coalitions lasting while others rapidly disintegrate?
How long does it take for an employee to receive a promotion? Why are
some families stuck in poverty while others escape it? These are all ques-
tions concerned with change over time and the occurrence of specific events.
Event history analysis is a set of methods that is concerned with explain-
ing the occurrence and timing of events. An event is a sharp, discontinuous
change. We say that our units of interest -individuals, households, firms,
political parties- transition from one state to another. Individuals transition
from being single to being married. Young people transition from student to
graduate. An employee transitions from a junior to a senior role. Families
transition between being in and out of poverty. In all of these cases, we are
dealing with entities who with the passage of time experience change.The
data describing these phenomena is called time-to-event data. Other terms
that have been used are transition data, survival data, or duration data.
This chapter provides an introduction to methods designed for time to
event data. Section 2 reviews the basic concepts used to describe this type
of data. Mean-modelling methods (including OLS) are inappropriate due to
censoring and truncation. These features are discussed in Section 3, as well
as the implications of different sampling frameworks. Instead of the mean,

1
event history methods model the survivor and hazard functions. These are
measures specific to time to event data and are introduced in Section 4.
Section 5 covers descriptive methods: the non-parametric estimators of the
survivor and cumulative hazard functions. Social scientists often deal with
data where time is measured discretely. Section 6 reviews two of the most
common models used in this case: the logistic and the complementary log-
log. Section 7 describes the most common modelling strategy adopted in
the case of continuous time data: the semi-parametric Cox model, as well as
two classes of parametric models. Finally, Section 8 briefly introduces more
advanced topics such as repeated events, competing risks and unobserved
heterogeneity.

2 Key concepts in event history analysis


To master event history methods, it is important to fully understand the
associated jargon. States are the categories of the outcome of interest. For
example, a person can be married or single. A family can be poor or non-
poor. The possible states constitute a set of discrete values, measured by
a categorical variable. The number of states is not restricted to two. A
person could be single, married, cohabiting, divorced or widowed and tran-
sitioning between these states over time. States can be physical (obese),
emotional(depressed) or social (married). What is important is that they
are clearly defined, non-overlapping (exclusive) and cover all units in the
data (exhaustive) [Singer and Willet, 2003]. Sometimes states are easily de-
fined: married vs. divorced. But often the researcher must establish defining
criteria : for example, when is someone considered to be depressed?
An event is a transition between two states. Many textbooks refer to
events as failures. The term originates in the biomedical and engineering
sciences where the event of interest was death or mechanical failure. De-
spite the negative connotations, event history methods do not care about
the desirability of an event. Events could be negative (illness, death), posi-
tive (graduation, promotion) or neutral (buying a house). Throughout this
chapter, we will assume that there is only one type of event, i.e. there are
only two possible states, and events are non-repeated, i.e. we observe a single
spell for each unit in our data. Extensions of event history methods that can
accommodate more than one type of event and repeated events are briefly
discussed in the last section.

2
We wish to model the time it takes for an event to occur, also called
survival time, failure time or duration. Duration captures the length of time
a unit is at risk of experiencing the event: the unit could experience the event
but is not observed to do so. A single person is at risk of getting married,
but a married person is not. A poor family is at risk of escaping poverty,
whereas a rich family is not. The period of time a unit is at risk is also
called risk period or spell. The risk period / spell ends when the event takes
place. But when does it begin? It begins when the unit is first capable of
experiencing the event. Sometimes this is a clear time point. An individual
starts being at risk of divorce the moment she gets married. But when does
a person start being at risk of marriage? When the start of the risk period is
not clear cut, the researcher must use her judgement and knowledge of the
data and subject to set an appropriate starting point (see Singer and Willet
[2003] for some examples).
Survival time (or duration) can be continuous or discrete. The distinction
determines which class of methods are suitable. Events happening in contin-
uous time can occur at any moment. Events happening in discrete time occur
at pre specified intervals. An employee can become unemployed at any mo-
ment. In contrast, a student can usually progress from one year to the next
only at the end of an academic term. In practice, distinguishing between con-
tinuous and discrete time is not always straightforward. The measurement
of time is intrinsically discrete. This is because time is recorded using some
unit (year, month, week, day) that can always be subdivided. Still, time can
be considered approximately continuous if the unit of measurement is small
relative to the time scale at which the event takes place. If we measure time
until marriage in days, an average person will typically be at risk for many
days before the event. There will be few people in the data who get mar-
ried on exactly the same day. When the unit used to measure time is large
relative to the scale of the event, time should be considered to be discrete.
If we measure time until recovering from flu in days, the average person will
typically be observed only a few days before recovery and there will be many
individuals who recover on the same day.
Often, events occur in continuous time but are observed only at pre speci-
fied intervals. For example, a yearly panel study that recorded marital status
might have information about the year a person got married but not the exact
day. In this case, time should be modelled as discrete.

3
3 Features of time-to-event data
Time-to-event data have a longitudinal dimension and exhibit one or more of
the following features: censoring, time-varying covariates and non-normality
of residuals.

3.1 Censoring
Censoring refers to exact durations (or survival times) being unknown for
some units in the data. The event is not observed because the study came
to an end, the unit dropped out of the study, or transitioned to another
state. For example, assume we wish to model the effect of training on the
probability of an employee being promoted and have longitudinal data on
employee pay levels. Figure 1 shows some of the possible situations that
might arise. Some of the employees do not receive a promotion by the time
our study ends (case B). Some employees drop out of the study, perhaps
because they move to a different area for reasons unrelated to their job (case
C). Finally, some employees temporarily leave the labour force for health ,
family or other reasons. For all these employees, the exact survival time,
i.e. the time at which a promotion would have occurred, is unknown. These
observations are right-censored.
Right-censoring is a form of missing data. Researchers sometimes deal
with this problem by dropping cases with missing values. In the case of time-
to-event data this can be problematic. Often, the share of censored cases is
high, perhaps higher than 50 percent. More importantly, because longer
spells are more likely to be right-censored, by excluding right-censored ob-
servations we are likely altering the distribution of survival times and poten-
tially biasing results. A different option would be to substitute the last known
time at which the unit is observed for the survival time. While not discarding
observations, this approach introduces significant non-random measurement
error and can also lead to biased estimates. We do not know the exact sur-
vival times for censored observations but we do know that they are later than
the censoring time. Ideally, we want to incorporate this information into our
models.
The process behind censoring is assumed to be independent of the event-
generating process. Unfortunately, this assumption is not testable empirically
but see Allison [2014] for a simple way to sensitivity test results. More often,
the researcher needs to rely on theory to determine whether the assumption

4
Figure 1: Different types of censoring and truncation

−5 0 5 10 15 20 25
Years since study began
Notes: A-completed spell; B-right censored spell (end of study); C-right censored
spell (attrition); D-left censored spell; E-right and left censored spell

is reasonable. For example, we know that people who become divorced are
more likely to move home and drop out from a panel study. In this case,
censoring is related to the event of interest because experiencing the event
increases the likelihood of right-censoring. In this chapter, we will assume
censoring is non informative.
Left-censoring refers to the situation when the start of the spell is un-
known. Whereas event history analysis can easily deal with right-censoring,
left-censoring poses much more serious challenges and typically requires mak-
ing untestable assumptions about the data generating process.

3.2 Time-varying covariates and residuals


Another feature of time-to-event data is the presence of time-varying covari-
ates. These are predictor variables that vary within unit over time, for ex-
ample the number of children during a marriage spell, the number of hours
spent searching during an unemployment spell,or the approval rating of a
prime-minister during her time in office. Time-varying covariates can be ac-
commodated by setting up the data in a (quasi) longitudinal format (see

5
sections 6.2 and 7.1).
The residuals in time-to-event data are unlikely to be well approximated
by the normal distribution. For a detailed discussion of this problem see
Cleves et al. [2008].

3.3 Truncation
Truncation describes the situation where short or long spells are under sam-
pled due to the design of the study. Left truncation (also called delayed
entry) occurs when we only observe spells that have survived a minimum
amount of time, typically because the spells were already ongoing when we
began observing them (cases D and E in Figure 1). Spells that are ongoing
at the start of the study are more likely to have longer durations. To see
why, look at the example in Figure 2. Jack, Mary and Alice start a spell
of unemployment at exactly the same time before we start observing them.
Jack and Mary have shorter spells and are able to find new jobs before our
study begins. As a result, they are left censored. We do not know about their
unemployment spells because they already found a job by the time we start
observing them and they appear as employed in our data. In contrast, Alice
has a longer unemployment spell that is observed. Of all the unemployment
spells that started at the same time as those of John, Mary and Alice, only
spells that last at least t3 are observable to us. If we were to treat ongoing
spells the same as fresh ones, we would be artificially shifting the distribu-
tion of survival times to the right, towards longer durations. Fortunately,
left truncated spells can be used in our analyses once we condition on them
lasting long enough to be included in our study. These spells cannot tell us
anything about the risk of experiencing an event at shorter durations, but
they give us information about the risk at longer durations. In our example,
Alice’s spell cannot tell us anything about the risk of experiencing the event
at durations shorter than t3 because only spells longer than t3 are observed.
However, the same spell does contain information about the risk of finding a
job at durations larger than t3 and smaller than t4 . Note that left truncation
only poses a problem because we do not have information about the entire
universe of spells. Were we to collect the complete history of all unemploy-
ment spells for our study participants as opposed to only ongoing or newly
started spells, we would not have to concern ourselves with any adjustments.
Right truncation occurs when we observe only spells that ended with an
event by some specified date, usually the date of the study. Because an event

6
Figure 2: Left truncated spells

Alice

Mary

Jack

0 t1 t2 t3 t4
Time since becoming unemployed

is more likely to be observed the shorter the duration of the spell, long spells
will be under-represented. We can correct for this bias by conditioning on
the event occurring at the observed times.

3.4 Sampling frameworks behind time-to-event data


Time to event data can be obtained from many designs including cross-
sectional surveys with retrospective questions, administrative records, panel
and cohort studies with prospective or retrospective questions, or a combi-
nation of these. In all cases, special attention needs to be paid to the types
of spells present in the data(see also Jenkins [2005]).
In the simplest case, units are observed from the moment they become
at risk until they experience the event or are right-censored. This situation
is called an inflow sample or a fresh spells sample. The important thing
to note is that all spells are observed from the start, i.e there is no left
censoring or left-truncation (although right-censoring might still be present).
For example, assume we are interested in divorce and have data from a panel
study where respondents are interviewed every year. The fresh spells are
those where we observe individuals marry during the study, i.e. after the
first wave. These spells are then subsequently followed until they end in
divorce or are right-censored.

7
Many of our respondents will have married by the time we begin the
study. Their marriage spell is ongoing at the time we sample them. We
might ask them about their marriage date but this date will fall outside the
window of observation. These spells are left-truncated and give rise to a stock
sample.
Classical event history methods have been developed for inflow samples.
Whenever the sample contains ongoing (or left-truncated)spells, the mod-
elling strategy needs to be adjusted to account for ongoing spells system-
atically over-representing longer spells and under-representing shorter ones.
In practice, this is done by adjusting the maximum likelihood estimator by
conditioning on survival up to the point of study entry. See Jenkins [2005],
Box-Steffensmeier and Jones [2004], Cleves et al. [2008] for details.
In practice, survey data is likely to contain mostly ongoing spells. Unless
complete histories are collected, fresh spells do not appear in time-to-event
data based on cross-sectional designs and they are likely to be a small pro-
portion of all spells observed in panel data.

4 The the survivor and the hazard rate func-


tions
Modelling the expected mean is infeasible with time-to-event data due to
censoring. Computing a mean requires knowledge of actual survival times
and this information is missing for right-censored spells. Instead, we use two
concepts specific to event history analysis: the survivor and the hazard rate
functions.
We begin by defining T as a positive continuous random variable repre-
senting survival times. The distribution of the possible values that T can
take is captured by its probability density function- f (t). f (t) maps the
probability that an event will occur at each possible duration t. Formally,

P r(t ≤ T ≤ t + ∆t)
f (t) = lim (1)
∆t→0 ∆t
The cumulative density function of T , denoted by F (t), captures the
probability that an event will occur before or at time t: survival time T is
less than t. F (t) is an increasing function of t that lies between 0 and 1.

F (t) = P r(T ≤ t) (2)

8
f (t) and F (t) are related. f (t) represents the derivative of F (t) with
respect to t and F (t) is the integral of f (t) between 0 and t.
The survivor function is defined as 1 − F (t). It captures the probability
that a spell that begun at t0 = 0 is still ongoing at time t.

S(t) = 1 − F (t) = P r(T ≥ t) (3)


S(t) is a decreasing function that lies between 1 and 0. At t = 0, all units
survive with certainty: S(0) = 1. As time passes, the proportion of units
who experience the event rises and the proportion of those who survive (i.e.
do not experience the event) falls. After enough time passes, all units will
experience the event: S(∞) = 0 and F (∞) = 1.
The hazard rate θ(t) is the probability that a spell that has lasted up to
time t will end between t and t + ∆t.

P r(t ≤ T ≤ t + ∆t|T ≥ t)
θ(t) = lim (4)
∆t→0 ∆t
It can also be expressed as the ratio between the probability density
function f (t) and the survivor function S(t).

f (t)
θ(t) = (5)
S(t)
The hazard rate captures the rate at which units surviving up to time t
experience the event. It represents a measure of the intensity of risk at time
t. The hazard rate is positive and can vary over time in any arbitrary way. It
can be increasing, decreasing, U-shaped or take any other form. For example,
the risk of dying falls in the first years of life, reaches a long plateau, and in
later life starts increasing again.
Notice the difference between the hazard rate and the probability density
function. The former is a conditional probability whereas the latter repre-
sents an unconditional probability. The probability density function tells us
something about the risk of experiencing the event at time t among all our
units. The hazard rate describes the risk of experiencing the event at time
t only among those units who have survived up to time t. For example, the
probability of dying at age 60 for someone aged 60 (the hazard) is different
from the probability of dying at age 60 for a newborn baby (the probability
density).

9
There is a one-to-one relationship between f (t), F (t), S(t) and θ(t). If we
know one function, we can derive the other three. An important link between
the hazard rate and the survivor function is the cumulative integrated hazard
rate, H(t). The cumulative integrated hazard at time t is the integral (or
area under) the hazard rate function between 0 and t.
Z t
H(t) = θ(u)du (6)
0
Combining equations (5) and (6), we have

Z t
f (u)
H(t) = du = −ln(S(t)) (7)
0 S(u)
The cumulative hazard at time t measures the amount of risk that has
accumulated between the start of the process at t0 = 0 and time t. The
cumulative hazard, just like the hazard, is a rate and varies between 0 and
∞. It expresses the number of events that we expect to observe in a unit of
time. The cumulative hazard is increasing with survival time and is inversely
related to the probability of surviving. The more risk accumulates over time,
the lower the chances of surviving.

5 Descriptive methods: the empirical survivor


and cumulative hazard functions
Researchers usually start their analysis with descriptive measures of their
outcome of interest such as frequency tables, means and variances. In event
history analysis, the counterparts are the empirical estimators of the survivor
function S(t) and the cumulative hazard function H(t). These estimators are
non-parametric. They make no assumption about the relationship between
survival time and S(t)/H(t) and are constructed directly from the data. They
provide information about how the risk of experiencing the event varies over
time and can be applied either to the entire sample or to subgroups defined
by observable characteristics (see Jenkins [2005]). The estimators are the
Kaplan-Meier (also called the product limit) estimator, the Nelson-Aalen
estimator and the life table. The first two are are used with continuous time
data whereas the latter is for discrete time. The logic behind them is very
similar.

10
Figure 3: The density, survivor and hazard functions
(b) Survivor function
(a) Hazard rate
1

3 0.8

0.6
S(t)
θ(t)

2
0.4
1
0.2

0
0 1 2 3 4 5 0 1 2 3 4 5
Survival time(t) Survival time(t)
(c) Probability density function

0.4

0.3
f (t)

0.2

0.1

0
0 1 2 3 4 5
Survival time(t)

11
We start with the Kaplan-Meier estimator and, at first, assume no left-
truncation. Let t1 < t2 < ... < tn be the survival times observed in the data.
Then, for each time ti , i = 1...n, we can compute the following quantities:

• n(ti ): the number of units who are at risk of experiencing the event
between ti−1 and ti . This is the number of units who have not failed or
have been right censored before ti−1 .

• d(ti ) : the number of units who failed (experienced the event) between
ti−1 and ti

• c(ti ) : the number of units who are right censored between ti−1 and ti

An approximation of the interval hazard rate between ti−1 and ti can be


obtained by dividing the number of those who failed in the interval d(ti ) by
the number who were at risk at the beginning of the interval n(ti ).

d(ti )
ĥ(ti ) =
n(ti )
The Kaplan Meier estimator of the survivor function is
Y d(tj )

Ŝ(ti ) = 1− (8)
t ≤t
n(tj )
j i

Table 1 shows an example. A sample of 250 patients in a hospital unit


have been followed for five days after surgery. Each day, the number of
patients who developed an infection ( d(ti )) and the number who were dis-
charged or moved and thus right censored (c(ti )) is recorded. In the first day,
20 people developed an infection and 10 were censored. In the second day,
21 developed an infection and 14 were censored and so on. At the end of day
5 (or the beginning of day 6), 132 patients have not developed an infection
and are all right censored.
At the end of day 1, the proportion of patients surviving is 1 minus the
20
fraction of people who experienced the event Ŝ(1) = 1 − 250 = 0.92. The
number of people who are at risk of being observed developing an infection
at the beginning of day 2 is n(t2 ) = 250 − 20 − 10 = 220. If there was
no right censoring in our sample, we could compute the survivor function
at the end of day 2 as one minus the proportion of people who experienced
the event in day 1 or in day 2. We cannot use this technique because 10

12
Table 1: The estimation of the empirical survival function

Time At risk Failed Censored Interval hazard rate Survivor function


t n(ti ) d(t) c(t) h(t) S(t)
Day 1 250 20 10 0.08 [20/250] 0.92[1-0.08]
Day 2 220 21 14 0.09 [21/220] 0.84 [(1-0.09)* 0.92]
Day 3 185 12 9 0.06 [12/185] 0.79 [(1-0.06)* 0.84]
Day 4 168 8 14 0.05 [8/168] 0.75 [(1-0.05)* 0.79]
Day 5 146 5 141 0.03 [5/146] 0.73 [(1-0.03)* 0.75]

people are censored in the first day. Because they are lost to observation,
they cannot be observed experiencing the event in day 2 (or any subsequent
days). They must be removed from the pool of people at risk. In the presence
of right censoring, we cannot compute the survivor function directly except
for the first interval (between 0 and the first observed survival time t1 ). For
the remaining intervals, the survivor function is computed as the product
between the survivor function in the previous interval and the proportion of
units surviving in the current interval. The survivor function in day 2 is the
product of the survivor function in day 1 and the proportion of people who did
not develop an infection in day 2 calculated relative to the number of people
21
who were at risk at the beginning of day 2: Ŝ(t2 ) = 0.92 ∗ (1 − 220 ) = 0.84.
The survivor function can only be computed for those times at which
events are observed. As a result, the survivor function is a step function, as
shown in Figure 4. It changes sharply at the observed survival times and
remains constant between those times.
What happens if we have left-truncated spell in our sample? Suppose
that 10 patients arrive at our unit 3 days after they have been operated on
elsewhere. None of them were showing signs of infection on arrival and we
observed them during day 4 and day 5. Two persons developed an infection
in day 4 and the remainder are right censored at the end of day 5. Table 2
shows how our Kaplan Meier estimator can be modified to take account of
these new spells.
m(ti ) now represents the number of patients whom we start observing at
ti . m(t1 ) is 250 and m(t4 ) is 10.We cannot use the 10 patients who entered
the study at the start of day 4 to compute the survivor function for durations
shorter than 4. This is because those patients who have been operated on
at the same time but developed an infection in the first three days are not

13
Figure 4: The Kaplan Meier estimator of the survivor function

0.8

0.6
S(t)

0.4

0.2

0
0 2 4 6 8
Time until infection(t)

Table 2: The estimation of the empirical survival function with left-


truncation

Time At risk Failed Censored Entered Interval hazard rate Survivor function
t n(ti ) d(ti ) c(ti ) m(ti ) h(ti ) S(ti )
Day 1 250 20 10 250 0.08 [20/250] 0.92[1-0.08]
Day 2 220 21 14 0 0.09 [21/220] 0.84 [(1-0.09)* 0.92]
Day 3 185 12 9 0 0.06 [12/185] 0.79 [(1-0.06)* 0.84]
Day 4 178 10 14 10 0.06 [10/178] 0.74 [(1-0.06)* 0.79]
Day 5 154 5 149 0 0.03 [5/154] 0.72 [(1-0.03)* 0.74]

observable to us. The estimation of S(ti ), i = 1, 2, 3 remains unchanged.


However, at the beginning of day 4, our sample at risk changes. To the
original 168 cases we add the 10 left truncated cases. The number of failed
cases in day 4 d(t4 ) becomes 10 (the original 8 plus 2 left-truncated patients).
h(t4 ) = 0.06 and S(t6 ) = 0.74. The sample at risk at the beginning day 5
n(t5 ) increases to 154 cases and S(t5 ) = 0.72. Notice how the presence of
the left truncated spells affected the calculation of the survivor functions at
t4 and t5 but not at earlier times.
The survivor curves of two groups can be compared using non-parametric

14
methods such as the log-rank and the Wilcoxon tests to check that they are
statistically different. See Cleves et al. [2008] for a description and STATA
implementation.
We can use the estimated survivor function to derive various quantiles of
survival time. For example, median survival time tm is the time at which
the survivor function is 0.5: S(tm )=0.5. The first quartile is the time tq1
at which the survivor function equals 0.75: S(tq1 ) = 0.75. More generally,
the p-th percentile of survival time tp is the smallest observable time beyond
which the proportion of units expected to survive falls bellow 1 − p/100:
S(tp ) ≤ (1 − p/100). In Figure 4, we can determine the first quartile of
survival time by drawing a horizontal line through 0.75. At the point at
which this line intersects the survivor curve we draw a vertical line. The
point at which this line intersects with the x axis is tq1 . In this case, tq1 = 4.
We can only determine the value of a percentile p if at least p% of the sample
experience the event. In Figure 4, we cannot determine median survival
time because less than 50% of the sample failed before the study concluded.
Median survival time is in this case unknown or missing.
Having estimated the survivor function, we can derive the cumulative
integrated hazard function using equation 7. Alternatively, the cumulative
hazard function can be computed directly using the Nelson-Aalen estimator.
X
Ĥ(ti ) = ĥ(ti ) (9)
j≤i

The Nelson-Aalen estimate of the cumulative integrated hazard at time


ti is the sum of the interval hazard rates between t0 = 0 and ti . Based on the
data in Table 1, we have H(t1 ) = 0.08, H(t2 ) = 0.17, H(t3 ) = 0.23 and so on.
The estimated cumulative hazard is a step function as well (see Figure 5).
It increases at the observed survival times and is constant in between these
times. In large samples, the Kaplan-Meier and Nelson-Aalen estimators are
asymptotically equivalent. In small samples, the survivor function is best
estimated using the Kaplan- Meier estimator and the cumulative hazard using
the Nelson-Aalen estimator.See Cleves et al. [2008] for how to implement the
Kaplan Meier estimator in STATA, and Mills [2011] for how to implement it
in R.
From the cumulative hazard, we could in theory derive the hazard rate
which is the slope (or the derivative with respect to t) of the cumulative
hazard. But because the cumulative hazard is a step function, the slope is

15
not well defined. To reconstruct the hazard rate, we need to smooth H(t)
(by for example connecting the dots directly, rather than through a step).
To do so requires assumptions about how the hazard rate changes between
observed survival times.

Figure 5: The Nelson-Aalen estimator of the cumulative hazard function

0.8

0.6
H(t)

0.4

0.2

0
0 2 4 6 8
Time until infection (t)

Life table estimators are very similar to Kaplan-Meier but have been
designed specifically for the case where the underlying survival times are
continuous but are only observed at discrete intervals. See Jenkins [2005],
Singer and Willet [2003] and Blossfeld et al. [1989] for more details.

6 Discrete time methods


6.1 The hazard and the survivor functions in discrete
time
In discrete time models, duration is recorded in intervals. These need not
be of the same length. Figure 6 provides an illustration. At the end of each
interval ti , we observe the state of each unit and determine how many failed
during the interval.
Indexing intervals by the time at the end ti and assuming each interval

16
Figure 6: Discrete time

Survival time
t0 t1 t2 t3 t4 t5

is 1 unit long, the probability density function is the probability of an event


being observed in interval i.

f (ti ) = f (i) = P r(T = i) (10)


The failure function at time ti is the cumulative probability of experienc-
ing the event before the end of interval i.

F (ti ) = F (i) = P r(T ≤ i)


i i
X X (11)
= P r(T = i) = f (i)
j=1 j=1

In a discrete time setting, the integral becomes a sum. The survivor


function becomes
i
X
S(ti ) = S(i) = 1 − F (i) = 1 − f (i) (12)
j=1

In discrete time, both the survivor and the failure functions will be step
functions (similar to the Kaplan Meier estimator).
The hazard function in discrete time is defined as the probability of ex-
periencing the event in the i-th interval conditional on surviving up to the
end of interval i − 1. The hazard function in this case is a true probability,
meaning it ranges between 0 and 1.

h(ti ) = h(i) = P r(T = i|T > i − 1) (13)


Using equations (5) and (12) we have

f (i) S(i − 1) − S(i) S(i)


h(i) = = =1− (14)
S(i − 1) S(i − 1) S(i − 1)
Equation (14) sums up the relationship between the hazard function and
the survivor function. We can re-express it as follows

17
i
Y
S(i) = S(i−1)(1−h(i)) = S(i−2)(1−h(i−1))(1−h(i)) = (1−h(j)) (15)
j=1

The probability of surviving up to the end of interval i is the probability


to survive up to the end of interval i − 1 and conditional on surviving until
the end of interval i − 1, to survive until the end of interval i.

6.2 Data format


Discrete time models can be easily estimated using standard methods for
binary dependent variables if the data is set up in the long format shown in
Table 3. Each unit has as many rows in the data as the number of intervals
she is observed to be at risk. In a panel study of divorce where time is
measured in years, a person that has been married for 5 years before divorcing
will have 5 rows associated with her. If a spell is right-censored, it will
contribute as many rows as the number of intervals observed to be at risk
before censoring. A person observed for 3 years after marriage who is not
divorced at the end of the observation period contributes 3 rows. Note that
different censoring times for different units are easily accommodated.
Three variables capture information about survival times. First, a unit
identifier links the rows belonging to the same unit. In our divorce example,
Person ID in Table 3 links the rows belonging to one person. Second, a
duration variable (Duration in Table 3) indexes the time interval to which
the row refers to. If the spell is observed from the start (i.e. it is not left-
truncated), this variable will range from 1 to either the interval when the
event occurs or the last interval in which the unit is observed if the spell is
right-censored. A third variable (Event in Table 3) records whether an event
is observed to occur during the interval or not. If the spell is right-censored,
this variable will be 0 for all the rows comprising the spell. For a completed
spell, the variable will be 0 but for the last row corresponding to the interval
in which the event is observed where it takes the value of 1.
Because each unit-interval has a corresponding row, time-varying covari-
ates are easily incorporated as shown in the Covariates column in Table 3.
The covariate x can take a different value in each interval the unit is observed
to be at risk. For example, x11 might be the number of children person 1
has in the first year after marriage, x12 her number of children in the second
year and so on.

18
Table 3: Data format for discrete time analysis

Person ID Duration Event Covariates


1 1 0 x11
1 2 0 x12
1 3 0 x13
1 4 0 x14
1 5 1 x15
2 1 0 x21
2 2 0 x22
2 3 0 x23

For a detailed review of how to set up the data in the correct format using
Stata see Longhi and Nandi [2015]. For a similar review using R see Mills
[2011].

6.3 Models for discrete time


Models for discrete time express the hazard rate as a function of covariates
and survival time. Because the hazard rate in discrete time is a probability
ranging between 0 and 1, techniques developed for binary dependent variables
can be used. Logit and probit are the most common. Here, we review the
estimation of a logit model. Estimating a probit model follows the same logic
(see Box-Steffensmeier and Jones [2004]).

6.3.1 The logistic model


Modelling the hazard rate directly as a function of covariates is inappropriate
because the hazard is restricted to be between 0 and 1, whereas the combi-
nation of the covariates and coefficients (the right-hand side) can take any
value. We can solve this problem by transforming the dependent variable.
Instead of modelling the hazard, we model the log of the ratio between the
hazard and one minus the hazard, as in equation 16. This expression is called
the logit of the hazard and ranges between minus and plus infinity. Figure 7
shows how the logit varies with the hazard. The logit function is symmetrical
around zero and changes faster when the hazard is close to 0 or close to 1.
The expression inside the log is referred to as the odds of experiencing the

19
event.
 h(i) 
log = α + βX1 + γX2 (i) (16)
1 − h(i)

Figure 7: Logit and cloglog transformations of the hazard

Hazard rate
0.5 1

−5
logistic function
cloglog function

In equation 16, the hazard rate only depends on a time constant covariate
X1 and a time-varying covariate X2 (i). This implies that the hazard does
not vary with survival time. Because survival time does not appear on the
right-hand side, its coefficient is implicitly zero. Usually, this assumption is
not realistic. We can improve the model by adding survival time on the right
hand side, as in equation 17. This model is still relatively restrictive because
it assumes that the hazard changes linearly with time. Flexibility could be
increased by adding higher terms such i2 or i3 .
 h(i) 
log = α + βX1 + γX2 (i) + δi (17)
1 − h(i)
We can make the hazard rate fully flexible with respect to time by re-
placing δi with a full set of dummy variables corresponding to the observed
durations in the data, as in equation 18.
 h(i) 
log = α1 D1 + α2 D2 + ....αn Dn + βX1 + γX2 (i) (18)
1 − h(i)
D1 , D2 ...Dn represent dummy variables for each interval. For example, D1
will be 1 in rows corresponding to i = 1 and zero otherwise. The advantage

20
of this specification is that the hazard rate can change in an unrestricted way
from one interval to the next. The disadvantage is that as the number of
intervals increases , we are left with a large number of coefficients to interpret.
Moreover, if no event is observed in interval i then αi cannot be estimated.
If this situation occurs, intervals with no observed events need to be merged
to adjacent ones. A different approach that is particularly suitable when the
number of intervals is relatively large is to use a local smoothing function.
See Box-Steffensmeier and Jones [2004] for an application.
The
 coefficients
 of the logistic model represent the change in the logit
h(i)
(log 1−h(i) ) corresponding to a 1 unit change in X. Exponentiating both
sides of equation 17, we obtain a model in which the odds of experiencing
the event depend multiplicatively on the covariates.

 h(i) 
exp(log ) = exp(α + βX1 + γX2 (i) + δi)
1 − h(i)
(19)
h(i)
= exp(α) ∗ exp(βX1 ) ∗ exp(γX2 (i)) ∗ exp(δi))
1 − h(i)
Notice that when the dependent variable is the log of the odds, the terms
on the right hand side are added. When the dependent variable is the odds,
the terms are multiplied. A one unit change in X1 increases the odds of
experiencing the event exp(β) times. A different way to express the same idea
is to say that the odds of experiencing the event change by (exp(β) − 1)%.
Because the odds vary multiplicatively with the covariates, the logistic model
is a proportional odds model.
To obtain the effect on the hazard itself, called the marginal effect, is
more complicated and requires calculating predicted hazard rates at various
values of X and other covariates. The marginal effect is not constant but
depends on the value of X and the other covariates. Most statistical software
have packages for the estimation of marginal effects.

6.3.2 The complementary log-log model


A transformation with somewhat different properties is the complementary
log-log or cloglog. Figure 7 shows how the cloglog transformation varies with
the hazard. When the hazard is small, the logistic and cloglog transforma-
tions are very similar. For large values of the hazard the logistic transforma-
tion increases faster than the cloglog.

21
log(− log(1 − h(i))) = α1 D1 + α2 D2 + ...αn Dn + βX1 + γX2 (i) (20)

Modelling duration dependence can be done using the same techniques


: either including a parametric representation of time such as δ1 i (linear),
δ1 i + δ2 i2 (quadratic), δ1 i + δ2 i2 + δ3 i3 (cubic), or δ log(i) (logarithm) or time
dummies, as in equation 20.
The cloglog model is an attractive specification when the underlying pro-
cess occurs in continuous time but the data only record information at dis-
crete intervals. In this case, exact survival times are not known, only the
interval they fall in. For example, divorce events can occur on any day but
information is only available about the year, not the exact date. Such data
are called interval censored. The advantage of the cloglog model is that the
coefficients of the X variables (β and γ in equation 20) are the same as if
the model had been estimated using continuous time data. These coefficients
express the effect of a 1 unit change in X on the log of the continuous time
hazard rate, log(θ(t)). Exponentiating the coefficients, we obtain the propor-
tional effect on the continuous hazard rate itself. In equation 20, a one unit
change in X1 corresponds to an exp(β) times increase or a (1 − exp(β))%
increase in θ. Because the coefficients capture the effect on the continuous
time hazard, they are unaffected by the interval length. Were we to measure
time until divorce in months rather than years, we would obtain the exact
same coefficients. The set of α coefficients however would change. Because
a change in X leads to a proportional change in θ(t), the cloglog model is a
proportional hazard model. We will discuss the proportional hazard property
in more depth in the next section. The set of α coefficients capture the dura-
tion dependence of the discrete time hazard. The duration dependence of the
continuous time hazard cannot be estimated without further assumptions.
[Jenkins, 1995] provides a practical list of steps for estimating discrete
time models.

7 Continuous time methods


7.1 Data format
When time is continuous, models are estimated on data in wide format as
shown in Table 4. Each spell/ unit is represented by one row. Two variables

22
are needed. The first (Duration in Table 4 ) records the length of time the
unit has been observed to be at risk. If in our study of divorce we measured
time in days rather than years, Duration would record the number of days
a person has been married before divorcing or being right-censored. The
second variable (Event in Table 4 records whether the spell is completed
(Event=1) or censored (Event=0).

Table 4: Data format for continuous time analysis with time-constant covari-
ates

Person ID Duration Event Covariates


1 1850 1 x1
2 1020 0 x2

Covariates are accommodated in this setting if they do not vary within


a spell. To include time-varying covariates, the data needs to be adjusted.
Suppose we again want to use the number of children to predict the prob-
ability of divorce. We will need to split each marriage spell when a change
in the number of children is observed, as shown in Table 5. The first person
is now represented by three rows rather than one. She spent 720 days being
married without having children, 790 days being married with one child and
finally, she divorced 240 days after having her second child. Notice how the
data set-up resembles the long format required for discrete time analysis.
But instead of having one row per interval, we have as many rows as there
are changes in the X covariate plus one. If a person does not experience a
change in X (for example she does not have any children), she continues to
contribute only one row in the data. If more than one time-varying covariate
is to be included , the spell needs to be split every time there is a change
in any of the covariates. A row thus corresponds to a period of time in the
spell during which the combination of all time varying covariates does not
change.

7.2 Parametric specifications


Parametric models specify a distribution from which survival times are drawn.
Different models correspond to different distributions. Choosing a distribu-
tion for survival times implies that the model is fully specified except for the

23
Table 5: Data format for continuous time analysis with time-varying covari-
ates

Person ID Duration Event Covariates


1 720 0 x11 =0
1 790 0 x12 =1
1 240 1 x13 =2
2 442 0 x21 =0
2 578 0 x22 =1

parameters that need to be estimated( Allison [2014]). There are two classes
of parametric models: proportional hazard (PH) and accelerated failure time
(AFT) models.

7.2.1 Proportional hazard models


All proportional hazard models take the form

θ(t, X) = θ0 (t) ∗ λ(X) (21)


The hazard rate can be decomposed in two parts: a part (θ0 (t)) that
depends only on duration and not on the X − s and a part (λ(X)) that
depends on the X − s but not on duration. This form amounts to assuming
there is no interaction between duration and the other covariates. We can
clearly see this implication if we take the log of equation (21)

log(θ(t, X)) = log(θ0 (t)) + log(λ(X)) (22)


The function θ0 (t) is called the baseline hazard and captures the duration
dependence (i.e. the way the hazard rate changes over time) present in the
data. Because θ0 does not depend on X, duration dependence is assumed
to be the same for all units in the sample. The function λ(X) shifts the
baseline hazard for groups with different values of X. For example, suppose
we want to predict the hazard of experiencing a divorce as a function of the
length of marriage and gender. In a PH model, the length of marriage has
the same effect on the hazard rate of both men and women. A PH model
could not accommodate the hazard falling for women and increasing for men.
At any time point, the ratio between the hazard rate of men and the hazard

24
Figure 8: The PH property

100 Hazard rate: men


Hazard rate: women 0

log(θ(t)
θ(t)

50
−5
Log(hazard rate): men
Log(hazard rate):women
0
0 1 2 3 4 5 0 0.5 1 1.5
Survival time (t) Log(Survival time)

rate of women will be the same. This ratio is called the hazard ratio and is
constant for all survival times. Figure 8 shows an example when the hazard
is increasing over time. The graph on the left shows the hazard vs. survival
time. The hazard of divorce for women is three times higher than that of men
at all survival times. The hazard ratio in this case is 3 and does not change
with t. The graph on the right shows the log(hazard) vs. the log(survival
time). The difference between the logs of the hazard rate for women is now
constant (and equal to log(3)). The two lines are parallel. Notice how the
relationship is multiplicative when expressed in terms of the hazard rate
(equation 21) and additive when expressed in terms of the log(hazard rate)
(equation 22).
The function used to define λ(x) is most often the exponential function.
This ensures that the right hand side is always positive (as is the hazard
rate). Assuming there is only one covariate, the model becomes

θ(t, X) = θ0 (t) exp(βX) (23)


In this case, the coefficient β can be interpreted as the proportionate
change in the hazard rate associated with a 1 unit change in X. exp(β) is
the hazard ratio associated with a 1 unit increase in X. See Jenkins [2005] for
the mathematical implications of the PH property for the survival, density
and cumulative hazard functions.
The choice of theta0 defines the model. If we assume θ0 is equal to a
constant c, we obtain the exponential model in which the hazard rate is

25
constant. The hazard rate will vary with changes in X but it does not
depend on survival time(notice how t does not appear on the right hand
ride; its implied coefficient is zero).

θ(t, X) = c ∗ exp(βX) (24)

log(θ(t, X)) = log(c) + βX


(25)
= α + βX

The exponential model is usually unrealistic. A better representation is


the Weibull model that defines the baseline hazard as: θ0 (t) = p ∗ tp−1 . The
model becomes
θ(t, X) = p ∗ tp−1 ∗ exp(βX) (26)
The parameter p determines the shape of the hazard. If p < 1, the hazard
rate will be falling over time. If p > 1, the hazard is increasing over time and
if p = 1 we revert back to the exponential model. Taking the log of equation
26, we obtain

log(θ(t, X)) = log(p) + (p − 1) ∗ log(t) + βX (27)


The log of the hazard rate varies in a linear way with the log of survival
time.
A choice popular among demographers is to set θ0 (t) = exp(γt). This
gives rise to the Gompertz model. This specification assumes that the log of
the hazard rate varies in a linear way with survival time.

θ(t, X) = exp(γt) ∗ exp(βX) (28)

log(θ(t, X)) = γt + βX (29)

7.2.2 Accelerated failure time models


Accelerated failure time model survival time instead of the hazard rate. They
are of the form

log(ti ) = βXi + σui (30)

26
where u is a disturbance term independent of X. Models in the AFT
class differ depending on the distribution used to model u. For example, if u
is assumed to have a normal distribution, we obtain the log-normal model. If
u is assumed to have a logistic distribution, we obtain the log-logistic model
and if u is assumed to have an extreme value distribution, we obtain the
Weibull model. Unlike the Weibull model, the log-logistic and log-normal
specifications allow the hazard to change direction, for example to increase
and then decrease. For a review of their properties and parametrizations see
Box-Steffensmeier and Jones [2004], Cleves et al. [2008], and Jenkins [2005].
The Weibull model is the only class that satisfies both the PH and AFT
properties.
Only if all spells in the data are completed (i.e. there is no right-
censoring), AFT models can be estimated using standard OLS regression
(see [Allison, 2014]). In the presence of censoring, the estimation can be
done using maximum likelihood.
Equation (30) can be re-written (once we exponentiate) as

ti = exp(βXi ) ∗ exp(σui ) or
exp(σui ) = exp(−βXi ) ∗ ti (31)
τi = exp(−βXi ) ∗ ti

The quantity Ψ = exp(−βXi ) is called the acceleration parameter. If


Ψ = 1, time passes at the normal rate. If P si > 1, time is accelerated and
the failure is likely to occur sooner. If Ψ < 1, time is decelerated and the
failure is expected to occur later. All AFT models have the property

S(t) = S0 (Ψt) (32)

The effect of the covariates is to change the scale of the survivor function
by a constant factor Ψ = exp(−βXi ). One common analogy used to explain
the AFT property is the comparison of human-years and dog-years (see Mills
[2011]).Each dog-year is the equivalent of seven human years. In this case,
time passes seven times faster for dogs than for humans (ψ = 7). Figure 9
illustrates the AFT property graphically. The proportion of dogs expected
to survive beyond 10 years is around 20%, the same as the proportion of
humans expected to survive beyond 70 years.
The estimated coefficients in an AFT model can be interpreted as the
proportionate change in survival time associated with a 1 unit increase in

27
Figure 9: The AFT property

1
Survivor function: humans
Survivor function: dogs
0.8

0.6
S(t)

0.4

0.2

0
10 30 50 70 90
Survival time (t)

X. The exponentiated coefficient exp(β) is also called the time ratio. The
effect of a 1 unit change in X is to multiply the expected survival time by
exp(β). Notice how the AFT and PH parametrizations imply coefficients
with different signs. A higher hazard rate is associated with shorter survival
times and vice-versa.
Choosing a parametric specification relies on theory and initial inspec-
tion based on non-parametric methods. Unfortunately, often there is little
theory to guide the choice [Box-Steffensmeier and Jones, 2004, Blossfeld and
Rohwehr, 2002]. Choosing the wrong parametrization can bias both the
estimated duration dependence and the estimated effects of the other covari-
ates. Box-Steffensmeier and Jones [2004] and Cleves et al. [2008] provide a
description of methods to choose a parametrization based on nested mod-
els. Blossfeld and Rohwehr [2002] describe methods to check distributional
assumptions.

7.3 Semi-parametric specifications: The Cox model


In the previous section we discussed PH models of the form θ(t, X) = θ0 (t) ∗
exp(βX). One possibility would be to leave the function θ0 (t) unspecified and
concentrate on estimating β, the regression coefficient of X. This is the Cox

28
model [Cox, 1972], probably the most popular model for analysing continuous
time data in the social sciences. The attractiveness of the Cox model is that
it does not impose any functional form on duration dependence. θ0 (t) is
unspecified and thus can take any form. The downside is that the Cox model
has nothing to say about duration dependence (although an approximation
of the baseline hazard can be obtained). As a result, the model is most
useful when the primary interest lies in the relationship between the hazard
rate and the covariate X and duration dependence is viewed as nuisance.
The Cox model is also called a semiparametric model because it makes no
assumption about how the hazard rate varies with survival time but does
specify the relationship between the hazard and X (the log of the hazard
varies linearly with X).
We can estimate this model using a method called partial likelihood (see
Jenkins [2005], Allison [2014] and Cleves et al. [2008]). This method allows
us to estimate the parameter β without estimating the parameters of dura-
tion dependence, θ0 (t). It requires the PH assumption to hold and no (or
few) identical survival times (called ’ties’)in the data. The latter condition
is needed because partial likelihood uses information about the ordering of
events as opposed to survival times themselves. When events occur at the
same time, a clear ordering of events cannot be generated. Approximation
methods have been developed to deal with ’ties’ (see Cleves et al. [2008], Box-
Steffensmeier and Jones [2004] but these approximation become increasingly
poor as the number of ’ties’ increases. In this case, it is advisable to use
discrete time methods.
An expanded representation of the Cox model is given in equation 33
where X1 −Xk are the covariates to be included in the model. The covariates
can be time-constant or time-varying.

θi (t, X) = θ0 (t) exp(β1 X1i + β2 X2i + ... + βk Xki ) (33)


There is no intercept-β0 -in the expression because the baseline hazard is
left unspecified. The intercept is effectively absorbed in the baseline hazard.
We can re-parametrize the model as
 
θi (t)
log = β1 X1i + β2 X2i + ... + βk Xki (34)
θ0 (t)
The β coefficients express the effect of a 1 unit change in X on the log
of the hazard rate relative to the baseline hazard. The baseline hazard is

29
the hazard rate corresponding to an individual with all covariates equal to
zero X1 = X2 = ... = Xk = 0. In a model of the hazard rate of divorce in
which X1 = 1 for females and X2 (t) represents earnings, the baseline hazard
corresponds to a male with zero earnings. Most statistical software packages
report the quantity exp(β) called the hazard ratio rather than β. The hazard
ratio is always positive (as it is the result of an exponential transformation).
A value larger than 1 implies that individuals with higher values on X have
an increased hazard and thus experience the event sooner. A value equal
to 1 implies that there is no relationship between X and the hazard rate
and a value between (0, 1) implies that individuals with higher values on X
have lower values on the hazard and thus experience the event later. An
alternative interpretation is that (exp(β) − 1) ∗ 100 represents the percentage
change in the hazard rate associated with a 1 unit change in X.
The baseline hazard ratio is left unspecified and is not estimated. How-
ever, approximations of the survivor function and the cumulative hazard
function can be recovered based on β1 X1i + β2 X2i + ... + βk Xki , also called
the risk score. The exact estimation procedure is not straightforward but it
closely resembles the Kaplan-Meier estimator. In fact, a Cox model with-
out any covariates produces the same survivor function as the Kaplan-Meier
estimator and the same cumulative hazard function as a Nelson-Aalen es-
timator. When covariates are included, the survivor function / cumulative
hazard produced by the Cox model can be thought of as covariate-adjusted
Kaplan-Meier/ Nelson-Aalen estimates [Cleves et al., 2008]. Both functions
will be step functions. The baseline hazard can subsequently be recovered
by smoothing the cumulative hazard function.
We can test the PH assumption in two ways, i.e. graphically and numeri-
cally. There are many graphical methods available but the most widely used
is the log minus log survivor plot. It relies on testing the relationship between
the cumulative hazards of two (or more groups) groups. Recall equation 33
describes how the hazard can be decomposed into the baseline hazard and
the shifter exp(βX). We can integrate this equation to obtain

H(t, X) = H0 (t) exp(βX) (35)

The cumulative hazard function can be decomposed into a baseline cu-


mulative hazard and the term exp(βX). Taking the log of equation 35, we
obtain

30
log H(t, X) = log(H0 (t)) + βX (36)

We can replace H(t) by −log(S(t) (see equation 7).

log[−log(S(t, X]) = log[− log(S0 (t))] + βX (37)

Figure 10: The log minus log plot

The log minus log survivor plot displays an estimate of −log[−log(S(t])


versus log(t) for two groups i and j with different values on the X covariate.
If the PH assumption holds, the lines should be approximately parallel (and
the distance between them will be equal to β(Xi − Xj )). Figure 10 shows
an example drawn from the British Household Panel Survey where the two
groups are men and women and the event of interest is beginning to smoke.
Since PH essentially amounts to assuming no interaction between dura-
tion and X, it can be tested by explicitly incorporating an interaction term
and seeing if it is statistically significant. To build the interaction term we

31
will need to specify θ0 (t). The exact specification is not important: any func-
tion of t can be used. In practice, the most common functional forms used
are the linear( θ0 (t) = t ) and the logarithmic (theta0 (t) = log(t)).
The Cox model assumes proportionality of hazards for all the covariates
included. The previous two methods can be used to check one covariate
at a time. If there are many covariates in the model, a more convenient
method is a test based on Schoenfield residuals. The test works in two
steps. First, a Cox model is fit to the data and a set of residuals ηij is
calculated for each covariate Xi and each individual j. In the second step,
a regression of these residuals versus survival time is fitted. The coefficients
are then tested for statistical significance. A significant coefficient indicated
that the PH assumption is violated. The Schoenfield residuals can be tested
simultaneously for all the covariates included in the model. See [Cleves et al.,
2008] for a detailed explanation of how the residuals are constructed and how
the test can be carried out in STATA.

8 More advanced topics


8.1 Repeated events
So far, we have assumed there is one spell per unit in the data. Many
events are repeatable: marriage, giving birth, finding a job, buying a house,
developing an illness. A unit -in this case individual-can have more than
one spell in the data. If the process behind each event is thought to be
the same, the main consideration is to account for the non-independence of
spells coming from the same individual. See Amorim and Cai [2015] and
Twisk et al. [2005] for a review of methods specifically adapted to repeated
events.

8.2 Competing risks


Often, social processes can lead to more than one type of event. For example,
a marriage can end because of divorce or death of a spouse. A spell of
unemployment can end in finding a job or retirement. A government can
fall because the prime minister resigned or because parliament passed a no
confidence motion. In each case, the occurrence of one event removes the unit
from risk set of the other event. A widowed person is no longer at risk of

32
divorce. A retired person is no longer at risk of finding a job. Competing risks
models are a class of event history methods designed to deal with multiple
types of events.
The methods discussed in this chapter can easily be extended to accom-
modate competing risks if the processes generating the different types of
events are independent. In continuous time, each event type can be mod-
elled separately treating the occurrence of the other event type as censoring.
See Jenkins [2005], Box-Steffensmeier and Jones [2004] and Mills [2011] for
detailed discussions. In discrete time, the logistic regression is replaced by
a multinomial logistic regression (see Jenkins [2005]). Andersen et al. [2012]
provide a good introduction to the specific problems posed by competing
risks.

8.3 Unobserved heterogeneity or frailty


Sometimes important differences relevant to the risk of experiencing the event
are unobserved or unobservable. The hazard of finding a job may be influ-
enced by ability or motivation but we may lack suitable measures in the data.
The presence of unobserved characteristics (called unobserved heterogeneity
or frailty) can seriously bias both duration dependence and the coefficients
of the other covariates if ignored. Duration dependence and the response of
the hazard to a change in a covariate Xk will biased downwards (see Jenkins
[2005] for a mathematical proof). The bias arises because the pool of units at
risk changes over time. Assume we are interested in estimating time until an
individual leaves a welfare program and that the hazard rate of leaving wel-
fare depends on mental health which is unobserved. Individuals with better
mental health have a higher hazard of leaving and shorter survival times. As
time passes, there are fewer and fewer individuals with good mental health
at risk of leaving the program. The share of those with poor mental health
in the risk pool (i.e. people who have not yet left the program) increases
over time. Since a higher fraction of program participants have poor mental
health, the observed hazard rate will be declining over time. This will not be
because spending more time receiving welfare makes one less likely to leave
the program (’welfare-dependence’) but rather because the composition of
the risk pool with respect to mental health changes.
Dealing with unobserved heterogeneity is mathematically and compu-
tationally challenging although statistical packages designed to address the
problem have become increasingly available. The most common method is to

33
assume unobserved heterogeneity has a particular distribution (usually the
gamma) and integrate it out [Lancaster, 1990, Abbrig and Van den Berg,
2007]. Heckman and Singer [1984] have proposed a non-parametric model
where unobserved heterogeneity takes an arbitrary number of discrete val-
ues called mass points. Estimation remains non-trivial and in some cases
model convergence may not be achieved. All methods assume unobserved
heterogeneity to be time invariant.

References
Jaap H. Abbrig and Gerard J. Van den Berg. The unobserved heterogeneity
distribution in duration analysis. Biometrika, 94(1):87–99, 2007.

Paul D. Allison. Event history and survival analysis. Sage Publications,


Thousand Oaks, 2014.

Leila Amorim and Jianwen Cai. Modelling recurrent events: A tutorial for
analysis in epidemiology. International Journal of Epidemiology, 44(1):
324–333, 2015.

PK Andersen, RB Geskus, T. de Witte, and H. Putter. Competing risks in


epidemiology: possibilities and pitfalls. International Journal of Epidemi-
ology, 41(3):861–870, 2012.

Hans-Peter Blossfeld and Götz Rohwehr. Techniques of Event History Mod-


elling. New Approaches to Causal Analysis. Lawrence Erlbaum Associates,
Mahwah, New Jersey, 2002.

Hans-Peter Blossfeld, Alfred Hamerle, and Karl Ulrich Mayer. Event History
Analysis. Statistical Theory and the Application in the Social Sciences.
Lawrence Erlbaum Associates, New Jersey, 1989.

Janet M. Box-Steffensmeier and Bradford S. Jones. Event History Modelling.


A Guide for Social Scientists. Cambridge University Press, Cambridge,
2004.

Mario A. Cleves, William W. Gould, Roberto G. Gutierrez, and Yulia U.


Marchenko. An Introduction to Survival Analysis Using Stata. STATA
Press, College Station, Texas, 2008.

34
David Cox. Regression models and life-tables. Journal of the Royal Statistical
Society, 34(2):187–220, 1972.

James Heckman and Burton Singer. A method for minimizing the impact of
distributional assumptions in econometric models of duration data. Econo-
metrica, 52(2):271–320, 1984.

Stephen P. Jenkins. Easy methods for discrete-time duration models. Oxford


Bulletin of Economics and Statistics, 57(1):129–136, 1995.

Stephen P. Jenkins. Survival analysis. 2005.

Tony Lancaster. The Econometric Analysis of Transition Data. Cambridge


University Press, Cambridge, 1990.

Simonetta Longhi and Alita Nandi. A Practical Guide to Using Panel Data.
Sage, London, 2015.

Melinda Mills. Introducing Survival and Event History Analysis. Sage, Lon-
don, 2011.

Judith D. Singer and John B. Willet. Applied Longitudinal Data Analysis:


Modelling Change and Event Occurrence. Oxford University Press, New
York, 2003.

J Twisk, N. Smidt, and W de Vente. Applied analysis of recurrent events: a


practical overview. Journal of Epidemiology & Community Health, 59(8):
706–710, 2005.

35

You might also like