Modeling Discrete Time To Event Data Instant Reading Access
Modeling Discrete Time To Event Data Instant Reading Access
Visit the link below to download the full version of this book:
https://medipdf.com/product/modeling-discrete-time-to-event-data/
Modeling Discrete
Time-to-Event Data
123
Gerhard Tutz Matthias Schmid
LMU Munich University of Bonn
Munich, Germany Bonn, Germany
In recent years, a large variety of textbooks dealing with time-to-event analysis has
been published. Most of these books focus on the statistical analysis of observations
in continuous time. In practice, however, one often observes discrete event times—
either because of grouping effects or because event times are intrinsically measured
on a discrete scale. Statistical methodology for discrete event times has been mainly
presented in journal articles and a few book chapters. In this book we introduce
basic concepts and give several extensions that allow to model discrete time data
adequately. In particular, modeling discrete time-to-event data strongly profits from
the smoothing and regularization methods that have been developed in recent
decades. The presented approaches include methods that allow to find much more
flexible models than in the early times of survival modeling.
The book is aimed at applied statisticians, students of statistics and researchers
from areas like biometrics, social sciences and econometrics. The mathematical
level is moderate, instead we focus on basic concepts and data analysis.
Objectives
v
vi Preface
Special Topics
• All numerical results presented in this book were obtained by using the R
System for Statistical Computing (R Core Team 2015). Hence readers are able to
reproduce all the results by using freely available software.
• Various functions and tools for the analysis of discrete time-to-event data are
collected in the R package discSurv (Welchowski and Schmid 2015).
We are grateful to many colleagues for valuable discussions and suggestions, in
particular to Kaveh Bashiri, Moritz Berger, Jutta Gampe, Andreas Groll, Wolfgang
Hess, Stephanie Möst, Vito M. R. Mugeo, Margret Oelker, Hein Putter, Micha
Schneider and Steffen Unkel. Silke Janitza carefully read preliminary versions of
the book and helped to reduce the number of mistakes. We also thank Helmut
Küchenhoff for late but substantial suggestions.
Special thanks go to Thomas Welchowski for his excellent programming work
and to Pia Oberschmidt for assisting us in compiling the subject index.
1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.1 Survival and Time-to-Event Data . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.2 Continuous Versus Discrete Survival .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4
1.3 Overview.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6
1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7
2 The Life Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
2.1 Life Table Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
2.1.1 Distributional Aspects . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 19
2.1.2 Smooth Life Table Estimators . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 20
2.1.3 Heterogeneous Intervals .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 23
2.2 Kaplan–Meier Estimator .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 25
2.3 Life Tables in Demography .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 27
2.4 Literature and Further Reading . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 31
2.5 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 31
2.6 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 32
3 Basic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 35
3.1 The Discrete Hazard Function .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 35
3.2 Parametric Regression Models . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 37
3.2.1 Logistic Discrete Hazards: The Proportional
Continuation Ratio Model.. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 38
3.2.2 Alternative Models . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 42
3.3 Discrete and Continuous Hazards . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 48
3.3.1 Concepts for Continuous Time.. . . . . . . .. . . . . . . . . . . . . . . . . . . . 48
3.3.2 The Proportional Hazards Model . . . . . .. . . . . . . . . . . . . . . . . . . . 50
3.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 51
3.4.1 Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 58
3.5 Time-Varying Covariates . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 59
3.6 Continuous Versus Discrete Proportional Hazards . . . . . . . . . . . . . . . . . 64
3.7 Subject-Specific Interval Censoring .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 67
3.8 Literature and Further Reading . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 70
vii
viii Contents
3.9 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 70
3.10 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 71
4 Evaluation and Model Choice .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 73
4.1 Relevance of Predictors: Tests . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 73
4.2 Residuals and Goodness-of-Fit . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 77
4.2.1 No Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 78
4.2.2 Deviance in the Case of Censoring . . . .. . . . . . . . . . . . . . . . . . . . 80
4.2.3 Martingale Residuals . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 81
4.3 Measuring Predictive Performance .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 86
4.3.1 Predictive Deviance and R2 Coefficients . . . . . . . . . . . . . . . . . . 86
4.3.2 Prediction Error Curves . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 88
4.3.3 Discrimination Measures .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 92
4.4 Choice of Link Function and Flexible Links .. . .. . . . . . . . . . . . . . . . . . . . 96
4.4.1 Families of Response Functions . . . . . . .. . . . . . . . . . . . . . . . . . . . 97
4.4.2 Nonparametric Estimation of Link Functions .. . . . . . . . . . . . 101
4.5 Literature and Further Reading . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 101
4.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 102
4.7 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 102
5 Nonparametric Modeling and Smooth Effects . . . . . .. . . . . . . . . . . . . . . . . . . . 105
5.1 Smooth Baseline Hazard .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 105
5.1.1 Estimation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109
5.1.2 Smooth Life Table Estimates . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 112
5.2 Additive Models .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 115
5.3 Time-Varying Coefficients .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 118
5.3.1 Penalty for Smooth Time-Varying Effects
and Selection .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 119
5.3.2 Time-Varying Effects and Additive Models .. . . . . . . . . . . . . . 121
5.4 Inclusion of Calendar Time . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 122
5.5 Literature and Further Reading . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 124
5.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 125
5.7 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 125
6 Tree-Based Approaches .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 129
6.1 Recursive Partitioning .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 130
6.2 Recursive Partitioning Based on Covariate-Free
Discrete Hazard Models . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 132
6.3 Recursive Partitioning with Binary Outcome . . .. . . . . . . . . . . . . . . . . . . . 133
6.4 Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 141
6.4.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 141
6.4.2 Random Forests .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 142
6.5 Literature and Further Reading . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 144
6.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 144
6.7 Exercises .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 144
Contents ix
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 225
Survival analysis consists of a body of methods that are known under different
names. In biostatistics, where one often examines the time to death, survival analysis
is the most often used name. In the social sciences one often speaks of event history
data, in technical applications of reliability methods. In all of these areas one wants
to model time-to-event data. The focus is on the modeling of the time it takes until a
specific event occurs. More generally, one has mutually exclusive states that can be
taken over time. For example, in the analysis of unemployment the states can refer
to unemployment, part-time employment, full-time employment or retirement. One
wants to model the course of an individual between these states over time. An event
occurs if an individual moves from one state to another. In a single spell analysis,
which is the most extensively treated case in this book, one considers just one time
period between two events, for example, how long it takes until unemployment ends.
Since one models the transition between states, one also uses the name transition
models. More general names for the type of data to be modeled, which do not refer to
a specific area of applications, are duration data, sojourn data or failure time data,
and the corresponding models are called duration models or failure time models. We
will most often use the term survival data and survival models but, depending on the
context, also use alternative names.
What makes survival data special? In a regression model, if one wants to
investigate how predictors determine a specific survival time T, time takes the
role of the response variable. Thus one has a response variable with a restricted
support because T 0 has to hold. Nevertheless, by using some transformation, for
example, log.T/, such that all values can occur, one might be tempted to consider it
as a common regression problem of the form log.T/ D xT C, where x denotes the
predictors, is a vector of coefficients, and is a noise variable. Although models
like that can be used in simple cases they do not work in more general settings.
There are in particular two issues that are important in the modeling of survival
data, namely the modeling of the underlying dynamics of the process, which can be
captured in the form of a risk or hazard function, and censoring, which means that
in some cases the exact time is not available. In the following these two aspects are
briefly sketched.
In time-to-event data one often considers the so-called hazard function. In the case
of discrete time (e.g., if time is given in months), it has the simple form of a
conditional probability. Then the hazard or intensity function for a given vector of
predictors x is defined by
It represents the conditional probability that the time period T ends at time t, given
T t (and x). In survival analysis, for example, the hazard is the current risk
of dying at time t given the individual has survived until then. When considering
duration of unemployment it can be the probability that unemployment ends in
month t given the person was unemployed until then. In the latter case a positive
event ends the spell, and the hazard represents an opportunity rather than a risk.
But the important point is that the hazard function is a current (local on the time
scale) measure for the strength of the tendency to move from one state to the
other. It measures at each time point the tendency that a transition takes place.
In this sense it measures the underlying dynamics of survival. Typically one is
interested in studying the dynamics behind the time under investigation. The hazard
rate becomes even more important when covariates vary over time, for example, if
treatment in a clinical study is modified over time. Then a simple regression model
with (transformed) time as response will not work, because in a regression model
one has to consider covariates that are fixed at the beginning of the time under
investigation. However, time-varying covariates can be considered within the hazard
function framework by specifying
.tjxt / D P.T D t j T t; xt /; t D 1; 2; : : : ;
where xt can include the available information on covariates up until time t. Then the
hazard function measures the current risk given the covariate values up until time t,
so that .tjxt / represents the dynamics of the underlying process given any value
of the covariates at or before t. More formally the phenomenon to be modeled is
a stochastic process, that is, a collection of random variables indexed by time with
values that correspond to the states. The modeling as a stochastic process (more
concisely, a counting process) is extensively treated in Andersen et al. (1993) and
Fleming and Harrington (2011).
1.1 Survival and Time-to-Event Data 3
Even without using the counting process framework one way to handle survival
data is to specify a parametric or non-parametric model for the hazard function. If
the hazard given x or xt is specified, the behavior of the survival time is implicitly
defined. In fact, most of the models considered in this book are hazard rate models.
The second issue that makes survival data special is censoring. An observation
is called “censored” if its survival time has not been fully observed, that is, the
exact time within one state is not known. Censoring may, for example, occur if
observations in a study drop out early (so that they are lost before the event of
interest occurs), or if the event of an observation occurs after the study has been
finished. These situations are illustrated in Fig. 1.1. For observation 1 a spell started
at t D 0, and the observation remained in the study until its event occurred. The
black dot indicates that the event of observation 1 has actually been observed.
In contrast, for observation 2 the spell started later than for observation 1, but it
was right censored because it dropped out of the study before its event occurred.
Similarly, for observation 4 the exact survival time is not known because it has
not occurred before the end of the study (indicated by the dashed line). It should
be noted that in Fig. 1.1 time refers to calendar time. What is actually modeled in
survival is the spell length, that is, the time from entry time until transition to another
state.
observed event
censored
Observation 4
Observation 3
Observation 2
Observation 1
0 1 2 3 4
time
Fig. 1.1 Four observations for which spells start at different times. Exact survival time is observed
for observations 1 and 3 (black dots); for observations 2 and 4 the end of the spell is not observed
(circles), since observation 2 drops out early and observation 4 is still alive at the end of the study,
which is shown as a dashed line
4 1 Introduction
Most textbooks on survival analysis assume that the survival time is continuous
and the event to be modeled may occur at any particular time point. Several
books are available that treat continuous survival data extensively, for example,
Lawless (1982), Lancaster (1992), Kalbfleisch and Prentice (2002) and Klein and
1.2 Continuous Versus Discrete Survival 5
Moeschberger (2003). What in these books is often considered very briefly, if at all,
is the case of discrete survival, which is the topic of the present book.
Although we imagine time as a continuum, in practice measurement of time is
always discrete. In particular in the medical sciences, economics and the social
sciences duration is usually measured, for example, in days, years or months.
Thus, even though the transition between states takes place at a specific time point,
the exact time points are usually not known. What is available are the data that
summarize what was happening during a specific interval. One can use positive
integers, 1; 2; 3; : : : to denote time. More formally, continuous time is divided into
intervals
In this book discrete event times are denoted by T, where T D t means that the event
has occurred in the interval Œat1 ; at /, also called “time period t”. One also speaks
of grouped survival data or interval censoring. In grouped survival data there are
typically some observations that have the same survival time. This phenomenon is
usually referred to as “ties”. In continuous time, ties ideally should not occur. In
fact, some models and estimation methods for continuous time even assume that
there are no ties in the data. Nevertheless, in practical applications with continuous
event times ties occur, which might be taken as a hint for underlying grouping. In
some areas, for example in demography, discrete data are quite natural. For example,
life tables traditionally use years as a measure for life span.
In some cases the underlying transition process is what Jenkins (2004) calls
intrinsically discrete. Consider, for example, time to pregnancy. A natural measure
for the time it takes a couple to conceive is the number of menstrual cycles, which
is a truly discrete response, see also Scheike and Jensen (1997) for an application of
discrete survival models to model fertility. Genuinely discrete measurements may
also result from surveys that are taken every month or year. For example, the IFO
Business Climate for Germany (http://www.cesifo-group.de/ifoHome), is based on
a survey in which firms from Germany are asked each month if “Production has
decreased”, “Production has remained unchanged”, or “Production has increased”.
Consequently, when investigating the factors that determine how long the answer
of a firm remains the same, one obtains time-discrete measurements. Also, when
considering the important problem of panel mortality, where one investigates for
how long a firm or an individual is in a panel, the response is the number of times
the questionnaire was sent back and is therefore genuinely discrete.
In summary, discrete time-to-event data occur as
• intrinsically discrete measurements, where the measurements represent natural
numbers, or
• grouped data, which represent events in underlying time intervals, and the
response refers to an interval.
The basic modeling approaches are the same for both types of data, in particular
when the intervals in grouped data have the same length (e.g., if they represent
6 1 Introduction
1.3 Overview