Applied Survival Analysis Using R Complete PDF Download
Applied Survival Analysis Using R Complete PDF Download
Visit the link below to download the full version of this book:
https://medipdf.com/product/applied-survival-analysis-using-r/
123
Dirk F. Moore
Department of Biostatistics
Rutgers School of Public Health
Piscataway, NJ, USA
This book serves as an introductory guide for students and analysts who need
to work with survival time data. The minimum prerequisites are basic applied
courses in linear regression and categorical data analysis. Students who also have
taken a master’s level course in statistical theory will be well prepared to work
through this book, since frequent reference is made to maximum likelihood theory.
Students lacking this training may still be able to understand most of the material,
provided they have an understanding of the basic concepts of differential and
integral calculus. Specifically, students should understand the concept of the limit,
and they should know what derivatives and integrals are and be able to evaluate them
in some basic cases.
The material for this book has come from two sources. The first source is
an introductory class in survival analysis for graduate students in epidemiology
and biostatistics at the Rutgers School of Public Health. Biostatistics students, as
one would expect, have a much firmer grasp of more mathematical aspects of
statistics than do epidemiology students. Still, I have found that those epidemiology
students with strong quantitative backgrounds have been able to understand some
mathematical statistical procedures such as score and likelihood ratio tests, provided
that they are not expected to symbolically differentiate or integrate complex
formulas. In this book I have, when possible, used the numerical capabilities of the
R system to substitute for symbolic manipulation. The second source of material
is derived from collaborations with physicians and epidemiologists at the Rutgers
Cancer Institute of New Jersey and at the Rutgers Robert Wood Johnson Medical
School. A number of the data sets in this text are derived from these collaborations.
Also, the experience of training statistical analysts to work on these data sets
provided additional inspiration for the book.
The first chapter introduces the concepts of survival times and how right
censoring occurs and describes several of the datasets that will be used throughout
the book. Chapter 2 presents fundamentals of survival theory. This includes hazard,
probability density, survival functions, and how they are related. The hazard
function is illustrated using both life table data and using some common parametric
distributions. The chapter ends with a brief introduction to properties of maximum
vii
viii Preface
I would like to thank Rebecca Moss for permission to use the “pancreatic” data
and Michael Steinberg for permission to use the “pharmacoSmoking” data. Both of
these data sets are used repeatedly throughout the text. I would also like to thank
Grace Lu-Yao, Weichung Joe Shih, and Yong Lin for years-long collaborations
on using the SEER-Medicare data for studying the survival trajectories of prostate
cancer patients. These collaborations led to the development of the “prostateSur-
vival” data set discussed in this text in Chapter 9. I thank the Division of Cancer
Epidemiology and Genetics of the US National Cancer Institute for providing the
“asheknazi” data. I also thank Wan Yee Lau for making the “hepatoCellular” data
publically available in the online Dryad data repository and for allowing me to
include it in the “asaur” R package.
1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.1 What Is Survival Analysis? . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.2 What You Need to Know to Use This Book . . . .. . . . . . . . . . . . . . . . . . . . 2
1.3 Survival Data and Censoring . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2
1.4 Some Examples of Survival Data Sets . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6
1.5 Additional Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9
2 Basic Principles of Survival Analysis . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11
2.1 The Hazard and Survival Functions . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11
2.2 Other Representations of a Survival Distribution .. . . . . . . . . . . . . . . . . . 13
2.3 Mean and Median Survival Time . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14
2.4 Parametric Survival Distributions . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15
2.5 Computing the Survival Function from the Hazard Function .. . . . . 19
2.6 A Brief Introduction to Maximum Likelihood Estimation . . . . . . . . . 20
2.7 Additional Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 23
3 Nonparametric Survival Curve Estimation. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 25
3.1 Nonparametric Estimation of the Survival Function . . . . . . . . . . . . . . . 25
3.2 Finding the Median Survival and a Confidence Interval
for the Median .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 30
3.3 Median Follow-Up Time . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 32
3.4 Obtaining a Smoothed Hazard and Survival Function Estimate . . . 32
3.5 Left Truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 36
3.6 Additional Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 41
4 Nonparametric Comparison of Survival Distributions . . . . . . . . . . . . . . . . . 43
4.1 Comparing Two Groups of Survival Times . . . . .. . . . . . . . . . . . . . . . . . . . 43
4.2 Stratified Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 49
4.3 Additional Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 52
xi
xii Contents
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 223
Survival analysis is the study of survival times and of the factors that influence
them. Types of studies with survival outcomes include clinical trials, prospective and
retrospective observational studies, and animal experiments. Examples of survival
times include time from birth until death, time from entry into a clinical trial until
death or disease progression, or time from birth to development of breast cancer
(that is, age of onset). The survival endpoint can also refer a positive event. For
example, one might be interested in the time from entry into a clinical trial until
tumor response. Survival studies can involve estimation of the survival distribution,
comparisons of the survival distributions of various treatments or interventions, or
elucidation of the factors that influence survival times. As we shall see, many of the
techniques we study have analogues in generalized linear models such as linear or
logistic regression.
Survival analysis is a difficult subject, and a full exposition of its principles would
require readers to have a background not only in basic statistical theory but also in
advanced topics in the theory of probability. Fortunately, many of the most important
concepts in survival analysis can be presented at a more elementary level. The aim
of this book is to provide the reader with an understanding of these principles and
also to serve as a guide to implementing these ideas in a practical setting. We shall
use the R statistical system extensively throughout the book because (1) it is a
high-quality system for doing statistics, (2) it includes a wealth of enhancements
and packages for doing survival analysis, (3) its interactive design will allow us
to illustrate survival concepts, and (4) it is an open source package available for
download to anyone at no cost from the main R website, www.R-project.org. This
book is meant to be used as well as read, and the reader is encouraged to use R
to try out the examples discussed in the text and to do the exercises at the end of
each chapter. It is expected that readers are already familiar with the R language; for