0% found this document useful (0 votes)
204 views296 pages

Hazard Rate Theory and Inference

This document is an introduction to the hazard rate and its applications in reliability theory and survival analysis. It discusses both theoretical concepts and practical inferential methods. Specifically, it defines key functions related to the hazard rate, like the survival function and mean residual life, explores properties of the hazard rate for various distributions and systems, and introduces classes of distributions based on aging properties. It also covers nonparametric methods for estimating hazard rates from lifetime data, including the Kaplan-Meier and Nelson-Aalen approaches as well as life table methods. The goal is to provide flexible, model-free techniques for inference about hazard rates from data.

Uploaded by

dwimahr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
204 views296 pages

Hazard Rate Theory and Inference

This document is an introduction to the hazard rate and its applications in reliability theory and survival analysis. It discusses both theoretical concepts and practical inferential methods. Specifically, it defines key functions related to the hazard rate, like the survival function and mean residual life, explores properties of the hazard rate for various distributions and systems, and introduces classes of distributions based on aging properties. It also covers nonparametric methods for estimating hazard rates from lifetime data, including the Kaplan-Meier and Nelson-Aalen approaches as well as life table methods. The goal is to provide flexible, model-free techniques for inference about hazard rates from data.

Uploaded by

dwimahr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 296

The Hazard Rate

— Theory and Inference —


(With supplementary MATLAB–Programs)

Horst Rinne

c Prof. em. Dr. Horst Rinne
Department of Economics and Management Science
Chair of Econometrics and Statistics
Justus–Liebig–University, D 35394 Giessen, Germany
Preface
When we look at biological organisms like human beings, animals, plants or at technical devices
like motorcars, aircrafts, television sets and parts thereof or at economic and socio–economic
units like enterprizes, corporations, labor unions or at social units like families, parties or even
states, we observe that in every moment of their existence we will find them in a well–defined
state. A patient, after some medical treatment, may be alive, an adult person may be out of work,
a piece of machinery may either be down or functioning, a labor union may be on strike or a
state may be at war with some other state. The sojourn–time in a given state for such a unit ends
by the occurrence of some random event. In this book, the time–to–event since entering into a
give state will generally be called lifetime and the terminating event will be called failure. Since
the terminating event is random the lifetime is a random variable with realizations which — in
general — will be non–negative.
The main body of this monograph is given by Parts I and II. The final Part III, entitled ‘Appen-
dices’, gives the usual ingredients of a scientific opus: the bibliography and an author index as
well as a subject index. Here one will also find information about the MATLAB–programs which
have been written to facilitate practical working with the methods and procedures described in
the monograph. Part I is descriptive in the sense that gives definitions for the various functions
of the variate ‘lifetime’ and explores how these functions are related to one another. This is done
in Chapter 1 where we also differentiate between the univariate and the multivariate cases and
between the continuous and the discrete cases. Chapter 2 introduces several classes of lifetime
distributions with respect to aging. Chapter 3 is devoted to univariate parametric distributions and
enumerates important continuous and discrete distributions known in probability theory. Here the
reader will find the formulas for the four most important representatives of a lifetime variate: its
probability density in the continuous case or its probability mass function in the discrete case, its
survival function, its hazard rate function and its mean residual life function. In order to have a
graph of these four functions for any set of the function parameters the reader can revert to the
MATLAB–programs stored in ‘Distributions.zip’ and described in Part III.
Part II is devoted to the inference of the hazard rate. The focus is on non–BAYESIAN and non–
parametric inferential procedures of — unless stated otherwise — univariate continuous lifetime
distributions. The non–parametric approach to estimate hazard rates from lifetime data is flexible,
model–free and data–driven. No shape assumption is imposed other than that the hazard rate is a
smooth function, or occasionally in Chapter 7, a monotone function. Such an approach typically
involves smoothing of an initial and discrete hazard rate estimate, with arbitrary choice of the
smoother.
In Chapters 5 and 6 we present estimation techniques to find such initial estimates for non–
grouped and grouped data after having introduced — in Chapter 4 — sampling techniques for
lifetime data with the pertaining denotation of the quantities coming up. The core chapter of
Part II is Chapter 8 presenting smoothing techniques. Emphasis here is on smoothing with kernels,
a technique that is most elaborated, explored and used in practice, but we also look at some other
techniques.
Chapter 9 on hazard plotting is an exception from the non–parametric nature of this second part
when we plot on special graph paper. For each location–scale distribution we can design an
especially scaled grid so that the data points will lie on a straight line when coming from that
distribution. So the graph is a means of testing for a special distribution. Estimates of the pertain-
ing parameters of this distribution can be found as special hazard quantiles. Hazard plotting thus
serves as an instrument for estimating and testing and is the bridge to the testing procedures of
Chapter 10.
IV Preface

Testing procedures in Chapter 10 are not devoted to hypotheses on parameters of some parametric
lifetime distribution, but they are concerned with the presence or absence of certain aging prop-
erties. These properties have been described in Chapter 2 and will be tested here by means of
non–parametric methods. The testing of the no–aging property (= constant hazard rate) may be
seen and taken as testing for exponentially distributed lifetime and thus is parametric. Most of
the testing procedures are of numerical type, but with the total–time–on–test plot we also have
a graphical approach. After presentation of the prerequisites like order statistics, spacings and
TTT–transform we test for properties of the hazard rate, i.e., its shape and behavior, and we test
for aging classes.
Contents

Preface III

List of Figures IX

List of Tables XI

I Theoretical and Probabilistic Concepts 1

1 The Hazard Rate and its Relatives 3


1.1 The Univariate Continuous Case . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Functions Describing Lifetime . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1.1 The Failure Density Function . . . . . . . . . . . . . . . . . . 4
1.1.1.2 The Lifetime Distribution Function . . . . . . . . . . . . . . . 5
1.1.1.3 The Survival Function . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1.4 The Hazard Rate Function . . . . . . . . . . . . . . . . . . . . 9
1.1.1.5 The Cumulative Hazard Rate Function . . . . . . . . . . . . . 16
1.1.1.6 Mean Residual Life Function . . . . . . . . . . . . . . . . . . 17
1.1.2 The Hazard Rate for Special Cases . . . . . . . . . . . . . . . . . . . . . 23
1.1.2.1 Transformation of Random Variables . . . . . . . . . . . . . . 23
1.1.2.2 Mixing and Compounding . . . . . . . . . . . . . . . . . . . . 28
1.1.2.3 Formation of Systems . . . . . . . . . . . . . . . . . . . . . . 30
1.1.2.4 Acceleration and Proportional Hazards . . . . . . . . . . . . . 38
1.1.2.5 Truncated Distributions . . . . . . . . . . . . . . . . . . . . . 40
1.1.2.6 Life Potential . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.2 The Univariate Discrete Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.2.1 General Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.2.2 Special Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.3 The Multivariate Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
1.3.1 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 61
1.3.2 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

2 Aging Criteria and Classes of Univariate Lifetime Distributions 67


2.1 Monotone Hazard Rate Distributions . . . . . . . . . . . . . . . . . . . . . . . . 67
2.1.1 Continuous IHR and DHR Distributions . . . . . . . . . . . . . . . . . . 68
2.1.2 Discrete IHR and DHR Distributions . . . . . . . . . . . . . . . . . . . 72
2.1.3 IHRA and DHRA Distributions . . . . . . . . . . . . . . . . . . . . . . 75
VI Contents

2.2 Non–monotone Hazard Rate Distributions . . . . . . . . . . . . . . . . . . . . . 78


2.3 MRL classes of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.4 Classification According to other Aging Criteria . . . . . . . . . . . . . . . . . . 88

3 Presentation of Univariate Parametric Distributions 95


3.1 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.2 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

II Inferential Aspects 135

4 Sampling Lifetime Data 137

5 Hazard Rate Estimation and the K APLAN /M EIER and N ELSON /A ALEN Approaches142
5.1 Estimating the Hazard Rate and the Survival Function . . . . . . . . . . . . . . . 142
5.2 Estimating the Cumulative Hazard rate . . . . . . . . . . . . . . . . . . . . . . . 149

6 Estimating the Hazard Rate from Life Tables 152


6.1 Life Table Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.2 Estimators for Life Table Functions Including the Hazard Rate . . . . . . . . . . 155
6.3 Related Hazard Rate estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

7 Maximum Likelihood Estimation of Monotone Hazard Rates 167


7.1 The Case of Complete Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.2 The case of Randomly Censored Samples . . . . . . . . . . . . . . . . . . . . . 173

8 Smooth Hazard Rate Estimators 177


8.1 Kernel Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.1.1 Motivation and Basic Concepts . . . . . . . . . . . . . . . . . . . . . . 177
8.1.1.1 The Convolution Formula . . . . . . . . . . . . . . . . . . . . 178
8.1.1.2 Performance of Kernel Smoothing . . . . . . . . . . . . . . . 180
8.1.1.3 Kernel Selection . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.1.1.4 Bandwidth Selection . . . . . . . . . . . . . . . . . . . . . . . 191
8.1.2 Indirect Smoothing — The Ratio–type Estimator . . . . . . . . . . . . . 193
8.1.3 Direct Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.1.3.1 Boundary Kernels . . . . . . . . . . . . . . . . . . . . . . . . 196
8.1.3.2 Kernel Estimators with Globally Constant (Fixed) Bandwidth . 198
8.1.3.3 Kernel Estimators with Varying Bandwidth . . . . . . . . . . . 205
8.2 Further Smoothing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

9 Hazard Plotting 211


Contents VII

9.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211


9.2 Hazard Plots and Hazard Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
9.3 Hazard Papers for Location–scale Distributions . . . . . . . . . . . . . . . . . . 218

10 Testing Hypotheses on Life Distributions 227


10.1 Prerequisites: Order Statistics, Spacings, TTT–statistics . . . . . . . . . . . . . . 227
10.2 Testing Hazard Rate Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
10.2.1 Constancy of the Hazard Rate . . . . . . . . . . . . . . . . . . . . . . . 236
10.2.2 Monotonicity of the Hazard Rate . . . . . . . . . . . . . . . . . . . . . . 239
10.2.3 Bathtub Shape of the Hazard Rate . . . . . . . . . . . . . . . . . . . . . 244
10.3 Testing for Aging Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
10.3.1 IHRA (DHRA) Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
10.3.2 NBU (NWU) Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
10.3.3 DMRL (IMRL) Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
10.3.4 NBUE (NWUE) Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
10.3.5 HNBUE (HNWUE) Tests . . . . . . . . . . . . . . . . . . . . . . . . . 255

III Appendices 259

Bibliography 261

Author Index 273

Subject Index 277

Included MATLAB–Programs 283


List of Figures
1/1 PDF, CDF, and CCDF of the linear hazard rate distribution with a = 0 and b = 1 . 8
1/2 HR, CHR, and MRL of the linear hazard rate distribution with a = 0 and b = 1 . . 18
1/3 Series system block–diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1/4 Parallel system block–diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1/5 Block–diagram of a 2 × 2 series–parallel system with
system–level redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1/6 Block–diagram of a 2 × 2 series–parallel system with
component–level redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1/7 Block–diagram of a 2–out–of–3 system . . . . . . . . . . . . . . . . . . . . . . . 34
1/8 Block–diagram of a bridge system . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1/9 Hazard rates of the reduced R AYLEIGH distribution, non–truncated and truncated
at xl = 1 and/or xu = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1/10 Survival functions S(x), S(x)
e and hazard rates h(x), e h(x) of the reduced
R AYLEIGH distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1/11 PMF, HR and MRL of the P OISSON distribution with λ = 5 . . . . . . . . . . . . 56
1/12 G UMBEL’s bivariate exponential distribution with θ = 1 . . . . . . . . . . . . . . 63

2/1 Hazard rates of several log–normal distributions . . . . . . . . . . . . . . . . . . . 81


2/2 Hazard rates of several inverse G AUSSIAN distributions . . . . . . . . . . . . . . 81
2/3 Hazard rates of several H JORTH distributions . . . . . . . . . . . . . . . . . . . . 82
2/4 HR and corresponding MRL of two reduced W EIBULL distributions . . . . . . . . 86
2/5 HR function and MRL function of a D HILLON–I distribution
(a = 0, b = 1, c = 0.5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2/6 HR function and MRL function of a log–normal distribution (a = 0, b = 1) . . . . 87
2/7 Chains of implications for several aging criteria . . . . . . . . . . . . . . . . . . . 88

3/1 PDF-formula display of the D HILLON–II distribution by the program ContDist . . 120
3/2 Display of the functions of a D HILLON–II distribution by the program ContDist . . 120
3/3 PMF-formula display of the W EIBULL type I distribution by the program DiscDist 134
3/4 Display of the functions of a W EIBULL type I distribution by the program DiscDist 134

4/1 Illustration of the numbers ci , di , ni and the failure times xi on the time axis
(non–grouped data) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4/2 Illustration of the numbers cj , dj , nj for a divided time axis (grouped data) . . . . 141

5/1 Estimated hazard rate and survival function with pointwise 95%–confidence inter-
vals for the 21 leukaemia–patients’ data . . . . . . . . . . . . . . . . . . . . . 147
5/2 Estimated CHR with pointwise 95%–confidence intervals
left part: indirect estimates; right part: direct (N ELSON /A ALEN) estimates . . . 151

6/1 Plot of life table quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

7/1 ML–estimates for a continuous distribution with increasing hazard rate . . . . . . 169
7/2 ML–estimates for a continuous distribution with decreasing hazard rate . . . . . . 171
7/3 ML–estimates of an increasing discrete hazard rate . . . . . . . . . . . . . . . . . 173

8/1 Biweight–kernel smoothing with different bandwidths . . . . . . . . . . . . . . . 185


X List of Figures

8/2 Common kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188


8/3 Effect of kernel choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
8/4 Fourth–order kernels based on G AUSS and triweight kernels, respectively . . . . . 190
8/5 Ratio–type estimates of the hazard rate for the leukaemia patients’ data . . . . . . 194
8/6 Triweight boundary kernels
(left: linear multiple, center: M ÜLLER–91, right: M ÜLLER /WANG–94) . . . . 198
8/7 Hazard rate estimates for the survival time of 43 patients having granulocytic
leukemia — Different kernels and common bandwidth bn = 250 . . . . . . . . 204
8/8 Hazard rate estimates for the survival time of 43 patients having granulocytic
leukemia — Different bandwidths and common E PANECHNIKOV kernel . . . . 204
8/9 Hazard rate estimates for the survival time of 43 patients having granulocytic
leukemia using a local and a variable biweight kernel with k = 13 and 100
gridpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

9/1 Hazard paper of the log–W EIBULL distribution . . . . . . . . . . . . . . . . . . . 214


9/2 Hazard plot for the logistic distribution . . . . . . . . . . . . . . . . . . . . . . . 217

10/1 Two ways of expressing the total time spent on test . . . . . . . . . . . . . . . . . 234
10/2 TTT–plots based on simulated exponential data (b = 5; n = 10, 50, 100) and the
scaled TTT–transform of the exponential distribution . . . . . . . . . . . . . . 236
10/3 Graphs for judging exponentiality . . . . . . . . . . . . . . . . . . . . . . . . . . 239
10/4 Scaled TTT–transforms of three W EIBULL distributions . . . . . . . . . . . . . . 240
10/5 Scaled TTT–transforms of lognormal and power function distributions . . . . . . . 245
10/6 TTT–plot for a data set coming from a DIHR distribution . . . . . . . . . . . . . . 248
10/7 TTT–plot of a data set coming from an IHR distribution . . . . . . . . . . . . . . 248
10/8 TTT–plot for the 43 granulocytic leukemia patients . . . . . . . . . . . . . . . . . 257
List of Tables
1/1 Relations among the six functions describing a continuously distributed stochastic
lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1/2 Relations among the six functions describing a discrete stochastic lifetime . . . . . 55

2/1 Closure and inheritance of classes of lifetime distributions


under reliability operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5/1 Estimates of hi and S(xi ) for the 21 leukaemia patients’ data . . . . . . . . . . . . 147
5/2 Estimates of H(xi ) and its variances for the 21 leukaemia patients’ data . . . . . . 151

6/1 Extraction from the German life table 2000 – 2002 for males . . . . . . . . . . . . 155
6/2 Lay–out of a non–demographic life table . . . . . . . . . . . . . . . . . . . . . . 156
6/3 Data for life table estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6/4 Estimates (with variances) of life table quantities . . . . . . . . . . . . . . . . . . 164

7/1 ML–estimates and naive estimates of an increasing hazard rate . . . . . . . . . . . 169


7/1 ML–estimates and naive estimates of a decreasing hazard rate . . . . . . . . . . . 170
7/3 ML–estimates and naive estimates of the increasing hazard rate of a discrete distri-
bution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7/4 Hazard rate estimate for K APLAN /M EIER’s data set . . . . . . . . . . . . . . . . 176

8/1 Common kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187


p
10/1 Critical values of A2 7560/n7 . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
10/2 Percentage points kr, γ of E PSTEINS’s cttot–test . . . . . . . . . . . . . . . . . . . 243
10/3 Pr(G∗n ≥ n − k) . . . . .p. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
10/4 Critical values of B ∗ = B 210/n5 . . . . . . . . . . . . . . . . . . . . . . . . . 250
10/5 Critical values jn, γ of J . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
10/6 Critical values wn, γ for W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
q 
10/7 Critical values qn,γ of Q1 45 n 4 . . . . . . . . . . . . . . . . . . . . . . . . . 256
Part I

Theoretical and Probabilistic Concepts


1 The Hazard Rate and its Relatives
In the course of time several functions have evolved in probability theory that define and describe
a random variable. Here, we will focus on those functions and concepts which are related to
lifetime as a random variable.

1.1 The Univariate Continuous Case


The emphasis in this section is on those functions which are of special interest in describing
the evolution of the risks to which a given unit is subjected over time. Thus, the variate under
study is the lifetime X — or more generally —the duration or sojourn–time of the unit spent
in a given state. Lifetimes or survival times are data that measure the time between two events,
namely that of entering into the state and that of escaping from that state. The latter event will
be called failure. Generally, this time time between events is a one–dimensional and continuous
variate, defined on [0, ∞) unless stated otherwise. Multidimensional variates will be discussed in
Sect. 1.3 and discrete variates in Sect. 1.2. There are situations where lifetime can be thought of
to be negative, e.g., shelf–aging meaning that a unit might fail before its real usage will start and
this starting is taken as origin of time.

1.1.1 Functions Describing Lifetime1


Six representatives of a lifetime distribution will be discussed, each of them bearing different
names depending on their field of application:

1. the probability density function f (x), abbreviated by PDF, density function for short, also
known as failure density, or as failure rate;

2. the cumulative distribution function F (x), abbreviated by CDF, distribution function for
short, also known as failure function or as lifetime distribution function;

3. the complementary cumulative distribution function S(x), abbreviated by CCDF, also


known as survival function or as reliability function;

4. the hazard rate function h(x), abbreviated by HR, hazard rate for short, also known as
instantaneous failure rate or as force of mortality;

5. the cumulative hazard rate H(x), abbreviated by CHR, also known as integrated hazard
rate;

6. the mean residual life function µ(x), abbreviated by MRL, also known as life expectancy
of an x–survivor or as mean future life of an x–survivor.

The origins of some of these functions date back to the 17th century when the first life tables came
into existence, whereas their application in the engineering sciences and in the life sciences only
started in the 1950’s. Each of these six functions completely describes the distribution of lifetime,
and any of these functions determines the other five, see Tab. 1/1 at the end of Sect. 1.1.1.6. The
six functions are answering different questions with respect to the lifetime variable. The choice
further depends on whether
1
Suggested reading for this section: BAIN /E NGELHARDT (1991, Chapter 1), L EEMIS (1986; 1995, Chapter 3),
R INNE (2009, Chapter 2), S MITH (2002, Chapters 1 and 2).
4 1 The Hazard Rate and its Relatives

• the mathematical representation has a tractable form or

• intuition is gained concerning the distribution by seeing a plot of the representative.

The six representatives are not the only ways to define the distribution of a random variable X.
Other concepts include, e.g.:

• the moment generating function E es X with E as expectation operator,




• the probability generating function E Z X for discrete X,





• the characteristic function E ei s X , i := −1,


• the M ELLIN transform E X s−1 , R. H. M ELLIN (1854 – 1933),




• the L APLACE transform E e−s X , P. S. DE L APLACE (1749 – 1827),




• the exponential F OURIER transform E e−i s X , J. B. J. F OURIER (1768 – 1830),




• the density quantile function f F −1 (P ) , 0 ≤ P ≤ 1;


 

F −1
R (P )
• the total–time–on–test transform S(x) dx, 0 ≤ P ≤ 1.
0

The six representatives used here have been chosen because of their special meaning for lifetime
data, for their intuitive appeal, for their usefulness in lifetime data analysis, and — last but not
least — for their popularity in probability theory and in statistics.

1.1.1.1 The Failure Density Function

The first lifetime distribution representative to be described is the failure density PDF — or in a
more general context — the probability density function, defined as
 
∆ ∆
Pr x − <X ≤x+
2 2
f (x) := lim . (1.1a)
∆→0 ∆

 ∆ smallthe product f (x) ∆ approximates the probability of failure in the time interval
Thus, for
(x − ∆ 2, x + ∆ 2] or — roughly speaking — the probability of failure around an age of x. The
probability of reaching an age between xl and xu , xl < xu , is

Zxu
Pr(xl ≤ X ≤ xu ) = f (x) dx. (1.1b)
xl

Especially for a newly born organism or creature or a just produced unit, e.g., for a unit starting
at age x = 0, the probability to fail up to an age x > 0 is given by

Zx
Pr(X ≤ x) = f (u) du. (1.1c)
0

For varying x formula (1.1c) gives the lifetime distribution function, see (1.2a).
1.1 The Univariate Continuous Case 5

Theorem 1: All probability density functions for the variate ‘lifetime’ must satisfy two conditions:
1. f (x) ≥ 0, ∀ x ≥ 0, (1.1d)
Z∞
2. f (x) dx = 1.  (1.1e)
0

Remarks:

1. When X has a parametric distribution with a shift parameter a ∈ R (1.1d,e) turn into
Z∞
f (x) ≥ 0 ∀ x ≥ a and f (x) dx = 1.
a

a is called safe life when a > 0, i.e., failing before the age a is impossible. When a < 0
we have shelf–aging.
2. Usually, when describing a particular PDF only its non–zero part will be explicitly stated,
and it should be understood that PDF is zero over any unspecified region of R.
3. Many characteristics such as age, length, weight etc. are true continuous variables, at least
conceptually, although it could be said that due to the physical limitations of measuring
devices, the characteristic can be observed only as a discrete variable. However, the mea-
surement restrictions are usually insignificant to other sources of error, and the continuous
model is mathematically and conceptually much more convenient.
4. For continuous variables we have the following numerical equivalences:
Pr(xl ≤ X ≤ xu ) = Pr(xl < X ≤ xu ) = Pr(xl ≤ X < xu ) = Pr(xl < X < xu ),
i.e., the probability of failing within a given interval is the same whether we include or
exclude none, one or both of its end–points.

Generally, for lifetime X the density function is positively skewed (skewed to the right or steep
on the left–hand side), see Fig. 1/1 below. Thus, f (x) has a flat and relatively long right–hand
tail, meaning that longer lifetimes are less probable than shorter lifetimes and that the mean life
(life expectancy) is greater than the median life, see Sect. 1.1.1.2.

1.1.1.2 The Lifetime Distribution Function

The second lifetime distribution representative is the failure function or lifetime function CDF,
defined as
F (x) := Pr(X ≤ x), x ≥ 0, (1.2a)
giving the probability of failing up to age x or of having a life span of at most length x.
Theorem 2: Any function F (x) may be the CDF of a lifetime variable if its satisfies the following
properties:
1. lim F (x) = 0, (1.2b)
x→0

2. lim F (x) = 1, (1.2c)


x→∞

3. F (xu ) ≥ F (xl ) ∀ xu > xl , i.e., F (x) is a non–decreasing function of x, (1.2d)


4. F (x) is continuous, i.e., lim F (x + ∆) = lim F (x − ∆) = F (x), ∆ > 0.  (1.2e)
∆→0 ∆→0
6 1 The Hazard Rate and its Relatives

Remarks:

1. Because F (x) is a probability, see (1.2a), we have

0 ≤ F (x) ≤ 1. (1.2f)

2. CDF and PDF are related as


dF (x)
f (x) = and (1.2g)
dx
Zx
F (x) = f (u) du. (1.2h)
0

Because F (x) is a monotone and increasing2 function, see Fig. 1/1 below, the inverse function
F −1 (.) exists and is called percentile function or quantile function:

F (xP ) = P =⇒ xP = F −1 (P ), 0 ≤ P ≤ 1. (1.3)

xP is called the percentile or quantile of order P . The special percentile x0.5 is called median
life, i.e., there are equal chances of failing before or surviving beyond the age x0.5 . Because of
the positive skewness of most lifetime densities the median life is more popular than the mean
life µ := E(X) in measuring the central tendency by a single number. For positive skewness of
PDF we find 0.5 < µ.

1.1.1.3 The Survival Function

Another lifetime distribution representative is CCDF, the survival function or reliability func-
tion, defined as
S(x) := Pr(X > x), x ≥ 0, (1.4a)
indicating the probability of surviving an age of x or becoming older than x. From (1.2a) and
(1.4a) we see that the lifetime distribution and the survival function are complementary functions:

S(x) = 1 − F (x) and F (x) = 1 − S(x). (1.4b)

Thus, S(x) is the probability of exceeding x and F (x) is the probability of reaching x, or —
stated for a technical unit — S(x) gives the probability of its functioning at time x and F (x) is
the probability of its being down at time x. PDF and CCDF are related as

d
f (x) = − S(x) and (1.4c)
dx
Z∞
S(x) = f (u) du. (1.4d)
x

The study of S(x) is at the heart of survival analysis and reliability theory. The survival function
is important in describing systems of components, i.e., in calculating systems’ reliability, see
Sect. 1.1.2.3.
2
In this text we always use increasing in the sense of non–decreasing, and decreasing has the meaning of
non–increasing.
1.1 The Univariate Continuous Case 7

From (1.4d) we can establish the following relations, simply because PDFs integrate to one:
Z∞
S(0) = f (u) du = 1. (1.4e)
0

Furthermore,
Z∞
S(∞) = lim S(x) = lim f (u) du = 0. (1.4f)
x→∞ x→∞
x
Finally, for xu ≥ xl :
Zxu
S(xl ) − S(xu ) = f (u) du ≥ 0. (1.4g)
xl

These properties establish the following


Theorem 3: The survival function S(x) is monotone and decreasing over its support [0, ∞).
Furthermore, S(x) satisfies S(0) = 1, S(∞) = 0. 
In fact, any monotone decreasing function S(x) with support [0, ∞) and S(0) = 1, S(∞) = 0
is the survival function of some lifetime variate, for an example see Fig. 1/1 further down. The
matching random variable is the one having PDF as f (x) = −dS(x)/dx.
The raw moments µ0k = E X k ; k = 0, 1, 2, . . . ; of random lifetime X may be expressed in


terms of its survival function S(·) as stated in the following


Theorem 4: Let X be a variate with F (x), S(x) = 1 − F (x) and f (x) = F 0 (x) = −S 0 (x), all
functions defined on [a, ∞). Then
Z∞
µ0k k
xk f (x) dx

:= E X :=
a
Z∞
k
= a +k xk−1 S(x) dx; k = 0, 1, 2, . . . ; (1.5a)
a

if and only if
lim xk−1 f (x) = 0 for k < ∞.
 
 (1.5b)
x→∞

Proof of Theorem 4: Integrating by parts the last term on the right–hand side of (1.5a) gives
Z∞   ∞ Z∞
S(x) k xk−1 dx = S(x) xk − −f (x) xk dx

a
a a
h i Z∞
k k
= lim S(x) x − S(a) a + xk f (x) dx
x→∞
a
h i
lim S(x) xk − ak + µ0k , because S(a) = 1.
=
x→∞
h i
Applying once L’ H OSPITALS’s rule to the indeterminate form limx→∞ S(x) xk gives

S(x) −f (x) −k−1


lim = lim x
x→∞ x−k x→∞−k
1 h i
= lim xk+1 f (x)
k x→∞
= 0, because of (1.5b). 
8 1 The Hazard Rate and its Relatives

Two special cases of (1.5a) for a lifetime variate with a = 0 are:

1. the mean, often called mean time to failure and abbreviated by MTTF,

Z∞
µ := µ01 := E(X) = S(x) dx; (1.5c)
0

(Therefore, to find the average lifespan, we integrate the survival function over its support,
i.e., the mean life is equal to the area beneath the survival function.) and

2. the variance

σ 2 := Var(X) = E X 2 − µ2

2
Z∞
∞
Z
= 2 x S(x) dx −  S(x) dx . (1.5d)
0 0

Example 1/1: PDF, CDF, and CCDF of the linear hazard rate distribution
A distribution with the linear hazard rate h(x) = a + b x; x ≥ 0, a ≥ 0, b > 0; has:
 
b
f (x) = (a + b x) exp −a x − x2 ,
2
 
b 2
F (x) = 1 − exp −a x − x ,
2
 
b 2
S(x) = exp −a x − x .
2

Fig. 1/1 shows the functions f (x), F (x), and S(x) for a = 0 and b = 1, which is nothing but the reduced
R AYLEIGH distribution, a special case of the W EIBULL distribution.

Figure 1/1: PDF, CDF, and CCDF of the linear hazard rate distribution with a = 0 and b = 1
1.1 The Univariate Continuous Case 9

r
a 1
xmode = − +
b b
r 
a a 2 2
xP = − + − ln(1 − P ), 0 ≤ P ≤ 1
b b b

(−b/2)i Γ(2 i + r + 1)
 
X Γ(2 + r + 2 i)
= E xr =

µr + b
i=0
i! a2 i+r a2 i+r+2


In the special case a = 0 we have µr = Γ(1 + r/2) (b/2)r/2 .

1.1.1.4 The Hazard Rate Function

The reliability (survival) function examines the chance that breakdowns of organisms, of technical
units etc. occur beyond a given point in time. To monitor the lifetime of a unit across the support
of its lifetime distribution, the hazard rate h(x) is used.
In fact, the hazard rate usually is more informative about the underlying mechanism of failure than
the other representatives of a lifetime distribution. For this reason, consideration of the hazard rate
may be the dominant method for summarizing survival data. C OX /OAKES (1984, p. 16) give the
following number of reasons why consideration of the hazard rate may be a good idea:

“(i) it may be physically enlightening to consider the immediate ‘risk’ attaching to


an individual known to be alive at age t,
(ii) comparison of groups of individuals are sometimes intensively made via the
hazard,
(iii) hazard–based models are often convenient when there is censoring or there are
several types of failure,
(iv) comparison with an exponential distribution is particular simple in terms of the
hazard,
(v) the hazard is the special form for the ‘single failure’ system of the complete
intensity function for more elaborate point processes, i.e., systems in which
several point events can occur for each individual.”

The hazard rate is perhaps the most popular of the six representatives modeling and analyzing
lifetime data. This is due to its intuitive interpretation as the amount of risk to fail associated with
a unit at age x. Another reason for its popularity is that it is a special case of the intensity function
for a non–homogeneous P OISSON process. A hazard rate function models the occurrence of only
one, namely the first event (= failure), whereas the intensity function models the occurrence of a
sequence of events over time.
The hazard rate goes by several aliases.

• In the engineering sciences it is known as the failure rate.3

• In actuarial science it is known as the force of mortality or force of decrement

• In vital statistics and in the life sciences it is known as the age–specific death rate.
3
This name gives reason for confusion with the failure density.
10 1 The Hazard Rate and its Relatives

• In economics its reciprocal is known as M ILLS ’ ratio, see M ILLS (1926).

• In point process and extreme value theory it is known as the rate function or intensity
function.

The hazard rate can be derived using the concept of conditional probability. Let A and B be two
random events with Pr(A) > 0, than the probability of the conditional event B | A (= event B
happens, given event A has happened) is defined as

Pr(A ∩ B)
Pr(B | A) = , (1.6a)
Pr(A)

where A ∩ B means that events A and B happen simultaneously. Now let A := ‘X > x’
(= lifetime is greater than x) and B := ‘X > x + y’, then, evidently, A ∩ B = ‘X > x + y’.
As Pr(A) = Pr(X > x) = S(x) and Pr(A ∩ B) = Pr(B) = Pr(X > x + y) = S(x + y) the
conditional survival probability according to (1.6a) results as

S(x + y)
Pr(X > x + y | X > x) = . (1.6b)
S(x)

The conditional event ‘X > x + y | X > x’ can be transformed to ‘X − x > y | X > x’ and the
corresponding conditional variate

Y | X > x := X − x | X > x

is called future lifetime or remaining lifetime of an x–survivor.4 We may write (1.6b) as

S(x + y)
S(y | X > x) = (1.6c)
S(x)

which is called conditional survival function. Its complement is the conditional distribution
function

S(x + y)
F (y | X > x) = 1 −
S(x)
S(x) − S(x + y)
=
S(x)
F (x + y) − F (x)
= . (1.6d)
1 − F (x)

Differentiating (1.6d) with respect to y gives the conditional failure density:

dF (y | X > x)
f (y | X > x) =
dy
 
d F (x + y) − F (x)
=
dy 1 − F (x)
f (x + y) f (x + y)
= = . (1.6e)
1 − F (x) S(x)
4
This variate plays a role in Sect. 1.1.1.6 and is discussed in more detail under the heading of truncated lifetime
distributions in Sect. 1.1.2.5.
1.1 The Univariate Continuous Case 11

f (y | X > x) really is a density function, as the two conditions (1.1d,e) are fulfilled:

1. f (y | X > x) ≥ 0, because f (x + y) ≥ 0 and S(x) > 0.


Z∞ Z∞
1
2. f (y | X > x) dy = f (x + y) dy
S(x)
0 0
Z∞
1
= f (u) du, u = x + y
S(x)
x
1
= S(x) = 1.
S(x)

For small ∆ we have

Pr(x < X ≤ x + ∆ | X > x) ≈ f (∆ | X > x) ∆


f (x + ∆)
= ∆. (1.7a)
S(x)

This is an approximation of an x–survivor’s chance to fail within the small time span ∆ adjacent
to x. Now, the hazard rate follows from (1.6e) and (1.7a) with ∆ → 0 :

h(x) = lim f (∆ | X > x)


∆→0

f (x + ∆)
= lim
∆→0 S(x)
f (x)
= , S(x) > 0. (1.7b)
S(x)

In other words, for a small increment in time, ∆, the conditional probability that an x–survivor
fails in the time interval (x, x + ∆] is roughly equal to the product h(x) ∆. Another possible
interpretation of h(x) is the rate at which failures occur per unit of time relative to the portion of
the population which has not yet failed.
When we want to predict the chance of failure at age x for a newly born or produced unit having
F (x) as its CDF we have to use f (x), i.e., f (x) is an unconditional predictor for risk to fail at
x. When we know that a unit has survived up to x, we have to use h(x) which is a conditional
predictor. Comparing numerically f (x) to h(x) we notice:

• f (0) = h(0),

• f (x) ≥ h(x) ∀ x > 0, because S(x) ≤ 1 ∀ x > 0.

There is a fundamental difference between the hazard rate function h(x) and the conditional
failure density f (y | X > x).

1. h(x) is a function of x, the age reached, whereas f (y | X > x) is a function of the future
lifetime y following a given age x.

2. Both, h(x) and f (y | X > x) are non–negative, but h(x) is not a density function as it is
R∞
not normalized, instead we have h(x) dx = ∞, see Theorem 5.
0
12 1 The Hazard Rate and its Relatives

Summarizing we can state the following


Theorem 5: Any function h(x) is a HR if and only if it satisfies the following properties:

1. h(x) ≥ 0 ∀ x ≥ 0, (1.7c)
Z∞
2. h(x) dx = ∞.  (1.7d)
0

Proof of Theorem 5: The properties necessarily hold since



1. f (x) ≥ 0 and S(x) > 0, thus h(x) = f (x) S(x) ≥ 0,
R∞ R∞  
2. h(x) dx = − d ln S(x) , see (1.8c)
0 0

= − ln S(x) 0

= ln S(0) − ln S(∞)
= ln 1 − ln 0
= ∞. 

The hazard rate measures the propensity to fail or to die depending on the age reached and it thus
plays a key role in characterizing the process of aging and in classifying lifetime distributions, see
Sect. 2. Generally, HR more precisely describes the stochastic regularity of the variate ‘lifetime’
than the positively skewed course of PDF or the monotone courses of CDF or CCDF. We will
distinguish between

• monotone hazard rates, either increasing, when the unit is wearing out with age, or decreas-
ing, when the unit is improving with age, and
• non–monotone hazard rates either U–shaped (= bathtub–shaped) as, e.g., is the case with
the age–specific death rate in human life tables, or having any other non–monotone course,
e.g., an inverted bathtub–shape.

It is easily possible to express the hazard rate of a population by its PDF, CDF, and CCDF.5
f (x)
h(x) = , (1.8a)
R∞
f (u) du
x
F 0 (x)
= , (1.8b)
1 − F (x)
S 0 (x) d ln S(x)
= − = − . (1.8c)
S(x) dx
Conversely, we may write the PDF, CD, and CCDF of a population in terms of its HR. Integrating
in (1.8c) yields
Zx x
h(u) du = − ln S(x)

0
0
= − ln S(x) + ln S(0)
= − ln S(x), because S(0) = 1. (1.9a)
5 k

It is also possible to write the moments E X in terms of the hazard rate, see M UTH (1974).
1.1 The Univariate Continuous Case 13

Upon exponentiating (1.9a) turns into


 x 
 Z 
S(x) = exp − h(u) du , (1.9b)
 
0

so that  x 
 Z 
F (x) = 1 − exp − h(u) du . (1.9c)
 
0

Finally, differentiating (1.9c) yields f (x) in terms of h(x):


  x 
R
d 1 − exp − h(u) du
dF (x) 0
f (x) = =
dx dx
 x 
 Z 
= h(x) exp − h(u) du . (1.9d)
 
0

Excursus: Defining distributions by their hazard rate function


Applying (1.9b–d) we want to see what distribution results from four different models of the hazard rate.

1. The constant hazard rate model


From
h(x) = λ ∀ x ≥ 0, λ > 0,
we find
 x 
 Z 
f (x) = λ exp − λ du = λ e−λ x ,
 
0 x 
 Z 
F (x) = 1 − exp − λ du = 1 − e−λ x ,
 
 x 0 
 Z 
S(x) = exp − λ du = e−λ x .
 
0

Thus, the constant hazard rate model gives the exponential distribution.
2. The linear hazard rate model
From
h(x) = a + b x ∀ x ≥ 0, a ≥ 0, b > 0,
we find, see Example 1:
 
b
f (x) = (a + b x) exp −a x − x2 ,
2
 
b
F (x) = 1 − exp −a x − x2 ,
2
 
b
S(x) = exp −a x − x2 .
2

For a = 0 this is a R AYLEIGH distribution.


14 1 The Hazard Rate and its Relatives

3. The power hazard rate model


The HR
h(x) = c xc−1 ∀ x ≥ 0, c > 0,
leads to
f (x) = c xc−1 exp(−xc ) ,
F (x) = 1 − exp(−xc ) ,
S(x) = exp(−xc ) .

This is the reduced W EIBULL distribution, see R INNE (2009).


4. The exponential hazard rate model
Setting
h(x) = ex , x ≥ 0,
gives

f (x) = ex exp{−ex + 1} ,
F (x) = 1 − exp{−ex } ,
S(x) = exp{−ex } .

This is recognized as a G OMPERTZ distribution.


Based on h(x) = f (x) S(x) it appears that the approximate unconditional probability of failure
in (x, x + dx], Pr(x < X ≤ x + dx) ≈ f (x) dx, is equal to the product of the probability of
surviving beyond x and the approximate conditional probability of failure in (x, x + dx] :

f (x) dx = S(x) h(x) dx,

from which we can define the survival probability as


Z∞ Z∞
S(x) = f (u) du = S(u) h(u) du,
x x

expressing
 R x the survival
probability in terms of the future lifetime. Looking at S(x) =
exp − 0 h(u) du in (1.9b) we see the survival probability expressed in terms of the past
lifetime.

Excursus: Reversed hazard rate


The reversed (reverse) hazard rate, also named retro hazard, was first mentioned by the name ‘dual of
the hazard rate’ in BARLOW et al. (1963). The name ‘reversed hazard rate’ was first used by L AGAKOS et
al. (1988).6 It extends the concept of hazard rate to a reverse time direction and is defined as:

Pr(x − ∆ < X ≤ x |X ≤ x)
rh(x) := lim
∆→0 ∆
f (x)
= , F (x) > 0. (1.10a)
F (x)

(1.10a) can be derived along the lines of (1.6a) – (1.7b) with A := ‘X ≤ x’ and ‘B := X ≤ x − y’, so that
A ∩ B = ‘X ≤ x − y’. From (1.10a) it is seen that rh(x) describes the probability of an immediate past
6
Newer papers on the topic are: C HANDRA /ROY (2001, 2005), G UPTA /NANDA (2001), K UNDU et al. (2009),
NANDA /G UPTA (2001, 2004), and S ANKARAN et al. (2007).
1.1 The Univariate Continuous Case 15

failure, given that the unit has already failed at time x, as opposed to the immediate future failure, given
that the unit has not failed at time x, described by h(x).
h(x) and rh(x) are related as
S(x)
rh(x) = h(x) , S(x) < 1, (1.10b)
1 − S(x)
F (x)
h(x) = rh(x) , F (x) < 1. (1.10c)
1 − F (x)
Both rates are equal to one another for x = x0.5 , otherwise we have
 
 < rh(x) for x < x , 
0.5
h(x) (1.10d)
 > rh(x) for x > x . 
0.5

The reversed hazard rate may be expressed by the PDF, CDF, and CCDF of X as
f (x) F 0 (x) −S 0 (x)
rh(x) = Rx = = . (1.10e)
F (x) 1 − S(x)
f (u) du
0

It is also possible to express the PDF, CDF, and CCDF in terms of the reversed hazard rate. From (1.10a)
we have
d ln F (x)
rh(x) = . (1.10f)
dx
Integrating (1.10f) yields
Z∞ ∞
rh(u) du = ln F (x)

x
x
= − ln F (x), because F (∞) = 1. (1.10g)

Upon exponentiating (1.10g) turns into


 ∞ 
 Z 
F (x) = exp − rh(u) du , (1.10h)
 
x

so that  ∞ 
 Z 
S(x) = 1 − exp − rh(u) du (1.10i)
 
x
and
 ∞ 
R
d exp − rh(u) du
dF (x) x
f (x) = =
dx dx
 ∞ 
 Z 
= rh(x) exp − rh(u) du . (1.10j)
 
x

In the previous excursion we have seen that the exponential distribution has a constant hazard rate, but the
reversed hazard of that distribution is decreasing. From (1.10h–j) we see that we cannot find a distribution
defined on [0, ∞) having a constant reversed hazard rate, but the reflected exponential distribution
defined on (−∞, 0] with

f (x) = λ eλ x ; x ≤ 0, λ > 0;
F (x) = eλ x

has a constant reversed hazard rate: rh(x) = λ.


16 1 The Hazard Rate and its Relatives

1.1.1.5 The Cumulative Hazard Rate Function

The cumulative hazard rate or integrated hazard rate CHR is defined as


Zx
H(x) := h(u) du, (1.11a)
0

and satisfies three conditions:

1. H(0) = 0,

2. limx→∞ H(x) = ∞,

3. H(x) is increasing (= non–decreasing).

From (1.9b) and (1.11a) we easily find


 
S(x) = exp − H(x) (1.11b)

and furthermore
 
F (x) = 1 − exp − H(x) , (1.11c)
 
d exp − H(x)
f (x) = − . (1.11d)
dx
Vice versa we have the following relations between PDF, CDF, and CCDF on the one hand and
CHR on the other hand:
∞ 
Z 
H(x) = − ln f (u) du , (1.11e)
 
x

= − ln S(x), (1.11f)
= − ln[1 − F (x)] . (1.11g)

But what is the meaning of H(x)? — Whereas h(x) ∆ can be given an intuitive interpretation as
Pr(x < X ≤ x + ∆ | X > x), H(x) cannot. H(x) is not the sum or the integral of conditional
probabilities because the conditioning event changes with x, and there is no law of probability
leading to H(x). Thus, H(x) does not have a probabilistic connotation. Yet H(x) plays a key
role in reliability and survival analysis, because of the exponentiation formula (1.11b) which says
that with H(x) specified we have

Pr(X > x) = e−H(x) , x ≥ 0. (1.12)

Excursus: Hazard potential


Based on (1.12) S INGPURWALLA (2006) introduced a new notion, the hazard potential. Turning to the
right–hand side of (1.12), we note that e−H(x) is the survival function of an exponentially distributed
variate, say Z, with scale parameter equal to one, evaluated at H(x), i.e.:

Pr(X > x) = e−H(x) = Pr Z > H(x) .


 
(1.13)

To appreciate the physical connotation of (1.13), we note that because of


 
Pr(X ≤ x) = Pr Z ≤ H(x) ,
1.1 The Univariate Continuous Case 17

we may claim that the time to failure, X, of a unit coincides with the time at which its cumulative hazard
H(x) crosses a random threshold Z, where Z has an exponential distribution with scale parameter equal to
one, i.e., X = H −1 (Z). The random threshold Z, where Z = H(X), is defined as the hazard potential
of the unit. We may interpret Z as an unknown resource with which the unit is endowed at the time of its
inception. With Z considered a resource, H(x) can be interpreted as the amount of resource consumed by
time x and the HR, h(x) = −dH(x) dx, can be considered the rate at which this resource is consumed.
The unit fails when this resource becomes depleted. The term ‘potential’ refers to a feature parallel to
that of life potential, see Sect. 1.1.2.6. The difference here is that we are alluding to a unit’s resistance to
failure rather than its capacity for work.

Another possibility giving insight into (1.12) is the provision of an indifference principle for
reliability and survival analysis. Corresponding to every non–negative variate X having an ab-
solutely continuous survival function S(x) = Pr(X > x), there exists a variate Z taking values
H(x), 0 ≤ H(x) < ∞, whose survival function is an exponential with scale parameter equal
to one. The survival function of X is indexed by x, x ≥ 0, whereas that of Z is indexed by
Rx 
H(x) = − dS(u) S(u).
0

We finally introduce the notion hazard quantile, denoted xH


Λ and defined by

H(xΛ ) = Λ, Λ ≥ 0, or (1.14a)
xH
Λ = H −1 (Λ), Λ ≥ 0. (1.14b)

The hazard quantile plays a role in hazard plotting and in designing hazard papers, see Sections 9.2
and 9.3. Based on (11c,g) we see that the ordinary quantile or percentile xP , 0 ≤ P ≤ 1, and the
hazard quantile xH
Λ , Λ ≥ 0, are linked as

xP = xH
− ln(1−P ) , and (1.14c)

xH
Λ = x1−exp(Λ) . (1.14d)

1.1.1.6 Mean Residual Life Function7

In Sect. 1.1.1.4 we have introduced the conditional lifetime variate Y | X > x := X − x | X > x,
called future lifetime or remaining lifetime of an x–survivor. The pertaining PDF reads

f (x + y)
f (y | X > x) = , y ≥ 0.
S(x)

The mean of this variate, denoted µ(x),8 as it depends on the age reached, is called mean residual
life (MRL):

µ(x) = E(Y | X > x)


Z∞
1
= y f (x + y) dy
S(x)
0
Z∞
1
= (u − x) f (u) du, x + y = u,
S(x)
x
7
Suggested reading for the section: G UESS /P ROSCHAN (1988), M UTH (1980), OAKES /DASU (1990), S WARTZ
(1973).
8 o
In actuarial science and life tables it is denoted ex , see Sect. 6.1.
18 1 The Hazard Rate and its Relatives
∞
Z∞

Z
1  
= u f (u) du − x f (u) du
S(x)  
x x
Z∞
1
= u f (u) du − x. (1.15a)
S(x)
x

Upon application of Theorem 4 to (1.15a) we find


Z∞
1
µ(x) = S(u) du. (1.15b)
S(x)
x

For x = 0 we have the unconditional mean life (MTTF):

µ(0) = µ = E(X). (1.15c)

Looking at (1.15b) we see that µ(x) is the area beneath the survival function to the right of of x
divided by the ordinate S(x) at x, corresponding to the fraction surviving x.
The mean residual life µ(x) must not be confused with the mean age of an x–survivor:

Z∞
1
E(X | X > x) = u f (u) du. (1.16a)
S(x)
x

From (1.15a) we see that both means are related as

µ(x) = E(X | X > x) − x. (1.16b)

Example 1/2: HR, CHR, and MRL of the linear hazard rate distribution

Figure 1/2: HR, CHR, and MRL of the linear hazard rate distribution with a = 0 and b = 1
1.1 The Univariate Continuous Case 19

For the linear hazard rate distribution defined in Example 1/1 we have

h(x)= a + b x,
b
H(x) = a x + x2 ,
2 " !#
 2 r r
a π (a + b x)2
exp 1 − erf
2b 2b 2b
µ(x) =  
b
exp −a x − x2
2

The following theorem gives the properties of MRL, see S WARTZ (1973).
Theorem 6: If µ(x) is the MRL of a survival function S(x) with finite mean E(X) = µ then:

1) µ(x) ≥ 0 ∀ x ≥ 0, (1.17a)
2) µ(0) = E(X), (1.17b)
0
3) if S(x) is absolutely continuous, then µ (x) ≥ −1, (1.17c)
Z∞
1
4) dx diverges, (1.17d)
µ(x)
0
 x 
µ(0)  Z 1 
5) S(x) = exp − du .  (1.17e)
µ(x)  µ(u) 
0

Proof of Theorem 6: It is fairly obvious why property 1) would be necessary since MRL is a
conditional expectation of a non–negative variate. Part 1) further follows from (1.15b) because
S(x) ≥ 0 ∀ x ≥ 0.
2) follows from (1.15b) with (1.5c) observing S(0) = 1.
To proof 3) we take a closer look at the derivative of µ(x), starting with (1.15b):9
R∞
−S 2 (x) + f (x) S(u) du
0 x
µ (x) =
S 2 (x)
= h(x) µ(x) − 1. (1.18)

As h(x) and µ(x) are non–negative we have µ0 (x) ≥ −1.


Forshowing 4) and 5) we once more begin with (1.15b), then by simplifying the expression for
−1 µ(x) we find that

1 S(x)
− = −∞ ,
µ(x) R
S(u) du
x
Z∞
d
S(u) du
dx
x
= ,
R∞
S(u) du
x

9
Rx 
By differentiating the numerator and denominator of µ(x) = S(u) du S(x) it can be shown that
0
 d −1
limx→∞ µ(x) = limx→∞ − dx ln f (x) .
20 1 The Hazard Rate and its Relatives
 ∞ 
Z
d
= ln S(u) du . (1.19a)
dx
x

Integrate each side of (1.19a) between 0 and u to obtain


 ∞
Zz Zz

Z
1
− dx = dln S(u) du (1.19b)
µ(x)
0 0 x
Z∞ Z∞
= ln S(u) du − ln S(u) du (1.19c)
z 0
Z∞
= ln S(u) du − ln µ(0). (1.19d)
z

The limit of (1.19d) for z → ∞ — after multiplication by (−1) — is

Zz Z∞
 
1
lim dx = lim ln µ(0) − ln S(u) du
z→∞ µ(x) z→∞
0 z
Z∞
 

= ln µ(0) − ln lim S(u) du


z→∞
z

= ln µ(0) − ln 0, as S(∞) = 0
= ln µ(0) + ∞.

Thus 4) has been proven.


Exponentiating each side of (1.19d) and using (1.15b) gives

 x  R∞
Z S(u) du
 1 
x µ(x)
exp − du = = S(x). (1.19e)
 µ(u)  µ(0) µ(0)
0

Finally, 5) follows from cross multiplication of each side of (1.19e) by µ(0) µ(x). 
(1.17d) is known as the inversion formula10
which serves as a starting point in expressing the
other four representatives of a lifetime distribution in terms of µ(x) :
 x 
0
1 + µ (x)  Z
1 
f (x) = µ(0) exp − du , (1.20a)
µ2 (x)  µ(u) 
0
 x 
0
1 + µ (x)  Z 0
1 + µ (u) 
= exp − du , (1.20b)
µ(x)  µ(u) 
0
 x 
Z
µ(0)  1 
F (x) = 1 − exp − du , (1.20c)
µ(x)  µ(u) 
0
10
M EILIJSON (1972) gives another proof of this formula based on the L APLACE transform.
1.1 The Univariate Continuous Case 21

1 + µ0 (x)
h(x) = , (1.20d)
µ(x)
  Zx
µ(x) 1
H(x) = ln + du, (1.20e)
µ(0) µ(u)
0
Zx
1 + µ0 (u)
= du. (1.20f)
µ(u)
0

Example 1/3: Finding PDF, CDF, CCDF, HR, and CHR from a given MRL

What are the five representatives of a lifetime distribution when its MRL is given by
µ(x) = a + b x; x ≥ 0, a > 0, b > 0?
The resulting distribution may be called linear mean residual lifetime distribution.11 Applying (1.17e)
and (1.20a–f) we find after some manipulation:
1+b
h(x) = ,
a + bx
 
1+b a + bx
H(x) = ln ,
b a
 1/b
a a
S(x) = ,
a + bx a + bx
 1/b
a a
F (x) = 1 − ,
a + bx a + bx
 1/b
a
a (1 + b)
a + bx
f (x) = .
(a + b x)2

MRL may also be written in terms of PDF, CDF, CCDF, HR, and CHR:12
R∞
u f (x + u) du
0
µ(x) = , (1.21a)
R∞
f (u) du
0
R∞
u f (u) du
x
= − x, (1.21b)
R∞
f (u) du
x
R∞  
1 − F (u) du
x
µ(x) = , (1.21c)
1 − F (x)
11
BARLOW /P ROSCHAN (1975) have shown that any mixture of exponential distributions yields a distribution with
decreasing HR what — see Sect. 2.3 – is equivalent to an increasing MRL. Based on this result M ORRISON
(1978) proved that when taking the gamma as the mixing distribution the result is a distribution with a linearly
increasing MRL which can be identified as the PARETO distribution of the second kind.
12
Note that it is possible for the MRL to exist but for the hazard rate function not to exist and
 vice versa. If, e.g.,
2
we modify the C AUCHY distribution to a half–C AUCHY distribution having f (x) = 2 [π (1 + x )], x ≥ 0,
the MRL does not exist whereas h(x) = 2 [(1 + x2 ) (π − 2 arctan x)].

22 1 The Hazard Rate and its Relatives

R∞
S(u) du
x
µ(x) = , (1.21d)
S(x)
R∞
 z 
R
exp − h(u) du dz
x
µ(x) =  0x  , (1.21e)
R
exp − h(u) du
0
R∞
exp{−H(u)} du Z∞
x  
µ(x) = = exp H(x) − H(u + x) du. (1.21f)
exp{−H(x)}
0

G UESS /P ROSCHAN (1978) stated several bounds for MRL depending on the moments, the CDF
and the percentile function of X. G UPTA (1981) showed how to express the moments of X in
terms of the mean residual lifetime. He also stated that MRL is the reciprocal of the hazard rate
of the asymptotic forward and backward recurrence times of a renewal process. Both recurrence
times have the same asymptotic distribution with PDF
1 − F (x)
f ∗ (x) = (1.22a)
E(X)
and CCDF
Z∞
1
S ∗ (x) = S(u) du. (1.22b)
E(X)
x
The corresponding HR is
f ∗ (x)
h∗ (x) =
S ∗ (x)
which upon inserting (1.22a,b) and taking the reciprocal gives
Z[∞
S ∗ (x) 1

= S(u) du = µ(x). (1.22c)
f (x) S(x)
x

Two survival functions S0 (x) and S1 (x) are said to have proportional mean residual life if
µ1 (x) = Θ µ0 (x) ∀ x ≥ 0 and Θ > 0, (1.23a)
where µ0 (x) and µ1 (x) are the respective mean residual lives at time x. It can be shown that if
S0 (x) and S1 (x) have proportional mean residual life, then
∞ 1/Θ−1
Z
S 0 (u) du
S1 (x) = S0 (x)   . (1.23b)
µ0 (0)
x

The hazard rate and the mean residual life are conditional concepts, both are conditioned on
survival to time x. An essential difference between HR and MRL is that the former accounts only
for the immediate future in assessing the event ‘unit failure’, whereas the latter accounts for the
whole future. This is readily seen if we multiply both h(x) and µ(x) by S(x):
dS(x)
h(x) S(x) = − , (1.24a)
dx
Z∞
µ(x) S(x) = S(u) du. (1.24b)
x
1.1 The Univariate Continuous Case 23

The right–hand side of (1.24a) depends on the probability law at the point x only, whereas the
right–hand side of (1.24b) depends on the probability law of X at all points in (x, ∞). This in-
tuition explains the difference between the two. Both, MRL and HR are needed in practice. In
theory we define classes of distributions depending on the behavior of MRL and HR, see Chap-
ter 2. The MRL function has a tremendous range of applications. For example, WATSON /W ELLS
(1961) use MRL in studying burn–in. Actuaries apply MRL to setting rates of benefits for life
insurance. Distributions with increasing MRL have been found useful as models in the social
science for the duration of wars and strikes or of jobs, a phenomenon called ‘inertia’.
In Tab. 1/1 we have summarized the most important relationships between the six representatives
of the lifetime distribution scattered in this and the preceding sections. The table has been ar-
ranged as an input–output table showing how to switch over from one representative to another
one.

1.1.2 The Hazard Rate for Special Cases


Assuming a continuous variate we will give results on the behavior of the hazard rate when we
apply special operations to the variate and its distribution. The results of this section may be
generalized to the discrete and the multivariate cases, see Sect. 1.2 and 1.3.

1.1.2.1 Transformation of Random Variables

Suppose we have a continuous random variable X with known representatives of its distribution,
and we consider a new random variable Y which is some function of X, i.e., let

y = g(x) (1.25a)

be a function of x such that its inverse

x = g −1 (y) (1.25b)

exists. When seeking the representatives of the Y –distribution in terms of those of the X–
distribution we have to distinguish between two cases.
In the first case let y = g(x) be a strictly increasing function. Then, if X is less than or equal
to x it follows that Y is less than or equal to the unique value of y that corresponds to the given
value of x. Thus, if X ≤ x, then Y ≤ g(x). Conversely, if Y ≤ y, then X ≤ g −1 (y), and the
probabilities of these events are equal, i.e.,

Pr(Y ≤ y) = Pr X ≤ g −1 (y)
 

F (y) = F g −1 (y) .
 
(1.26a)

(1.26a) can be confusing since the CDFs on opposite sides of the equation are not the same
functions. The one on the left–hand side is the CDF of Y , whereas the one on the right–hand side
is for the random variable X. To clarify this, we write (1.26a) as

FY (y) = FX g −1 (y) .
 
(1.26b)

From (1.26b), which relates the CDFs of X and Y , we can derive relationships for the PDFs,
CCDFs, HRs, and CHRs as well.13 Since the CCDF is the complement of the CDF it follows
from (1.26b) that 1 − SY (y) = 1 − SX g −1 (y) or that


SY (y) = SX g −1 (y) .
 
(1.26c)
13
Generally, the MRL of Y cannot be given easily and as an exact function of the X–MRL, see the excursus at the
end of this section.
24

Table 1/1: Relations among the six functions describing a continuously distributed stochastic lifetime
H
HHHH
HH
H to
H
HH
H f (x) F (x) S(x) h(x) H(x) µ(x)
HH
from H H
HH
HH
HH
R∞
∞  u f (x + u) du
Rx R∞ f (x) R 0
f (x) − f (u) du f (u) du − ln f (u) du
0 x x
R∞ R∞
f (u) du f (u) du
x x

R∞
0
[1 − F (u)] du
F (x) x
F (x) F 0 (x) − 1 − F (x) − ln{1 − F (x)}
1 − F (x) 1 − F (x)
R∞
0
S(u) du
0 −S (x) x
S(x) −R (x) 1 − S(x) − − ln[S(x)]
S(x) S(x)
 u 
R∞ R
 x   x   x  exp − h(v) dv du
R R R Rx x
h(x) h(x) exp − h(u) du 1 − exp − h(u) du exp − h(u) du − h(u) du  0x 
0 0 0 0
R
exp − h(u) du
0

R∞
exp{−H(u)} du
d {exp[−H(x)]} x
H(x) − 1 − exp{−H(x)} exp{−H(x)} H 0 (x) −
dx exp{−H(x)}
 
1 + µ0 (x) µ(0) µ(0) µ(x)
× µ(0) × 1− × × ln +
µ2 (x) µ(x) µ(x) 1 µ(0)
µ(x)  x   x   x  {1 + µ0 (x)} −
R 1 R 1 R 1 µ(x) Rx 1
× exp − du × exp − du × exp − du + du
0 µ(u) 0 µ(u) 0 µ(u) 0 µ(u)
1 The Hazard Rate and its Relatives
1.1 The Univariate Continuous Case 25

Next, the PDF is the derivative of the CDF, so we differentiate both side of (1.26b) with respect
to y, obtaining
d
fY (y) = FY (y)
dy
d
FX g −1 (y)]

=
dy
 d −1
= fX g −1 (y)

g (y), (1.26d)
dy

using the chain rule of differentiation. Since g −1 (y) is simply x we can simply write
 dx
fY (y) = fX g −1 (y)

. (1.26e)
dy
Furthermore, HR is the ratio of the PDF and the CCDF. Thus,
fY (y)
hY (y) =
SY (y)
 dx
fX g −1 (y)

dy
=  
SX g −1 (y)
 dx
= hX g −1 (y)

. (1.26f)
dy
Finally, the CHR is the negative of the ln–transformed CCDF, see (1.11f). So we have

HY (y) = − ln SY (y)
= − ln SX g −1 (y)
 

= HX g −1 (y) .
 
(1.26g)

Example 1/4: Increasing transformation of the reduced exponentially distributed variate

The reduced exponential distribution has


fX (x) = e−x , FX (x) = 1 − e−x , SX (x) = e−x , hX (x) = 1, HX (x) = x; x ≥ 0.
Let y = g(x) = x2 . We first have
dx 1
x = g −1 (y) = y 1/2 and = y −1/2 .
dy 2
The resulting representatives of the Y –distribution follow as
1/2
FY (y) = 1 − e−y ,
1/2
SY (y) = e−y ,
1 −1/2 −y1/2
fY (y) = y e ,
2
1 −1/2
hY (y) = y ,
2
HY (y) = y 1/2 .
 
For MRL we find µX (x) = 1, µY (y) = 2 1 + y 1/2 , so µX (x) is constant, whereas µY (y) is increasing.
We see that Y has a reduced W EIBULL distribution with shape parameter equal to 1/2.
26 1 The Hazard Rate and its Relatives

In the second case y = g(x) is a strictly decreasing function and the reasoning and the results
change a bit. In this case we see that if X is less than x, then Y will be greater than the value of
y which corresponds to the given value of x, conversely, if Y > y, then X < g −1 (y). In terms of
probabilities we have

Pr(Y > y) = Pr X < g −1 (y) = Pr X ≤ g −1 (y) or


   

SY (y) = FX g −1 (y) = 1 − SX g −1 (y) .


   
(1.27a)

Then
FY (y) = 1 − SY (y) = 1 − FX g −1 (y) = SX g −1 (y)
   
(1.27b)
and
d d
FY (y) = − FX g −1 (y)
 
fY (y) =
dy dy
 dx
= −fX g −1 (y)

(1.27c)
dy

by the chain rule. Since x = g −1 (y) is a decreasing function the derivative dx dy in (1.27c) will


be negative and the PDF fY (y) will be positive, as required. The HR of Y is


 dx
fX g −1 (y)

fY (y) dy
hY (y) = = −  
SY (y) 1 − SX g −1 (y)
 −1  SX g −1 (y)
 
dx
= −hX g (y) 
−1
 . (1.27d)
1 − SX g (y) dy

In general, the CHR of Y cannot be written in terms of the X–CHR, but in can be expressed in
terms of the X–CDF as

HY (y) = − ln SY (y) = − ln FX g −1 (y) .


 
(1.27e)

Example 1/5: Decreasing transformation of the reduced exponentially distributed variate

We take the transformation y = g(x) = x−1 and have

dx
x = g −1 (y) = y −1 , = −y −2 .
dy

From the representatives of the X–distribution in Example 1/4 and using (1.27a–e) we find
−1
SY (y) = 1 − e−y ,
−y −1
FY (y) = e ,
−2 −y −1
fY (y) = y e ,
−y −1
e y −2
hY (y) = y −2 −1 = ,
1 − e−y e−y−1 − 1
h −1 
HY (y) = − ln 1 − e−y .

The distribution of Y is recognized as the type–II maximum extreme value distribution, also known as
inverse W EIBULL distribution.
1.1 The Univariate Continuous Case 27

We now explore two special transformations. The first one is the linear transformation:

y = g(x) = a + b x, b 6= 0. (1.28a)

is increasing (b > 0) we have from (1.26f,g) with x = g −1 (y) = (y − a) b



If the transformation
and dx dy = b−1 :

 
1 y−a
hY (y) = hX , (1.28b)
b b
i.e., the HR at a given value of the new variable is b−1 times the hazard at the value of the original
variable corresponding to the given value of the new variable, and
 
y−a
HY (y) = HX , (1.28c)
b
i.e., the CHR of Y is simply the CHR of X evaluated at the retransformed y–value. For a de-
creasing linear transformation, i.e., for y = a + b c, b < 0, we have from (1.27d,e):
 
y−a
  SX
1 y−a b
hY (y) = hX  , (1.28d)
b b y−a
FX
b
 
y−a
HY (y) = − ln FX . (1.28e)
b

The second special case is the probability integral transformation:

y = g(x) = FX (x), (1.29a)

where FX (x) is the increasing CDF of X. We note that x = g −1 (y) = FX−1 (y), 0 ≤ y ≤ 1, i.e.,
x is given by the percentile function of X. From (1.26b) we have
h i
FY (y) = FX g −1 (y) = FX FX−1 (y) = y,
 
(1.29b)

therefore
d
fY (y) =
FY (y) = 1. (1.29c)
dy
Thus, Y has the reduced uniform distribution with

fY (y) = 1 for 0 ≤ y ≤ 1. (1.29d)

The other representatives of the Y –distribution are:

SY (y) = 1 − y, (1.29e)
1
hY (y) = , (1.29f)
1−y
HY (y) = − ln(1 − y). (1.29g)

Excursus: Moments of transformed variates


With the only exception of the linear transformation the moments of a transformed variate cannot be given
as exact functions of the moments of the original variate. The following approximations are based on the
delta method (method of statistical differentials). For the mean of Y = g(X) we have
  Var(X) d2 g(x)
µY = E(Y ) ≈ g E(X) + (1.30a)
2 dx2 x=E(X)

28 1 The Hazard Rate and its Relatives

and for the variance of Y  2


dg(x)
σY2 = Var(y) ≈ Var(X) . (1.30b)
dx x=E(X)

For the linear transformation Y = a + b x we have exact relationships:

E(a + b X) = a + b E(X), (1.30c)


Var(a + b X) = b2 Var(X). (1.30d)

With b > 0 in Y = a + b X the MRLs of X and Y are related as


 
y−a
µY (y) = a + b µX . (1.30e)
b

1.1.2.2 Mixing and Compounding14

In some situations, units may not come from a homogeneous population. A demographer who is
to construct a nation’s life table might encounter several ethnic groups having different patterns of
mortality. A reliability engineer, for instance, might have a component that has been manufactured
in one of two facilities, but is not certain which one the unit comes from. In finite mixture
models, a unit is assumed to be from one of m populations. The case m = ∞ is called countable
mixture. When there is a single population that is mixed by a continuous parameter Θ (for
example, the amount of impurities present in a raw material or the temperature of solder applied
in a circuit board), a stochastic parameter model (= continuous mixture model) is appropriate.
Suppose that F (x | θ) represents the lifetime CDF given that Θ = θ and that G(θ) represents the
CDF of the random parameter Θ. The function F (x), defined by
Z
F (x) = F (x | θ dG(θ), (1.31a)
all θ

which is the marginal CDF of X, is called compound distribution of F (·) and G(·). F (x | θ) is
known as the kernel and G(·) is the mixing (or compounding) distribution. If the entire mass
of the corresponding measure of G(·) is confined to a countable number of points θ1 , θ2 , . . . and
the masses at θj ; j = 1, 2, . . . ; are G(θj ), then (1.31a) takes the form

X
F (X) = F (x | θj ) G(θj ), (1.31b)
j=1

which is a countable mixture CDF.15 If the entire mass of the corresponding measure G(·) is
confined to only a finite number of finite points θ1 , θ2 , . . . , θm , then (1.31a) becomes a finite
mixture of m components whose CDF is given by
m
X
F (x) = F (x | θj ) G(θj ). (1.31c)
j=1

To simplify notation in (1.31b,c), we write

pj := G(θj ) and Fj (x := F (x | θj ),
14
Suggested reading for this section: A L –H USSAINI /S ULTAN (2001).
15
For example, the non–central χ2 –distribution is a countable mixture of P OISSON and χ2 –distributions.
1.1 The Univariate Continuous Case 29

so that (1.31b) turns into



X
F (x) = Fj (x) pj , (1.32a)
j=1

P
where pj ≥ 0 ∀ j and pj = 1. Also, (1.31c) becomes
j=1
m
X
F (x) = Fj (x) pj , (1.32b)
j=1
m
P
where pj ≥ 0 ∀ j and pj = 1. In (1.32a,b) pj is known as the j–th mixing proportion and
j=1
Fj (x) as the j–th component in the mixture. It may be noticed that the choice Fj (x) := F (x | θj )
restricts the CDF F (x | θj ) for all values of j to belong to the same family of distributions. How-
ever, formulas (1.32a,b) are written in the most general forms on which each of the CDFs Fj (x)
could belong to a distinct family. The only requirement here is that, for any j, Fj (x) is a CDF.
If, in (1.31a), G(θ) is absolutely continuous, a PDF g(θ) exists such that g(θ) = G0 (θ) and if
f (x) and f (x | θ) are the PDFs corresponding to the CDFs F (x) and F (x | θ), than from (1.31a)
we have Z
f (x) = f (x | θ) g(θ) dθ. (1.33a)
all θ
Similarly, the PDFs corresponding to (1.32a,b) are given by
m
X
f (x) = fj (x) pj and (1.33b)
j=1
X∞
f (x) = fj (x) pj , (1.33c)
j=1

where fj (x) is the j–th component density function corresponding to Fj (x).


Having found the mixed CDF and PDF by one of the foregoing formulas, we can find the HR,
CHR, and MRL, respectively, of the mixed population by applying (1.7b), (1.11f), and (1.15c),
respectively.

Example 1/6: Continuous mixture of exponential distributions16

Let the lifetime X have an exponential distribution with a positive parameter (= scaling factor) θ :
f (x | θ) = θ e−θ x , x ≥ 0.
Now suppose that θ > 0 is a realization of a random variable Θ which also has an exponential distribution,
but with parameter λ :
g(θ) = λ e−λ θ , λ > 0.
In this case the compound distribution of X is found using the integration by parts:
Z∞
f (x) = θ e−θ x λ e−λ θ dθ
0
Z∞
= λ θ e−θ (x+λ) dθ
0
∞
λ e−θ (x+λ)
 Z
λ
= − + e−θ (x+λ) dθ
x+λ x+λ 0
30 1 The Hazard Rate and its Relatives
∞
λ θ e−θ (x+λ)

λ −θ (x+λ)
= − − e
x+λ (x + λ)2 0
λ
= , x ≥ 0.
(x + λ)2
This is recognized as a special case of the log–logistic distribution, see Sect. 3.1. The corresponding CDF
and CCDF are
λ
F (x) = 1 − ,
x+λ
λ
S(x) = .
x+λ
So the HR and CHR follow as
1
h(x) = ,
x+λ
 
x+λ
H(x) = ln .
λ
A MRL µ(x) does not exist. We notice that, while the HR of each f (x | θ) is a constant, namely h(x | θ) =
θ, the HR of the mixture is decreasing. This result holds for finite mixtures of exponential distributions,
see below.

It can be shown that in the case of a finite mixture the HR and the MRL of the compound dis-
tribution may be written in terms of the HRs hj (x) and of the MRLs µj (x) of the m mixed
distributions:
Pm
pj Sj (x) hj (x)
j=1
h(x) = m , (1.34a)
P
pj Sj (x)
j=1
m
P
pj Sj (x) µj (x)
j=1
µ(x) = m . (1.34b)
P
pj Sj (x)
j=1

Thus, the HR and MRL of the mixed model may be considered as a weighted average of the HRs
and MRLs of the individual populations, the weights being pj Sj (x). One interesting property of
a mixed exponential model is that it has a decreasing HR. Suppose, Xj ; j = 1, 2, . . . , m; are
exponentially distributed with scale parameter bj , bj > 0, respectively, then
m
P 1  x
pj exp −
f (x) j=1 bj bj
h(x) = = m , x ≥ 0.
S(x) P  x
pj exp −
j=1 bj
It can P
be shown that this HR is a decreasing function, decreasing from
 the average of the failure
m 1
rates, j=1 pj bj , at x = 0, to the minimum of the failure rates, 1 max(bj ), as x → ∞. This
suggests one possible justification for a decreasing HR model.

1.1.2.3 Formation of Systems17

Up to now we have considered modeling lifetime of single units, components, people etc. by using
hazards and its relatives. However, it is especially true in the engineering sciences that pieces of
17
Suggested reading for this section: BARLOW /P ROSCHAN (1975), C ROWDER et al. (1991, Chapter 9), L EEMIS
(1995, Chapter 2), M EEKER /E SCOBAR (1998, Chapter 15), S MITH (2002, Chapter 3).
1.1 The Univariate Continuous Case 31

equipment consist of many — possibly different — interacting components. The term ‘reliability’
is commonly used to describe the ‘survival’ of such components and of such a system. Essentially,
the reliability of a component is the probability that it is operational. The primary concern of
engineers when looking at a system of components is its reliability and how the reliabilities of
individual components affect the reliability of the entire system. Once the the system reliability
has been found we can calculate the system hazard applying (1.8c). We will only give a short
introduction into the theory of reliability of systems; more details may be found in the suggested
readings.
It is certainly true that the reliability of components may change with time. However, initially we
make the assumption that at some instant in time we are able to observe the components and know
whether they are functioning or not. Let Ci ; i = 1, 2, . . . , m; denote component i and suppose
that each component has one of two operational states: ‘functioning’ and ‘not functioning’. For
each i the indicator zi associated with Ci is defined by
 
 1 if Ci is functioning 
zi = ; i = 1, 2, . . . , m. (1.35a)
 0 if C is not functioning 
i

A structure function is a useful tool in describing the way m components are related to form a
system. The structure function defines the system state as a function of the component states and
is given by  
 1 if the system is functioning 
φ(z1 , . . . , zm ) = (1.35b)
 0 if is not functioning. 

Since there are m components there are 2m different values that the system state vector

z = (z1 , z2 , . . . , zm )

can assume and m



j of these vectors correspond to exactly j functioning components; j =
0, 1, . . . , m. The structure function φ(z), maps the system state vector z to 0 or 1, yielding
the state of the system. The most common system structures are the series and parallel systems
and most other complicated structures can be reduced to these two types.
A series system functions when all its components function. Thus φ(z) assumes the value 1
when z1 = z2 = . . . = zm = 1, and 0 otherwise. Therefore, its structure function φS (z) is given
by
 
 0 if there exists an i such that zi = 0 
φS (z) = (1.36a)
 1 if z = 1 ∀ i, 
i

= min(z1 , z2 , . . . , zm ), (1.36b)
m
Y
= zi . (1.36c)
i=1

These three different ways of expressing the value of the structure function are equivalent, al-
though (1.36c) is preferred because of its compactness. The block–diagram in Fig. 1/3 visual-
izes a series system of m components.18 Systems that function only when all their components
function should be modeled as series systems.
18
A block–diagram is a graphic device for expressing the arrangement of the components to form a system. If
a path can be traced through functioning components from left to right on a block–diagram, then the system
functions. The boxes represent the components, and either component numbers i or probabilities Pi are placed
inside the boxes.
32 1 The Hazard Rate and its Relatives

Figure 1/3: Series system block–diagram

A parallel system functions when one or more of its components function. Its structure function
φP (z) assumes the value 0 when z1 = z2 = . . . = 0, and 1 otherwise. Therefore,
 
 0 if zi = 0 ∀ i, 
φP (z) = (1.37a)
 1 if there exists an i such that z = 1 
i

= max(z1 , z2 , . . . , zm ), (1.37b)
m
Y
= 1 − (1 − zi ). (1.37c)
i=1

See Fig. 1/4 for a block–diagram of a parallel arrangement of m components. Such an arrange-
ment is appropriate when all components must fail for the system to fail.
Figure 1/4: Parallel system block–diagram

To avoid studying structure functions that are unreasonable, a subset of all possible system of m
components, that is, coherent systems, has been defined. A system is coherent if

1. its structure function is non–decreasing in z, i.e.,

φ(z1 , . . . , zi−1 , 0, zi+1 , . . . , zm ) ≤ φ(z1 , . . . , zi−1 , 1, zi+1 , . . . , zm ) (1.38a)

and

2. there are no irrelevant components, i.e., component Ci is irrelevant if, for all states of the
other components in the system (that is, for all values of zj for j 6= i)

φ(z1 , . . . , zi−1 , 1, zi+1 , . . . , zm ) = φ(z1 , . . . , zi−1 , 0, zi+1 , . . . , zm ). (1.38b)

The structure function of a coherent system may be quite difficult to describe in simple terms.19
However, it can be shown that the structure function of any coherent system is bounded above and
19
Some techniques in this context are the formation of path vectors, minimal path vector, cut vectors, and minimal
cut vector.
1.1 The Univariate Continuous Case 33

below by the structure functions of parallel and series systems what inevitably leads to bounds on
the reliability of coherent systems, see (1.44c).
Theorem 7: If φC (z) is the structure function of a coherent system of m components in the state
vector z = (z1 , z2 , . . . , zm ), then
m
Y m
Y
φS (z) = zi ≤ φC (z) ≤ φP (z) = 1 − (1 − zi ).  (1.39)
i=1 i=1

Series and parallel systems are coherent systems. Before showing how the structure function is
related to the system reliability we present some other types of coherent systems.

1. Systems with components in series–parallel


Methods for evaluating the reliability of structures with components in both series and
parallel provide the basis for evaluating more complicated structures. There are two types
of simple (rectangular) series–parallel structures.
1.1 Series–parallel system structure with system–level redundancy
In some applications it is more cost effective to achieve higher reliability by using two
or more copies of a series system rather than having to improve the reliability of the
single system itself. This idea leads to a r×k series–parallel system–level redundancy
structure having r parallel sets, each of k components in series. The structure function
reads  
Y r Yk
φ(z) = 1 − 1 − zij  , (1.40a)
i=1 j=1

if the components are independent. Fig. 1/5 shows such a 2 × 2 structure.


Figure 1/5: Block–diagram of a 2 × 2 series–parallel system with
system–level redundancy

For the system depicted in Fig. 1/5 the structure function is


  
φ(z) = 1 − 1 − z11 z12 1 − z21 z22 ,
and there are — out of 16 possible vectors — seven system state vectors leading to
φ(z) = 1.
1.2 Series–parallel system structure with component–level redundancy
Component redundancy is an important method for improving system reliability. A
r × k component–level redundant structure has k series structures, each one made of
r components in parallel. If it is necessary to have only one path through the system
such a structure is, for a given number of identical components, more reliable than the
series–parallel system with system–level redundancy. The structure function reads
 
Yk Yr
φ(z) = 1 − zij  , (1.40b)
i=1 j=1
34 1 The Hazard Rate and its Relatives

if the components are independent. The structure function is the product of the struc-
ture functions of the k parallel subsystems each consisting of r components. Fig. 1/6
shows such a 2 × 2 structure where (1.40b) results into
  
φ(z) = 1 − (1 − z11 ) (1 − z12 ) 1 − (1 − z21 ) (1 − z22 ) ,
and there are nine system state vectors — out of 16 possible vectors — leading to
φ(z) = 1.
Figure 1/6: Block–diagram of a 2 × 2 series–parallel system with
component–level redundancy

2. k–out–of–m system
Another way of increasing system reliability consists in supplying more components than
are necessary for functioning. By a k–out–of–m system (k ≤ m) we mean a system of m
components which will function provided at least k of its components are functioning. This
means that the structure function is
k
 
P
 1 if zi ≥ k, 
 


φ(z) = i=1 (1.41a)
 Pk 
 0 if
 zi < k.  
i=1

Fig. 1/7 shows the block–diagram of a 2–out–of–3 system which looks like that of a series–
parallel system with system–level redundancy. Note that this diagram does not reflect the
physical layout, but rather the paths through the system that will allow operation of the
system.
Figure 1/7: Block–diagram of a 2–out–of–3 system

The structure function of the 2–out–of–3 system is


φ(z) = 1 − (z1 z2 ) (1 − z1 z3 ) (1 − z2 z3 ). (1.41b)
There are two border cases of the k–out–of–m system:

1) the series system is a m–out–of–m system,


2) the parallel system is a 1–out–of–m system.
1.1 The Univariate Continuous Case 35

3. Bridge system
Bridge–structure systems provide another useful way of improving the reliability of certain
systems. Fig. 1/8 illustrates a simple bridge system where component 3 is the bridge.
If component 3 is working (not working), this system has the same structure as Fig. 1/6
(Fig. 1/5).
Figure 1/8: Block–diagram of a bridge system

A moment’s reflection on the diagram in Fig. 1/8 reveals that the system functions if any
one of the following sets of components functions:

{1, 4}, {1, 3, 5}, {2, 5}, {2, 3, 4}.

These sets are referred to as minimal path sets. Since one or more of these sets of com-
ponents must function for the system to function, the block–diagram may be written as
a parallel arrangement of these sets, each set being a series arrangement of its members.
Thus, the structure function corresponding to Fig. 1/8 is

φ(z) = 1 − (1 − z1 z2 ) (1 − z1 z3 z5 ) (1 − z2 z5 ) (1 − z2 z3 z4 ). (1.42)

We now turn to a technique of finding the reliability20 of a coherent system of m independent


components. We introduce the notation Zi to denote the random state of Ci at a given point in
time:  
 0 if Ci has failed 
Zi = ; i = 1, 2, . . . , m.
 1 if C is functioning 
i

These m variates can be written as a random state vector Z. The probability that Ci is functioning
at a certain time is given by
Pi = Pr(Zi = 1). (1.43a)
These m values can be written as a reliability vector:

P = (P1 , P2 , . . . , Pm ). (1.43b)

The system reliability at a certain time is defined as


 
R(P ) = Pr φ(Z) = 1 , (1.43c)

because R is a quantity that can be


 calculated  P . The method of calculation used
 from the vector
here is based on the fact that Pr φ(Z) = 1 is equal to E φ(Z) , since φ(Z) is a B ERNOULLI
random vector. Consequently, the expected value of φ(Z) is the system reliability:
 
R(P ) = E φ(Z) . (1.43d)
20
There exist several techniques, see L EEMIS (1995, pp. 28ff.), each having its special advantages and disadvan-
tages.
36 1 The Hazard Rate and its Relatives

For instance, applying (1.43d) to the structure function φS (·) of a series system — see (1.36c) —
we find
 
RS (P ) = E φS (Z)
"m #
Y
= E Zi
i=1
m
Y 
= E Zi , because of independence
i=1
m
Y
= Pi . (1.44a)
i=1

Likewise we find the reliability of a parallel system using φP (·) of (1.37c) as


 
RP (P ) = E φP (Z)
m
" #
Y
= E 1 − (1 − Zi )
i=1
"m #
Y
= 1−E (1 − Zi )
i=1
m
Y 
= 1− E 1 − Zi , because of independence
i=1
m
Y
= 1 − (1 − Pi ). (1.44b)
i=1

We can state the following rule to find the reliability of a coherent system with independent
components:

1. Determine the system structure function φ(Z).

2. Replace each Zi by Pi = Pr(Zi = 1).

3. The result is R(P ).

Applying this rule to (1.39) we can state that for any coherent system with structure function
φC (Z), Z = (Z1 , Z2 , . . . , Zm ), its reliability R(P ) is bounded above and below by the relia-
bilities of a series system and a parallel system, each having the same components:
m
Y m
Y
RS (P ) = Pi ≤ R(P ) ≤ RP (P ) ≤ 1 − (1 − Pi ). (1.44c)
i=1 i=1

This inequality is not especially sharp. For instance, having m = 5 identical components acting
independently with P1 = P2 = . . . = P5 = 0.9 we find

RS (P ) = 0.59049 ≤ R(P ) ≤ RP (P ) = 0.99999.

In order to introduce time dependency into the reliability function we have to substitute Pi by the
survival function Si (x). Let

S(x) = S1 (x), S2 (x), . . . , Sm (x) ,
1.1 The Univariate Continuous Case 37
 
then the time dependent system reliability function is denoted R S(x) and the system hazard
rate will be  
dR S(x) dx
h(x) = −   . (1.45)
R S(x)
In general, (1.45) will not result into a handsome formula,
 even when we assume identical com-
ponents so that Si (x) = S(x) ∀ i. In most cases dR S(x) dx has to be determined by numerical
differentiation. Therefore, we only take a look at the hazard rate functions of the two most simple
systems, i.e., the series and the parallel systems.
The reliability function of a series system is
m
  Y
RS S(x) = Si (x). (1.46a)
i=1

Now, by  x 
 Z 
Si (x) = exp − hi (u) du , (1.46b)
 
0
see (1.9b), where hi (.) is the HR of component Ci , we first have
 
 X m Zx 
 
RS S(x) = exp − hi (u) du (1.46c)
 
i=1 0

and upon interchanging summation and integration we find


 x 
 Z Xm 
 
RS S(x) = exp − hi (u) du . (1.46d)
 
0 i=1

Because RS (S(x) is the survival function of the series system (1.9b) holds and
m
X
hS (x) = hi (x) (1.46e)
i=1

must be the hazard rate of the series system which is the sum of the component hazard rates.
Thus, the series system hazard rate is the higher the more components are linked together. This is
in accordance with the fact that the series system reliability is a decreasing function of the number
of components. When the components have identical lifetime distributions we have hi (x) =
h(x) ∀ i and (1.46e) turns into
hS (x) = m h(x). (1.46f)
Because of (1.36b) this is the hazard rate of the minimum order statistic.
The reliability function of a parallel system is
m
Y
 
RP S(x) = 1 − Fi (x). (1.47a)
i=1

Assuming identical components — Fi (x) = F (x) ∀ i — (1.47a) turns into


   m
RP S(x) = 1 − F (x) (1.47b)

so that (1.45) gives the hazard rate of the parallel system


 m−1
m f (x) F (x)
hP (x) =  m , (1.47c)
1 − F (x)
38 1 The Hazard Rate and its Relatives

which — because of (1.37b) — is nothing but the hazard rate of the maximum order statistic.
Let
f (x)
h(x) =
1 − F (x)
be the hazard rate of a component, then (1.47c) can be transformed into
 m−1
m F (x)  0
hP (x) = m−1
h(x), F (x) = 1, (1.47d)
P  i
F (x)
i=0
m
= m−1
h(x). (1.47e)
P  −i
F (x)
i=0

(1.47e) follows from (1.47d) when dividing the numerator and the denominator on the right–hand
 m−1  . m−1P  −i 
side by F (x) . The factor m F (x) goes to 0 as x → 0, and it goes to 1 as
1=0
x → ∞, thus the hazard rate of the parallel system is always less than the hazard rate of an
individual component. This is in accordance with the fact that the reliability of a parallel system
is always higher than that of an individual component.

1.1.2.4 Acceleration and Proportional Hazards21

The accelerated life model and the proportional hazards model are designed to include a vector
z of covariates (= explanatory variables) zi ; i = 1, . . . , k; in a lifetime model. zi influences
the lifetime X of the unit under study, and the zi are non–random. Covariates may account
for the fact that the population of units is not truly homogeneous. Other possibilities for the
elements of z include cumulative load applied, time–varying stress, and environmental factors.
The difference between accelerated life models and proportional hazards models is that in the
first case the covariates affect the rate at which the unit ages, and in the second case the covariates
increase or decrease the hazard rate. So, in accelerated life models the survival function has to be
modeled and in proportional hazards models the hazard rate has to be modified.
We first give a short introduction to accelerated life models. The question here is how to link the
covariates to the survival function. One approach is to define one lifetime model when z = 0,
called the baseline model and other models for z 6= 0. Analysis is simplified when there is only
a single model appropriate for all values of z. The survivor function of X in the accelerated life
model is
 
S(x) = S0 x ψ(z) , x ≥ 0, (1.48a)
where S0 (·) is a baseline survival function and ψ(z) is a link function. The covariates are
linked to the lifetime by ψ(z), satisfying

ψ(0) = 1 and ψ(z) > 0 ∀ z 6= 0. (1.48b)

With these attributes of ψ(z), z = 0 implies S0 (x) = S(x). A very popular choice for ψ(z) is
the log–linear link function
ψ(z) = exp β 0 z .

(1.48c)
The vector β represents regression coefficients
 and z is a vector of non–random regressors. With
(1.48c) the covariates accelerate β 0 z > 0 or decelerate β 0 z < 0 the rate at which a unit


21
Suggested reading for ‘acceleration’: N ELSON (1990), and for ‘proportional hazards’: C OX /OAKES (1984).
1.1 The Univariate Continuous Case 39

moves through time with respect to the baseline case. Other, less popular choices for the link
function are −1
ψ(z) = β 0 z and ψ(z) = β 0 z .
With these two specifications it may happen that ψ(z) < 0 for some value of β resulting into a
negative lifetime. The other
 lifetime distribution representatives for an accelerated lifetime model
with S(x) = S0 x ψ(z) are
 
F (x) = 1 − S(x) = S0 x ψ(z) , (1.48d)
 
f (x) = ψ(z) f0 x ψ(z) , (1.48e)
 
f (x) f0 x ψ(z)  
h(x) = = ψ(z)   = ψ(z) h0 x ψ(z) , (1.48f)
S(x) S0 x ψ(z)
n  o  
H(x) = − ln S0 x ψ(z) = H0 x ψ(z) . (1.48g)

We recognize that these formulas resemble those of the variable of the variable transformation of
Sect. 1.1.2.1.
Whereas accelerated lifetime models modify the rate that the unit moves through time, propor-
tional hazard models modify the hazard rate by the factor ψ(z) :

h(x) = ψ(z) h0 (x), x ≥ 0. (1.49a)

h0 (x) is called the baseline hazard, representing the hazard rate for a unit having ψ(z) = 1. As
before, a popular choice for the link function here is the log–linear form (1.48c), and the hazard
rate increases when β 0 z > 0 and decreases when β 0 z < 0. The ‘proportional’ terminology arises
in a perfectly natural way. If two units 1 and 2 have lifetimes depending on respective vectors of
covariate values z 1 and z 2 , then
h1 (x) h0 (x) ψ(z 1 ) ψ(z 1 )
= = ,
h2 (x) h0 (x) ψ(z 2 ) ψ(z 2 )
showing clearly how the baseline hazards cancel from this ratio, so that the hazard ratio for the
two units does not depend on lifetime x. The other lifetime distribution representatives can be
determined from (1.49a):
Zx
H(x) = h(u) du
0
Zx
= ψ(z) h0 (u) du
0
= ψ(z) H0 (x), (1.49b)
 x 
 Z 
S(x) = exp − ψ(z) h0 (u) du
 
0
 
 Zx 
= exp −ψ(z) h0 (u) du
 
0
  x ψ(z)
 Z 
= exp− h0 (u) du
 
0
 ψ(z)
= S0 (x) , (1.49c)
40 1 The Hazard Rate and its Relatives
 ψ(z)
F (x) = 1 − S0 (x) , (1.49d)
 ψ(z)−1
f (x) = f0 (x) ψ(z) S0 (x) . (1.49e)

Example 1/7: Accelerated life model and proportional hazards model for W EIBULL baseline
The W EIBULL baseline survival function for the accelerated life model is
h  x c i
S0 (x) = exp − ; x ≥ 0; b, c > 0,
b
where b is a scale parameter and c is a shape parameter. Introducing a link function ψ(z) into S0 (·)
according to (1.48a) gives
 
SA (x) = S0 x ψ(z)
  c 
x ψ(z)
= exp − . (1.50a)
b
Thus, the accelerated
 lifetime has a W EIBULL distribution as well, but the scale parameter has changed
from b to bA = b ψ(z). The hazard rate belonging to (1.50a) is
hA (x) = ψ(z) h0 (x)
 c−1
c x ψ(z)
= ψ(z)
b b
 c−1
c x ψ(z)
=  . (1.50b)
b ψ(z) b

For the proportional hazards model the W EIBULL baseline hazard is


c  x c−1
h0 (x) = .
b b
So, according to (1.49a) the hazard rate for a unit with covariate vector z and link function ψ(z) is
c  x c−1
hP (x) = h0 (x) ψ(z = ψ(z). (1.51a)
b b
c 
Inserting the usual baseline S0 (x) = exp − xb

for the W EIBULL distribution into (1.49c) the survival
function corresponding to (1.51a) is
n h  x c ioψ(z
SP (x) = exp −
b
h x c i
= exp − ψ(z)
 b  c 
1 c
x ψ(z)
= exp−   . (1.51b)
b

(1.51b) can be recognized


 as a W EIBULL distribution as well, but contrary to (1.50a) the scale parameter
1 c

is bP = b ψ(z) . The W EIBULL distribution is the only baseline distribution where the accelerated life
and the proportional hazards models coincide in this fashion. (1.50a) and (1.51b) are identical for c = 1
which is an exponential distribution.

1.1.2.5 Truncated Distributions22

Truncation and censoring are two operations which often are confused in statistical literature, but
there is a clear distinction between these two concepts. Truncation is confined to the distribution
22
Suggested reading for this section: C OHEN (1991).
1.1 The Univariate Continuous Case 41

or to the population whereas censoring is related to samples. So we talk about truncated distri-
butions and censored samples, and we may have a censored sample from a truncated distribution.
Truncation means that the original or natural support of a distribution has been shrunken so that
the portion of the population in the truncated part can never be observed and a certain part of the
original probability mass will be cut off. Thus, truncation modifies the distribution and leads to
conditional distributions. Censoring, either from an unmodified distribution or from a truncated
distribution, modifies the selection of the random variables and is thus related to the sampling
process. Censoring means that — for one reason or the other — the statistician refrains from
measuring the exact value of a unit’s characteristic when this value falls inside a certain area,
e.g., is greater or smaller than a given threshold. Censored samples produce two types of obser-
vations, those with known and complete value of the characteristic under study, and those which
fall into a special region of the characteristic and are thus only known by their frequency and not
by their value. A censored observation is distinct from a missing observation in that the order of
the censored observation relative to some of the uncensored observations is known and conveys
information regarding the distribution being sampled.
There are three common types of truncation:

• left truncation, also known as lower truncation or truncation from below,

• right truncation, also known as upper truncation or truncation from above,

• double truncation, also known as truncation on both sides or truncation from below
and above.

Regarding censoring we have substantial more types which are fully described and discussed in
R INNE (2009, p. 291–312) and in Sect. 4. Truncation of a lifetime distribution leads to different
types of lifetime:

• Left truncation gives future lifetime or remaining lifetime, i.e., lifetime greater than xl , the
lower point of truncation. Left truncation may be realized by burn–in or by preselecting
of apparently weak looking units (freaks).

• Right truncation gives early lifetime or young lifetime, i.e., lifetime less than xu , the upper
point of truncation. Right truncation will be met with when for economic or safety reasons
items will be in operation for at most xu units of time.

• Double truncation gives interim lifetime, i.e., lifetime which is less than xu , but greater
than xl and thus is in the interval [xl , xu ]. We may think of interim lifetime as either future
lifetime truncated on the right or as early lifetime truncated on the left.

We first present results for the future lifetime which has already been introduced in Sect. 1.1.1.4
in conjunction with the introduction of the hazard rate. Future lifetime beyond xl will be denoted

X − xl | X ≥ xl := Y | xl

and its distributional representatives — expressed in terms of the original distribution —are
f (xl + y) f (xl + y)
fl (y | xl ) = = , xl ≥ 0, y ≥ 0, (1.52a)
1 − F (xl ) S(xl )
F (xl + y) − F (xl ) S(xl ) − S(xl + y)
Fl (y | xl ) = = , (1.52b)
1 − F (xl ) S(xl )
S(xl + y) 1 − F (xl + y)
Sl (y | xl ) = = , (1.52c)
S(xl ) 1 − F (xl )
42 1 The Hazard Rate and its Relatives

fl (y | xl ) f (xl + y)
hl (y | xl ) = = = h(xl + y). (1.52d)
Sl (y | xl ) S(xl + y)
The hazard rate for the future lifetime Y of a distribution truncated on the left at xl is identical
to that of the original distribution at x = xl + y, i.e., the courses of these two hazard rates only
differ by a translation.
Zy
Hl (y | xl ) = hl (u | xl ) du,
0
Zy
= h(xl + u) du,
0
xZl +y

= h(v) dv, v = xl + u,
xl
= H(xl + y) − H(xl ). (1.52e)

The raw moments of Y | xl are


Z∞
k k
E Y | xl ) = y k−1 S(xl + y) dy. (1.52f)
S(xl )
0

Especially, for k = 1 we have


Z∞
1
E(Y | xl ) = S(xl + y) dy (1.52g)
S(xl )
0

which — in Sect. 1.1.1.e — was denoted µ(xl ) and called the MRL of X. The MRL of the future
lifetime — after some manipulations — is
Z∞
1
µl (y) = S(xl + v) dv = µ(xl + y), (1.52h)
S(xl + y)
y

i.e., the MRL of the left–truncated variable, truncated at xl , after y units of time is the same as
that of the original variable at age xl + y. Like the HR the MRL is translated. The percentile
function of Y | xl follows from
 F (xl + yP ) − F (xl )
F yP | xl = = P, 0 ≤ P ≤ 1,
1 − F (xl )
as
yP = F −1 p + (1 − P ) F (xl ) − xl ,
 
(1.52i)
i.e., yP is equal to the percentile of order P + (1 − P ) F (xl ) of the original distribution, but
reduced by the value of the truncation point xl . Sometimes (particularly, when the distribution is
highly skewed), the median is preferred to the mean, in which case, the quantity ‘median residual
life’ at xl is preferred to the mean residual lifetime. The median residual lifetime at xl is the
length of the interval from xl to that time where one–half of the units alive at xl will still be alive.
Now we give results for the early lifetime, denoted

xu − X | X ≤ xu =: Y | xu .
1.1 The Univariate Continuous Case 43

f (y) f (y
fu (y | xl ) = = , xu > 0, 0 ≤ y ≤ xu , (1.53a)
F (xu ) 1 − S(xu
F (y) 1 − S(y)
Fu (y | xl ) = = , (1.53b)
F (xu ) 1 − S(xu )
S(y) − S(xu ) F (xu ) − F (y)
Su (y | xl ) = = , (1.53c)
1 − S(xu ) F (xu )
fu (y | xu ) f (y) S(y)
hu (y | xl ) = = = h(y) . (1.53d)
Su (y | xu ) S(y) − S(xu ) S(y) − S(xu )
Since S(y) > S(y) − S(xu ) for y < xu the hazard rate of the right–truncated distribution is
greater than that of the original distribution. (1.53d) shows that truncation from below markedly
affects the course of the HR, whereas truncation from above only shifts the HR, see (1.52d) and
Fig. 1/9. Since the HR at y is conditional on survival to y, truncation below y is immaterial.
However, truncation above y has an effect on the HR at y, since the time interval remaining for
failing is shortened. It should be clear that as y → xu from below, hu (y | xu ) becomes indefinitely
large since the interval for failing approaches zero.
   
Hu (y | xu ) = − ln Su (y | xu ) = ln F (xu ) − ln S(y) − S(xu ) . (1.53e)
For the mean of Y | xu we find
Zxu
E(Y | xu ) = Su (v | xu ) dv
0
x 
1 Z u 
= S(v) dv − xu S(xu ) , (1.53f)
1 − S(xu )  
0

and for the MRL we have


Zxu
1
µu (y) = Su (v | xu ) dv
Su (y | xu )
0
x 
1 Z u  
= S(v) dv − S(xu ) xu − y . (1.53g)
S(y) − S(xu )  
0

We notice that µu (y) approaches zero with x → xu . The percentile function of Y | xu follows
from
F (yP )
Fu (yP | xu ) = = P, 0 ≤ P ≤ 1,
F (xu )
as
yP = F −1 P F (xu )
 
(1.53h)
i.e., yP is equal to the percentile of order P F (xu ) of the original distribution.
The results for the interim lifetime
   
(xu − xl ) − (xu − X) xl ≤ X ≤ xu = X − xl | xl ≤ X ≤ xu =: Y | xl ; xu
are the following
f (xl + y)
fl,u (y | xl ; xu ) =
F (xu ) − f (xl )
f (xl + y)
= , 0 ≤ xl < xu , 0 ≤ y ≤ xu − xl , (1.54a)
S(xl ) − S(xu )
44 1 The Hazard Rate and its Relatives

F (xl + y) − F (xl ) S(xl ) − S(xl + y)


Fl,u (y | xl ; xu ) = = , (1.54b)
F (xu ) − F (xl ) S(xl ) − S(xu )
F (xu ) − F (xl + y) S(xl + y) − S(xu )
Sl,u (y | xl ; xu ) = = , (1.54c)
F (xu ) − F (xl ) S(xl ) − S(xu )
fl,u (y | xl ; xu ) f (xl + y)
hl,u (y | xl ; xu ) = =
Sl,u (y | xl ; xu ) S(xl + y) − S(xu )
S(xl + y)
= h(xl + y) , (1.54d)
S(xl + y) − S(xu )
 
Hl,u (y | xl ; xu ) = − ln Sl,u (y | xl ; xu )
   
= ln F (xu ) − F (xl ) − ln S(xl + y) − S(xu ) , (1.54e)
Zxu
E(Y | xl ; xu ) = Sl,u (v | xl ; xu ) dv
xl
 x −x 
1  Zu l 
= S(xl + v) − [xu − xl ] S(xu ) , (1.54f)
S(xl ) − S(xu )  
0
u −xl
xZ
1
µl,u (y) = Sl,u (v | xl ; xu ) dv
Sl,u (y | xl ; xu )
y
 x −x 
Zu l
1  
= S(xl + v) dv−[xu −xl −y] S(xu ) . (1.54g)
S(xl + y)−S(xu )  
y

The percentile function of Y | xl , xu follows from

F (xl + yP ) − F (xl )
Fl,u (yP | xl ; xu ) = =P
F (xu ) − F (xl )
as
yP = F −1 P [F (xu ) − F (xl )] + F (xl ) − xl .

(1.54h)
We notice that

• by setting xu = ∞ the formulas (1.54a–h) turn into those for the case of the left–truncation,
i.e., into formulas (1.52a–i),

• by setting xl = 0 the formulas (1.54a–h) turn into those for the case of right–truncation i.e.,
into formulas (1.53a–h) and

• by setting xl = 0 and xu = ∞ (1.54a–h) give the results of Sections 1.1.1.3 to 1.1.1.6.

Example 1/8: Truncation of the reduced R AYLEIGH distribution

The reduced R AYLEIGH distribution, equivalent to the reduced W EIBULL distribution with shape parame-
ter equal to 2, has

f (x) = 2 x exp − x2 , x ≥ 0,


F (x) = 1 − exp − x2 ,


S(x) = exp − x2 ,

1.1 The Univariate Continuous Case 45

h(x) = 2 x,
H(x) = x2 ,
 
k
E Xk

= Γ 1+ ,
2
  √
3 π
E(X) = Γ = ≈ 0.88623,
2 2
1√
π exp x2 erfc(x).23

µ(x) =
2
Truncation from below at xl gives
 
fl (y | xl ) = 2 (xl + y) exp − y (2 xl + y) ; xl > 0, y 0;
 
Fl (y | xl ) = 1 − exp − y (2 xl + y) ,
 
Sl (y | xl ) = exp − y (2 xl + y) ,
hl (y | xl ) = 2 (xl + y),
Hl (y | xl ) = y (2 xl + y 2 ),
1√
π exp x2l erfc(xl ),

E(Y | xl ) =
2
1√
π exp (xl + y)2 erfc(xl + y),
 
µl (y) = µ(xl + y) =
q 2
yP = x2l − ln(1 − P ) − xl .

Truncation from above at xu gives



2 y exp − y 2
fu (y | xu ) =  ; xu > 0, 0 ≤ y ≤ xu ;
1 − exp − x2u

1 − exp − y 2
Fu (y | xu ) = ,
1 − exp − x2u
 
exp − y 2 − exp − x2u
Su (y | xu ) =  ,
1 − exp − x2u
2y
hu (y | xu ) = ,
1 − exp y 2 − x2u
"  #
exp − y 2 − exp − x2u
Hu (y | xu ) = − ln  ,
1 − exp − x2u
√
2 xu − exp − x2u π erf(xu )
E(Y | xu ) =   ,
2 1 − exp − x2u
 √  
exp y 2 2 (xu − y) + exp x2u π erf(xu ) + erf(y)
µu (y) =    ,
2 exp y 2 − exp x2u
q   
yP = − ln 1 − P 1 − exp − x2u .

Truncation from both sides gives


 
2 (xl + xu ) exp − (xl + y)2
fl,u (y | xl ; xu ) =   ; 0 < xl < xu , 0 ≤ y ≤ xu − xl ;
exp − x2l − exp − x2u
  
exp − x2l − exp − (xl + y)2
Fl,u (y | xl ; xu ) =   ,
exp − x2l − exp − x2u
  
exp − (xl + y)2 − exp − x2l
Sl,u (y | xl ; xu ) =   ,
exp − x2l − exp − x2u

23
Rx
erf(x) = 1 − √2π exp − u2 du is the error function and erfc= 1−erf the complementary error function.

0
46 1 The Hazard Rate and its Relatives

2 (xl + xu )
hl,u (y | xl ; xu ) =  ,
1 − exp (xl + y)2 − x2u
(   )
exp − (xl + y)2 − exp − x2u
Hl,u (y | xl ; xu ) = − ln   ,
exp − x2l − exp − x2u
 √  
exp x2l 2 (xu − xl ) + exp x2u π erf(xl ) − erf(xu )
E(Y | xl ; xu ) =    ,
2 exp x2l − exp x2u
 2   √  
exp xl + y 2 (xl − xu − y) + exp x2u π erf(xl + y) − erf(2 xl − xu )
µl,u (y) =   2   ,
2 exp xl + y − exp x2u
q   
yP = − ln P exp − x2u + (1 − P ) exp − x2l − xl .

Fig. 1/9 shows the hazard rates for the original distribution together with those of the three types of trun-
cation where the truncation points are xl = 1 and xu = 3, respectively.
Figure 1/9: Hazard rates of the reduced R AYLEIGH distribution, non–truncated and truncated at xl = 1
and/or xu = 3

1.1.2.6 Life Potential

The integral
Z∞
Π(x) = S(u) du (1.55a)
x
= µ(x) S(x) (1.55b)

is the life potential, i.e., the total number of expected time units to be spend by the fraction of
units in the population which survive the age x. Let N be the number of persons at age x = 0,
i.e., N is the size of the population, then in life tables and in the actuarial sciences the product
N × Π(x) is known as ’the expected total number of years lived beyond age x by persons alive
at age x’. There, this quantity is denoted Tx and is measured in ‘population units × time units’,
1.1 The Univariate Continuous Case 47

i.e., person–years. When the population is mechanical equipment the quantity N × Π(x) will be
called or machine–hours. In the engineering sciences the complement of Π(x) to µ = E(X) :
Zx
µ − Π(x) = S(u) du (1.55c)
0

is known as total–time–on–test up to age x.24 We see that


Z∞
Π(0) = S(u) du = µ = E(X). (1.55d)
0

The scaled life potential


Π(x)
S(x)
e = , 0 ≤ S(x)
e ≤ 1,
Π(0)
µ(x)
= S(x),
µ
Z∞
1
= S(u) du (1.56a)
µ
x

tells what fraction of the original life potential Π(0) = µ is still available at age x. Whereas
S(x) says what fraction of the initial size of population units has survived the age x, S(x)
e tells
what fraction of the initial number of lifetime units has not been ‘consumed’ up to x. S(x) e can
be regarded as a survival function, but not as one of population units, but as one of lifetime units.
Thus, we may call S(x)
e the potential survival function. Looking at (1.56a) we see that

S(x)
e ≥ S(x) for increasing MRL

and
S(x)
e ≤ S(x) for decreasing MRL.
S(x)
e is — like S(x) — a monotone and decreasing function with

S(0)
e = 1 and S(∞)
e = 0.

Besides S(x)
e we can define the following representatives of the life potential distribution:

• potential distribution function


Zx
1
Fe(x) := 1 − S(x)
e = S(u) du, (1.56b)
µ
0

• potential density function


dFe(x) S(x)
fe(x) := = , (1.56c)
dx µ
24
We mention that the total–time–on–test transform of a lifetime distribution is a function of P, 0 ≤ P ≤ 1,
the portion failing, and the upper limit of integration is expressed in terms of the percentile xP = F −1 (P ) :
F −1
Z (P )
HF−1 (P ) = S(u) du.
0
48 1 The Hazard Rate and its Relatives

• potential hazard rate


fe(x) S(x) 1
h(x) :=
e = ∞ = , (1.56d)
S(x)
e R µ(x)
S(u) du
x

• cumulative potential hazard rate


Zx Zx
1
H(x)
e := e h(u) du = du = ln µ − ln µ(x) − ln S(x), see (1.19e). (1.56e)
µ(u)
0 0

h(x) may be regarded as the natural rate of depreciation for a population


The potential hazard rate e
having lifetime distribution F (x), when this rate is applied to the stock of units having age x. The
only source of depreciation is ‘death’ or failure of population units.25 e h(x) is the velocity with
which the stock of lifetime units decreases at age x and e
h(x) ∆, ∆ small, is the amount of lifetime
units vanishing in [x, x + ∆].

Example 1/9: Life potential of the exponential and the reduced R AYLEIGH distributions

The exponential distribution with f (x) = 1b exp − xb is the only continuous distribution where S(x) =


S(x)
e ∀ x ≥ 0. This is a consequence of its constant MRL: µ(x) = b ∀ x ≥ 0.
For the reduced R AYLEIGH distribution of Example 1/8 we have

1√ Π(x) 2 exp − x2
Π(x) = π erfc(x), x ≥ 0, S(x)
e = h(x) = √
= erfc(x), e .
2 E(X) π erfc(x)
From Fig. 1/10 we see that S(x)
e < S(x), thus, the stock of lifetime units decreases faster than the stock of
h(x) > h(x), but both rates approach to one another with x → ∞.
population units. We further see that e

Figure 1/10: Survival functions S(x), S(x)


e and hazard rates h(x), e
h(x) of the reduced
R AYLEIGH distribution

25
In general, this natural rate of depreciation will be different from the rate of depreciation applied in accounting
because there we have to allow for economic and fiscal reasons.
1.2 The Univariate Discrete Case 49

1.2 The Univariate Discrete Case26


A large amount of research has been devoted to continuous lifetimes. Many of the concepts that
apply to continuous distributions also apply to discrete distributions, but discrete failure time dis-
tributions are applied less frequently than continuous distributions since there are fewer situations
for which failures can only occur at discrete points in time. However, discrete lifetimes have sev-
eral important applications. Actuaries and biostatisticians are interested in lifetimes of persons or
organisms, measured in years, months, weeks, or days, i.e., time is grouped and counted in some
units or time intervals. A reliability engineer will monitor a device only once per time period
(hour, day etc.) and will count the time periods successfully completed prior to failure of the
device.There are situations in reliability theory where clock time is not the best scale on which to
describe lifetime. ‘Time’ can also be the number of times or cycles that a piece of equipment is
operated successfully, or the number of miles that an airplane is flown or a car is run.

1.2.1 General Results


We assume that lifetime X can take a discrete set of values {xi }; i = 1, 2, . . . ; at which failure
may occur; generally, we take xi = i. When the xi ’s are equidistant, i.e., xi ∈ M = {a, a +
1, a + 2, . . .} for some a ∈ R, then the transformation xi − a leads to the set of non–negative
integers N0 . We will restrict lifetimes to the set N0 with probability mass function PMF {Pi },
where
Pi := Pr(X = i); i = 0, 1, 2, . . .

Often, we have P0 = 0, but, for generality, we assume that P0 is not necessarily equal to zero.
The case P0 6= 0 corresponds to a non–zero portion of dud units in a reliability context or to a
non–zero probability that a fetus dies at birth or to an egg failing to hatch in a biostatistics context.
Different representatives for such discrete lifetime distributions will be described next.
The most basic representative of a discrete lifetime distribution is its PMF

Pr(X = i) = Pi ; i = 0, 1, 2, . . . ; (1.57)

where X is the random time to failure or to death and i is the observed number of time units to
this event.
The survival or reliability function is

Si = 1 − Pr(X < i); i = 0, 1, 2, . . . ;


= Pr(X ≥ i),
X
= Pj . (1.58a)
j≥i

This is a non–increasing step function which is left–continuous27 since


 
lim Si− − Si = 0,  > 0, i ≥ 0.
→0

The reliability function is a unique function of the probabilities Pi , and similarly the Pi are deter-
mined uniquely by the Si :
Pi = Si − Si+1 ; i = 0, 1, 2, . . . (1.58b)
26
Suggested reading for the section: K EMP (2004), L AI (2013), S ALVIA /B OLLINGER (1982), S HAKED et al.
(1995).
27
This definition is different from that of continuous variate, see (1.4a).
50 1 The Hazard Rate and its Relatives

The distribution function or failure function is the complement to the survival function

Fi = Pr(X < i).


= 1 − Si ; i = 0, 1, 2, . . . (1.59)

The hazard rate in the discrete case is defined as28


Pr(X = i)
hi = ; i = 0, 1, 2, . . . (1.60a)
Pr(X ≥ i)
This is a conditional probability, i.e., the probability that a unit fails at time i, given that it has
survived to at least time i. Thus we have

0 ≤ hi ≤ 1,

whereas in the continuous case we have h(x) ≥ 0. We may write hi in terms of Pi and/or Si :
P
hi = Pi , (1.60b)
Pj
j≥i

Pi
= , (1.60c)
Si
Si+1
= 1− . (1.60d)
Si
Conversely, we may express Pi and Si in terms of hi . Because S0 = 1 we may write Si for
i = 1, 2, . . . as the telescope product
Si Si−1 S 2 S1
Si = ··· . (1.61a)
Si−1 Si−2 S 1 S0
From (1.60d) we have
Si+1
1 − hi = ; i = 0, 1, 2, . . . (1.61b)
Si
Combining (1.61a) and (1.61b) we find29
i−1
Y
Si = (1 − hj ); i = 0, 1, 2, . . . (1.61c)
j=0

From (1.60c) we have


Pi = h i S i , (1.61d)
and upon inserting (1.61c) into (1.61d) we find
i−1
Y
Pi = h i (1 − hj ); i = 0, 1, 2, . . . (1.61e)
j=0

In the continuous case we have with respect to the cumulative hazard function
Zx Zx
f (u)
H(x) = h(u) du = du
S(u)
0 0
= − ln S(x).
28

Some authors define the discrete hazard rate as Pr(X = i) Pr(X > i).
k
29 Q
Remember: ai = 1 for k < j.
i=j
1.2 The Univariate Discrete Case 51

In the discrete case we have a dilemma when defining the cumulative hazard function, because, in
general, summing the hazard rate values hi — as analogue to integrating h(x) in the continuous
case — is not equal to taking the negative of the logarithm of the survival function Si . Thus,
two possible, but different choices for the discrete case exist, which give rise to two estimators in
Sect. 6.2:
1 Hi := − ln Si , (1.62a)
called cumulative hazard function by K EMP (2004), and
i
X
2 Hi := hj , (1.62b)
j=0

called accumulated hazard function by K EMP.

Excursus: The pseudo–hazard rate


By defining an alternative hazard rate h∗i , called pseudo–hazard rate by ROY /G UPTA (1992), it is possible
to have  
Xi
Si = exp(−Hi∗ ) = exp− h∗j  .
j=0

This pseudo–hazard rate reads


 
Si−1
h∗i = ln ; i = 0, 1, 2, . . . ;
Si
where S−1 = 1. Then
i
X
Hi∗ = = ln S−1 − ln Si = − ln Si .
j=0

The pseudo–hazard rate h∗i and the hazard rate hi are linked as

h∗i = − ln(1 − hi−1 ).

C OX /OAKES (1984, p. 15) prefer to define the cumulative hazard rate for discrete lifetimes as
X
H(x) = ln[1 − h(xj )],
xj <x

because the relationship S(x) = exp[−H(x)] will be presumed for discrete lifetimes. If the h(xj ) are
small, we have for the C OX /OAKES definition
X
H(x) ≈ h(xj ).
xj <x

1 Hi and 2 Hi are linked as follows: From (1.62a) using (1.58a) and (1.61c) we find
   
X i−1
Y i−1 h
X i
1 Hi = − ln Pi  = − ln (1 − hj ) = − ln 1 − 2 Hj + 2 Hj−1 . (1.63a)
j≥i j=0 j=0

From (1.62b) using (1.60b) together with (1.58a,b) we have


 
i i   X i h
X  Pi  X Sj − Sj+1  i
2 Hi = P = = 1 − exp 1 Hj − 1 Hj+1 . (1.63b)
Pk Sj

j=0 j=0 j=0
k≥j
52 1 The Hazard Rate and its Relatives

Expressing Si Pi and hi by 1 Hi we find


Si = exp(− 1 Hi ), (1.64a)
Pi = exp(− 1 Hi ) − exp(− 1 Hi+1 ), (1.64b)
hi = 1 − exp(1 Hi − 1 Hi+1 ), (1.64c)
and expressing Si Pi and hi by 2 Hi we have
i−1
Y 
Si = 1 − 2 Hj + 2 Hj−1 , 2 H−1 = 0, (1.64d)
j=0
i−1
Y 
Pi = 2 Hi − 2 Hi−1 1 − 2 Hj + 2 Hj−1 , (1.64e)
j=0
hi = 2 Hi − 2 Hi−1 . (1.64f)
For hi small, we have
1 − hi ≈ exp(−hi ). (1.65a)
From this (1.61e) and (1.61c) become
Pi ≈ hi exp(− 2 Hi−1 ), (1.65b)
Si ≈ exp(− 2 Hi−1 ). (1.65c)
These approximations correspond to the well–known results for continuous variates:
 x 
Z
f (x) = h(x) exp− h(u) du ,
0
Zx
 

S(x) = exp− h(u) du .


0

For a discrete distribution with increasing hazard rate the condition h0 ≤ h1 ≤ . . . applied to
(1.61c) yields the inequality
Si ≤ (1 − h0 )i = (1 − P0 )i ≈ exp(−i h0 ) = exp(−i P0 ). (1.66)
This inequality is reversed for a decreasing hazard rate distribution.
The mean residual life function is defined by K ALBFLEISCH /P RENTICE (1980, p. 7), L AWLESS
(1982, p. 44) and L EEMIS (1995, p. 57) as
Li := E(X − i | X ≥ i), i ≥ 0. (1.67a)
Therefore, in the discrete case we have30
P
j Pj
j≥i
Li = P − i, (1.67b)
Pj
j≥i
P
Sj
j>i
= , (1.67c)
Si
XY j
= (1 − hk ) (1.67d)
j≥i k=i
.P
30 P
In (1.67b) the term j Pj Pj is the mean age at death of an i–survivor.
j≥i j≥i
1.2 The Univariate Discrete Case 53
X 
Li = exp 1 Hi − 1 Hj , (1.67e)
j>i
j
XY 
= 1 − 2 Hk + 2 Hk−1 . (1.67f)
j≥i k=i

From (1.67b) we see that, since the MRL function is defined for all i ≥ 0, and everything in this
expression except i is constant between mass function values, MRL decreases with a slope of −1
at all time values for which there is no mass.
Reverting to (1.66), which
P holds for an increasing hazard rate distribution, we find — see
(1.72a) — from E(X) = Sj the inequality
j>0

1 − h0 1 − P0
E(X) ≤ = . (1.68)
h0 P0
This inequality is reversed for a decreasing hazard rate distribution.
What can be said about L = limi→∞ Li ? — Let
   
Pi+1 Si
h = lim hi , P = lim , S = lim ,
i→∞ i→∞ Pi i→∞ Si+1

then S ALVIA /B OLLINGER (1982) stated and proved the following


Theorem 8: If 0 < h < 1, then

• L = (S − 1)−1 ,
• P = S −1 ,
• h = (L + 1)−1 . 

For h = 0 we find
S = P = 1 and L = ∞,
and for h = 1 we have
P = L = 0 and S = ∞.
These extremes do in fact occur. For example, the Y ULE distribution
ρ Γ(ρ + 1) Γ(i)
Pi = ρ B(i, ρ + 1) = ; i = 1, 2, . . . ; ρ ∈ R+ ;
Γ(i + ρ + 1)
has h = 0, and the P OISSON distribution
λi −λ
Pi = e ; i = 0, 1, . . .
i!
has h = 1, see Example 1/10.
The discrete lifetime distribution representatives Pi , Si , hi , 1 Hi , and 2 Hi may be expressed in
terms of Li . From (1.67c) we have

Li Si = Si+1 + Si+2 + . . .

and furthermore
Li−1 Si−1 = Si + Si+1 + . . .
with difference Li−1 Si−1 − Li Si = Si . Solving for Si gives
Li−1
Si = Si−1 , (1.69a)
1 + Li
54 1 The Hazard Rate and its Relatives

which upon substituting Si−1 by Si−2 Li−2 (1 + Li−1 ), and so forth until S0 L0 (1 − L1 ) =
L0 (1 − L1 ), results into
i−1
Y Lj
Si = . (1.69b)
1 + Lj+1
j=0

From (1.69b) we have

• together with (1.61b):


Li
hi = 1 − , (1.69c)
1 + Li+1

• together with (1.69c) and (1.61d):


 i−1
Y
Li Lj
Pi = 1− , (1.69d)
1 + Li+1 1 + Lj+1
j=0

• together with (1.62a):


i−1  
X Lj
1 Hi = − ln , (1.69e)
1 + Lj+1
j=0

• and together with (1.69c) and (1.62b):


i
X Lj
2 Hi = (i + 1) − . (1.69f)
1 + Lj+1
j=0

(1.69c) is the basis for a recursion formula


Li
Li+1 = − 1. (1.70)
1 − hi

In the continuous case it does not matter whether we define MRL by E(X − x | X > x) or by
E(X − x | X ≥ x), but in the discrete case there is a difference. MRL defined as

µ(i) = E(X − i | X > i), i > 0, (1.71a)

is different from(1.67a). We have

µ(i) = Li+1 + 1, i > 0. (1.71b)

With (1.67a) the MRL function at time i = 0 is the mean of the lifetime distribution:
X X
L0 = j Pj = Sj = E(X) (1.72a)
j≥0 j>0

whilst
E(X)
µ(0) = L1 + 1 = . (1.72b)
1 − P0
For this reason Li should be preferred over µ(i).
In Tab. 1/2 we have collected all the relations between the six representatives of a discrete lifetime
distribution.
Table 1/2: Relations among the six functions describing a discrete stochastic lifetime

l
ll
ll to
l
l Pi Si hi 1 Hi 2 Hi Li
l
froml
l l
ll
P
! i j Pj
P P P X P j≥i
Pi − Pj Pi − ln Pj Pj P −i
j≥i Pj j≥i j=0
Pk Pj
1.2 The Univariate Discrete Case

j≥i k≥j j≥i


P
i Sj
Si+1 X Sj − Sj+1 j>i
Si Si − Si+1 − 1− − ln Si
Si j=0
Sj Si

i−1
Q i−1
Q i−1
P i
P j
P Q
hi hi (1 − hj ) (1 − hj ) − − ln(1 − hj ) hj (1 − hk )
j=0 j=0 j=0 j=0 j≥i k=i


exp − 1 Hi −   i−1
P h i P 
1 Hi exp − 1 Hi 1 − exp 1 Hi − 1 Hi+1 − − 1 − exp 1 Hj − 1 Hj+1 exp 1 Hi − 1 Hj
 j=0 j>i
− exp − 1 Hi+1


2 Hi − 2 Hi−1 × i−1
Q  i−1
P  j
P Q 
2 Hi i−1 1 − 2 Hj + 2 Hj−1 2 Hi − 2 Hi−1 − ln 1 − 2 Hj + 2 Hj−1 − (1 − 2 Hk + 2 Hk−1
j=0 j=0 j≥i k=i
Q 
× 1 − 2 Hj + 2 Hj−1
j=0
 
Li
1− × i−1 i−1   i
1 + Li+1 Y Lj Li X Lj X Lj
Li i−1   1− − ln (i + 1) − −
Y Lj 1 + Lj+1 1 + Li+1 1 + Lj+1 1 + Lj+1
× j=0 j=0 j=0
j=0
1 + Lj+1
55
56 1 The Hazard Rate and its Relatives

Example 1/10: Hazard rate of the P OISSON distribution

The P OISSON distribution has PMF

λi −λ
Pi = e ; i = 0, 1, . . . ; λ > 0. (1.73a)
i!
Applying (1.60b) to (1.73a) we find
−1
λ2

λ
hi = 1+ + + ... ; i = 0, 1, . . . , (1.73b)
i+1 i+2
so

lim hi = 1 (1.73c)
i→∞

and
e−λ ≤ hi ≤ 1. (1.73d)
Fig. 1/11 shows the probabilities Pi , the hazard rate values hi and the mean residual life values Li for
λ = 5. We have an increasing HR and a decreasing MRL, a result which does not depend on the value of
λ. For computing MRL we have used the recursion (1.70) starting with L0 = E(X) = λ.

Figure 1/11: PMF, HR and MRL of the P OISSON distribution with λ = 5

Excursus: Results for a distribution having both discrete and continuous components
A lifetime distribution may have both discrete and continuous components, e.g., a device has a non–zero
probability of failing on switching on and/or off whereas otherwise the chance of failing has a continuous
density. In the mixed case the density is a sum of discrete and continuous parts. Specifically, if hc (x)
denotes the hazard rate for the continuous part and mass points occur at xi ; i = 0, 1, . . . ; then the overall
survivor function can be written by the so–called ‘pseudo–integral’ as
 x 
Z i−1
Y
S(x) = exp− hc (u) du (1 − hj ). (1.74a)
0 j=0
1.2 The Univariate Discrete Case 57

The corresponding HR is X
h(x) dx = hc (x) dx + hj δ(x − xj ) (1.74b)
j

where δ(x − xj ) is the D IRAC function


 
 1 for y = 0 
δ(y) dy = (1.74c)
 0 otherwise. 

The cumulative hazard function is now defined as


Zx Zx X
H(x) = h(u) du = hc (u) du + hi . (1.74d)
0 0 xi <x

1.2.2 Special Results31


As in the continuous case it is possible to specify a discrete distribution by defining a suitable
hazard rate. In the section we will present four such approaches.
The first approach with a constant discrete hazard rate takes
hi = c ∀ i, 0 < c < 1. (1.75a)
From (1.61e) we find
Pi = c (1 − c)i ; i = 0, 1, . . . (1.75b)
This is recognized as the geometric distribution, and it is, of course the discrete analogue of the
exponential distribution, both having a constant hazard rate and the memoryless property. For
the geometric distribution we note
Si = (1 − c)i , (1.75c)
1 Hi = c (i + 1), (1.75d)
2 Hi = −i ln(1 − c), (1.75e)
1
Li = . (1.75f)
c
S ALVIA /B OLLINGER (1982) proposed the following decreasing discrete hazard rate
c
hi = ; i = 0, 1, . . . ; 0 < c < 1.
i+1
PADGETT /S PURRIER(1985) generalized this model by introducing a second parameter α, α ≥
0:
c
hi = ; i = 0, 1, . . . ; 0 < c < 1; α ≥ 0, (1.76a)
αi + 1
resulting into
i−1
Q
(α j − c + 1)
j=0
Pi = c i
, (1.76b)
Q
(α j + 1)
j=0
i−1
Y αj − c + 1
Si = . (1.76c)
αj + 1
j=0
31
Suggested reading for this section: PADGETT /S PURRIER (1985), R INNE (2009, pp. 119–125), S ALVIA /
B OLLINGER (1982).
58 1 The Hazard Rate and its Relatives

As c → 1, the model approaches the degenerate distribution with P0 = 1. α is a shape parameter


and with respect to the parameter c we have: c = Pr(X = 0). For α = 1 we have the original
model of S ALVIA /B OLLINGER, and for α = 0 we are back to the geometric distribution with
constant hazard rate hi = c ∀ i.
The increasing discrete hazard rate distribution of S ALVIA /B OLLINGER has
c
hi = 1 − ; i = 0, 1, . . . ; 0 < c ≤ 1,
i+1
which has been generalized by PADGETT /S PURRIER to
c
hi = 1 − ; i = 0, 1, . . . ; 0 < c < 1; α ≥ 0. (1.77a)
αi + 1
Corresponding to (1.77a) we have
ci (α i − c + 1
Pi = i
, (1.77b)
Q
(α j + 1)
j=0
ci
Si = i
. (1.77c)
Q
(α j + 1)
j=0

As α → ∞ this model approaches the B ERNOULLI distribution with parameter P = 1 − c and


Q = 1 − P = c. As c → 0 the model approaches the degenerate distribution with P0 = 1. α is
a shape parameter and c is corresponds to Pr(X > 0). {Pi } is a non–decreasing series if

• c > 0.5 and


(c − 1)2
• α≤ .
2c − 1

Otherwise, the PMF first increases and then decreases. For α = 1 we have the original model of
S ALVIA /B OLLINGER, and for α = 0 we are back to the geometric distribution, but with hazard
rate hi = 1 − c ∀ i.
There are several discrete models which — depending on the value of a certain parameter — have
a constant, an increasing or a decreasing hazard rate, respectively, like the continuous W EIBULL
c  x c−1
distribution with hazard rate h(x) = ; x ≥ 0, b > 0, c > 0; see R INNE (2009). The
b b
type–I discrete W EIBULL model of NAKAGAWA /O SAKI (1975) has
β −iβ
hi = 1 − q (i+1) ; i = 0, 1, . . . ; 0 < q < 1; β > 0; (1.78a)
with corresponding
β β
Pi = q i − q (i+1) , (1.78b)

Si = q . (1.78c)

The parameter β plays the same role as the shape parameter c in the continuous W EIBLL distri-
bution.

• hi is constant with value 1 − q for β = 1,

• hi is decreasing for 0 < β < 1,

• hi is increasing for β > 1.


1.2 The Univariate Discrete Case 59

The type–II discrete W EIBULL distribution of S TEIN /DATTERO (1984) directly mimics the
c  x  c−1
continuous hazard rate by
b b
 
 α iβ−1 for i = 1, 2, . . . 
hi = , α > 0, β > 0. (1.79a)
 0 for i = 0 and i > m 

m is a truncation value given by


 
 intα−1/(β−1)  for β > 1 
m= , (1.79b)
 ∞ for β ≤ 1 

which is necessary to ensure hi ≤ 1. The corresponding PMF and CCDF are

i−1 
Y 
Pi = α iβ−1 1 − α j β−1 ; i = 1, 2, . . . (1.79c)
j=1
i−1 
Y 
Si = 1 − α j β−1 . (1.79d)
j=1

The hazard rate is

• constant with value α for β = 1,

• decreasing for 0 < β < 1,

• increasing for β > 1.

The type–III discrete W EIBULL distribution of PADGETT /S PURRIER (1985) has


h i
hi = 1 − exp −d (i + 1)β ; i = 0, 1, . . . ; d > 0, β ∈ R (1.80a)

with corresponding
 
i
X n h io
Pi = exp−d j β  1 − exp −d (i + 1)β , (1.80b)
j=1
 
i
X
Si = exp−d jβ  . (1.80c)
j=1

The hazard rate is

• constant with value 1 − exp(−d) for β = 0,

• decreasing for β < 0,

• increasing for β > 0.


60 1 The Hazard Rate and its Relatives

1.3 The Multivariate Cases32


We will only give short comments on the case of multivariate lifetime distributions for several
reasons:

1. Realistic and tractable multivariate lifetime models are scarce.

2. The hazard rate concept and the mean residual life concept are somewhat difficult to extend
to the multivariate situation.

3. A third difficulty is that often the sample is not big enough in relation to the dimension of
the model in order to find ‘good’ estimates of the model and its parameters. Furthermore,
many data are censored in such a way that one cannot determine whether or not the variates
are independent.

Sometimes two or more lifetime variables X1 , . . . , Xm are of interest simultaneously and a mul-
tivariate model is required. For example, a device may have two or more integral parts and it may
be desired to model the joint lifetime distribution of these parts. Let

X = (X1 , . . . , Xm )0 ; m = 2, 3, . . .

be the (column) vector of variates and

x = (x1 , . . . , xm )0

a vector of its realizations. Then, a multivariate distribution can be specified either in terms of the
joint survival function

S(x) := Pr(X > x) = Pr(X1 > x1 , . . . , Xm > xm ) (1.81a)

or in terms of joint failure (distribution) function

F (x) := Pr(X ≤ x) = Pr(X1 ≤ x1 , . . . , Xm ≤ xm ) (1.81b)

or — in the continuous case — in terms of the joint failure density

∂ m F (x) ∂ m S(x)
f (x) := =− (1.81c)
∂x1 . . . ∂xm ∂x1 . . . ∂xm

or — in the discrete case — in terms of the joint probability mass function

Pr(X = x) = Pr(X1 = x1 , . . . Xm = xm ); (x1 , . . . xm ) ∈ Nm


0 . (1.81d)

In fortunate circumstances X1 , . . . , Xm can be assumed to be independent and the joint


functions in (1.81a–d) can be written as products of the one–dimensional marginal functions
Si (xi ), Fi (xi ), fi (xi ) or Pr(Xi = xi ), respectively. In this case one is effectively back in the
univariate framework.
32
Suggested reading for the section: A SADIA (1999), C OX (1972); J OHNSON /KOTZ (1975), M C G ILL (1992),
S HAKED et al. (1995).
1.3 The Multivariate Cases 61

1.3.1 Continuous Distributions


In the context of defining the concepts IHR and DHR (increasing and decreasing hazard rate), see
Sect. 2.1, for multivariate distributions we find several attempts in the pertaining literature. Some
authors like H ARRIS (1970) or B RINDLEY /T HOMPSON (1972) give no explicit definition of a
multivariate hazard rate. So B RINDLEY /T HOMPSON state that a multivariate distribution with
S(x) defined on the positive orthant is IHR (DHR) if
S(x + ∆)
Pr(x1 > x + ∆, . . . , Xm > xm + ∆ | X1 > x1 , . . . , Xm > xm ) =
S(x)
is decreasing (increasing) in x for each ∆ > 0 and all x ≥ 0 such that S(x) > 0. Other authors
like BASU (1971) or P URI /RUBIN (1974) define the multivariate hazard rate as a scalar quantity.
In the bivariate case BASU gives the hazard rate as
f (x1 , x2 )
h(x1 , x2 ) =
Pr(X1 > x1 , X2 > x2 )
f (x1 , x2 )
= . (1.82a)
1 − F (x1 , ∞) − F (∞, x2 ) + F (x1 , x2 )
For independent variates X1 , X2 (1.82a) turns into
f (x1 , x2 )
h(x1 , x2 ) =
Pr(X1 > x1 ) Pr(X2 > x2 )
f (x1 ) f (x2 )
=
S(x1 ) S(x2 )
= h(x1 ) h(x2 ) (1.82b)

where f (·), S(·), h(·) are the marginal functions, respectively. (1.82a) may easily be extended
to the case of more than two variates.
Some authors like J OHNSON /KOTZ (1975) take the point of view that, for a concept such as
‘multivariate hazard rate’, it is unreasonable to expect a single value to represent this aspect of
a multivariate distribution. The basic idea underlying the univariate definition is that of rate of
decrease of ‘survivors’ with increase in value x of X as, e.g., in a life table where the hazard rate
is in fact the force of mortality. When there are two or more variates this rate depends on which
variate is changed and we need a different rate for each variate. So, J OHNSON /KOTZ defined the
joint multivariate hazard rate of m absolutely continuous variables X1 , . . . , Xm as the vector
 
hX (x) := − (∂/∂x1 ), . . . , −(∂/∂xm ) ln S(x)
= −grad ln S(x). (1.83a)

Sometimes hX (x) is called the hazard gradient of X. For convenience we will write a compo-
nent of the vector hX (x) as
 

hi (x) := − ln S(x); i = 1, . . . , m. (1.83b)
∂xi
(1.83a,b) are motivated by the fact that in the univariate case we have
dH(x) d ln S(x)
h(x) = =− ,
dx dx
see (1.11a,f).
If the multivariate hazard rate (1.83a) is constant, i.e., does not vary with any of x1 , . . . ,xm , so
that hX (x) = c, this means that, whenever the hazard rate exists, we have ∂ ln S(x) ∂xi =
62 1 The Hazard Rate and its Relatives

−ci (i = 1, . . . , m). Hence, S(x) = exp(−ci xi ) si (x1 , . . . , xi−1 , xi+1 , . . . , xm ); i =


1, . . . , m; whence
m
!
X
S(x) ∝ exp − ci xi .
i=1

Thus, the Xi are mutually independent exponential variables if and only if the multivariate
 hazard
rate
 is constant. We may distinguish between strictly constant vector hazard rates hX (x) =
c as defined above, and locally constant vector hazard rates for which hi (x) does not depend
on xi , though it may depend on other x’s.

Example 1/11: Bivariate exponential distributions33

A bivariate exponential distribution, BED for short, has both marginal distributions as exponential. As
KOTZ /BALAKRISHNAN /J OHNSON (2000) show, many of such BEDs exist. We will present these BEDs
in their standard form, but location and scale parameters can easily be introduced, if needed, through
appropriate linear transformations.
G UMBEL’s BED has the joint survival function

S(x) = exp(−x1 − x2 − θ x1 x2 ); x1 , x2 ≥ 0, θ > 0; (1.84a)

the joint density function


 
f (x) = exp(−x1 − x2 − θ x1 x2 ) (1 + θ x1 ) (1 + θ x2 ) − θ ; (1.84b)

and the conditional PDF of X2 , given X1 = x1 ,


 
f (x2 | x1 ) = exp − (1 + θ x1 ) x2 (1 + θ x1 ) (1 + θ x2 ) − θ . (1.84c)

The latter is not exponential whereas the marginal distributions of each X1 and X2 are standard exponen-
tial. If θ = 0, then X1 and X2 are mutually independent.
The joint multivariate hazard rate of G UMBEL’s BED is
   
h1 (x) 1 + θ x2
hX (x) =  = . (1.84d)
h2 (x) 1 + θ x1

The components in (1.84d) are constant with respect to variation in the corresponding variable, i.e., h1 (x)
does not depend on x1 nor does h2 (x) on x2 , but not with respect to variation in the other variable. So, the
distribution of X = (X1 , X2 ) has a locally, but not strictly constant bivariate hazard rate, see the graphs
on the right–hand side of Fig. 1/12.
The scalar multivariate hazard rate (1.82a) for G UMBEL’s BED using

F (x1 , x2 ) = 1 − exp(−x1 ) − exp(−x2 ) + exp[−x2 − x1 (1 + θ x2 )],


F (x1 , ∞) = 1 − exp(−x1 ),
F (∞, x2 ) = 1 − exp(−x2 ),

results in
h(x1 , x2 ) = (1 + θ x1 ) (1 + θ x2 ) − θ. (1.84e)

Fig. 1/12 displays four functions of G UMBEL’s BED with parameter θ = 1. In the upper left corner we
have the PDF of (1.84b) and in the lower left corner the scalar multivariate HR of (1.84e). On the right–
hand side we see the first (upper graph) and the second (lower graph) component of the joint multivariate
HR of (1.84d).
33
Results for the multivariate normal distribution can be found in G UPTA /G UPTA (1997), M A (2000), M C G ILL
(1992), NAVARRO /RUIZ (2004).
1.3 The Multivariate Cases 63

Figure 1/12: G UMBEL’s bivariate exponential distribution with θ = 1

Another BED is that of M ARSHALL /O LKIN (2007). The physical model consists of two components, sub-
jected to shocks that are always fatal. These shocks are assumed to be governed by independent P OISSON
processes with parameters λ1 , λ2 and λ12 , according as the shock applies to component 1 only, compo-
nent 2 only, or both components, respectively. The joint survival function of the lifetimes X1 and X2 of
the two components is
h i
S(x) = exp − λ1 x1 − λ2 x2 − λ12 max(x1 , x2 ) ; λ1 , λ2 , λ12 > 0; x1 , x2 ≥ 0 (1.85a)
 
 exp − λ x − (λ + λ ) x  for 0 ≤ x ≤ x 
1 1 2 12 2 1 2
= (1.85b)
 exp − (λ + λ ) x − λ x  for 0 ≤ x ≤ x . 
1 12 1 2 2 2 1

The marginal distributions are genuine one–dimensional exponential distributions:


  
S(x1 , ∞) = exp − (λ1 + λ12 ) x1 , 
  (1.85c)
S(∞, x ) = exp − (λ + λ ) x . 
2 2 12 2

The probability that a failure on component i occurs first is

λi
Pr(Xi < Xj ) = ; i, j = 1, 2; i 6= j; (1.85d)
λ1 + λ2 + λ12

and we have a positive probability that both components fail simultaneously:

λ12
Pr(X1 = X2 ) = . (1.85e)
λ1 + λ2 + λ12

(1.85e also is the correlation coefficient between X1 and X2 . The joint PDF of this BED — using S(x) of
(1.85a) — is  
 λ (λ + λ ) S(x) for 0 < x < x 
 2 1 12 2 1 

 

f (x) = λ1 (λ2 + λ12 ) S(x) for 0 < x1 < x2 (1.85f)

 

 
 λ12 S(x) for x1 = x2 > 0. 
64 1 The Hazard Rate and its Relatives

The joint multivariate hazard rate results as


   

 λ1 



   for x1 < x2 


λ2 + λ12

 

hX (x) =   (1.85g)
 λ + λ12 
 1
 



  for x1 > x2 . 



 λ2 

These hazard rates are strictly increasing, but for x2 (x1 ) fixed, the first (second) component is a non–
decreasing function of x1 (x2 ).
R INNE (2009, p. 173 ff.) has extended the two models above and further BEDs by power transformation
to bivariate W EIBULL distributions.

The joint multivariate mean residual life, defined by A RNOLD /Z AHEDI (1988), is the vector
 
µ1 (x)
..
 
µX (x) =   = E(X − x | X ≥ x) (1.86a)
 
.
 
µm (x)

where

µi (x) = E(Xi − xi | X ≥ x)
R∞
S(x1 , . . . , xi−1 , xi + u, xi+1 , . . . , xm ) du
0
= ; i = 1, 2, . . . , m (1.86b)
S(x)
whenever S(x) > 0, see also (1.15b). It can be shown easily that the following relationship holds
between the hi (x)’s of hX (x), see (1.83b), and the µi (x)’s of µX (x) :

µi (x) = µi (x) hi (x) − 1; i, 2, . . . , m. (1.86c)
xi
Looking at G UMBEL’s BED of Example(1/11) we find
 1 
 1 + θ x2 
µX (x) =  ,
1
1 + θ x1
showing that µi (x) is constant with respect to xi , but decreases with respect to the other variable.
For G UMBEL’s BED we have

h1 (x) µ1 (x) = h2 (x) µ2 (x) = 1.

Another multivariate hazard rate concept has been proposed by C OX (1972) viewing the multi-
variate lifetime as a point process. In the bivariate case we have the following four components
of the hazard rate vector:
Pr(x ≤ Xi < x + ∆x | X1 ≥ x X2 ≥ x)
λi = lim ; i = 1, 2; (1.87a)
∆x→0 ∆x
Pr(x1 ≤ X1 < x1 + ∆x | X1 ≥ x1 X2 = x2 )
λ12 (x1 | x2 ) = lim ; x1 > x2 ; (1.87b)
∆x→0 ∆x
Pr(x2 ≤ X2 < x2 + ∆x | X1 = x1 X2 ≥ x2 )
λ21 (x2 | x1 ) = lim ; x1 < x2 . (1.87c)
∆x→0 ∆x
1.3 The Multivariate Cases 65

In terms of the joint survivor function S(x1 , x2 ) for X1 and X2 it is readily seen that

∂S(x1 , x2 ) ∂x1
λ1 (x) = − (1.87d)
S(x1 , x2 )

x1 =x2 =x,
2

∂ S(x1 , x2 ) ∂x1 ∂x2
λ12 (x1 ) | x2 ) = −  , x1 > x2 , (1.87e)
∂S(x1 , x2 ) ∂x2

with similar expressions for λ2 (x) and λ21 (x2 | x1 ). The functions (1.87a–c) completely specify
the joint distribution of X1 and X2 . The joint PDF of X1 and X2 can be shown to be
( ) 
Rx2   Rx1
λ2 (x2 ) λ12 (x1 | x2 ) exp − λ1 (u) + λ2 (u) du − λ12 (u | x2 ) du , x1 ≥ x2 



0 x2

( ) (1.87f)
Rx1   Rx1 

λ1 (x1 ) λ21 (x2 | x2 ) exp − λ1 (u) + λ2 (u) du − λ12 (u | x1 ) du , x1 ≤ x2 .

0 x1

This can be verified by viewing the process as a point process. For example, with x1 ≥ x2 , the
probability of having no failures in [0, x2 ) and then the event X2 ∈ [x2 , x2 + ∆x2 ) is
 x 
 Z2 
λ2 (x2 ) ∆x2 exp − [λ1 (u) + λ2 (u)] du .
 
0

Conditional on this, the probability of no further failures in [x2 , x1 ) and the event x1 ∈ [x1 , x1 +
∆x1 ) is  x 
 Z1 
λ12 (x1 | x2 ) ∆x1 exp − λ12 (u | x2 ) du .
 
x2

Multiplying these probabilities we get the first line of (1.87f).

1.3.2 Discrete Distributions


We will briefly comment on bivariate distributions. Results for the case of more than two discrete
variates can be found in S HAKED et al. (1995). Let X = (X1 , X2 ) be a random vector with
support in N20 and denote its joint PMF by

Pr(x1 , x2 ) = Pr(X1 = x1 , X2 = x2 ), (x1 , x2 ) ∈ N20 . (1.88a)

Remember, that in the discrete case the hazard rat is a conditional probability. The discrete
multivariate conditional hazard rate functions of (X1 , X2 ) are defined as

λ1 (x) = Pr(X1 = x, X2 > x | X1 ≥ x1 , X2 ≥ x), x ∈ N0 , (1.88b)


λ2 (x) = Pr(X2 = x, X1 > x | X1 ≥ x1 , X2 ≥ x), x ∈ N0 , (1.88c)
λ12 (x) = Pr(X1 = x, X2 = x | X1 ≥ x1 , X2 ≥ x), x ∈ N0 , (1.88d)
λ1 (x | x2 ) = Pr(X1 ≥ x | X2 ≥ x, X2 = x2 ), x > x2 , (x1 , x2 ) ∈ N20 , (1.88e)
λ2 (x | x1 ) = Pr(X2 ≥ x | X1 = x1 , X2 ≥ x), x > x1 , (x1 , x2 ) ∈ N20 , (1.88f)

provided the conditions in the above conditional probabilities have positive probabilities. Other-
wise, these functions are set to 1.
The meaning of these functions is as follows. The functions λ1 (x), λ2 (x) and λ12 (x) describe
the initial hazard rates, i.e., the hazard rates before a failure of any component. Given that no
failure has occurred before time x, then, at time x one of the following four events must occur:
66 1 The Hazard Rate and its Relatives

1. only component 1 fails, the probability being λ1 (x),

2. only component 1 fails, the probability being λ2 (x),

3. both components fail, the probability being λ12 (x),

4. no component fails, the probability being 1 − λ1 (x) − λ2 (x) − λ12 (x).

Now suppose that one component failed at x1 (or x2 ) and that the other component stayed alive
at that time. Then, conditional on X1 = x1 (or X2 = x2 ), the hazard rate of the live component
at time x > x1 (or x > x2 ) is given by λ2 (x | x1 ) [or λ1 (x | x2 )].
The hazard rates given in (1.88b–f) are the discrete analogs of the bivariate hazard rate functions
described in C OX (1972), see (1.87a–c) for the absolute continuous case. But in C OX there is
no analog of (1.88d) since absolute continuity of the distribution of (X1 , X2 ) is assumed there.
Failure of both components at the same time has zero probability in the absolute continuous case,
but in the discrete case it may be positive and is given by λ12 (x).
From (1.88b–f) we see that the joint distribution of X1 and X2 determines the conditional hazard
rate functions. But also the converse is true, i.e., (1.88b–f) determine Pr(x1 , x2 ) of (1.88a). For
more details see S HAKED et al. (1995) who also give necessary and sufficient conditions on
the functions (1.88b–f) which ensure that they are hazard rate functions of some random vector
(X1 , X2 ).
2 Aging Criteria and Classes of
Univariate Lifetime Distributions1
It is quite natural and obvious to classify lifetime distributions by using so–called aging criteria.
In the context of lifetime analysis aging does not mean that a statistical unit becomes older in
the sense of chronological calendarian time, rather aging is a notion pertaining to the behavior of
residual life. Aging is thus the phenomenon that a chronological older unit has a shorter residual
life in some statistical sense than a newer or chronological younger unit. We may distinguish
between

• positive (true or adverse) aging indicating a decline — in some way or the other — of
residual life with growing age x.

• negative (inverse or beneficial) aging when residual life is increasing with x in some way
or the other.

Lifetime distributions are mostly characterized with respect to aging by the behavior of

• their hazard rate h(x) or

• their mean residual life µ(x).

Hazard rate classes will be discussed in Sections 2.1 and 2.2. Mean residual life classes are the
topic of Section 2.3. But there are more statistical concepts used in classifying lifetime distribu-
tions. These will be presented in Section 2.4 where we will also show how all the aging criteria
are linked.
Classes of lifetime distributions based on notions of aging afford statisticians an opportunity to
consider problems of a character somewhat different from the usual. Instead of assuming that
he knows nothing about the underlying lifetime distribution, the statistician assumes that he does
not know the parametric form of the distribution, but that he does know, for example, that the
hazard rate is increasing. More generally, he knows that some type of aging property holds for
the lifetime distribution; this aging property give rise to a corresponding geometric property of the
distribution. Knowing that a lifetime distribution belongs to a certain class, it is possible by using
certain additional information to give approximations and bounds of the percentiles, moments
and survival probabilities of this distribution. Of course, it is possible to test whether certain
hypotheses on aging hold or not, see Sect. 10.3.
This chapter only present results for univariate distributions. Readers interested in aging criteria
for multivariate distributions are referred to B LOCK / S AVITS (1982, 1988), H ARRIS (1970) or
S HAKED /S HANTHIKUMAR (1987, 1988).

2.1 Monotone Hazard Rate Distributions


Since most materials, structures and devices wear out with time, the class of failure distribu-
tions for which the hazard rate is increasing is one of special interest. The phenomenon of work
hardening of certain materials and the debugging of complex systems make the class of failure
1
Suggested reading for this chapter: H OLLANDER /P ROSCHAN (1984), M ARSHALL /O LKIN (2007).
68 2 Aging Criteria and Classes of Univariate Lifetime Distributions

distributions with decreasing hazard rate also of some interest. Here, the terms ‘increasing’ and
‘decreasing’ are not used in the strict sense, but increasing (decreasing) stands for non–decreasing
(non–increasing). Note that with this convention the continuous exponential distribution and the
discrete geometric distribution with constant hazard rates belong to both classes. There are, of
course, examples such as dynamic loading of structures, where a non–monotonic hazard rate
function would be appropriate. Structures undergoing adjustment and modification also tend to
have a non–monotonic hazard rate.
The assumption that a lifetime distribution has a monotone hazard rate is quite strong as we
shall show, but such distributions possess many useful and interesting properties. Most results
on monotone hazard rates hold for the continuous as well as for the discrete case, but there are
some differences, especially in the way how to detect whether the distribution’s hazard rate is
increasing or decreasing. So we have decided to present the continuous and the discrete cases in
two separate Sections 2.1.1 and 2.1.2. In Section 2.1.3 we will introduce the related concept of
the hazard rate average and see when this is increasing or decreasing.

2.1.1 Continuous IHR and DHR Distributions2


We start by defining the properties IHR (increasing hazard rate) and DHR (decreasing hazard
rate)without assuming that the distribution F (·) has a density and thus has a hazard rate h(x) =
f (x) S(x). This definition is quite general and has its origin in the conditional failure function
(1.6d), defined as the probability of failure in a finite time interval of length y, given the age x. If
F (·) denotes the failure function, then the failure rate by this definition would be
F (x + y) − F (x)
. (2.1)
1 − F (x)
We mention that when we divide this quantity by y and let y → 0 we will obtain the familiar
hazard rate h(x).
Definition 1: A continuous distribution F (·) is IHR (DHR) if and only if
F (x + y) − F (x)
1 − F (x)
is increasing (decreasing) in x for y > 0, where x ≥ 0 such that F (x) < 1. 
We could have defined IHR without restricting x to non–negative values; however, for DHR, we
cannot extend x towards −∞. Note that if F (·) is DHR, then F (x) > 0 for x > 0, The following
theorem shows for distributions with support [0, ∞) the equivalence of IHR (DHR) distribution
with a density and distributions for which h(x) is increasing (decreasing).
Theorem 9: Assume F (·) has a density f (·) with F (0− ) = 0. Then F (·) is IHR (DHR) if and
only if h(x) is increasing (decreasing). 

Proof: Note that if in (2.11) we divide by y and let y approach to zero, we obtain h(x) = f (x)
[1 − F (x)]. Hence, we need only show that h(x) increasing (decreasing) in x implies (2.1) in-
creasing (decreasing) in x. For x1 ≤ x2

h(x1 ) ≤ h(x2 )
(≥)

implies
Zx Zx
h(x1 + u) du ≤ h(x2 + u) du.
(≥)
0 0
2
Suggested reading for this section: BARLOW /M ARSCHALL (1964), BARLOW /M ARSHALL /P ROSCHAN (1963),
BARLOW /P ROSCHAN (1965, 1975).
2.1 Monotone Hazard Rate Distributions 69

That is,  x +x   x +x 
 Z2   Z1 
exp − h(u) du ≤ exp − h(u) du
  (≥)  
x2 x1

implying

F (x2 + x) − F (x2 ) F (x1 + x) − F (x1 )


≥ . 
1 − F (x2 ) (≤) 1 − F (x1 )

We will now show how the IHR (DHR) property is related to the future lifetime of x–survivors.
Let x be the age of a statistical unit, then its survival function of future life Y | x is given, see
(1.52c), as
S(x + y)
S(y | x) = (2.2a)
S(x)
which can be written in terms of the hazard rate h(x) :
( )
x+y
R
exp − h(u) du
0
S(y | x) =  x 
R
exp − h(u) du
0
 x+y 
 Z 
= exp − h(u) du . (2.2b)
 
x

From (2.2b) we see that the conditional survival probability is an increasing (decreasing) function
of x, the age reached, if and only if the hazard rate is decreasing (increasing). Thus, complemen-
tary to Definition 1 of IHR and DHR given above, we can state the following
Definition 2: F (·) is IHR (DHR) if and only if S(y | x) is decreasing (increasing) in x for any
y > 0, x ≥ 0 such that S(x) > 0. 
Introducing the notions stochastically larger (smaller), we may characterize IHR (DHR) in still
another way. A variate X1 with distribution F1 (x) is called stochastically smaller (larger) than a
variate X2 with F2 (x), abbreviated
st
X1 ≤ X2 (2.3a)
(≥)

if
F1 (x) ≥ F2 (x) ∀ x. (2.3b)
(≤)

Evidently we have
st
Y | x1 ≥ Y | x2 for x1 ≤ x2 (2.3c)
(≤)

if the underlying lifetime distribution F (·) is IHR (DHR), i.e., the future lifetimes Y | x become
stochastically smaller (larger) with growing x.
The most important geometric properties of the IHR (DHR) lifetime distributions are stated in the
following two Theorems 10 and 11.
Theorem10: F (·) is IHR (DHR) if and only if its logarithmic survival function ln S(x) is concave
(convex). 
Because H(x) = − ln S(x), see (1.11f), we may express Theorem 10 in terms of the cumulative
hazard rate as well:
70 2 Aging Criteria and Classes of Univariate Lifetime Distributions

Rx
F (·) is IHR (DHR) if and only if its CHR H(x) = h(u) du is convex (concave).
0
We will first give a proof of Theorem 10 based on Definition 1 not assuming the existence of a
density function.

Proof 1: Let S(x) = 1 − F (x) = exp − H(x)]. Then

F (x + y) − F (x) n  o
= 1 − exp − H(x + y) − H(x) ,
1 − F (x)

and F (·) is IHR (DHR) if and only if H(x + y) − H(x) is increasing (decreasing) in x for all
y > 0. Thus F (·) is IHR (DHR) if and only if H(x) is convex (concave). 
Assuming a continuous distribution with existing density we have
Proof 2: F (·) is IHR (DHR) if its hazard rate has a non–negative (non–positive) first derivative,
i.e.,
2
S(x) f 0 (x) − f (x)
  
dh(x) d f (x)
= = 2 ≥ 0. (2.4a)
dx dx S(x)

S(x) (≤)

For ln S(x) to be concave (convex) its second derivative has to be non–positive (non–negative),
i.e.,
2
d2 ln S(x) S(x) f 0 (x) − f (x)
    
d f (x)
= − =− 2 ≤ 0. (2.4b)
dx2 dx S(x)

S(x) (≥)

Multiplying (2.4b) by −1 gives (2.4a). 


The second theorem on geometrical properties of IHR (DHR) distributions refers to their density
functions.3
Theorem 11: Let f (x) be a PDF defined on R+ . The accompanying distribution is IHR (DHR) if
and only if ln f (x) is concave (convex). 
BARLOW /P ROSCHAN (1965) give another statement on IHR and DHR distributions:

1. F (·) is IHR if and only if S(x) = 1 − F (x) is a P ÓLYA frequency function of order 2.

2. F (·) is DHR if and only if S(x + y) is totally positive of order 2 in x and y for x + y ≥ 0.

Example 2/1: IHR and DHR property of the W EIBULL distribution

The reduced4 W EIBULL distribution has

= c xc−1 exp − xc , x ≥ 0, c > 0,



f (x)
F (x) = 1 − exp − xc ,


S(x) = exp − xc ,


h(x) = c xc−1 ,
H(x) = xc .
3
Two more geometrical properties of these classes are:
1. The density function of a DHR distribution is a decreasing function.
2. The density function of an IHR distribution need not be unimodal.

4
The reduced W EIBULL distribution has a location parameter set to 0 and a scale parameter set to 1.
2.1 Monotone Hazard Rate Distributions 71

1. With respect to the hazard rate h(x) we have:



dh(x)  ≤ 0 for 0 < c ≤ 1 ⇒ DHR,
= c (c − 1) xc−2
dx  ≥ 0 for c ≥ 1 ⇒ IHR.

2. With respect to ln S(x) = −xc we have:



d2 ln S(x)  ≥ 0 (convex) for 0 < c ≤ 1 ⇒ DHR
c−2
= −c (c − 1) x
dx2  ≤ 0 (concave) for c ≥ 1 ⇒ IHR.

3. With respect to ln f (x) = ln c + (c − 1) ln x − xc we have:



2
d ln f (x) c−1  ≥ 0 (convex) for 0 < c ≤ 1 ⇒ DHR
c−2
= − − c (c − 1) x
dx2 x2  ≤ 0 (concave) for c ≥ 1 ⇒ IHR.

An immediate consequence of Theorem 10 is the following


 1/x
Lemma: If F (·) is IHR (DHR) then S(x) is increasing (decreasing). 
 t/x
By this lemma, S(x) ≤ S(x) for t > x and
Z∞ Z∞
u/x
ur S(u) du ≤ ur S(u)

du < ∞
x x

when S(x) < 1 and r ≥ 0. Hence, IHR distributions have finite moments of all orders. DHR
1
distributions necessarily must not have finite moments. For example, F (x) = 1 − , x ≥ 0,
1+x
is DHR, but the mean does not exist.
BARLOW /P ROSCHAN (1965) have given and proved a lot of theorems on IHR (DHR) distribu-
tions which rest on the fact that the exponential distribution is the boundary distribution between
these two classes. We only cite four of their results giving bounds for the survival probability and
for the moments.

1. If F (·) is IHR (DHR) with known percentile xP of order P, 0 < P < 1, i.e., F (xP ) = P,
then  
 ≥ exp(−α x) for x ≤ xP , 

 

(≤)
S(x) (2.5a)


 ≤ exp(−α x) for x ≥ x ,
P 


(≥)
where
ln(1 − P )
α=− . (2.5b)
xP
2. If F (·) is IHR with known mean µ, a sharp lower bound for S(·) is
 
 exp(−xµ) for x < µ 
S(x) ≥ (2.6a)
 0 for x ≥ µ, 
and a sharp upper bound is
 
 1 for x ≤ µ 
S(x) ≤ (2.6b)
 exp(−ω x) for x > µ, 

ω being the solution of 1 − ω µ = exp(−ω x).


72 2 Aging Criteria and Classes of Univariate Lifetime Distributions

3. If F (·) is DHR with known mean µ a sharp upper bound for S(·) is
 
 exp(−x/µ) for x ≤ µ, 
S(x) ≤ µ (2.7)
 for x ≥ µ. 
xe
The sharp lower bound for DHR distributions is zero.

4. For an IHR distribution with µ = E(X) we have the following inequality on moments:

µr = E X r ≤ r! µr ; r = 1, 2, . . .

(2.8a)

whereas for DHR distributions with existing moments we have

µr = E X r ≥ r! µr ; r = 1, 2, . . .

(2.8b)

A consequence of (2.8a,b) is the following inequality on the coefficient of variation:


p
Var(X)
≤ 1 (2.8c)
µ (≥)

for IHR (DHR) distributions.

An interesting question is whether or not the monotone hazard rate is preserved under certain op-
eration with variates. We assume that the involved random variables are statistically independent.

1. A mixture
m
X m
X
F (x) = αi Fi (x); αi ≥ 0, αi = 1; (2.9)
i=1 i=1
of m DHR distributions is a DHR distribution. Mixing of IHR distributions does not nec-
essarily result in an IHR distribution.

2. A convolution of IHR distributions also is IHR, especially if X1 and X2 are IHR with
 rates h1 (x) and h2 (x), respectively, then Y = X1 + X2 has hazard rate hY (x) ≤
hazard
min h1 (x), h2 (x) . The sum of DHR variates is not DHR.

3. A coherent structure of IHR (DHR) components does not necessarily have an IHR (DHR)
lifetime distribution, i.e., the IHR (DHR) class is not closed under formation of coherent
systems. But, parallel and series systems of identical IHR components are IHR. For series
systems the components do not have to be identical.

4. Order statistics from IHR distributions also have IHR distributions. However, this is not
true for spacings from an IHR distribution. Order statistics from a DHR distribution are
not necessarily DHR. However, spacings from a DHR distribution are DHR.

Tab. 2/1 in Sect. 2.4 summarizes preservation results for other classes of lifetime distributions.

2.1.2 Discrete IHR and DHR Distributions5


The results of the preceding section on continuous IHR and DHR distributions also hold for the
discrete case unless given in terms of a PDF. A discrete distribution with PMF

Pi = Pr(X = i); i = 0, 1, . . .
5
Suggested reading for this section: G UPTA et al. (1997), K EMP (2004), L ANGBERG et al. (1980), S HAKED et
al. (1995).
2.1 Monotone Hazard Rate Distributions 73

is said to be IHR (DHR) if its hazard rate

Pi
hi = ∞ ; i = 0, 1, . . .
P
Pj
j=i
P
is non–decreasing (non–increasing). Notice that hi ≤ 1. As the survival function Pj rarely can
j≥i
be given in closed form it is not easy to determine the monotonicity of the hazard rate by using the
difference hi+1 − hi . In Theorem 11 we have seen that the IHR (DHR) property of a continuous
distribution can be determined by the curvature of ln f (x). Because the PMF Pi plays the same
role for the CDF (CCDF) as the PDF f (x) in the continuous case, i.e., giving the increment in
CDF (decrement in CCDF), a simple criterion for determining the monotonicity can be based on
the curvature of the PMF, see G UPTA et al. (1997).
Define
Pi − Pi+1 Pi+1
ηi := =1− (2.10a)
Pi Pi
and
Pi+1 Pi+2
∆ηi = ηi+1 − ηi = − . (2.10b)
Pi Pi+1
Recalling that PMF is log convex if
2
Pi Pi+2 > Pi+1 ∀i

and log concave if


2
Pi Pi+2 < Pi+1 ∀ i,
we can say that log convexity is equivalent to ∆ηi < 0 and log concavity is equivalent to ∆ηi > 0.
Thus, analogous to Theorem 11 we can state:

1) If ∆ηi < 0, then hi is non–increasing (DHR).  



2) If ∆ηi > 0, then hi is non–decreasing (IHR). (2.10c)
Pi+1 Pi+2


3) If ∆ηi = 0, then = ∀ i.



Pi Pi+1

The difference ∆ηi = 0 in 3) implies


Pi = ci P0 ,
where c is a positive constant, and three distributions with this property are possible, see G UPTA
et al. (1997):
a) Pi = P0 (1 − P0 )i ; i = 0, 1, . . .
This is the geometric distribution which has a constant hazard rate and is IHR as well as DHR.

b) Pi = P0 = 1/(m + 1); i = 0, 1, . . . , m

This is the discrete uniform distribution which is IHR.

ci
c) Pi = ; i = 0, 1, . . . , m
1 + c + c2 + . . . + cm
This distribution is IHR, too.
74 2 Aging Criteria and Classes of Univariate Lifetime Distributions

Thus, in order to find out whether a discrete distribution is IHR or DHR or not monotone we just
have to study the behavior of the ratio of two adjacent probabilities.

Example 2/2: Monotonicity of well–known discrete distributions6

Binomial distribution
 
n
Pi = P i (1 − P )n−i ; i = 0, 1, . . . , n, n ∈ N, 0 < P < 1;
i
Pi+1 n−i P
=
Pi i+1 1−P
n+1 P
∆ηi = > 0 ⇒ IHR
(i + 1) (i + 2) 1 − P

Logarithmic series distribution

aPi  −1
Pi = ; i = 1, 2, . . . , 0 < P < 1, a = − ln(1 − P ) ;
i
Pi+1 i
= P
Pi i+1
P
∆ηi = − < 0 ⇒ DHR
(i + 1) (i + 2)

Negative binomial distribution

 
k+i−1 k
Pi = q (1 − q)i ; i = 0, 1, . . . , 0 < q < 1, k > 0;
i
Pi+1 k+i
= (1 − q)
Pi (i + 1)



 < 0 for 0 < k < 1 ⇒ DHR
k−1 
∆ηi = (1 − q) = 0 for k = 1 ⇒ geometric distribution
(i + 1 (i + 2) 

 > 0 for k > 1 ⇒ IHR

P OISSON distribution

λi −λ
Pi = e ; i = 0, 1, . . . , λ > 0;
i!
Pi+1 λ
=
Pi i+1
λ
∆ηi = > 0 ⇒ IHR
(i + 1) (i + 2)

We have just seen that the ratio of two adjacent probabilities serves to investigate the behavior of
the hazard rate. G UPTA et al. (1997) also show that it is possible to compute the hazard rate when
these ratios are known. The fundamental equation, which has to be evaluated, gives the reciprocal

6
For more discrete distributions see Sect. 3.2.
2.1 Monotone Hazard Rate Distributions 75

of the hazard rate as follows:


1 Pr(X ≥ i)
=
hi Pr(X = i)
Pi + Pi+1 + Pi+2 + . . .
=
Pi
Pi+1 Pi+2 Pi+1 Pi+3 Pi+2 Pi+1
= 1+ + + + ...
Pi Pi+1 Pi Pi+2 Pi+1 Pi
i+j
∞ Y
X Pk+1
= 1+ . (2.11)
Pk
j=0 k=i

2.1.3 IHRA and DHRA Distributions7


Introducing the so–called hazard rate average (HRA) lessens the requirement of a monotone
course of the hazard rate when monotonicity is only asked for its average. Thus, the class of IHRA
(DHRA) distributions — IHRA (DHRA) = ˆ increasing (decreasing) hazard rate average — is
wider than the class of IHR (DHR) distributions. In defining HRA we have to distinguish between
continuous and discrete distributions.
Definition 1: A continuous distribution is IHRA (DHRA) if

1
HRA(x) = − ln S(x) (2.12a)
x
is increasing (decreasing) in x, x ≥ 0, or, equivalently, if
 1/x1  1/x2
S(x1 ) ≥ S(x2 ) for 0 ≤ x1 ≤ x2 .  (2.12b)
(≤)

Rx
Recall from (1.11f) that − ln S(x) represents the cumulative hazard rate H(x) = h(u) du,
0
when the hazard rate exists and we then have
Zx
1 H(x)
HRA(x) = h(u) du = . (2.12c)
x x
0

Thus, we may give a second and equivalent


Definition 2: A continuous distribution is IHRA (DHRA) if

H(x2 ) H(x2 )
≥ for 0 ≤ x1 ≤ x2 .  (2.12d)
x2 (≤) x2

Based on (2.12c) IHRA (DHRA) means that


 
d H(x)
≥ 0 ∀x≥0 (2.13a)
dx x (≤)

or
H(x)
h(x) ≥ ∀ ≥ 0, (2.13b)
(≤) x
7
Suggested reading for this section: BARLOW /P ROSCHAN (1975), L ANGBERG et al. (1982).
76 2 Aging Criteria and Classes of Univariate Lifetime Distributions

i.e., the increment h(x) of H(x) has to be greater (smaller) than the hazard rate average HRA(x)
for all x. For example, looking at the reduced W EIBULL distribution we have
H(x)
H(x) = xc , h(x) = c xc−1 , = HRA(x) = xc−1
x
and

 ≤ H(x) = xc−1 for 0 < c ≤ 1 ⇒ DHRA and DHR

h(x) = c x c−1 x
 ≥ H(x) = xc−1 for c ≥ 1 ⇒ IHRA and IHR.

x
 1/x
It is obvious that an IHRA distribution is characterized by decreasing S(x) on [0, ∞), while
 1/x
a DHRA distribution is characterized by increasing S(x) on [0, ∞). Hence, we can formu-
late
Theorem 12: A distribution is IHRA (DHRA) if and only if
 1/a
S(a x) ≥ S(x) , 0 < a < 1, x ≥ 0.  (2.14)
(≤)

Another theorem on IHRA (DHRA) is


Theorem 13: If a distribution F (·) is IHR (DHR) then F (·) is IHRA (DHR). 
The reverse of Theorem 13 necessarily does not always hold. For example,

F (x) = 1 − e−x 1 − e−c x ; x ≥ 0, c > 1,


 

is IHRA but not IHR.


Furthermore we can state
Theorem 14: A distribution F (·) is IHRA (DHRA) if and only if the difference S(x) − exp(λ x)
has exactly one change in sign from + to − (from − to +) for all λ > 0. 
From Theorem 14 we can easily deduce the following bounds for the survival probability of an
IHRA (DHRA) distribution with known percentile XP of order P, 0 < P < 1 :
 
x ln(1 − P )
 

 ≥ exp − for x ≤ xP



(≤) x P

S(x)   (2.15)
x ln(1 − P )
 ≤ exp − for x ≥ xP . 

 
xP

(≥)

BARLOW /P ROSCHAN (1975, pp. 91 ff.) give the following stochastic model leading to an IHRA
lifetime distribution. A device is subject to shocks occurring randomly in time according to a
P OISSON process, each shock independently causes random damage to the device. The damages
accumulate until a critical threshold or capacity is exceeded, at which time the device fails. This
time to failure is governed by an IHRA distribution. For closure and inheritance of IHRA (DHRA)
distributions see Tab. 2/1 in Sect. 2.4.
In the discrete case we have — see (1.62a,b) — two different CHRs, the cumulative hazard rate
function  
X
1 Hi = − ln Si = − ln  Pj  ; i = 0, 1, . . .
j≥i

and the accumulated hazard rate


i
X
2 Hi = hj ; i = 0, 1, . . .
j=0
2.1 Monotone Hazard Rate Distributions 77

Thus, we can define two discrete hazard rate averages:


1
1 HRAi = − ln Si ; i = 0, 1, . . . ; (2.16a)
i+1
i
1 X
2 HRAi = hj ; i = 0, 1, . . . (2.16b)
i+1
j=0

Definition: A discrete distribution is IHRA (DHRA) in the sense of the cumulative hazard rate if

1 HRAi ≥ 1 HRAi−1 ; i = 1, 2, . . . (2.17a)


(≤)

or equivalently if
1/(i+1) 1/i
Si ≤ Si−1 ; i = 1, 2, . . . (2.17b)
(≥)

and it is IHRA (DHRA) in the sense of the accumulated hazard rate if

2 HRAi ≥ 2 HRAi−1 ; i = 1, 2, . . . (2.17c)


(≤)

or equivalently if
i 2 Hi−1
≥ ; i = 1, 2, . . .  (2.17d)
i + 1 (≤) 2 Hi

Theorem 15: If a discrete lifetime distribution is IHR (DHR) then it is IHRA (DHRA) in the sense
of the cumulative hazard rate as well as in the sense of the accumulated hazard rate. 

Proof: We first look at 1 HRAi . If

h0 ≤ h1 ≤ . . . ≤ hi−1 ≤ hi ≤ . . .

then, because of 0 ≤ hi ≤ 1 ∀ i,

ln(1 − h0 ) ≥ ln(1 − h1 ) ≥ . . . ≥ ln(1 − hi−1 ) ≥ ln(1 − hi ) ≥ . . .

and
i−1 i−2
1 X 1X
1 HRAi − 1 HRAi−1 = − ln(1 − hj ) + ln(1 − hj )
i+1 i
j=0 j=0
i−2
P i−1
P
(i + 1) ln(1 − hj ) − i ln(1 − hj )
j=0 j=0
=
i (i + 1)
i−2
P 
ln(1 − hj ) − ln(1 − hi−1 ) − ln(1 − hi−1 )
j=0
= ≥0
i (i + 1)
and the distribution is IHRA. Similarly, if

h0 ≥ h1 ≥ . . . ≥ hi−1 ≥ hi ≥ . . .

then
ln(1 − h0 ) ≤ ln(1 − h1 ) ≤ . . . ≤ ln(1 − hi−1 ) ≤ ln(1 − hi ) ≤ . . .
78 2 Aging Criteria and Classes of Univariate Lifetime Distributions

and
1 HRAi − 1 HRAi−1 ≤ 0
and the distribution is DHRA.
We now look at 2 HRAi . If

h0 ≤ h1 ≤ . . . ≤ hi−1 ≤ hi ≤ . . .

then
i
X i−1
X
i 2 Hi − (i + 1) 2 Hi−1 = i hj − (i + 1) hj
j=0 j=0
i−1
X
= (hi − hj ) ≥ 0
j=0

and the distribution is IHRA. Similarly, if

h0 ≥ h1 ≥ . . . ≥ hi−1 ≥ hi ≥ . . .

then
i 2 Hi − (i + 1) 2 Hi−1 ≤ 0
and the distribution is DHRA. 

When the distribution is not IHR (DHR) Theorem 15 must not hold. We recommend to express
the behavior of the hazard rate average in the discrete sense by means of 2 HRAi , because (2.16b)
is the average by construction, i.e., a sum divided by the number of its summands.

2.2 Non–monotone Hazard Rate Distributions8


The IHR (IHRA) property is characteristic for devices that consistently (on the average) deterio-
rate with age, whereas the DHR (DHRA) property is characteristic for devices that consistently
(on the average) improve with age. But many physical phenomenon exhibit hazard rates that are
not monotone. Of special interest are hazard rate which first decrease and afterwards increase and
look like a bathtub or which first increase and then decrease and look like an inverted (upside–
down) bathtub.
A common description of bathtub–shaped hazard rates which is appropriate for modeling human
lifetimes by means of the lifetable9 shows three phases: an initial phase during which the hazard
rate (here: the one–year age specific death rate) decreases, followed by a middle phase during
which the hazard rate is approximately constant, concluded by a final phase during which the
hazard rate increases. For human beings, the first phase (infant mortality) shows death due typi-
cally to hereditary defects, whose impact diminishes with age. The middle phase (chance failure)
shows death due typically to sudden jolts such as accidents. The final phase (wear–out) shows
death resulting from the natural accumulation of negative health effects. The logical counterpart
to bathtub–shaped hazard rates is the three phase situation in which the hazard rate initially in-
creases, then becomes essentially constant, and ultimately decreases. This upside–down shape
can be found in accelerated life testing, in which the units tested are subjected to abnormally high
stress levels. We will abbreviate the bathtub property by DIHR (decreasing–increasing hazard
rate) and the upside–down bathtub property by IDHR (increasing–decreasing hazard rate).
8
Suggested reading for this section: D HILLON (1979, 1981), G LASER (1980), G RIFFITH (1982), H JORTH (1980),
L AI et al. (2001), S ILVA et al. (2010).
9
The lifetable is presented and discussed in Chapter 6.
2.2 Non–monotone Hazard Rate Distributions 79

Whether a hazard rate is DIHR or IDHR can be found out by investigating its first derivative in
the case of a continuous variate or its first difference in the case of a discrete variate. A quite
general definition which extends the idea of DIHR and IDHR to situations where the hazard rate
itself does not exist is
Definition: A lifetime distribution F (x) with x ∈ [0, ∞) is said to be DIHR (IDHR) if there
exists a x0 > 0 such that H(x) = − ln[1 − F (x)] is concave (convex) on [0, x0 ) and convex
(concave) on [x0 , ∞). 
G LASER (1980) has given sufficient conditions to characterize a given lifetime distribution as
being IHR, DHR, IDHR, and DIHR, assuming that its PDF f (x) is continuous and twice differ-
entiable on [0, ∞). These conditions rest upon the reciprocal of the hazard rate

1 S(x)
g(x) := = (2.18a)
h(x) f (x)

with first derivative


g 0 (x) = g(x) ζ(x) − 1 (2.18b)
where
f 0 (x)
ζ(x) = − (2.18c)
f (x)
and
2 f 00 (x)
ζ 0 (x) = ζ(x) −

. (2.18d)
f (x)
These conditions are stated in the following
Theorem 16:
a) If ζ 0 (x) > 0 ∀ x ≥ 0, then IHR.

b) If ζ 0 (x) < 0 ∀ x ≥ 0, then DHR.

c) Suppose there exists x0 > 0 such that

ζ 0 (x) < 0 ∀ x ∈ [0, x0 ), ζ 0 (x0 ) = 0 and ζ 0 (x) > 0 ∀ x > x0 . (2.19a)

ca) If there exists y0 > 0 such that g 0 (y0 ) = 0, then DIHR.


cb) If there does not exist y0 > 0 such that g 0 (y0 ) = 0, then IHR.

d) Suppose there exists x0 > 0 such that

ζ 0 (x) > 0 ∀ x ∈ [0, x0 ), ζ 0 (x0 ) = 0 and ζ 0 (x) < 0 ∀ x > x0 . (2.19b)

da) If there exists y0 > 0 such that g 0 (y0 ) = 0, then IDHR.


db) If there does not exist y0 > 0 such that g 0 (y0 ) = 0, then DHR. 

G LASER (1980) has supplemented Theorem 16 by the following lemma that helps to avoid finding
a root y0 of g 0 (·).
Lemma: Suppose(2.19a) or (2.19b) hold in Theorem 16.

a) Suppose ε = limx→0+ f (x) exists, possibly equal to 0 or ∞.

(i) If ε = ∞ and (2.19a) holds, then DIHR.


(ii) If ε = 0 and (2.19a) holds, then IHR.
80 2 Aging Criteria and Classes of Univariate Lifetime Distributions

(iii) If ε = ∞ and (2.19b) holds, then DHR.


(iv) If ε = 0 and (2.19b) holds, then IDHR.

b) Suppose δ = limx→0+ g(x) ζ(x) exists, possibly equal to 0 or −∞.

(i) If δ > 1 and (2.19a) holds, then DIHR.


(ii) If δ < 1 and (2.19a) holds, then IHR.
(iii) If δ > 1 and (2.19b) holds, then DHR.
(iv) If δ < 1 and (2.19b) holds, then IDHR. 

Among the popular and classical distributions we seldom find one which is DIHR or IDHR. Two
prominent exceptions are the log–normal distribution with10

(ln x − a)2
 
1 1 p
f (x) = √ exp − , x > 0, a = E(ln X) ∈ R, b = Var(ln X),
b 2π x 2 b2
 
1 1 ln x − a
= φ , (2.20a)
b x b
 
ln x − a
F (x) = Φ , (2.20b)
b

and the inverse G AUSSIAN distribution, also called WALD distribution with
r
b (x − a)2 a3
 
b
f (x) = exp − , x > 0, a = E(X) > 0, b = > 0, (2.21a)
2 π x3 2 a2 x Var(X)
"r #   " r  #
b x  2b b x 
F (x) = Φ − 1 + exp Φ − +1 . (2.21b)
x a a x a

These two distributions are IDHR irrespective of their parameter values. Figures 2/1 and 2/2
depict the hazard rates of these distributions for several combinations of parameter values.
There are several approaches to construct DIHR and IDHR distributions. We only mention:11

1. directly specifying a hazard rate that has a bathtub or inverted bathtub shape and then re-
covering its CDF and PDF using (1.9c,d),

2. mixing or compounding of distributions,

3. generalizing a familiar distribution by introducing additional parameters.

We will give examples for each of these three approaches.


10

√1 exp − u2 2 is the PDF of the standardized normal distribution.


 
1. φ(u) = 2π

Ru
2. Φ(u) = φ(z) dz is the CDF of the standardized normal distribution which cannot be given in closed form.
−∞

11
For more approaches see L AI et al. (2001).
2.2 Non–monotone Hazard Rate Distributions 81

Figure 2/1: Hazard rates of several log–normal distributions

Figure 2/2: Hazard rates of several inverse G AUSSIAN distributions

A simple bathtub–shaped hazard rate is realized by a polynomial of second degree:12

h(x) = a + b x + c x2 ; x ≥ 0, a > 0, b < 0, c > 0. (2.22a)



The change point (minimum) is at x = −b (2 c). PDF and CDF corresponding (2.22a) are
 
2 b 2 c 3
f (x) = (a + b x + c x ) exp −a x − x − x , (2.22b)
2 3
 
b c
F (x) = 1 − exp −a x − x2 − x3 . (2.22c)
2 3
12
G LASER (1980) shows that it is not possible to create an upside–down bathtub–shaped hazard rate distribution
on [0, ∞) by a polynomial.
82 2 Aging Criteria and Classes of Univariate Lifetime Distributions

Another possibility of directly specifying a bathtub–shaped hazard rate is to add or superimpose


a decreasing and an increasing function. The approach of H JORTH (1980) rests upon this idea:

θ
h(x) = δ x + ; x ≥ 0, δ, θ, β ≥ 0. (2.23a)
1+βx

δ x is an increasing term and θ (1 + β x) is a decreasing term. For β > 0 (2.23a) results in

δ x2
 
θ + δ x (1 + β x)
f (x) = exp − , (2.23b)
(1 + β x)θ/β+1 2
exp − δ x2 /2

F (x) = 1 − , (2.23c)
(1 + β x)θ/β
δ/x2 δ ln(1 + β x)
H(x) = + . (2.23d)
2 β

For β = 0 we have to take the limit of f (x) and F (x) as β → 0. Special cases of (2.23a) — see
Fig. 2/3 — are:

• θ = 0 ⇒ increasing hazard rate δ x of a R AYLEIGH distribution,

• δ = 0 ⇒ decreasing hazard rate of a distribution with F (x) = 1 − (1 + β x)−θ/β , which


is the reduced L OMAX distribution when β = 1.

• δ = β = 0 ⇒ constant hazard rate of an exponential distribution,

• δ ≥ θ β ⇒ increasing hazard rate,

• 0 <rδ < θ β ! ⇒ bathtub–shaped hazard rate with change point (minimum) at x =


1 θβ
−1 .
β δ

Figure 2/3: Hazard rates of several H JORTH distributions


2.2 Non–monotone Hazard Rate Distributions 83

An example for a compound distribution with possible bathtub–shaped hazard rate is the expo-
nential power distribution of D HILLON (1981), called D HILLON–I distribution. The hazard rate
reads:
c x − a c−1 x−a c
   
h(x) = exp ; x ≥ a, a ∈ R, b, c > 0. (2.24a)
b b b
This hazard rate is displayed in Fig. 2/5 further down. The following functions belong to (2.24a):

x − a c−1 x−a c x−a c



     
c
f (x) = exp 1 − exp + , (2.24b)
b
b b b
x−a c
   
F (x) = 1 − exp 1 − exp , (2.24c)
b
x−a c
   
H(x) = exp 1 − exp − 1. (2.24d)
b

When b = 1 we have a log–W EIBULL distribution, also known as type–I extreme value distri-
bution of the minimum, and the hazard is increasing for b ≥ 1. The bathtub–shaped hazard rate
1 1 − b 1/b
 
comes up for 0 < b < 1 with a change point (minimum) at x = .
a b
A first example for a generalized distribution is S TACY’s (1962) generalized gamma distribution13
with14
d xd c−1
   
x d
f (x) = d c exp − ; x ≥ 0; b, c, d > 0. (2.25)
b Γ(c) b

This distribution includes many other distributions as special cases, see R INNE (2009, pp. 111 ff.).
The behavior of the hazard rate does not depend on the scaling parameter b, but it depends on
c d − 1 as follows:

• cd − 1 < 0

? d ≤ 1 ⇒ DHR
? d > 1 ⇒ DIHR

• cd − 1 > 0

? d ≥ 1 ⇒ IHR
? d < 1 ⇒ IDHR

• cd − 1 = 0

? d = 1 ⇒ constant hazard rate


? d < 1 ⇒ DHR
? d > 1 ⇒ IHR.

13 1  x d−1  x
The ordinary gamma distribution with f (x) = exp − ; x ≥ 0, b, d > 0; is IHR for
b Γ(d) b b
d ≥ 1 and IHR for 0 < d ≤ 1.
R∞
14
Γ(z) = uz−1 e−u du is the complete gamma function.
0
84 2 Aging Criteria and Classes of Univariate Lifetime Distributions

A second example is the generalized exponential geometric distribution given by S ILVA et al.
(2010) with

1 − exp(−b x) c
 
F (x) = ; x ≥ 0, b, c > 0, p ∈ (0, 1), (2.26a)
1 − p exp(−b x)
 c−1
c b (1 − p) exp(−b x) 1 − exp(−b x)
f (x) =  c+1 , (2.26b)
1 − p exp(−b x)
 c−1  −1
c b (1 − p) exp(−b x) 1 − exp(−b x) 1 − p exp(−b x)
h(x) =  c  c (2.26c)
1 − p exp(−b x) − 1 − exp(−b x)

This distribution is:


 
c−1
• IDHR for p ∈ , 1 and c > 1,
c+1
 
c−1
• IHR for p ∈ 0, and c > 1,
c+1
• DHR otherwise.

2.3 MRL classes of Distributions15


The behavior of the mean residual life function MRL

Z∞
1
• µ(x) = E(X − x | X ≥ x) = S(u) du in the continuous case and
S(x)
x

1 X
• Li = E(X − i | X ≥ i) = Sj in the discrete case
Si
j>i

may also be used to characterize aging and to classify lifetime distributions. While the hazard
rate function at x provides information about a small interval just after x, the MRL function at
x considers information about the whole interval after x (all after x). This intuition explains the
difference between the two.
When MRL is monotone and increasing (decreasing) — abbreviated IMRL (DMRL) — we have
beneficial (adverse) aging. But there are also distributions with non–monotone MRL. Of special
interest in this case are the DIMLR class where MRL has a bathtub shape (first decreasing, af-
terwards increasing) and the IDMRL class with an upside–down bathtub shape (first increasing,
afterwards decreasing).
What functions are MRL functions? — Several characterizations are possible which answer this.
We cite the characterization given by G UESS /P ROSCHAN (1988, p. 217).16
Theorem 17: Consider the following conditions:

(i) µ(x) : [0, ∞) → [0, ∞).

(ii) µ(0) > 0.


15
Suggested reading for the section: B RYSON /S IDDIQUI (1969), E BRAHIMI (1986), G UESS /PARK (1988),
G UESS /P ROSCHAN (1988), K EMP (2004), K LEFSJ Ö (1982b), L AI et al. (2001), WATSON /W ELLS (1961).
16
A characterization with examples of DMRL (IMRL) in the discrete case may be found in E BRAHIMI (1986).
2.3 MRL classes of Distributions 85

(iii) µ(x) is right–continuous (not necessarily continuous).

(iv) m(x) := µ(x) + x is increasing on [0, ∞).

(v) When there exists x0 such that µ(x−


0 ) = limx→0− µ(x) = 0, then µ(x) = 0 holds for
x ∈ [x0 , ∞). Otherwise, when there does not exist such a x0 with µ(x−
0 ) = 0, then
R∞ 1
du = ∞ holds.
0 µ(u)

A function µ(x) satisfies (i) – (v) if and only if µ(x) is the MRL function of a non–degenerate at
x = 0 lifetime distribution. 
Note that condition (ii) rules out the the degenerate at x = 0 distribution. (iv) is a statement on
the expected time of death given that a unit has survived to time x, see (1.16a,b). Theorem 17
delineates which functions can serve as MRL functions, and hence, it provides models for life–
lengths. For recovering the other representatives of a lifetime distribution see (1.20a–f).
We now look at the relationship between the HR classes and the MRL classes.
Theorem 18: A lifetime distribution that is IHR (DHR) has a decreasing (increasing) MRL func-
tion, i.e., the IHR (DHR) class is contained in the DMRL (IMRL) class. 
The implication IHR ⇒ DMRL (DHR ⇒ IMRL) of Theorem 18 cannot be reversed in general.
We give a proof of Theorem 18 for a discrete lifetime distribution:17
X  Sj Sj+1

Li − Li+1 = −
Si Si+1
j>i
j−1
" #
X Y
= (hj − hi ) (1 − hk ) ≥ 0, (2.27)
j>i (≤)
k=i+1

accordingly as hj ≥ hi , j > i. Therefore, when hi < hi+1 ∀ i (= IHR), the MRL function
(≤)
decreases and when hi > hi+1 ∀ i (= DHR), the MRL function increases.

Example 2/3: Distributions with monotone HR and monotone MRL

The reduced W EIBULL distribution has

h(x) = c xc−1 ; x ≥ 0, c > 0;

which is DHR for 0 < c ≤ 1 and IHR for c ≥ 1. With

S(x) = exp − xc


we find with the help of (1.21d)


R∞
 
 1 c
exp − uc du γ ,x
x c
µ(x) =  = , (2.28)
exp − xc c exp − xc

where
Z∞
γ(a, z) = ua−1 e−u du
z

17
For a proof in the continuous case see B RYSON /S IDDIQUI (1969).
86 2 Aging Criteria and Classes of Univariate Lifetime Distributions

is the complementary incomplete gamma function.18 We have


 
c c
 1 c 1/c 
x exp x γ , x − xc  ≤ 0 for c ≥ 1 ⇒ DMRL,
dµ(x) c
=
dx x  ≥ 0 for 0 < c ≤ 1 ⇒ IMRL.

Fig. 2/4 shows the hazard rate functions and the corresponding mean residual life functions.
A discrete distribution analyzed by K EMP (2004) is the zero–inflated geometric distribution with
0 < λ < 1 and 0 < α < 1. We have
h0 = 1 − α λ, hi = 1 − λ for i ≥ 1, and (2.29a)

S0 = 1, S1 = α λ, Si = α λi for i > 1. (2.29b)

Thus P 
Sj
αλ

j>0


L0 = = , 

S0 1−λ 
P (2.29c)
Sj 

j>i λ 

Li = = , i > 0. 

Si 1−λ
Now
h0 > h1 = h2 = . . . and L0 < L1 = L2 = . . .
Therefore, the zero–inflated geometric distribution is DHR and IMRL.
Figure 2/4: HR and corresponding MRL of two reduced W EIBULL distributions

From the fact that IHR ⇒ DMRL and DHR ⇒ IMRL one may conjecture that DIHR implies
IDMRL, i.e., a bathtub–shaped hazard rate implies an upside–down MRL. The following Theo-
rem 19 is given by and proved by M I (1995).
Theorem 19: If the hazard rate function h(x) has a bathtub shape, then the associated MRL has
an upside–down bathtub shape. 
Fig. 2/5 demonstrate Theorem 19 for the exponential power distribution of D HILLON (1981) with
the hazard rate given in (2.24a), which is DIHR for 0 < b < 1.
Rz
18
Γ(a, z) = ua−1 e−u du is the incomplete gamma function.
0
2.3 MRL classes of Distributions 87

Figure 2/5: HR function and MRL function of a D HILLON–I distribution


(a = 0, b = 1, c = 0.5)

Figure 2/6: HR function and MRL function of a log–normal distribution (a = 0, b = 1)

The converse of Theorem 19 is not necessarily true as demonstrated by the following example of
M I (1995): 


 x2 + 1 for 0 ≤ x < 1,

µ(x) = 2x for 1 ≤ x < 2,


 
 4 exp − 0.25 (x − 2) 
for x ≥ 2.

This µ(x) has an upside–down bathtub shape and is a MRL function of a certain lifetime distri-
bution. Applying (1.20d) we find the corresponding hazard rate function:

2x + 1

 for 0 ≤ x < 1,
x2 + 1



3

h(x) = for 1 ≤ x < 2,


 2x
 0.25  exp0.25 (x − 2) − 1 for x ≥ 2.

88 2 Aging Criteria and Classes of Univariate Lifetime Distributions

This hazard rate is not bathtub–shaped, instead it is bathtub–shaped over [0, 2), drops down to
0.25/e ≈ 0.0920 at x = 2 and increases to infinity over (2, ∞).
From Theorem 19 we may conjecture that a distribution with bathtub–shaped hazard rate might
have a bathtub–shaped mean residual life function. As Fig. 2/6 shows this conjecture holds at
least for the log–normal distribution.
The reader interested in DIHR and IDHR residual life functions of discrete distributions should
consult G UESS /PARK (1988).

2.4 Classification According to other Aging Criteria19


We start this section by a figure showing the most common aging criteria and the way they are
linked, i.e., which criterion is implied by which other criterion, or stated otherwise, which crite-
rion defines a subclass of distributions of what other criterion’s subclass. The chain in the upper
part refers to positive aging while the chain in the lower part refers to negative aging. A proof of
the implications in Fig. 2/7 can be found in B RYSON /S IDDIQUI (1969) and in K LEFSJ Ö (1982a).

Figure 2/7: Chains of implications for several aging criteria

IAF
⇐⇒

=⇒ NBUHR =⇒ NBUHRA
IHRA =⇒ NBU =⇒
=⇒
PF2 =⇒ IHR =⇒ NBUE =⇒ HNBUE
DMRL =⇒
⇐⇒

IIHRA
DAF
⇐⇒

=⇒ NWUHR =⇒ NWUHRA
DHRA =⇒ NWU =⇒
=⇒
DHR =⇒ NWUE =⇒ HNWUE
IMRL =⇒
⇐⇒

DIHRA
PF2 means P ÓLYA density of order 2 and is the strongest aging criterion.

We now present and discuss those criteria in Fig. 2/7 which have not been described in the pre-
ceding sections and we start on the left–hand side and move to the right. The criteria IAF and
DAF have been introduced by B RYSON /S IDDIQUI (1969), and IAF (DAF) stands for increasing
(decreasing) specific aging factor. The specific aging factor is defined as
S(x) S(y)
A(x, y) := ; x, y ≥ 0. (2.30a)
S(x + y)
19
Suggested reading for the section: BARLOW /P ROSCHAN (1975), B RYSON /S IDDIQUI (1969), J OHNSON /
KOTZ /BALAKRISHNAN (1995, pp. 663 ff.), K EMP (2004), K LEFSJ Ö (1982a,b), M ARSHALL /P ROSCHAN
(1972).
2.4 Classification According to other Aging Criteria 89

Notice the interchangeability of the arguments x and y and the relationship to NBU and NWU
in (2.32). If a distribution is NBU (NWU), its specific aging factor results as A(x, y) ≥ 1. The
(≤)
motivation for A(x, y) may be seen as follows: Consider two units with lifetimes described by
one and the same distribution F (·), and let x denote the chronological age of one of them. The
other unit is ‘new’, i.e., has a chronological age of zero. Then S(y) is the probability  that the
new unit will survive for at least a duration y. Correspondingly, the ratio S(x + y) S(x) is the
probability that the older unit will survive for that same duration, given its prior survival up to
time x. The specific aging factor is the a comparison of these two survival probabilities. It will
be strictly greater than unity if and only if the older unit has ‘aged’ in that it has less chance of
surviving for duration y than does a new unit. The range of A(x, y) is the extended positive real
line; however, it is undefined if either numerator factor vanishes.
Definition: A distribution function F (·) is called IAF if

A(x2 , y) ≥ A(x1 , y) ∀ y ≥ 0, x2 ≥ x1 ≥ 0, (2.30b)

and DAF if

A(x2 , y) ≤ A(x1 , y) ∀ y ≥ 0, x2 ≥ x1 ≥ 0.  (2.30c)

We now prove that criteria IAF and IHR are equivalent.


Proof: IAF specifies that A(x, y) is an increasing function of x for all y. Differentiating,

dA(x, y) S(x + y) S 0 (x) − S(x) S 0 (x + y)


= ≥ 0.
dx S(x) S(x + y)
Hence,
S 0 (x + y) S 0 (x)

S(x + y) S(x)
or
h(x + y) ≥ h(x).
Since this holds for all y ≥ 0, h(·) is increasing and IHR holds. Conversely, if IHR holds, then
the foregoing steps may be reversed to show that A(x, y) is an increasing function of x. 
The equivalence of DHR and DAF can be proved along the same lines.
An obvious generalization of the hazard rate average HRA(x), see (2.12a), is the specific
interval–average hazard rate
x+y
H(x + y) − H(x)
Z
1
HRA(x, y) := h(u) du = . (2.31a)
x x
y

We have
Zx
1
HRA(x, 0) = HRA(x) = h(u) du. (2.31b)
x
0

Definition: A distribution F (·) is called IIHRA (increasing interval–average hazard rate) if

HRA(x2 , y) ≥ HRA((x1 , y) ∀ x2 ≥ x1 ≥ 0, y ≥ 0, (2.31c)

and DIHRA (decreasing interval–average hazard rate) if

HRA(x2 , y) ≤ HRA(x1 , y) ∀ x2 ≥ x1 ≥ 0, y ≥ 0.  (2.31d)

To prove the equivalence of criteria IHR and IIHRA we need the following
90 2 Aging Criteria and Classes of Univariate Lifetime Distributions

Lemma: Let h(x) be integrable with no more than finitely many discontinuities in any finite
interval. Then h(x) is monotone increasing for all x > 0 if and only if

y+x
Z
1
h(y) ≤ h(u) du ≤ h(y + x) ∀ y ≥ 0, x > 0. 
x
y

Proof of IHR ⇔ IIHRA: In accordance with IHR, h(x2 ) ≥ h(x1 ) ∀ x2 ≥ x1 ≥ 0, let h(·) be
monotone increasing, and choose x2 ≥ x1 . Then

  y+x
Z 2 y+x
Z 2
1 1 1
HRA(x2 , y) − HRA(x1 , y) = − h(u) du + h(u) du
x2 x1 x2
y y+x1
y+x y+x
 
Z 2 Z 1
x2 + x1  1 1
= h(u) du − h(u) du
x2 x2 − x1 x1
y+x1 y
x2 − x1  
≥ h(y + x1 ) − h(y + x1 )
x2

by the above Lemma. Hence

HRA(x2 , y) ≥ HRA(x1 , y).

Conversely, suppose HRA(x2 , y) ≥ HRA(x1 , y) ∀ y ≥ 0 and x2 ≥ x1 . Then

x+x
Z 2 y+x
Z 1
1 1
h(u) du ≥ h(u) du.
x2 x1
y y

In particular, if x1 → 0, then the right–hand side becomes

x+x
R
h(u) du
x
lim = h(y + ),
x→0 x

so that, for any positive x2


y+x
Z 2
1
h(u) du ≥ h(y + ).
x2
y

Since y and x2 are arbitrary, the above Lemma applies to prove the monotonicity of h(·). 
The equivalence of DHR and DIHRA in the lower chain of Fig. 2/7 can be proved along the same
lines.
The probability of an x–survivor living another y units of time is

S(x + y)
S(y | x) = Pr(Y > y | X ≥ x) = ,
S(x)

whereas the probability of a new unit living more than y units of time is

Pr(Y > y | X ≥ 0) = S(y).


2.4 Classification According to other Aging Criteria 91

Definition:20 A lifetime distribution is said to be NBU (new better than than used) or NWU
(new worse than used) accordingly as the conditional survival probability S(y | x) is is less (or
greater) than the unconditional survival probability S(y), i.e.,

S(x + y)
S(y) ≥ ∀ x ≥ 0, y ≥ 0.  (2.32)
(≤) S(x)

We will prove the implications IHR ⇒ NBU and DHR ⇒ NWU for a discrete distribution, where
NBU (NWU) means
Si+j
≤ Sj , i 6= j = 0, 1, . . .
Si (≥)
Proof: From (1.61c) we have
i+j−1
Q
(1 − hk ) j−1
Si+j k=0
Y 1 − hi+k
= = .
Si Sj i−1
Q j−1
Q 1 − hk
(1 − hk ) (1 − hk ) k=0
k=0 k=0

If hi increases with i, then


1 − hi+j
<1
1 − hj
and
Si+j
< Sj .
Si
Similarly, if hi decreases with i, then
Si+j
> Sj .
Si
It is to be noticed that F (·) is NBU (NWU) if and only if Y | x is stochastically smaller (greater)
than X. Furthermore, we have
Theorem 20:

a) F (·) is NBU if and only if H(·) is superadditive, i.e.,

H(x + y) ≥ H(x) + H(y) for x, y > 0.

b) F (·) is NWU if and only if H(·) is subadditive, i.e.,

H(x + y) ≤ H(x) + H(y) for x, y > 0. 

For a continuous distribution it is easily shown that NBU (NWU) can be equivalently stated as

S(x + y)
S(y) ≥
(≤) S(x)
20
H OLLANDER /PARK /P ROSCHAN (1986) introduced subclasses of of the NBU (NWU) distributions, called new
better (worse) than used of age x0 . For these classes the survival probability at age 0 is greater (smaller) than or
equal to the conditional survival probability at specified age x0 > 0 :

S(x + x0 ) ≤ S(x) S(x0 ) ∀ x > 0.


(≥)

They presented preservation and non–preservation properties of the two classes under various reliability opera-
tions and also showed how to test whether or not a distribution is new better than than used at age x0 .
92 2 Aging Criteria and Classes of Univariate Lifetime Distributions

and
Zx x+y
Z
h(u) du ≤ h(u) du; x, y > 0.
(≥)
0 y
But for discrete NBU (NWU) distributions the corresponding equivalence of
Si+j
Sj ≥ ; i, j = 0, 1, . . .
(≤) Si
and
j
X i+j
X
hk ≤ hk
(≥)
k=0 k=i
does not hold.
From the mean residual life of an x–survivor
Z∞
1
µ(x) = E(Y | x ≥ x) = S(u) du,
S(x)
x

and the mean life of a new unit


Z∞
µ(0) = E(X) = S(u) du
0

we have the following


Definition: A distribution F (·) is called NBUE (new better than used in expectation) if

µ(0) ≥ µ(x), x > 0, 




or equivalently (2.33a)
R∞ 

S(u) du ≤ µ(0) S(x), x > 0. 


x

A distribution F (·) is called NWUE (new better than used in expectation) if



µ(0) ≤ µ(x), x > 0, 




or equivalently  (2.33b)
R∞ 

S(u) du ≥ µ(0) S(x), x > 0. 


x

We will prove the implication NBU ⇒ NBUE (NWU ⇒ NWUE) for a discrete distribution.
Proof: If
Si+j ≤ Si Sj ; j = 0, 1, . . .
(≥)

then X X
Si+j ≤ Si Sj . 
j≥0 (≥) j≥0

R∞
Definition: A continuous distribution F (·) with finite mean µ = S(u) du is said to be HNBUE
0
(harmonic new better than used in expectation) if
Z∞  
x
S(u) du ≤ µ exp − for x ≥ 0.  (2.34a)
µ
x
2.4 Classification According to other Aging Criteria 93

If the reversed inequality is true, F (·) is said to be HNWUE (harmonic new worse than used
in expectation). This gives a dual class to the HNBUE class of distributions in the same way as
the IHR, IHRA, NBU, NBUE, and DMRL classes have their duals. The HNBUE and HNWUE
classes have been introduced by ROLSKI (1975). The names HNBUE resp. HNWUE are to be
explained as follows. Starting from the mean residual life
Z∞
1
µ(x) = S(u) du
S(x)
x

then the inequality (2.34a) can be written as21


1
≤ µ for x > 0. (2.34b)
Zx
1 1
du
x µ(u)
0

This inequality says that the integral harmonic mean of µ(u), 0 ≤ u ≤ x, is less than or equal to
the integral harmonic mean of µ. For HNBUE distributions we have
 

 1 for x ≤ µ 

S(x) ≤ 
µ−x
 (2.34c)
 exp
 for x > µ 

µ

and for HNWUE distributions


  
µ x
S(x) ≤ 1 − exp − for x ≥ 0 (2.34d)
x µ

holds.
Definition: For an absolutely continuous distribution F (·) with hazard rate h(·), we say that F (·)
is NBUHR (NWUHR) — new better (worse) than used in hazard rate — if

h(x) ≥ h(0) ∀ x ≥ 0.  (2.35)


(≤)

Definition: For an absolutely continuous distribution F (·), we say that F (·) is NBUHRA
(NWUHRA) — new better (worse) than used in hazard rate average — if
Zx  
1 ln 1 − F (x)
h(0) ≤ h(u) du = − ∀ x ≥ 0.  (2.36)
(≥) x x
0
21
Notice that from (2.34a) we first have
 
µ(x) x
S(x) ≤ exp − .
µ µ
Using (1.19e) we then find  x 
 Z  
1 x

exp − du ≤ exp −
 µ(u)  µ
0

and
Zx
1 x
− du ≤ − .
µ(u) µ
0

Rearranging the latter inequality results into (2.34a).


94 2 Aging Criteria and Classes of Univariate Lifetime Distributions

In the following Table 2/1 we have compiled from H OLLANDER /P ROSCHAN (1984) and
K LEFSJ Ö (1982a) the results pertaining to the closure of classes of lifetime distributions under
three reliability operations: mixture and convolution of distributions and formation of coherent
systems.
Table 2/1: Closure and inheritance of classes of lifetime distributions
under reliability operations

Mixture of Convolution of Formation of


Class distributions distributions coherent systems
IHR not closed closed not closed
IHRA ” ” closed
NBU ” ” ”
NBUE ” ” not closed
DMRL ” not closed ”
HNBUE ” closed ”
DHR closed not closed not closed
DHRA ” ” ”
NWU not closed ” ”
NWUE ” ” ”
IMRL closed ” ”
HNWUE ” ” ”
3 Presentation of Univariate
Parametric Distributions
In this chapter we have compiled for a great number of univariate parametric distributions
• the PDF f (x) in the continuous case or the PMF Pi in the discrete case,
• the CCDF S(x) or Si ,
• the HR h(s) or hi , and
• the MRL µ(x) or Li , if existing.
We also indicate the aging class. Not all distributions may serve as lifetime distributions.

3.1 Continuous Distributions1


For those distributions defined on R or with negative argument x we have not given results for the
variant resulting from lower truncation at x = 0. The formulas for such a truncated distribution
easily follow by applying (1.52a-e). We have applied the following notation for parameters:
• a ∈ R for a location parameter which causes a shift of the distribution along the abscissa,
• b > 0 a scale parameter causing a stretching or a shrinkage of the distribution,
• any other Latin or Greek letter for a parameter that affects the shape of the distribution.
The aging property does not depend on a and b.
We use the following mathematical and statistical functions:
Z1
Γ(c) Γ(d)
B(c, d) = = tc−1 (1 − t)d−1 dt − complete beta function
Γ(c + d)
0

ZP
B(P, c, d) = tc−1 (1 − t)d−1 dt − incomplete beta function
0
Z∞
Γ(a) = ta−1 e−t dt − complete gamma function
0
Zu
Γ(a, u) = ta−1 e−t dt − incomplete gamma function
0
Z∞
γ(a, u) = ta−1 e−t dt − complementary incomplete gamma function
u
1
Suggested reading for this section: J OHNSON /KOTZ /BALAKRISHNAN (1994, 1995), L EEMIS(1995),
M EEKER /E SCOBAR (1998), PATEL (1973), R INNE (2009, 2010). The interactive program ContDist which
is written in MATLAB and which is included in the accompanying file ‘Distributions.zip’ displays for all dis-
tributions presented here a graph of the functions f (x), S(x), h(x) and — if existing — µ(x) for any set of
user–chosen parameter values.
96 3 Presentation of Univariate Parametric Distributions

Zu
1
exp[−t2 2] dt − CDF of the standardized normal distribution

Φ(u) = √

−∞

1
exp[−u2 2] du − PDF of the standardized normal distribution

φ(u) = √

Zx
2
exp − u2 du − error function

erf(x) = 1 − √
π
0

Alpha distribution
This distribution is related to the normal distribution in the following way: Consider Y ∼
N o(µ; σ), truncated to the left of y = 0. Then, X = 1/Y has an alpha distribution with pa-
rameters α = µ/σ and b = 1/σ. The alpha distribution has been applied to tool wear and has also
been suggested in modeling lifetimes under accelerated life testing, see S ALVIA (1985):
" #
b 2

b exp −0.5 α −
x
f (x) = √ ; x ≥ 0, α ∈ R, b > 0
2 π Φ(α) x2
 
b
Φ α−
x
S(x) = 1 −
Φ(α)
" #
b 2

b exp −0.5 α −
x
h(x) = ⇒ IDHR

  
b
2 π Φ(α) − Φ α − x2
x
µ(x) − does not exist
√ 
The PDF has its mode at x = b α2 + 8 − α 4 which moves to the left (right) as α (b)
increases.

Arcsine distribution
This distribution is a special case of the beta distribution (see below) with shape parameters c =
d = 0.5.2 The name is derived from the fact that its CDF and CCDF are written in terms of the
arcsine function, the inverse of the sine function. The arcsine distribution with a = 0 and b > 0
having support [−b; b] gives the position at random time of a particle engaged in simple harmonic
motion with amplitude b > 0.
1
f (x) = s  ; a − b ≤ x ≤ a + b; a ∈ R, b > 0
x−a 2

bπ 1 −
b
 
x−a
arcsin
1 b
S(x) = −
2 π
2
A beta distribution with c + d = 1, but c 6= 0.5 is sometimes called a generalized arcsine distribution.
3.1 Continuous Distributions 97

1
h(x) = s  2  ⇒ DIHR
x−a x−a
b 1− arccos
b b
p
a + b − x) (x + b − a)
µ(x) = (a − x) +   ⇒ IDMRL
x−a
arccos
b
The arcsine distribution is a location–scale distribution. The PDF is symmetric and U–shaped,
its minimum is at x = a with f (a) = (b π)−1 . The bathtub–shaped hazard rate has its minimum
at x∗ ≈ a − 0.4421 b with h(x∗ ) ≈ (1.8197 b)−1 . The upside–down shaped MRL has  +its max-

imum at x being the solution of (a + b − x ) (x + b − a) − (a − x ) arccos x b−a =
+
p +
+ +
h  + i2
(a + b − x+ ) (x+ + b − a) arccos x b−a
p
.

Beta distribution
The name of this distribution has its origin in the complete beta function B(c, d) which is part of
the formulas for the PDF and other distribution representatives. The PDF of a beta distribution
can take on a great variety of shapes depending on its two shape parameters c, d ∈ R :
• symmetric for c = d,
• unimodal for c > 1 and d > 1,
• U–shaped for c < 1 and d < 1,
• J–shaped for d ≤ 1 ≤ c, but c 6= d,
• inversely (or reflected) J–shaped for c ≤ 1 ≤ d, but c 6= d.

The beta distribution also includes several other distributions as special cases, e.g.,
• the uniform distribution for c = d = 1,
• the right–angled negatively (positively) skewed triangular distribution for d = 1 and c =
2 (c = 1 and d = 2),
• the arcsine distribution for c = d = 0.5,
• the power function distribution for d = 1, c > 0.

1 (x − a)c−1 (a + b − x)d−1
f (x) = ; a ≤ x ≤ a + b; a ∈ R; b, c, d > 0
B(c, d) bc+d−1
(x−a)/b
Z
1
S(x) = 1 − uc−1 (1 − u)d−1 du
B(c, d)
0
= 1 − I( x−a ) (c, d) = I(1− x−a ) (d, c)
b b

The function Iz (c, d) is the incomplete beta function ratio which has to be evaluated numerically.

 DIHR for 0 < c < 0.8 and d arbitrarily
h(x) − no closed form ⇒ ∼
 IHR for all other combinations of c and d

 IDMRL for 0 < c < 0.8 and d arbitrarily
µ(x) − no closed form ⇒ ∼
 DMRL for all other combinations of c and d
98 3 Presentation of Univariate Parametric Distributions

BIRNBAUM–SAUNDERS distribution
This distribution has been suggested by B IRNBAUM /S AUNDERS (1968, 1969) as a lifetime model
for materials subject to cyclic patterns of stress where the ultimate failure comes from the growth
of prominent flaws.
r r
x b  r !2 
+ r
b x 1 x b 
f (x) = √ exp− 2 − ; x ≥ 0; b, c > 0
2cx 2π 2c b x

p p 
The variate Y = x/b − b/x c has a standard normal distribution, so
" r r !#
1 x b
S(x) = Φ − − .
c b x

h(x) = f (x)/S(x) has no closed form, but it is IDHR with a maximum at x∗ ≈ b/(−0.4604 +
1.8417 c)2 which is close to zero for b small and c large, so that the upside–down bathtub shape
does not come up clearly and the hazard rate seems to be DHR. The IDHR pattern is best seen for
b and c around 1. For h(x) we further notice:

• h(0) = 0 and

1
• lim h(x) = .
x→∞ 2 b c2
µ(x) has no closed form and it is DIMRL for b and c around 1, otherwise IMRL.

B URR distribution of type XII


B URR (1942) has suggested a number of forms of a CDF that might be useful for purposes of
graduation. Special attention has been devoted to type XII.

 d−1 "  d #−c−1


cd x−a x−a
f (x) = 1+ ; x ≥ a; a ∈ R; b, c, d > 0
b b b
"  d #−c
x−a
S(x) = 1+ ;
b

 d−1 "  d #−1  DHR for 0 < d ≤ 1,
cd x−a x−a
h(x) = 1− ⇒
b b b  IDHR for d > 1




 IMRL for 0 < d ≤ 1

µ(x) − no closed form ⇒ DIMRL for d > 1



 does not exist for c d ≤ 1

The hazard rate has — for d > 1 — a maximum at x∗ = a + b (d − 1)1/d with h(x∗ ) =
(c/b) (d − 1)1−1/d .
3.1 Continuous Distributions 99

C AUCHY distribution
The C AUCHY distribution is a symmetric distribution defined on R. Moments and thus MRL do
not exist.
( "  #)−1
x−a 2

f (x) = πb 1+ ; x ∈ R; a ∈ R, b > 0
b
 
1 1 x−a
S(x) = − arctan
2 π b
1
h(x) = "  2 #    ⇒ IDHR
x−a π x−a
b 1+ − arctan
b 2 b

The hazard rate has its maximum at x∗ ≈ a + 0.428978 b with h(x∗ ) ≈ 0.7246/b.

χ distribution
The χ distribution is related to the χ2 distribution. For ν = 2 the χ distribution is equal to the
R AYLEIGH distribution.
 2
x
xν−1 exp −
2
f (x) =  ν  ; x ≥ 0; ν > 0
2ν/2−1 Γ
2
Γ(ν/2, x2 /2)
S(x) = 1 − ν  ;
Γ
2
 2
x
xν−1 exp −

2  DIHR for 0 < ν < 1
h(x) = h ν  i ⇒
2ν/2−1 Γ − Γ(ν/2, x2 /2) 
IHR for ν ≥ 1
2

 IDMRL for 0 < ν < 1
µ(x) − no closed form ⇒
DMRL for ν ≥ 1

χ2 distribution
The χ2 distribution is a special case of the gamma distribution (see below) with b = 2.
 x
xν/2−1 exp −
f (x) =  ν  2 ; x ≥ 0; ν > 0
ν/2
2 Γ
2
Γ(ν/2, x/2)
S(x) = 1 − ν  ;
Γ
2 
 x 
DHR for 0 < ν < 2
xν/2−1 exp −



h(x) = h ν  2 i ⇒ 0.5 for ν = 2 ( exponential distribution)
2 ν/2 Γ − Γ(ν/2, x/2) 

2 

IHR for ν > 2
100 3 Presentation of Univariate Parametric Distributions



 IMRL for 0 < ν < 2

µ(x) − no closed form ⇒ 2 for ν = 2



 DMRL for ν > 2

We notice that limx→∞ h(x) = 0.5 and limx→∞ µ(x) = 2.

Cosine distribution, ordinary


This distribution has a convex PDF and thus is no good approximation to the bell–shaped PDF of
the normal distribution.
 
1 x−a bπ bπ
f (x) = cos ; a− ≤x≤a+ ; a ∈ R, b > 0
2b b 2 2
  
x−a
S(x) = 0.5 1 − sin
b
1
h(x) =      ⇒ IHR
x−a x−a
b sec − tan
b b
 
bπ x−a
a+ − x − b cos
2 b
µ(x) =   ⇒ DMRL
a−x
1 + sin
b

Cosine distribution, raised


This variant of the cosine distribution is bell–shaped and resembles the PDF of the normal distri-
bution.
  
1 x−a
f (x) = 1 + cos π ; a − b ≤ x ≤ a + b; a ∈ R, b > 0
2b b
  
x−a 1 x−a
S(x) = 0.5 1 − − sin π
b π b
 
x−a
1 + cos π
b
h(x) =    ⇒ IHR
x−a 1 x−a
b 1− − sin π
b π b

x−a 2
   
x−a x−a
0.199339 − + − 0.0506606 cos π
2b b b
µ(x) ≈ b    ⇒ DMRL
x−a 1 x−a
0.5 1 − − sin π
b π b

D HILLON–I distribution
D HILLON (1979) has proposed the following hazard rate
 
h(x) = k λ c xc−1 + (1 − k) β xβ−1 b exp b xβ

which is a linear combination of two components, 0 ≤ k ≤ 1 being the combining linear factor.
λ and b are the scale factors in the first and second component, respectively, while c and β are
3.1 Continuous Distributions 101

shape parameters in the two components, where λ, b, c, β > 0. This model includes several
other distributions, e.g., the G OMPERTZ –M AKEHAM distribution for c = β = 1, the W EIBULL
distribution for k = 1 and the Log–W EIBULL distribution (= extreme value distribution of type
I for the minimum) for k = 0, β = 1, and is capable of representing different courses of the
hazard rate. The D HILLON–I distribution, proposed in D HILLON (1981) results when k = 0 and
thus is less complicated, but still models increasing and bathtub–shaped hazard rates. Introducing
a location parameter a, a ∈ R, the D HILLON–I distribution has:
c−1
x−a c x−a c
       
c x−a
f (x) = exp 1−exp + ; x ≥ a; a ∈ R; b, c > 0
b b b b
x−a c
  
S(x) = exp{1 − exp
b

 c−1  c   DIHR for 0 < c < 1
c x−a x−a
h(x) = exp ⇒
b b b  IHR for c ≥ 1

 IDMRL for 0 < c < 1
µ(x) − no closed form ⇒
 DMRL for c ≥ 1

For 0 < c < 1 the hazard rate hast its maximum at x∗ = a + b [(x − a)/b]1/c with h(x∗ ) =
(c/b) [(1 − c)/c](c−1)/c exp[1 − c)/c].

D HILLON–II distribution
This distribution of D HILLON (1981) is capable of generating decreasing or upside–down
bathtub–shaped hazard rates.
  c (   c+1 )
c+1 x−a x−a
f (x) = ln +1 exp − ln +1 ; x ≥ a; a ∈ R; b, c ≥ 0
x−a+b b b
(   c+1 )
x−a
S(x) = exp − ln +1
b

  c  DHR for c = 0
c+1 x−a
h(x) = ln +1 ⇒
x−a+b b  IDHR for c > 0




 IMRL for c = 0

µ(x) − no closed form ⇒ DIMRL for 0 < c < 2
 ∼


 DMRL for c > 2

For c > 0 the hazard rate has its maximum at x∗ = a + ec − 1 with h(x∗ ) = (c/e)c (c + 1) b.
 

Double W EIBULL distribution


Upon combining the (common) W EIBULL distribution and the reflected W EIBULL distribution
into one distribution we arrive at the double W EIBULL distribution. A special case of the latter
distribution (for c = 2) is the L APLACE distribution.

c x − a c−1 x − a c
 
f (x) = exp −
; x ∈ R; a ∈ R; b, c > 0
2b b b
102 3 Presentation of Univariate Parametric Distributions

a−x c
    
 1 − 0.5 exp − for x ≤ a


b

S(x) =
x−a c
   

 0.5 exp − for x ≥ a


b

a − x c−1 a−x c
      


 c exp −
b b


for x ≤ a

    c 
a − x




 b 2 − exp −
b


h(x) =
x − a c−1 x−x c
      

c exp −


b b


for x ≥ a


   c 

 x − a
b exp −


b

µ(x) − no closed form

The hazard rate is — except for c = 2 — far from being monotone; it is asymmetric around x = a
in any case. The mean residual life function generally decreases, but for c > 3 it is not monotone.

Exponential distribution
Formerly, the exponential distribution was regarded as the prototype of a lifetime distribution.
 
1 x−a
f (x) = exp − ; x ≥ a; a ∈ R, b > 0
b b
 
x−a
S(x) = exp −
b
1
h(x) = ⇒ IHR and DHR
b
µ(x) = b ⇒ IMRL and DMRL

Exponentiated exponential distribution


The name of this distribution is derived from the fact that its CDF is the exponentiated CDF
of the exponential distribution. This distribution also goes by the name generalized exponential
distribution. A special case (for c = 1) is the exponential distribution.

x − a c−1
   
c x−a
f (x) = exp − 1 − exp ; x ≥ a; a ∈ R, b > 0
b b b
  c
x−a
S(x) = 1 − 1 − exp −
b

 DHR for 0 < c ≤ 1
h(x) − no closed form ⇒
 IHR for c ≥ 1

 IMRL for 0 < c ≤ 1
µ(x) − no closed form ⇒
 DMRL for c ≥ 1
3.1 Continuous Distributions 103

F distribution
This distribution, also known as F ISHER distribution, who discovered it in the context of variance
analysis, is the distribution of the ratio of two independently
 distributed χ2 variables. More
X1 ν1
precisely, if X1 ∼ χ2ν1 and X2 ∼ χ2ν2 , then X =  ∼ Fν1 ,ν2 . The parameters ν1 , ν2 are
X2 ν2
called degrees of freedom, but nevertheless they are not restricted to integer values.
 
ν1 + ν2
Γ  ν1 /2
2 ν1 x(ν1 −2)/2
f (x) =  ν   ν 
1 2 ν2  (ν1 +ν2 )/2 ; x ≥ 0; ν1 , ν2 > 0
Γ Γ ν1
2 2 1+ x
ν2

There are no closed formulas for S(x), h(x) and µ(x). As E(X) = ν2 (ν2 − 2) only exists for
ν2 > 2, MRL also exists only for ν2 > 2. The hazard rate is either IDHR or IHR and the mean
residual life function is either DIMRL or DMRL, depending on the ν1 –ν2 –combination.

F R ÉCHET distribution
This distribution is also known as the extreme value distribution of type II for the minimum.
"   #
c a − x −c−1 a − x −c
 
f (x) = exp − ; x ≤ a; a ≤ 0; b, c > 0
b b b
"   #
a − x −c
S(x) = exp −
b
c a − x −c−1
 
h(x) = ⇒ IHR
b b
µ(x) − no closed form ⇒ DMRL

Gamma distribution
For c ∈ N this distribution is called E RLANG distribution.
 
c−1 x−a
(x − a) exp −
b
f (x) = c
; x ≥ a; a ∈ R; b, c > 0
b Γ(c)
 
x−a
γ c,
b
S(x) =
Γ(c)
 
c−1 x−a
(x − a) exp −

b  DHR for 0 < c ≤ 1
h(x) =   ⇒
x−a  IHR for c ≥ 1
bc γ c,
b

 IMRL for 0 < c ≤ 1
µ(x) − no closed form ⇒
 DMRL for c ≥ 1

Generalized exponential geometric distribution


This rather complicated looking distribution has been proposed by S ILVA et al. (2010) to model
different types of aging. We have introduced an additional location parameter a and changed the
original scaling factor b to a scaling parameter 1/b.
104 3 Presentation of Univariate Parametric Distributions

x−a c−1
   
x−a
c (1−p) exp − 1−exp −
b b
f (x) =   c−1 ; x ≥ a; a ∈ R; b, c > 0, p ∈ (0, 1)
x−a
b 1 − p exp −
b

  c
x−a

 1 − exp − b 
S(x) = 1 −    
 x−a 
1 − p exp −
b
x − a c−1
   
x−a
c (1 − p) exp − 1 − exp −
b b
h(x) = c+1  c 
x−a c
    
x−a x−a
b 1 − p exp − − 1 − p exp − 1 − exp −
b b b

µ(x) − no closed form

We have   
c−1
 IDHR for p ∈ , 1 and c > 1,
 c+1 



h(x) = c−1
IHR for p ∈ 0, c+1 and c > 1,



DHR otherwise.

For the MRL we have:



 DIMRL or multiply bended when IDHR or DHR
µ(x) =
 DMRL when IHR.

Generalized gamma distribution


This generalization of the three–parameter gamma distribution by introducing a fourth parameter
d goes back to S TACY (1962). The generalization allows for four types of aging with respect to
the hazard rate.
"   #
d (x − a)c d−1 x−a d
f (c) = exp − ; x ≥ a; a ∈ R; b, c, d > 0
bc d Γ(c) b
"   #
x−a d
γ c,
b
S(x) =
Γ(c)
"  d #
x − a
d (x − a)c d−1 exp −
b
h(x) = "  d #
x−a
bc d γ c,
b
µ(x) − no closed form
3.1 Continuous Distributions 105

The behavior of h(x) and µ(x) is as follows:


• constant HR and constant MRL for c d − 1 = 0 and d = 1,

 c d − 1 = 0 and 0 < d < 1 ⇒ IMRL
• DHR for
 c d − 1 < 0 and 0 < d ≤ 1 ⇒ IMRL,
• DIHR for c d − 1 < 1 and d > 1 ⇒ IDMRL,

 c d − 1 = 0 and d > 1 ⇒ DMRL
• IHR for
 c d − 1 > 0 and d ≥ 1 ⇒ DMRL,
• IDHR for c d − 1 > 0 and 0 < d < 1 ⇒ IMRL, DMRL or DIMRL.

Generalized linear hazard rate distribution


This generalization of the linear hazard rate distribution (see below) includes the linear hazard
rate as the special case for c = 1. When c = 1 and β = 0 we have an exponential distribution.
For other values of c the hazard rate is not linear.
   c−1   
β 2 β 2
f (x) = c (α + β x) 1 − exp − α x + x exp − α x + x ;
2 2
x ≥ 0; α, β ≥ 0, but not α = β = 0; c > 0
   c
β 2
S(x) = 1 − 1 − exp − α x + x
2
  
β
c (α + β x) exp − α x + x2
2
h(x) =     *   −c +
β 2 β 2
1 − exp − α x + x 1 − exp − α x + x −1
2 2
µ(x) − no closed form

With respect to h(x), µ(x) we have:


• 0 < c < 1, β = 0 ⇒ DHR and IDMRL,
• 0 < c < 1, β > 0 ⇒ DIHR and IDMRL,
• c > 0, α > 0 ⇒ IHR and DMRL,

• c = 1, β = 0, α > 0 ⇒ h(x) = α, µ(x) = 1 α.

Generalized logistic distribution


The broadest generalized logistic distribution is that of type IV.

1 exp(−d x b)
f (x) =   ; x ∈ R; b, c, d > 0
B(c, d) b 1 + exp(−x b) c+d


S(x) = 1 − I 1
 (c, d), see Beta distribution
1+exp(−x/b)

d
h(x) − no closed form ⇒ IHR with lim h(x) =
x→∞ b
b
µ(x) − no closed form ⇒ DMRL with lim µ(x) =
x→∞ d
Interchanging c and d gives the type IV generalized logistic distribution for −X. For c = d we
have the type III generalized logistic distribution. For d = 1 we have the type I generalization,
and type II results for d = 1 and −X.
106 3 Presentation of Univariate Parametric Distributions

Generalized L OMAX distribution


This distribution is also known as generalized PARETO distribution of the second kind.
 
−(1+1/c) 
a ≤ x ≤ a − b c for c < 0

1 x−a
f (x) = 1+c ; a ∈ R; b > 0, c ∈ R \ {0}
b b x ≥ a for c > 0 
 
1 x−a
lim f (x) = exp − − exponential distribution
c→0 b b
x − a −1/c
 
S(x) = 1 + c for c 6= 0
b

 −1  IHR for c < 0
1 x−a
h(x) = 1+c ⇒
b b  DHR for c > 0

a−b/c
 R∞
 R
S(u) du S(u) du
 


 x  DMRL for c < 0
x
µ(x) = or ⇒
 S(x) S(x)  IMRL or IDMRL for 0 < c < 1



 does not exist for c ≥ 1

Generalized R AYLEIGH distribution


The additional shape parameter c causes different types of aging compared to the ordinary
R AYLEIGH distribution which comes up for c = 1.
"   #( "   #)c−1
x−a 1 x−a 2 1 x−a 2
f (x) = c 2 exp − 1−exp − ; x ≥ a; a ∈ R; b, c > 0
b 2 b 2 b
( "  #)c
1 x−a 2

S(x) = 1 − exp −
2 b

f (x)  DIHR for 0 < c < 0.5
h(x) = ⇒
S(x)  IHR for c ≥ 0.5

 IDMRL for 0 < c < 0.5
µ(x) − no closed form ⇒
 DMRL for c ≥ 0.5

G OMPERTZ distribution
This distribution has been suggested by G OMPERTZ to smooth the course of mortality rates in
human life tables for higher ages.
 
α
f (x) = α exp(β x) exp [1 − exp(β x)] ; x ≥ 0; α, β > 0
β
 
α
S(x) = exp [1 − exp(β x)]
β
h(x) = α exp(β x) ⇒ IHR
µ(x) − no closed form ⇒ DMRL
3.1 Continuous Distributions 107

G OMPERTZ –M AKEHAM distribution


This distribution has an extra parameter γ allowing for pure random failures whereas the common
G OMPERTZ distribution only models failure by wear and tear.
   
α α
f (x) = γ [1 − exp(β x)] − γ x + α exp(β x) exp [1 − exp(β x)] − γ x ;
β β
x ≥ 0; α, β, γ > 0
 
α
S(x) = exp [1 − exp(β x)] − γ x
β

h(x) = γ + α exp(β x) ⇒ IHR

µ(x) − no closed form ⇒ DMRL

G UMBEL distribution
This is one of the extreme value distributions, namely, type I for the maximum. Because of the
PDF formula it is also known as double exponential distribution.
  
1 x−a x−a
f (x) = exp − − exp − ; x ∈ R; a ∈ R, b > 0
b b b
  
x−a
S(x) = 1 − exp − exp −
b
 
x−a
exp −
b 1
h(x) =      ⇒ IHR with lim h(x) =
x−a x→∞ b
b exp exp − −1
b

µ(x) − no closed form ⇒ DMRL with lim µ(x) = b


x→∞

Half–C AUCHY distribution


Left truncation of the C AUCHY distribution at its median x = a gives the half–C AUCHY distribu-
tion.
( "  #)−1
x−a 2

f (x) = 2 b π 1 + ; x ≥ a; a ∈ R, b > 0
b
 
2 x−a
S(x) = 1 − arctan
π b
2
h(x) = "  2 #    ⇒ IDHR
x−a x−a
b 1+ π − 2 arctan
b b

x∗ − a
 
h(x) has a maximum at x∗ being the solution of b−π (x∗ −a)+2 (x∗ −a) arctan = 0.
b
µ(x) does not exist.
108 3 Presentation of Univariate Parametric Distributions

Half–logistic distribution
Left truncation of the logistic distribution at its mean (= mode = median) x = a gives the half–
logistic distribution.
 
x−a
2 exp
b
f (x) =  ; x ≥ a; a ∈ R, b > 0
x−a 2
 
b 1 + exp
b
2
S(x) =  
x−a
1 + exp
b
1
h(x) =    ⇒ IHR
x−a
b 1 + exp
b

µ(x) − no closed form ⇒ DMRL

Half–normal distribution
Left truncation of the normal distribution at its mean (= mode = median) x = a gives the half–
normal distribution.
r
(x − a)2
 
1 2
f (x) = exp − ; x ≥ a; a ∈ R, b > 0
b π 2 b2
 
a−x
S(x) = 2 Φ
b
(x − a)2
 
exp −
1 2 b2
h(x) = √   ⇒ IHR
b 2π a−x
Φ
b

µ(x) − no closed form ⇒ DMRL

H JORTH distribution
This distribution is capable to show different types of aging.

x2
 
θ + δ x (1 + β x)
f (x) = exp −δ ; x ≥ 0; β, δ, θ > 0
(1 + β x)θ/β+1 2

x2
 
exp −δ
2
S(x) =
(1 + β x)θ/β
For β = 0 we have to take the limits of f (x) and S(x) leading to
 
δ 2
f (x) = exp −θ x − x (θ + δ x),
2
 
δ 2
S(x) = exp −θ x − x .
2
3.1 Continuous Distributions 109



 IHR for θ = 0


DHR for δ = 0




θ 
h(x) = δ x + ⇒ DIHR for 0 < δ < θ β
1+βx 





 IHR for δ ≥ θ β


 constant for β = δ = 0



 does not exist for δ = θ = 0


DMRL for θ = 0






 IMRL for δ = 0

µ(x) − no closed form ⇒ .
 IDMRL or DMRL for 0 < δ < θ β








 DMRL for δ ≥ θ β


constant for β = δ = 0

In case of DIHR the hazard rate has its minimum at x∗ = θ β/δ − 1 β with h(x∗ ) =
p 
q  
2 θ δ (θ β) − δ β.

Hyperbolic secant distribution



For a = 0 and b = 2 π this distribution shares many properties with the standardized (= reduced)
normal distribution.
 
1 x−a
f (x) = sech ; x ≥ R <, a ∈ R, b > 0
bπ b
  
2 x−a
S(x) = 1 − arctan exp
π b
 
x−a
sech
b 1
h(x) =     ⇒ IHR with lim h(x) =
x−a x→∞ b
b π − 2 arctan exp
b

µ(x) − no closed form ⇒ DMRL with lim µ(x) = b


x→∞

Inverse G AUSSIAN distribution


This distribution is also known as WALD distribution.
r
b (x − a)2
 
b
f (x) = exp − ; x > 0; a, b > 0
2 π x3 2 a2 x
"r #   " r  #
b  x 2b b x 
S(x) = Φ 1− − exp Φ − +1
x a a x a

h(x) − no closed form ⇒ IDHR

µ(x) − no closed form ⇒ DIMRL


110 3 Presentation of Univariate Parametric Distributions

Inverse R AYLEIGH distribution


 
2b b
f (x) = exp − ; x ≥ a, a ∈ R, b > 0
(x − a)3 (x − a)3
 
b
S(x) = 1 − exp −
(x − a)3
2b
h(x) =     ⇒ IDHR
3
b
(x − a) exp −1
(x − a)2
 √ √  
µ(x) = u exp − b u2 − 1 + b π erf b u with u = x − a; ⇒ DIMRL or DMRL
  

h(x) has a maximum at x∗ which is the solution of 2 exp b (x∗ − a)2 2 b − 3 (x∗ − a)2 +
   

6 (x∗ − a)2 = 0.

Inverse W EIBULL distribution


This distribution is also known as extreme value distribution of type II for the maximum.
"   #
c x − a −c−1 x − a −c
 
f (x) = exp − ; x ≥ a; a ∈ R; b, c > 0
b b b
"   #
x − a −c
S(x) = 1 − exp −
b
"   #
x − a −c−1 x − a −c
 
c exp −
b b
h(x) = ( "  −c #) ⇒ IDHR
x−a
b 1 − exp −
b

 no closed form ⇒ DIMRL
µ(x) −
 does not exist for 0 < c ≤ 1

 c . n h  c io
h(x) has a maximum at x∗ which is the solution of b
x∗ −a 1 − exp − x∗b−a = c+1
c .

L APLACE distribution
This distribution is also known as double, bilateral or two–tailed exponential distribution.
 
1 |x − a|
f (x) = exp − ; x ∈ R; a ∈ R, b > 0
2b b
   
1 |x − a|
S(x) = 1 − 1 + sign(x − a) 1 − exp −
2 b
 
|x − a|
exp −
b
h(x) =     
|x − a|
b 2 − 1 + sign(x − a) 1 − exp −
b
µ(x) − no closed form

h(x) is increasing over (−∞, a) and constant with h(x) = 1 b over [a, ∞). µ(x) is decreasing
over (∞, a) and constant with µ(x) = b over [a, ∞).
3.1 Continuous Distributions 111

Linear hazard rate distribution


This distribution includes the exponential distribution for α 6= 0, β = 0 and the R AYLEIGH
distribution for α = 0, β 6= 0. For α = β = 0 we have a distribution with everlasting life.
 
β 2
f (x) = (α + β x) exp −α x − x ; x ≥ 0; α, β ≥ 0
2
 
β
S(x) = exp −α x − x2
2
h(x) = α + β x ⇒ IHR
 
 constant with 1 α for α 6= 0, β = 0
µ(x) − no closed form ⇒
 DMRL with lim µ(x) = 0 for α = 0, β 6= 0
x→∞

Logistic distribution
The logistic distribution shares many properties with the normal distribution.
 
x−a
exp
1 b
f (x) =  ; x ∈ R; a ∈ R, b > 0
b x−a 2
 
1 + exp
b

x − a −1
  
S(x) = 1 + exp
b

x − a −1
  
1 1
h(x) = 1 + exp − ⇒ IHR with lim h(x) =
b b x→∞ b

µ(x) − no closed form ⇒ DMRL with lim µ(x) = b


x→∞

Log–gamma distribution
We present the version of the log–gamma distribution given in J OHNSON /KOTZ /BALAKRISHNAN
(1995, pp. 89ff.),3 which — for c 6= 1 — is a generalization of the extreme value distribution of
type I for the minimum (= log–W EIBULL distribution).
  
1 x−a x−a
f (x) = exp c − exp ; x ∈ R; a ∈ R, b, c > 0
b Γ(c) b b
  
x−a
γ c, exp
b
S(x) =
Γ(c)

h(x) − no closed form ⇒ IHR

µ(x) − no closed form ⇒ DMRL


3
J OHNSON /KOTZ / BALAKRISHNAN (1994, p. 383) define another log–gamma function which is the distribution
of a variate X when − ln X has a gamma distribution with a = 0 and b, c. The PDF of this version is
1 (− ln x)c−1
f (x) = c ; 0 < x < 1.
b Γ(c) x1+1/b
112 3 Presentation of Univariate Parametric Distributions

Log–L APLACE distribution


If Y has a L APLACE distribution, then X = exp(Y ) is said to have a log–L APLACE distribution.
We look at the version with a scale parameter b and two shape parameters c and d. For c = d the
PDF is symmetric in the sense that the variate and its reciprocal have the same distribution.
x
  c−1 

 for 0 ≤ x ≤ b 

1 cd

 b 

f (x) = × ; b, c, d > 0
b c+d   b d+1
  
for x ≥ b
 

 
x

d  x c
1 − for 0 ≤ x ≤ b


c+d b


S(x) =  d
 c b
for x ≥ b


c+d x

c d  x c−1





 b b   for 0 ≤ x ≤ b

 x c

 c + d − d
 b


h(x) = c d b d+1


 b x
for x ≥ b


  d
b


c



x
µ(x) − complicated closed form
HR and MRL have rather different courses depending on the special c–d–value combination, e.g.,
for c > 1 and d > 1 we have IHR over [0, b) and DHR over [b, ∞) with DIMRL or IMRL over
[0, ∞). MRL does not exist for d ≤ 1.
Log–logistic distribution
In economics this distribution is known as F ISK distribution where it describes the distribution of
income.
c x − a c−1
 

b b
f (x) =   2 ; x ≥ a; a ∈ R, b, c > 0
x−a c

1+
b
1
S(x) =
x−a c
 
1+
b
c x − a c−1
 

b b  DHR for 0 < c ≤ 1
h(x) =  c ⇒
x−a  IDHR for c > 1
1+
b

 does not exist for 0 < c ≤ 1
µ(x) − no closed form ⇒
 IMRL or DIMRL for c > 1

In case of IDHR the hazard rate has its maximum at x∗ = a + b (c − 1)1/c with h(x∗ ) =
(1/b) (c − 1)(c−1)/c .
3.1 Continuous Distributions 113

Log–normal with lower threshold


The log–normal distribution with lower threshold is more popular than that with an upper thresh-
old. A variate X is said to be log–normally distributed with lower threshold when there is a real
number a such that X = ln(X − a) is normally distributed. In economics it is known as G ALTON
or G IBRAT distribution.4 We have E[ln(X − a)] = α and Var[ln(X − a)] = β 2 .

[ln(x − a) − α]2
 
1
f (x) = √ exp − ; x ≥ a; a ∈ R, α ∈ R, β > 0
β (x − a) 2 π 2 β2
 
ln(x − a) − α
S(x) = Φ −
β
 
ln(x − a) − α
φ −
β
h(x) =   ⇒ IDHR
ln(x − a) − α
β (x − a) Φ −
β

µ(x) − no closed form ⇒ DIMRL

h(x) has its maximum at x∗ = a + exp(α + β u) where u is the solution of φ(u) = u [1 −


Φ(u)] (1 + β)/2.
Log–normal with upper threshold
This distribution is defined on (−∞, a], a being the upper threshold.

[ln(a − x) − α]2
 
1
f (x) = √ exp − ; x < a; a ∈ R, α ∈ R, β > 0
β (a − x) 2 π 2 β2
 
ln(a − x) − α
S(x) = Φ −
β
 
ln(a − x) − α
φ −
β
h(x) =   ⇒ IHR
ln(a − x) − α
β (a − x) Φ −
β

µ(x) − no closed form ⇒ DMRL

Log–W EIBULL distribution


When Y is W EIBULL distribute then X = ln X has a log–W EIBULL distribution, also known as
extreme value distribution of type I for the minimum.
  
1 x−a x−a
f (x) = exp − exp ; x ∈ R; a ∈ R, b > 0
b b b
  
x−a
S(x) = exp − exp
b
 
1 x−a
h(x) = exp ⇒ IHR
b b

µ(x) − no closed form ⇒ DMRL


4
The hazard rate of this log–normal distribution is discussed in detail by S WEET (1990).
114 3 Presentation of Univariate Parametric Distributions

L OMAX distribution
This distribution is also known as PARETO distribution of the second kind.
x − a −c−1
 
c
f (x) = 1+ ; x ≥ a; a ∈ R; b, c > 0
b b
x − a −c
 
S(x) = 1+
b
c
h(x) = ⇒ DHR
x−a−b

 does not exist for 0 < c ≤ 1

µ(x) = x−a−b

 for c > 1 ⇒ IMRL
c−1

M AXWELL –B OLTZMANN distribution


This distribution is nothing but the χ–distribution with 3 degrees of freedom, see also the χ–
distribution above.
"  #
1 2 x−a 2 1 x−a 2
r   
f (x) = exp − ; x ≥ a; a ∈ R, b > 0
b π b 2 b
"  #
x−a 2
S(x) = 1 − Fχ23 ; Fχ23 (.) − CDF of the χ2 –distribution with ν = 3
b
h(x) − no closed form ⇒ IHR
µ(x) − no closed form ⇒ DMRL

M UTH distribution
       
1 x−a 1 x−a x−a 1
f (x) = exp c − c exp − exp c +c + ;
b b c b b c
x ≥ a; a ∈ R, b > 0, 0 < c ≤ 1
   
1 x−a x−a 1
S(x) = exp − exp c +c +
c b b c
   
1 x−a 1−c
h(x) = exp c − c ⇒ IHR with h(a) = ∀c
b b b
µ(x) − no closed form ⇒ DMRL

Normal distribution
The normal or G AUSS distribution is of utmost importance in statistics. We have parameterized
this distribution
p by a which is the mean µ = E(X) and by b which is the standard deviation
σ = Var(X).
(x − a)2
   
1 1 x−a
f (x) = √ exp − = φ ; x ∈ R, a ∈ R, b > 0
b 2π 2 b2 b b
   
x−a a−x
S(x) = 1 − Φ = Φ
b b
 
x−a
φ
1 b
h(x) =   ⇒ IHR
b a − x
Φ
b
µ(x) = a + b2 h(x) − x ⇒ DMRL
3.1 Continuous Distributions 115

Parabolic U–shaped distribution


x−a 2
 
3
f (x) = ; a − b ≤ x ≤ a + b; a ∈ R, b > 0
2b b
" #
x−a 3

1
S(x) = 1−
2 b

3 (a − x)2
h(x) = ⇒ DIHR with minimum at x∗ = a and h(x∗ ) = 0
b3 + (a − x)3
0.75 b4 − b3 x + 0.25 x4
µ(x) = ⇒ IDMRL with maximum at x∗ ≈ a − 0.596072 b
b3 − x3

Parabolic inverted U–shaped distribution


"  #
x−a 2

3
f (x) = 1− ; a − b ≤ x ≤ a + b; a ∈ R, b > 0
4b b
"  #
x−a 3

1 1 x−a
S(x) = − 3 −
2 4 b b

h(x) − complicated form ⇒ IHR

x−a 2 x−a 4
   
0.75 − 2 − 0.25
b b
µ(x) = b      2 ⇒ DMRL
x−a x−a
2+ −1
b b

PARETO distribution of the first kind


In economics this distribution serves as model for the distribution of income.
c x − a −c−1
 
f (x) = ; x ≥ a + b; a ∈ R; b, c > 0
b b
x − a −c
 
S(x) =
b
c
h(x) = ⇒ DHR
x−a

 does not exist for c ≤ 1

µ(x) = b

x−a

 ⇒ IMRL
c−1

b

Power function distribution


This distribution gives the uniform distribution for c = 1 and the right–angled negatively skew
triangular distribution for c = 2 as special cases.
c x − a c−1
 
f (x) = ; a ≤ x ≤ a + b; a ∈ R; b, c > 0
b b
x−a c
 
S(x) = 1 −
b
116 3 Presentation of Univariate Parametric Distributions
 

c  1   DIHR for 0 < c < 1
h(x) = 1 +   ⇒
x−a c

a−x    IHR for c ≥ 1
−1
b
c (x − a − b)
− (x − a)
x−a c
  
−1  IDMRL for 0 < c < 0.815

b ∼
µ(x) = ⇒
c+1  DMRL for c > 0.815

The DIHR hazard rate its minimum at x∗ = a + b (1 − c)1/c with h(x∗ ) = (1 b) (1 − c)(c−1)/c .


R AYLEIGH distribution
This distribution is a special case of the χ–distribution when ν = 2, a special case of the
W EIBULL distribution when c = 2 and a special case of the generalized gamma distribution
when c = 1 and d = 2. It is also a linear hazard rate distribution.
"  #
1 x−a 2

x−a
f (x) = exp − ; x ≥ a; a ∈ R, b > 0
b 2 b
"  #
1 x−a 2

S(x) = exp −
2 b
x−a
h(x) = ⇒ IHR
b
µ(x) − no closed form ⇒ DMRL

Reflected exponential distribution


Upon reflecting the exponential distribution with f (x) = (1/b) exp[−(x − a)/b] around x = a
we arrive at the reflected exponential distribution whereby the lower threshold turns into an upper
threshold and the constant hazard rate property gets lost.
 
1 x−a
f (x) = exp ; x ≤ a; a ∈ R, b > 0
b b
 
x−a
S(x) = 1 − exp
b
1
h(x) =     ⇒ IHR
a−x
b exp −1
b
µ(x) − no closed form ⇒ DMRL

Reflected W EIBULL distribution


This distribution is also known as extreme value distribution of type III for the maximum and
results from the W EIBULL distribution by reflection around x = a whereby a turns into an upper
threshold. For c = 1 the reflected W EIBULL distribution is equal to the reflected exponential
distribution.
3.1 Continuous Distributions 117

a − x c−1 a−x c
    
c
f (x) = exp − ; x ≤ a; a ∈ R, b > 0
bb b
a−x c
   
S(x) = 1 − exp −
b
a − x c−1
 
c
b
h(x) =  ⇒ IHR
x−a c
   
b exp −1
b
µ(x) − no closed form ⇒ DMRL

Semi–elliptical distribution
q 
This distribution is also known as W IGNER’s semi–circle distribution. For b = 2 π ≈ 0.7979
the graph of f (x) is a semi–circle, otherwise a semi–ellipse.
s
x−a 2


2
f (x) = 1− ; a − b ≤ x ≤ a + b; a ∈ R, b > 0
bπ b
 s 
x−a 2
   
1 1 x − a x−a 
S(x) = − 1− + arcsin
2 π b b b
s
x−a 2
 
4 1−
b
h(x) =   s  ⇒ IHR
 2   
 x−a x−a x−a 
b π−2  1− + arcsin
 b b b 

µ(x) − no closed form ⇒ DMRL

t distribution
This distribution is also known as S TUDENT’s distribution, the pseudonym of W. S. G OSSET, its
discoverer.  
ν+1
Γ −(ν+1)/2
x2

ν
f (x) = √ ν  1 + ; x ∈ R; ν > 0
πνΓ ν
2
There are no closed formulas for S(x), h(x) and µ(x). The latter does not exist for ν ≤ 1.
for smaller ν we have IDHR and DIMRL. With ν → ∞ the t distribution goes to a normal
distribution, so then we have IHR and DMRL.

T EISSIER distribution
This distribution, suggested by T EISSIER (1934), is characterized by an exponentially declining
mean residual life function.
118 3 Presentation of Univariate Parametric Distributions

      
1 x−a x−a x−a
f (x) = exp − 1 exp 1 + − exp ; x ≥ a; a ∈ R, b > 0
b b b b
  
x−a x−a
S(x) = exp 1 + − exp
b b
   
1 x−a
h(x) = exp − 1 ⇒ IHR
b b
 
1 x−a
µ(x) = exp − ⇒ DMRL
b b

Triangular distribution, continuous


The triangular distribution includes the following special cases:
• c = 0.5 ⇒ symmetric triangular distribution,
• c = 0 ⇒ right–angled and positively skewed triangular distribution,
• c = 1 ⇒ right–angled and negatively skewed triangular distribution.
A triangular distribution results from folding (= summing) two independently and uniformly dis-
tributed variates. The triangle is symmetric when both uniform distributions are identical, other-
wise it is asymmetric.
2 (x − a)
 
 for a ≤ x ≤ a + c b 
c b2

 

f (x) = a ∈ R, b > 0, 0 < c < 1
 2 (a + b − x) 

 for a + c b ≤ x ≤ a + b 

(1 − c) b2

(x − a)2
− for a ≤ x ≤ a + c b

 1
c b2


S(x) =
 (a + b − x)2
for a + c b ≤ x ≤ a + b


(1 − c) b2

 
2 (x − a)

 for a ≤ x ≤ a + c b 

c b2 − (x − a)2
 
h(x) = ⇒ IHR
 2 

 for a + c b ≤ x ≤ a + b 

 a + b −2 x
 c + c − 3 c y + y3

 b for 0 ≤ y ≤ c 
 
3 (c − y 2)
 x−a
µ(y) = with y = ⇒ DMRL
 1 − y  b
 b for c ≤ y ≤ 1
 

3

Uniform distribution, continuous


The uniform or rectangular distribution is the simplest continuous distribution having a constant
PDF.
1
f (x) = ; a ≤ x ≤ a + b; a ∈ R, b > 0
b
x−a
S(x) = 1 −
b
1
h)x) = ⇒ IHR
a+b−x
b−x−a
µ(x) = ⇒ DMRL
2
3.1 Continuous Distributions 119

V–shaped distribution
We present the symmetric V–shaped distribution which may be regarded as a linear approximation
to the U–shaped parabolic distribution.
 
2 (2 a + b − 2 x) 

 for a ≤ x ≤ a + b 2 

b2
 
f (x) = a ∈ R, b > 0
 2 (2 x − 2 a − b)  
for a + b 2 ≤ x ≤ a + b 

 
b2
2 (x − a) (a + b − x)


 1− for a ≤ x ≤ a + b 2


b 2
S(x) =
2
 0.5 − (2 x − 2 a − b) for a + b 2 ≤ x ≤ a + b

 
b 2

2 (2 a + b − 2 x)
  
 b2 + 2 (a − x) (a + b − x) for a ≤ x ≤ a + b 2 

 

h(x) = ⇒ DIHR
2a + b − 2x 
for a + b 2 ≤ x ≤ a + b

 

 
(a − x) (a + b − x)

3 − 2 y [3 + y (2 y − 3)]
 
 b for 0 ≤ y ≤ 0.5 
6 [1 + 2 y (y − 1)]
 
  x−a
µ(y) = 2
with y = ⇒ DMRL
 (y − 1) (y + 0.5)  b
 b
 for 0.5 ≤ y ≤ 1 

3 y (1 − y)

The hazard rate has its minimum at x∗ = a + b 2 with h(x∗ ) = 0.




W EIBULL distribution
This distribution is also known as extreme value distribution of type III for the minimum.

c x − a c−1 x−a c
     
f (x) = exp − ; x ≥ a; a ∈ R; b, c > 0
b b b

x−a c
   
S(x) = exp −
b


c x−a
c−1  DHR for 0 < c ≤ 1

h(x) = ⇒
b b  IHR for c ≥ 1

 
b

1 c
  IMRL for 0 < c ≤ 1 
  x−a
µ(y) = exp(y c ) γ ,y ⇒ with y =
c c  DMRL for c ≥ 1
 
 b

We conclude this section on continuous distributions by showing a typical output of the pro-
gramm ContDist. After choosing one of the 62 distributions implemented in that program — here
the D HILLON–II distribution — the program shows — as a reminder — the PDF–formula of
this distribution. Then the user is asked to input a value for each of the pertaining parameters.
The program checks these values for admissibility. The chosen parameter values are displayed
together with the graphs of f (x), S(x), h(x) and µ(x).
120 3 Presentation of Univariate Parametric Distributions

Figure 3/1: PDF-formula display of the D HILLON–II distribution by the program ContDist

Figure 3/2: Display of the functions of a D HILLON–II distribution by the program ContDist

3.2 Discrete Distributions5


The most basic representative of a discrete distribution is its PMF
Pi = Pr(X = i); i = 0, 1, 2 ... or i = 1, 2, 3, ...
which can always be given in explicit and closed form. The survival function
X
Si = Pr(X ≥ i) = Pj
j≥i
5
Suggested reading for this section: L AI (2013), J OHNSON /KOTZ /K EMP (1992), PADGETT /S PURRIER (1985),
R INNE (2009), S ALVIA /B OLLINGER (1982), S HAKED et al. (1995), X EKALAKI (1983a, b). The interactive
program DiscDist, which is written in MATLAB and which is included in the accompanying file ‘Distribu-
tions.zip’, displays for all distributions presented here a graph of the functions Pi , Si , hi and — if existing —
Li for any set of parameter values.
3.2 Discrete Distributions 121

seldom exists in closed form and must be found numerically by summing the Pi ’s. Consequently,
the hazard rate
Pi
hi =
Si

mostly has no closed form, too. This statement also holds with respect to the mean residual life
function
Li = E(X − i|X ≥ i)

which may be calculated as


P P
Sj Pj
j>i j≥i
Li = = −i
Si Si

when the support is finite. For an infinite support i = imin , imin +1, ..., ∞ we avoid the evaluation
P∞
of the sum Sj with an infinite number of summands by remembering that — for existing
j=i+1 P
E(X) — we have Limin = Sj = E(X) − imin as Simin = 1. Thus, we can evaluate Li for
j>imin
i > imin as
 
h i i
X 
Li = E(X) − imin − Sj Si ; i = imin + 1, imin + 2, ...
 
j=imin +1

Together with Pi and — if existing in closed form — Si , hi , Li we will give the ratio

Pi+1
qi =
Pi

which comes up in the recursion formula Pi+1 = qi Pi . The quantity

Pi+1 Pi+2
∆ηi = − = qi − qi+1 ,
Pi Pi+1

which has been defined in (2/10b), will be given, too, as it serves in identifying the type of
monotonicity of the hazard rate, i.e.:

∆ηi < 0 ⇒ DHR, ∆ηi > 0 ⇒ IHR.

Many univariate discrete distributions can be explained by the so–called urn model. An urn (=
population) contains either a finite number N or an infinite number of balls from which either a
finite number M or a fraction P is red. The color ’red’ stands for any attribute. Sampling from
this urn may be either with or without replacement of each ball drawn before drawing the next
ball. Then the PMF gives the probability of having a certain number of red balls in the sample
or of the number of balls to be drawn until the first, the second or so on red ball is found in the
sample.
There is a caveat for the numerical evaluation of discrete distributions as severe rounding errors
may distort the result for extreme values of the parameters or of the variable of the distribution.
122 3 Presentation of Univariate Parametric Distributions

Binomial distribution
The binomial PMF gives the probability of having i red balls in a sample of size n, drawn with
replacement from an urn with fraction P of red balls.
 
n
Pi = P i (1 − P )n−i ; i = 0, 1, ..., n; n ≥ 1, 0 < P < 1
i
n−i P
qi =
i+1 1−P
n+1
∆ηi = > 0 ∀ i ⇒ IHR
(i + 1) (i + 2)
Li − no closed form ⇒ DMRL

Binomial distribution, positive


The positive binomial distribution is a binomial distribution truncated on the left–hand side at
i = 0.
  i
n P (1 − P )n−i
Pi = ; i = 1, 2, ..., n; n ≥ 1, 0 < P < 1
i 1 − (1 − P )n
qi , ∆ηi − as for the ordinary binomial distribution above
We have IHR and DMRL. For i = 1, 2, ..., n the hazard rates of the common and the positive
binomial distributions are identical.
Geometric distribution
The geometric distribution is a ’waiting time distribution’ in the sense that its PMF gives the
probability of drawing i + 1 balls (with replacement from a population having fraction P of
red balls) until the first red ball is drawn, i.e., i is the excess number of sampled balls until the
happening occurs. It is the discrete analogue to the exponential distribution, and it is a special
case (with m = 1) of the negative binomial distribution.
Pi = P (1 − P )i ; i = 0, 1, 2, ...; 0 < P < 1

Si = (1 − P )i

qi = (1 − P )

∆ηi = 0

hi = P ⇒ IHR and DHR


1−P
Li = ⇒ IMRL and DMRL
P

Geometric distribution, positive


The positive geometric distribution results from truncating the ordinary geometric distribution on
the left–hand side at i = 0. The resulting distribution gives the probability of the total number of
balls to be drawn until the first red ball. It is nothing but a shifted ordinary geometric distribution,
shifted one step to the right.
P (1 − P )i
Pi = = P (1 − P )i−1 ; i = 1, 2, 3, ...; 0 < P < 1
1−P
Si = (1 − P )i−1
3.2 Discrete Distributions 123

qi = 1 − P

∆ηi = 0

hi = P ⇒ IHR and DHR


1
Li = ⇒ IMRL and DMRL
P

Geometric distribution, zero–inflated


This modification of the ordinary geometric distribution has two parameters:

1. λ which corresponds to 1 − P of the ordinary geometric distribution,

2. α which is responsible for the inflation of Pr(X = 0).


P0 = 1 − α λ 
0 < λ < 1, 0 < α < 1
Pi = α (1 − λ) λi ; i = 1, 2, 3, ... 

S0 = 1
Si = α λi ; i = 1, 2, 3, ...

h0 = 1 − α λ 
⇒ DHR
hi = 1 − λ; i = 1, 2, 3, ... 
αλ 
L0 = 
1−λ

⇒ IMRL
λ
Li = ; i = 1, 2, 3, ...


1−λ

Hypergeometric distribution
The hypergeometric PMF gives the probability of having i red balls in a sample of size n, drawn
without replacement from a population of size N containing M red balls.
  
M N −M 
i n−i  max(0, M + n − N ) ≤ i ≤ min(n, M )
Pi =   ;
N  n, N, M ∈ N+ ; n < N, M < N
n

For M = 0 or M = N we would have a degenerate distribution with P0 = 1 or Pn = 1,


respectively. We would also have a degenerate distribution for n = N with PM = 1.

(M − i) (n − i)
qi =
(i + 1) (N − M + i + 1)
∆ηi > 0 ⇒ IHR

Li − no closed form ⇒ DMRL


124 3 Presentation of Univariate Parametric Distributions

Hypergeometric distribution, positive


The positive hypergeometric distribution results from the ordinary hypergeometric distribution
by truncation on the left–hand side at i = 0. For the truncation to be possible we must have
n ≤ N − M.
  
M N −M 
i n−i  1 ≤ i ≤ min(n, M )
Pi =    ;
N N − M  n, N, M ∈ N+ ; n < N, M < N, n ≤ N − M

n n

qi , ∆ηi are as with the ordinary hypergeometric distribution. We have IHR and DMRL. For i =
1, 2, ..., min(n, M ) the hazard rates of the ordinary and the positive hypergeometric distributions
are identical.

Logarithmic distribution
This distribution, also known as logarithmic series distribution, is derived from the M AC L AU -
RIN series expansion

P2 P3
− ln(1 − P ) = P + + + ....
2 3
It is the limit as m → 0 of the zero–truncated negative binomial distribution.
aPi 1
Pi = ; i = 1, 2, 3, ...; 0 < P < 1, a = −
i ln(1 − P )
P
E(X) = a
1−P
i
qi = P
i+1
P
∆ηi = − < 0 ⇒ DHR
(i + 1) (i + 2)

Li − no closed form ⇒ IMRL

Logarithmic distribution, right–truncated


This truncated logarithmic distribution results from omitting all realizations of the ordinary log-
arithmic distributions greater than r, r ≥ 2. For r = 1 we would have a degenerate distribution
with P1 = 1 and Pi = 0 ∀ i > 1.
Pi 1
Pi = ; i = 1, 2, ..., r; 0 < P < 1
i P Pj
i

j=1 j

 IHR for P large and r small
hi ⇒
 DIHR otherwise

 DMRL for P large and r small
Li ⇒
 IDMRL otherwise
3.2 Discrete Distributions 125

Matching distribution
There is a population of size N ∈ N+ and the entities of this population are numbered 1, 2, ..., N.
In the classical matching model the entities are arranged in a random order. Let X be the number
of entities for which their position in the random order is the same as the number assigned to
them. N −i
1 X (−1)j
Pi = ; i = 0, 1, ..., N ; N ∈ N+ ,
i! j!
j=0

where PN = 1 N ! and PN −1 = 0. hi and Li are not monotone.
Negative binomial distribution
We look at successive random trials, each having a constant probability P of success (= drawing
of a red ball). The number of extra trials to perform in order to observe a given number m of
successes has a negative binomial distribution. For integer m it is called PASCAL distribution
and for m = 1 we have the geometric distribution.
 
m+i−1
Pi = P m (1 − P )i ; i = 0, 1, 2, ...; 0 < P < 1; m = 1, 2, ...
i
More generally, m may be any positive real number. Then we write:
Γ(m + i)
Pi = P m (1 − P )i ; i = 0, 1, 2, ...; 0 < P < 1; m > 0.
Γ(i + 1) Γ(m)
As Γ(z + 1) = z! for integer z the latter version of the negative binomial distribution is more
general than the first one.
1−P
E(X) = m
P
m+i
qi = (1 − P )
1+1 
 < 0 for
 0 < m < 1 ⇒ DHR with IMRL
m−1 
∆ηi = (1 − P ) = 0 for m = 1 ⇒ hi = P with Li = 1/P
(i + 1) (1 + 2) 

 > 0 for m > 1 ⇒ IHR with DMRL

Negative hypergeometric distribution


The model of the negative binomial distribution above is modified in the following sense:
1. The drawing of balls is without replacement from a finite population of size N having
M, 1 ≤ M < N, red balls.
2. The variate is the total number of trials until the occurrence of the m–th success (= red
ball).
  
i−1 N −i
m−1 M −m
Pi =   ; m ≤ i ≤ N − M + m; m, N, M ∈ N+ ; M < N, m ≤ M
N
M
N +1
E(X) = m
M +1
i (N − M + m − i)
qi =
(i − m + 1) (N − i)
∆ηi > 0 ⇒ IHR
Li − no closed form ⇒ DMRL
126 3 Presentation of Univariate Parametric Distributions

Occupancy distribution
We have N distinct objects, e.g., balls, and m distinct boxes or cells. Now consider the placement
of these objects into the m boxes. The number of ways to do this clearly mN . Each of these ways
is considered equiprobable. We are interested in the distribution of X, the number of empty boxes
in a placement.
  m−i
i+j N
  
m X m−i
Pi = (−1)j 1− ; i = 0, 1, ..., m − 1; m, N ∈ N+
i j m
j=0

We have:

• IHR as ∆ηi > 0 and

• DMRL.

P OISSON distribution
This distribution is for the number of occurrences of an event in an interval of given length when
the intensity of event–occurrence in this interval is λ.

λi
Pi = exp(−λ); i = 0, 1, 2, ...; λ > 0
i!
E(X) = λ
λ
qi =
i+1
λ
∆ηi = > 0 ⇒ IHR
(i + 1) (i + 2)

Li − no closed form ⇒ DMRL

P OISSON distribution, positive


This distribution results from the ordinary P OISSON distribution by truncation of the realization
i = 0.

λi
Pi = exp(−λ); i = 1, 2, 3, ...; λ > 0
i! (eλ − 1)
λ
E(X) =
1 − e−λ
λ
qi =
i+1
λ
∆ηi = > 0 ⇒ IHR
(i + 1) (i + 2)

Li − no closed form ⇒ DMRL


3.2 Discrete Distributions 127

P ÓLYA distribution
This distribution gives the probabilities of the number of successes (= red balls) in a rather general
urn model. An urn contains N balls, M being red. A sample of size n is to be drawn from this
urn. After each ball drawn this ball is given back to the urn together with K balls of the color just
drawn, K may be any integer ..., −1, −1, 0, 1, 2, ... . K < 0 means that a number |K| of balls
of the color just drawn is eliminated from the urn, but the ball just drawn is laid back anyway.
Three special values of K lead to special distributions:

• K = −1 gives the hypergeometric distribution as effectively no ball is replaced.


• K = 0 gives the binomial distribution with P = M/N.
• K = N/2, when M = N/2 is the discrete uniform distribution.

i
Q n−i
Q
  [M + (j − 1) K] [N − M + (j − 1) K]
n j=1 j=1
Pi = n ;
i Q
[N + (j − 1) K]
j=1
 N −M   M
i = 0, 1, ..., n for K ≥ 0; max 0, n+ ≤ i ≤ min n, − for K < 0
K K
n, N, M ∈ N+ ; M < N ; K = ..., −2, −1, 0, 1, 2, ...; N + K (n − 1) > 0
n−i M + iK
qi =
i + 1 (i + 1) [N − M + (n − i + 1) K]
∆ηi > 0 ⇒ IHR

Li − no closed form ⇒ DMRL

Runs distribution
There are N0 balls labeled 0 and N1 balls labeled 1, arranged in random order. Let r0j be the
number of runs of j consecutive 0’s and r1j that of j consecutive 1’s. Then we have
X X
j r0j = N0 and j r1j = N1 .
j j

The distribution of the total number of runs of 1’s, X = R1 , is


  
N0 N1
i − 1 N1 − i
Pr(X = i) = Pi =   ; i = 1, 2, ..., min(N0 + 1, N1 ); N0 , N1 ∈ N+
N0 + N1
N1 − 1
i = 1 means that all the 1’s are standing together with no 0’s in between.
N1 (N0 + 1)
E(X) =
N0 + N1
(N0 − i + 1) (N1 − i)
qi =
i (i + 1)
∆ηi > 0 ⇒ IHR

Li − no closed form ⇒ DMRL


128 3 Presentation of Univariate Parametric Distributions

S ALVIA –B OLLINGER’s DHR distribution This and the following distribution of S ALVIA /B OL -
LINGER (1982) have their origin in looking for a rather simple form for the hazard rate, namely
some modification of a harmonic series.
c
hi = ; i = 0, 1, 2, ...; 0 < c < 1
i+1
P0 = h0 = c
i−1
c Y j+1−c
Pi = ; i = 1, 2, ...
i+1 j+1
j=0
S0 = 1
i−1
Y j+1−c
Si = ; i = 1, 2, ...
j+1
j=0
i+1−c
qi =
i+2
c+1
∆ηi = − < 0 ⇒ DHR with lim hi = 0
(i + 2) (i + 3) i→∞

Li − no closed form ⇒ IMRL

S ALVIA –B OLLINGER’s IHR distribution


c
hi = 1 − ; i = 0, 1, 2, ...; 0 < c < 1
i+1
(i + 1 − c) ci
Pi =
(i + 1)!
ci
Si =
i!
c (i + 2 − c)
qi =
(i + 2) (i + 1 − c)
∆ηi > 0 ⇒ IHR with lim hi = 1
i→∞

Li − no closed form ⇒ DMRL with lim Li = 1


i→∞

S ALVIA –B OLLINGER’s generalized DHR distribution


This and the following generalization of S ALVIA –B OLLINGER’s distributions by PADGETT /
S PURRIER (1985) have an extra parameter α causing a faster or a slower decline of the haz-
ard rate. With α = 1 we have the original distribution of S ALVIA and B OLLINGER and with
α = 0 we have the geometric distribution with constant hazard rate.
c
hi = ; i = 0, 1, 2, ...; 0 < c < 1; α ≥ 0
αi + 1
P0 = h0 = c
i−1
c Y αj + 1 − c
Pi = ; i = 1, 2, 3, ...
αi + 1 αj + 1
j=0
S0 = 1
i−1
Y αj + 1 − c
Si = ; i = 1, 2, 3, ...
αj + 1
j=0
3.2 Discrete Distributions 129

αi + 1 − c
qi =
α (i + 1) + 1
α c − α2
∆ηi = − < 0 ⇒ DHR with lim hi = 0
[α (i + 1) + 1] [α (i + 2) + 1] i→∞

Li − no closed form ⇒ IMRL

S ALVIA –B OLLINGER’s generalized IHR distribution


For α = 1 we have the original IHR distribution of S ALVIA /B OLLINGER and for α = 0 we have
a geometric distribution.
c
hi = 1 − ; i = 0, 1, 2, ...; 0 < c < 1; α ≥ 0
αi + 1
P0 = h 0 = 1 − c
 Yi−1
c c
Pi = 1− ; i = 1, 2, 3, ...
αi + 1 αj + 1
j=0
S0 = 1
i−1
Y c
Si = ; i = 1, 2, 3, ...
αj + 1
j=0
[α (i + 1) − c] (α i + 1) c
qi =
[α (i + 1) + 1] (α i + 1 − c) (α i + 1)
∆ηi > 0 ⇒ IHR with lim hi = 1
i→∞

Li − no closed form ⇒ DMRL with lim Li = 1


i→∞

Triangular distribution, right–angled and negatively skew


The triangular and the uniform distributions are simple forms of a discrete distribution with linear
PMF.
2i
Pi = ; i = 1, 2, ..., m; m ≥ 2
m (m + 1)
i (i − 1)
Si = 1 −
m (m + 1)
2i
hi =
m (m + 1) − i (i − 1)
i+1
qi =
i
1
∆ηi = > 0 ⇒ IHR
i (i + 1)
(m − i) (2 m + i − 1)
Li = ⇒ DMRL
3 (m + i)

Triangular distribution, right–angled and positively skew


2 (m − i + 1)
Pi = ; i = 1, 2, ..., m; m ≥ 2
m (m + 1)
(i − m − 2) (i − m − 1)
Si =
m (m + 1)
130 3 Presentation of Univariate Parametric Distributions

2
hi =
m − i + 2)
m−i
qi =
m−i+1
1
∆ηi = > 0 ⇒ IHR
(m − i) (m − i + 1)
m−i
Li = ⇒ DMRL
3

Triangular distribution, symmetric


We have to distinguish two cases depending on m, the length of the support of the distribution:

• m = 2 k, k = 1, 2, ...
• m = 2 k + 1; k = 1, 2, ...

We require m > 3. For m = 2 we will have the uniform distribution over two points i = 1 and
i = 2.
Case 1: m = 2 k, k =1, 2, ...
i
k (k+1) for i = 1, 2, ..., k

Pi =
 m−i+k for i = k + 1, k + 2, ..., 2 k (= m)
k (k+1)

 1 + 2 ik(1−i)
(k+1) for i = 1, 2, ..., k
Si =
(i−2 k−1) [i−2 (k+1)]

2 k (k+1) for i = k + 1, k + 2, ..., 2 k (= m)

2i
i (1−i) 2 k (k+1) for i = 1, 2, ..., k

hi =
2

2−i+2 k for i = k + 1, k + 2, ..., 2 k (= m)

 i+1 for i = 1, 2, ..., k
i
qi =
 2 k−i for i = k + 1, k + 2, ..., 2 k − 1
2 k−i+1
∆ηi > 0 ⇒ IHR
Li − complicated form ⇒ DMRL
Case 1: m = 2 k + 
1, k = 1, 2, ...
i

(k+1)2
for i = 1, 2, ..., k + 1
Pi =
 m−i+1 for i = k + 2, k + 3, ..., 2 k + 1 (= m)
(k+1)2

 1 + 2i(k+1)
(1−i)
2 for i = 1, 2, ..., k + 1
Si =
 (i−2 k−3) [i−2 (k+1)] for i = k + 2, k + 3, ..., 2 k + 1 (= m)
2 (k+1)2

2i

i (1−i) 2 (k+1)2
for i = 1, 2, ..., k + 1
hi =
2

3−i+2 k for i = k + 2, k + 3, ..., 2 k + 1 (= m)

 i+1 for i = 1, 2, ..., k + 1
i
qi =
 2 k−i for i = k + 2, k + 3, ..., 2 k
2 k−i+1
∆ηi > 0 ⇒ IHR
Li − complicated form ⇒ DMRL
3.2 Discrete Distributions 131

Uniform distribution
1
Pi = ; i = 1, 2, ..., m; m ∈ N+ ∨ m ≥ 2
m
i−1
Si = 1−
m
1
hi =
m−i+1
qi = 1

∆ηi = 0 ⇒ IHR
m−i
Li = ⇒ DMRL
2

W EIBULL distribution of type I


This and the following two discrete W EIBULL distributions mimic the continuous W EIBULL dis-
tribution with respect to the hazard rate behavior which may be IHR, DHR or constant depending
on the value of a certain parameter β. For β = 1 this discrete W EIBULL distribution is equal to
the geometric distribution with P = 1 − q.
β
Pi = q i − q i+1) β ; i = 0, 1, 2, ...; 0 < q < 1, β > 0
β
Si = q i



 DHR for 0 < β < 1
β β

hi = 1 − q (i+1) −i ⇒ constant (IHR and DHR) for β = 1 with hi = 1 − q



 IHR for β > 1



 IDMRL for 0 < β < 1

Li − no closed form ⇒ q
constant for β = 1 with Li = 1−q



 DIMRL for β > 1

W EIBULL distribution of type II


For β = 1 this discrete W EIBULL distribution is equal to the geometric distribution with P = α.

P1 = α
i−1 
Y 
Pi = α iβ−1 1 − α j β−1 ; i = 2, 3, ..., m; 0 < α < 1, β > 0
j=1

 ∞ for 0 < β ≤ 1
m=
 int α−1/(β−1) for β > 1
 

S1 = 1
i−1 
Y 
Si = 1 − α j β−1 ; i = 2, 3, ..., m
j=1
132 3 Presentation of Univariate Parametric Distributions
 


 DHR for 0 < β < 1 


 
β−1
hi = αi ⇒ constant (IHR and DHR) for β = 1 with hi = α ; i = 1, 2, ..., m

 


 IHR for β > 1 

 


 IMRL for 0 < β < 1 


 
Li − no closed form ⇒ 1−α ; i = 1, 2, ..., m
constant for β = 1 with Li = α

 


 DMRL for β > 1 

W EIBULL distribution of type III


For β = 0 this discrete W EIBULL distribution is equal to the geometric distribution with P =
1 − exp(−d).

P0 = 1 − exp(−d)
 
n o i
X
1 − exp − d (i + 1)β j β  ; i = 1, 2, ...; β ∈ R, d > 0
 
Pi = exp−d
j=1
S0 = 1
 
i
X
Si = exp−d j β  ; i = 1, 2, ...
j=1

= 1 − exp − d (i + 1)β
 
hi
 


 DHR for β < 0 


 
⇒ constant (IHR and DHR) for β = 0 with hi = 1 − exp(−d) ; i = 0, 1, ...

 


 IHR for β > 0 

 


 IMRL for β < 0 


 
Li − no closed form ⇒ 1 ; i = 0, 1, ...
constant for β = 0 with Li = exp(d)−1

 

 
 DMRL for β > 0 

Y ULE distribution
The Y ULE distribution plays a role in biostatistics where it is the distribution for the number of
species of biological organisms per family, see also X EKALAKI (1983a, b).

Γ(i) Γ(ρ + 1)
Pi = ρ B(i, ρ + 1) = ρ ; i = 1, 2, ...; ρ > 0
Γ(i + ρ + 1)
S1 = 1

Si = (i − 1) B(i − 1, ρ + 1); i = 2, 3, ...


ρ
E(X) = for ρ > 1
ρ−1
ρ
hi =
i+ρ
3.2 Discrete Distributions 133

i
qi =
1+ρ+i
−ρ − 1
∆ηi = < 0 ⇒ DHR
(1 + ρ + i) (2 + ρ + i)

 does not exist for ρ ≤ 1
Li − no closed form ⇒
 IMRL (linear) for ρ > 1

Zeta distribution of Z IPF


This distribution is also called discrete PARETO distribution. The name zeta is justified by the
appearance of R IEMANN’s zeta function in the formula of the PMF. This distribution plays a role
in linguistics giving the probability of the number of words appearing i times in long sequences
of text.
i−(θ+1)
Pi = ; i = 1, 2, ...; θ < 0
ζ(θ + 1)

X
ζ(θ + 1) = i−(θ+1) − R IEMANN’s zeta function
i=1

ζ(θ)
for θ > 1


E(X) = ζ(θ + 1)


 else
hi − no closed form
 θ+1
i
qi =
i+1
θ+1 
i + 1 θ+1
 
i
∆ηi = − < 0 ⇒ DHR
i+1 i+2

 does not exist for θ ≤ 1
Li − no closed form ⇒
 IMRL for θ > 1

Zeta distribution of H AIGHT


The name of the distribution has its reason in the fact that the first two raw moments can be
expressed in terms of R IEMANN’s zeta function.
1 1
Pi = − ; i = 1, 2, ...; α > 0
(2 i − 1)α (2 i + 1)α

 does not exist for α ≤ 1
E(X) =
 1 − 2−α  ζ(α) for α > 1

hi − no closed form
[2 i + 1]−α − [2 i + 3]−α
qi =
[2 i − 1]−α − [2 i + 1]−α
h  α  α i
2 4
(2 i − 1)α 1 − 2 1 − 3+2 i + 1 − 5+2 i
∆ηi = − α α
< 0 ⇒ DHR
(2 i − 1) − (2 i + 1)

 does not exist for α ≤ 1
Li − no closed form
 IMRL for α > 1
134 3 Presentation of Univariate Parametric Distributions

We conclude this section on discrete distributions by showing a typical output of the programm
DiscDist. After choosing one of the 31 distributions implemented in that program — here the
discrete W EIBULL distribution of type I — the program shows — as a reminder — the PMF–
formula of this distribution. Then the user is asked to input a value for each of the pertaining
parameters. The program checks these values for admissibility. The chosen parameter values are
displayed together with the graphs of Pi , Si , hi and Li .

Figure 3/3: PMF-formula display of the W EIBULL type I distribution by the program DiscDist

Figure 3/4: Display of the functions of a W EIBULL type I distribution by the program DiscDist
Part II

Inferential Aspects
4 Sampling Lifetime Data1
The estimation approach as well as the testing approach for the hazard rate and for any other life-
time representative have to take account of the type of the sampled data, i.e., on the way the data
set has been generated and on the form in which the data are handed over to the statistician, either
as individual observations (non–grouped) or as frequency counts per interval of time (grouped).
We will revert to the latter aspect by the end of this chapter when commenting on Figures 4/1 and
4/2.
The true problem of sampling lifetime data is the fact that the characteristic to be measured is time
itself. We might have long–lasting life–testing experiments unless we shorten them in one way
or the other, e.g. by acceleration, see Sect. 1.1.2.4, or by censoring. Thus, in a lifetime data set
we will find observations of complete lifetimes from birth to death (called failure data hereafter)
and incomplete lifetimes ending before death or failure (called censored data hereafter). When
the data set consists of the entire time spans ranging from birth or start to death or failure of each
sampled unit, the data set is said to be complete or uncensored. In practice, uncensored lifetime
data sets will be the rare exception. Clinical studies and biological trials or technical life testing
will seldom lead to complete lifetimes of all sampled units for several reasons. A sample is called
incomplete or censored when it consists of time spans covering the whole period of the unit’s
existence as well as of time spans with missing early and/or lifetime. The latter type of time spans
are called censored times. In most cases the late lifetime is missing because the observation of an
individual is not terminated by its death or its failure but by some other event. Thus, in clinical
trials there may a loss to follow–up of a person or its death due to another risk other than that
under study, and in technical life testing we meet planned withdrawal alive or stopping of the
experiment before all units have failed.
We may distinguish between random censoring and non–random censoring. Most of the results
presented in Part II are valid for random censoring only, but in practice the findings are also used
when censoring is deterministic and planned. An assumption concerning random as well as non–
random censoring is that censoring is non–informative. This means that the failure mechanism
and the censoring mechanism are assumed to act independently. Stated otherwise, for each unit
the censoring must not be predictive for the future and unobserved failure. Specifically, it must
be true for each unit at each lifetime x that
 
Pr [X ∈ [x, x + dx) | X ≥ x = Pr X ∈ [x, x + dx) | X ≥ x, Z ≥ x , dx small, (4.1)

Z being the censoring variate. (4.1) means that the probability of failing shortly after x, given
survival up to and including x, is unchanged by the added condition that censoring has not oc-
curred up to and including time x. Unfortunately, the truth of (4.1) cannot be tested from the
censored sample alone. In practice, a judgement about the truth of (4.1) should be sought on the
best available understanding of the nature of censoring applied.
A very simple random censoring process that is often realistic is one in which each unit is assumed
to be endowed with two random variables, a lifetime X and a censoring time Z, X and Z being
independent and continuous variates, having CDFs F (x) and G(z), respectively. For example, Z
may be the time associated to the happening of a competing risk. With n being the sample size
let (Xi Zi ); i = 1, 2, ..., n; be independent and define

Yi = min(Xi , Zi ), (4.2a)
1
Suggested reading for this chapter: R INNE (2009, Chapter 8).
138 4 Sampling Lifetime Data

and the indicator



 1, if Xi ≤ Zi (uncensored observation),
δi = I(Xi ≤Zi ) = (4.2b)
 0, if X > Z (censored observation).
i i

The data from observing n units now consists of the pairs (yi , δi ), i.e., it is known which obser-
vation is a failure time and which is a censored time. The joint probability of (yi , δi ) is obtained
using f (x) and g(z), the PDFs of X and Z, respectively. We have

Pr(Yi = y, δi = 0) = Pr(Xi > y, Zi = y)


 
= 1 − F (y) g(y) (4.2c)

and
Pr(Yi = y, δi = 1) = Pr(Zi > y, Xi = y)
 
= 1 − G(y) f (y). (4.2d)

These probabilities can be combined into the single expression


 δ  1−δi
Pr(Yi = y, δi ) = f (y) [1 − G(y)] i g(y) [1 − F (y)] . (4.2e)

From (4.2e) the joint probability of the n pairs (yi , δi ) results as


n 

δ  1−δi
f (yi ) [1 − G(yi )] i g(yi ) [1 − F (yi )]
Q 


i=1 n
δi
n
1−δi
 (4.2f)
1−δ δ
Q Q 
= 1 − G(yi ) g(yi ) i f (yi ) 1 − F (yi )
i .



i=1 i=1

If G(Y ) and g(y) do not involve any parameters of F (y) and f (y), then the first factor on the
right–hand side of (4.2f) can be neglected and the resulting expression taken to be proportional to
the likelihood function of the data:
n
Y 1−δi
f (yi )δi 1 − F (yi )

L∝ , (4.2g)
i=1

which constitutes the basis for maximum likelihood estimation, see Sections 7.1 and 7.2.
Censoring may be a random event and prevails in clinical studies, e.g., a person being member
of a special cancer survival study dies of a stroke or has a fatal traffic accident. Non–random
censoring prevails in life testing of technical units where the times of removing non–failed units
are scheduled at the beginning of the experiment. Non–random censoring takes different forms.2
According to what part of a lifetime is cut off and not reported by the sampling process, we
distinguish between

• censoring from above (on the right) when we do not observe the failure of a unit, i.e., the
last part of a lifetime is missing,

• censoring from below (on the left) when we do not know the ‘date of birth’ of a unit, i.e.,
observation starts at an unknown age of the unit and the first part of its lifetime is missing,

• censoring on both side, which is a combination of censoring from above and below.
2
A detailed description of life test plans, their motivation and their economic as well as their statistical advantages
and drawbacks is given in R INNE (2009, Chapter 8).
139

Censoring from above is most common with clinical studies and with technical life testing, and
that is why we will assume this type of censoring throughout Part II unless stated otherwise.
We further distinguish between:

• type–I censoring (time–dependent censoring) when testing and observing of lifetimes are
suspended when a fixed time xend has been reached, (The maximum lifetime to be ob-
served is xend . The number of failures in xend is random. Sometimes, depending on further
censoring prescriptions, the number of censored lifetimes in xend is random, too.),

• type–II censoring (failure–dependent censoring) when testing and observing are sus-
pended by reaching a fixed number k of failures, k < n, n being the sample size, (The
maximum observable lifetime, which also is the length or the duration of the experiment,
is the k-th order statistic Xk:n which is random,3 )

• a combination of type–I
 and type–II censoring, meaning that testing and observation stop
at min xend , Xk:n .

A third criterion in classifying non–random censoring is whether we have:

• single
 censoring, when all units
 that have not failed up to and including a certain time
xend , Xk:n or min xend , Xk:n are withdrawn from the test so that all censored times
are of equal length, or

• multiple censoring (hypercensoring or progressive censoring) when all the withdrawal


of units still alive is performed through several stages so that the censored times are not of
equal length as is the case with random censoring.

The following two figures show the quantities and data appearing in lifetime sampling. Fig. 4/1 is
related to non–grouped data and depicts the most general case, i.e., multiple or random censoring
with possibly tied observations. Other types of sampling with non–grouped data result from
Fig. 4/1 when assigning special values to the quantities ci and di which represent counts. Let
x1 < x2 < ... < xk , k ≤ n, be the observed distinct times of a failure. We admit three
possibilities of ties:

1) ties among censored observations,

2) ties among failure times, di ≥ 1, being the number of failures happening at xi ,

3) ties among censored lifetimes and failure times.

As time is a continuous variable ties of type 2) and 3) are theoretically impossible, but in practice
time is nearly always counted in some unit, for instance in minutes, hours or so on, so that two or
more events may happen ‘simultaneously’. In order to avoid difficulties with case 2), censored
times tied with failure times, we adopt the convention of moving such uncensored times in a tie
a little amount to the right so that censoring is assumed to occur a little bit later than failure. This
convention is sensible, since a unit observed alive at time xi , certainly survives past xi .
Beside di , which is attached to a certain point of time xi , we have ci , ci ≥ 0, the observed number
of censored lifetimes between failure times xi and xi+1 . Thus, ci is attached to an interval of
time, more precisely, to an interval of random length, which is right–opened: [xi , xi+1 ); i =
3
Commonly, x1:n ≤ x2:n ≤ ... ≤ xn:n denotes the ordered sample values. Sometimes, when there is no danger
of confusion and in order to keep the mathematical notation as lean as possible we will refrain from using this
special notation and xi will stand for the i–th longest lifetime.
140 4 Sampling Lifetime Data

0, 1, ..., k; x0 = 0 and xk+1 = ∞. This interval is in accordance with the convention above and
any censoring happening at xi+1 is counted in the number ci+1 of the following interval.

Figure 4/1: Illustration of the numbers ci , di , ni and the failure times xi on the time axis (non–
grouped data)

The numbers ni ; i = 0, 1, ..., k; are attached to a point just prior to xi . ni is called the number
of units at risk at xi , and it counts the number of units which are alive and not censored and
which are exposed to the risk of failure at xi . The quantities ni , ci , di are linked as

ni = ni−1 − ci−1 − di−1 ; i = 1, 2, ..., k; (4.3a)

where
n0 = n, d0 = 0.
The sample size n can be expressed as

n = n0 = (d0 + ... + dk ) + (c0 + ... + ck ), (4.3b)

i.e., it is divided into the number of units with complete lifetimes and of incomplete lifetimes,
respectively. The number of censored lifetimes in the right–opened interval [xi , xi+1 ) is

ci = ni − di − ni+1 . (4.3c)

We now look at some special situations.

1. For c0 = c1 = ... = ck = 0 there is no censoring and we have


i−1
X
ni = n − dj ; i = 1, 2, ..., k;
j=1

and especially
k−1
X
d k = nk = n − di .
i=1

2. When there are no tied failure times (di = 1 ∀ i) and no censoring we have

a) k = n and
b) ni = n − i + 1; i = 1, 2, ..., n.

3. For single censoring of type–II with ` as given number of failures to be observed and
possibly tied failure times we have k so that
k−1
X k
X
di < ` ≤ di
i=1 i=1

k
P
and c0 = c1 = ... = ck−1 = 0 and ck = n − di .
i=1
141

4. For single censoring of type–I with the single censoring time xend and possibly tied failure
times we may have

a) no failures before xend , so that neither d1 , d2 , ..., dk nor c1 , c2 , ..., ck exist and c0 =
n0 = n, or
b) k ≥ 1 failure times x1 < x2 < ... < xk < xend so that ck = nk − dk ≥ 0 lifetimes
will be censored at xend somewhere behind the last failure time xk . It might happen
that ck = 0 when all units have failed before xend .

Figure 4/2: Illustration of the numbers cj , dj , nj for a divided time axis (grouped data)

Fig. 4/2 depicts the situation for grouped sampling data. The time axis is divided into m + 1
intervals, not necessarily of equal length:

Ij = [tj−1 , tj ); j = 1, 2, ..., m + 1; (4.4a)

where t0 = 0, tm = tend and tm+1 = ∞, and tend is an upper limit on observation. These
intervals are fix and have non–random width

wj = tj − tj−1 ; j = 1, 2, ..., m + 1. (4.4b)

The analysis of grouped lifetime data is done by actuarial methods within a so–called life table. In
life tables for human populations, i.e., in demography, tend generally is 100 years and the interval
width has a constant length of one year for the first m = 100 intervals. But in general, the widths
need not be constant. Each member in a sample of n units whose lifetime starts at t0 either has
a failure time or a censoring time. These times are counted per interval. We now define the
following quantities:

dj = number of lifetimes ending by a failure or death in Ij , (Remember that for non–


grouped data di is attached to a point of time.)

cj = number of lifetimes in Ij ending by censoring,

nj = number of units entering Ij alive (= number of units at risk at tj−1 ).

These numbers are linked as

nj = nj−1 − dj−1 − cj−1 ; j = 2, 3, ..., m + 1; (4.4c)

where n1 = n is the sample size. In the last interval Im+1 = [tm , tm+1 ) = [tm , ∞) it can be
considered that only uncensored lifetimes are in this interval since the nm+1 units not failed by
tm = tend must fail somewhere in Im+1 . Thus we have

nm+1 = dm+1 and cm+1 = 0. (4.4d)


5 Hazard Rate Estimation and the
K APLAN /M EIER and
N ELSON /A ALEN Approaches
The estimating approach for the hazard rate in this chapter is especially apt for non-grouped data
sets of small to medium size. The quantities ni , ci and di coming up here have been defined in
Chapter 4 and are explained in Fig. 4/1. The hazard rate estimator which is derived in Sect. 5.1
is a by–product when estimating the survival function and comes as a pointwise estimator which
— for continuous distributed lifetime — may be smoothed by several methods, see Chapter 8.
The hazard rate estimator of Sect. 5.1 is input for estimating the cumulative hazard rate by the
N ELSON /A ALEN method in Sect. 5.2.1

5.1 Estimating the Hazard Rate and the Survival Function2


The search for an estimator of the survival function begins by assuming that the date have arisen
from a discrete distribution with probability mass values at the ordered distinct failure times x1 <
x2 < ... < xk , k ≤ n. For a discrete distribution the hazard rate hi is a conditional probability
with interpretation, see (1.60a):

Pr(X = xi )
hi = Pr(X = xi | X ≥ x− i) = .
Pr(X ≥ xi )

As has been shown in (1.61c), the survival function can be written in terms of the hazard rate:
Y
S(x) = (1 − hi ). (5.1a)
i : xi ≤x

Thus, a reasonable estimator of S(x) will be


Y
S(x)
b = (1 − b
hi ), (5.1b)
i : xi ≤x

which reduces the problem of estimating the survival function to that of estimating the hazard
rate at each observed failure time xi . Choosing the maximum likelihood procedure we have an
appropriate element for the likelihood function as3
Li = hdi i (1 − hi )ni −di ; i = 1, 2, ..., k. (5.2a)
This expression is correct since

1. di ≥ 1 is the number of failures at xi and hi is the conditional probability of failure at xi


and
1
The procedures of this chapter are programmed and assembled in the MATLAB–file Hazard01 which can be
downloaded from htpp://geb.uni-giessen.de/volltexte/2013/?????.
2
Suggested reading for this chapter: C OX /OAKES (1984), K APLAN /M EIER (1958), K LEIN /M OESCHBERGER
(1997), L AWLESS (1982), L EEMIS (1995), S MITH (2002).
3
Remember that ni , the number of units at risk at xi , is given by
ni = ni−1 − ci−1 − di−1 ; i = 1, 2, ..., k;
with n0 = n and d0 = 0.
5.1 Estimating the Hazard Rate and the Survival Function 143

2. ni −di is the number of units on test not failing at xi with 1−hi as the probability of failing
after xi , conditioned on survival to time xi .

Thus, the likelihood function for all hi is


k
Y k
Y
L(h1 , ..., hk ) = Li = hdi i (1 − hi )ni −di , (5.2b)
i=1 i=1

with log–likelihood function

L(h1 , ..., hk ) = ln L(h1 , ..., hk )


k
X  
= di ln hi + (ni − di ) ln(1 − hi ) . (5.2c)
i=1

The i–th element of the so–called score vector is


∂L(h1 , ..., hk ) di ni − d i
= − . (5.2d)
∂hi hi 1 − hi
Equating the score vector to zero and solving for hi yields the maximum likelihood estimator
(MLE) for hi :
di
hi = ; i = 1, 2, ..., k.
b (5.2e)
ni
This estimator is sensible, since di of the ni units still at risk at time xi fail, so the ratio of di to
ni is an appropriate estimator of the conditional probability of failure at xi . This derivation may
strike a familiar chord since, at each time xi , estimating hi with di divided by ni is equivalent to
estimating the probability of ‘success’, i.e., failing at xi , for each of of the ni units still on test.
Thus, this derivation is equivalent to finding the MLE for the probability of success for k binomial
variables Pi .
Using the particular estimator (5.2e) for the hazard rate, the survival function estimator (5.1b)
becomes
Y  di

S(x)
b = 1− ; i = 1, ..., k; (5.3a)
ni
i : xi ≤x

which is a so–called product—limit estimator (PLE). This version is commonly known as the
K APLAN /M EIER estimator (KME), see K APLAN /M EIER (1958).4 This estimator is a maxi-
mum likelihood estimator, too, as it is a function of the maximum likelihood estimated hi . In
(5.3a) the censored observations are not forgotten, they have been allowed for in ni , the number
of units at risk just before xi , and the effect of censored observations in the survival function
estimator is a larger downward step compared to the step–size if the there had been no censoring.

One problem that arises with the KME is that it is not defined past the last observed failure time
xk . The usual way to handle this problem is to cut off the estimator at xk . But there are other
suggestions. Some authors define

• S(x)
b = 0 for x > xk when dk = nk , i.e., when no sample units survived past xk ,
k
Q  
• S(x)
b = 1 − di ni for x > xk when dk < xk , i.e., when there are sample units
i=1
surviving past xk .
4
Another PLE has been suggested by H ERD (1960) and J OHNSON (1964), see the Excursus further down.
144 5 Hazard Rate Estimation and the K APLAN /M EIER and N ELSON /A ALEN Approaches

The first suggestion means that the in the sampled population no individual would survive age xk ,
whereas the second suggestion supposes ever–lasting lifetime for some individuals in the sampled
population.
We now mention two special variants of the KME–formula (5.3a).

1. When there are neither multiple failures nor censored observations before xk , the longest
recorded failure time, we will have

• di = 1 for i = 1, ..., n (or k in the case of single censoring) and


• ni = n − i + 1 as the number of units at risk (not failed) just before xi .

In this case the KME of (5.3a) will turn into


 
i
n−j
Y n−i  xi ≤ x < xi+1 , 
S(x)
b = = for (5.3b)
n−j+1 n  i = 1, ..., n − 1 or k − 1 
j=1

which is the familiar staircase empirical survival function with a downward step of size 1/n
at each xi . The hazard rate estimator (5.2e) turns into
1
hi = b
b h(xi ) = ; i = 1, ..., n or k. (5.3c)
n−i+1

2. When we have records on all observed times — failure times as well as censored times —
which are given as pairs (yi , δi ), see (4.2a,b), the KME of (5.3a) may be written as
Y  n − i δi
S(x) =
b , (5.3d)
n−i+1
i : yi ≤x

when there are no tied failure times. yi is any observation, either censored or not. The
hazard rate estimator for this case is
 1 
 for δi = 1, 
hi = b
b h(yi ) = n−i+1 (5.3e)
not defined for δi = 0.
 

Excursus: The K APLAN /M EIER estimator and the H ERD /J OHNSON estimator written in terms of
reverse ranks

Besides the KME we have another PLE which has been proposed by H ERD (1960) and J OHNSON (1964)
and which will be called H ERD /J OHNSON estimator, abbreviated HJE. When there are no tied obser-
vations both estimators can be defined recursively using the reverse ranks ri of y1 < y2 < ... < yn :

ri = n − i + 1, i = 1, 2, ..., n. (5.4)
Using ri the KME of (5.3d) turns into
 δ
ri − 1 i
S(x) = KME Pi =
b b KME Pi−1 ;
b yi ≤ x < yi+1 ; i = 1, ..., n; (5.5a)
ri
with starting value
KME P0
b = 1. (5.5b)
The HJE is defined as
 δi
ri
S(x)
b = HJE Pi =
b HJE Pi−1 ;
b yi ≤ x < yi+1 ; i = 1, ..., n; (5.6a)
ri + 1
5.1 Estimating the Hazard Rate and the Survival Function 145

with starting value


HJE P0 = 1. (5.6b)
b
 
The recursion factor ri (ri + 1) of the HJE is greater than the factor (ri − 1) ri of the KME resulting in
an always larger HJE. When the sample is uncensored, i.e., δi = 1 ∀ i, the HJE will be

b = n+1−i
HJE Pi
n+1
while the KME will be
b = n − i.
KME Pi
n
We conclude this excursus by stating that the KME being a MLE is more popular than the HJE.

We now want to find estimators for the variances of b hi and S(x).


b If the the possible failure
times xi are fixed and the censoring mechanism allows the number of failures di at each xi to
increase at the same rate as the sample size n, then the standard large–sample theory for MLEs

applies as C OX /OAKES (1984) stated. Thus, asymptotically n (b hi − hi ) will have a multivariate
normal distribution with mean vector equal to zero and a variance–covariance matrix which can
be estimated by the observed F ISHER information matrix. The elements of the latter matrix
require the derivatives of the score vector (5.2d) and read

ni − d i
 
di
∂L(h1 , ..., hk )


2 + 2
for i = ` 

− = h i (1 − h i ) ; i, ` = 1, ..., k. (5.7a)
∂hi ∂h`
for i 6= `
 
 0 

Substituting hi by its MLE the diagonal elements, which are not equal to zero, read

n3i

∂L(h1 , ..., hk )
− = . (5.7b)
∂hi ∂hi
hi =di /ni di (ni − di )

hi is
Thus, the estimated variance of b
 di (ni − di )
Var
c b hi = (5.7c)
n3i

which is also the variance of 1 − b


hi :
 
c 1−b
Var hi = Var
c b hi . (5.7d)

hi in (5.3e) and of 1 − b
When the data set is given as pairs (yi , δi ) the variance of b hi is

n−i
 


3
for δi = 1 

(n − i + 1)
 
Var hi = Var 1 − hi =
c b c b (5.7e)
 not defined for δi = 0 
 

Pointwise confidence intervals for hi can now be obtained via the normal approximation. A two–
sided (1 − α)–confidence interval such as
q  q 
hi − τ1−α/2 Var hi ≤ hi ≤ hi + τ1−α/2 Var
b c b b c bhi (5.7f)

refers to a single observation. A larger multiplier than the standard normal percentile τ1−α/2
would be needed for a simultaneous confidence interval over more than one lifetime.
146 5 Hazard Rate Estimation and the K APLAN /M EIER and N ELSON /A ALEN Approaches

Turning to the variance of S(x)


b we have from (5.1b)
X 
ln S(x)
b = ln 1 − b
hi . (5.8a)
i : xi ≤x

hi ’s are asymptotically independent, so that the asymptotic variance


We have just seen that the b
of ln S(x)
b and hance of S(x)
b can easily be found for any fixed x. First, using (1.30b) for the
approximate variance of a transformed variate we find
 2

c ln 1 − b
 1 
Var hi ≈ c 1−b
Var hi
1−b hi
di
= (5.8b)
ni (ni − di )

and then
  X  
c ln S(x)
Var b ≈ c ln 1 − b
Var hi
i : xi ≤x
X di
= . (5.8c)
ni (ni − di )
i : xi ≤x

Finally, applying (1.30b) once more, but now to (5.8c) we have

  h i2 X di
c S(x)
Var b ≈ S(x)
b , (5.8d)
ni (ni − di )
i : xi ≤x

which is known as G REENWOOD’s formula, G REENWOOD (1926). When the data are given as
pairs (yi , δi ) so that (5.3d,e) and (5.7e) hold, we have instead of (5.8b-d)
 
1

 for δi = 1 

(n − i) (n − i + 1)
 
c ln(1 − b
Var hi ) ≈ , (5.8e)
not defined for δi = 0 

 

  X δi
c ln S(x)
Var b ≈ , (5.8f)
(n − i) (n − i + 1)
i : yi ≤x
  h i2 X δi
c S(x)
Var b ≈ S(x)
b . (5.8g)
n − i) (n − i + 1)
i : yi ≤x

A pointwise confidence interval for S(x) can be obtained analogous to (5.7f). G REENWOOD’s
formula may be seen unstable in the right tail of the distribution,5 so some authors have proposed
an alternative and simpler estimator originating in the binomial distribution, namely
h i2 h i
  b i ) 1 − S(x)
S(x b
c S(x
Var b i) ≈ . (5.9)
ni

A rationale for (5.9) is given in C OX /OAKES (1984, p. 51) who also suggest likelihood based
confidence intervals resting upon a χ2 –distribution.
5
We see that (5.8d) grows with xi because di comes closer to ni . (5.8d) would even become ∞ when for the last
observed failure time xk we would have dk = nk .
5.1 Estimating the Hazard Rate and the Survival Function 147

Example 5/1: Computation of h


b i , S(x
b i ) and their variances

The following data have often been used in the literature on lifetime analysis, e.g., see C OX /OAKES (1984)
or L EEMIS (1986). The following observations (yi , δi ) with time measured in weeks

yi 6 6 6 6 7 9 10 10 11 13 16 17 19 20 22 23 25 32 32 34 35
δi 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0
are the ordered times of remission, i.e., freedom of symptoms in a precisely defined medical sense, of 21
leukaemia patients who have been treated with the drug 6–MP (= 6-mercapto-purine). 12 patients have
been lost to follow–up (δi = 0). From the data above we have extracted and displayed in Tab. 5/1 the
‘failure times’ xi , where — in this example — failure is a positive event.

Table 5/1: Estimates of hi and S(xi ) for the 21 leukaemia patients’ data
 
i xi ni di hi
b Var(
c b hi ) S(xb i) c S(x
Var b i)

1 6 21 3 0.1429 0.0058 0.8571 5.8303 · 10−3


2 7 17 1 0.0588 0.0033 0.8067 7.5488 · 10−3
3 10 15 1 0.0667 0.0041 0.7529 9.2965 · 10−3
4 13 12 1 0.0833 0.0064 0.6902 0.0114
5 16 11 1 0.0909 0.0075 0.6275 0.0130
6 22 7 1 0.1429 0.0175 0.5378 0.0165
7 23 6
1 0.1667 0.0231 0.4482 0.0181
 
S(x)
b c S(x)
and Var b are not defined beyond x = 23.

Figure 5/1: Estimated hazard rate and survival function with pointwise 95%–confidence intervals for the
21 leukaemia–patients’ data
148 5 Hazard Rate Estimation and the K APLAN /M EIER and N ELSON /A ALEN Approaches

Under the non–predictive censoring assumption the KME can be motivated in several ways. This
estimator is

1. the generalized MLE in the same sense that the empirical distribution function is in the
case of uncensored data, see M ILLER (1981, pp. 57 ff.),

2. the limit of the life–table estimators, i.e., for data grouped in time intervals (see Sec. 8.2),
as the intervals increase in number and decrease in length,

3. a product od estimators of conditional probabilities, see (5.1b),

4. a self–consistent estimator, see M ILLER (1981, pp. 52 ff.),

5. the redistribution–to–the–right estimator, as proposed by E FRON (1967), see the follow-


ing Example 5/2.

Example 5/2: The redistribution–to–the-right algorithm applied to the 21 leukemia patients’ data

The algorithm starts with an empirical distribution that puts mass 1/n at each observation yi ; i =
1, 2, ..., n; and then eliminates and moves the mass of each censored observation (yi , 0) by distribut-
ing it equally to all observations to the right of it. After the last redistribution the estimated survival
function at each (yi , 1) is unity minus the sum of the redistributed masses up to and including (yi , 1).

Mass at start (∗ marks a censored observation)


6 6 6 6∗ 7 9∗ 10 10∗ 11∗ 13 16 ... 34∗ 35∗
1/21 1/21 1/21 1/21 1/21 1/21 1/21 1/21 1/21 1/21 1/21 . . . 1/21 1/21

Mass after the first redistribution


Combining the first three tied uncensored observations at x = 6 and first redistribution of 1/21 at 6∗
among the 17 subsequent observations we have

6 6∗ 7 9∗ 10 10∗ 11∗ 13 16 ... 34∗ 35∗


1/7 0 6/119 6/119 6/119 6/119 6/119 6/119 6/119 . . . 6/119 6/119
since 1/21 + (1/21)/17 = 6/119.

Mass after the second redistribution


Redistribution of 6/119 at 9∗ to the 15 subsequent observations gives

6 6∗ 7 9∗ 10 10∗ 11∗ 13 16 ... 34∗ 35∗


1/7 0 6/119 0 32/595 32/595 32/595 32/595 32/595 . . . 32/595 32/595
since 6/119 + (6/119)/15 = 32/595.

Mass after the third redistribution


Redistribution of 32/595 at each of the two consecutive censored observations 10∗ and 11∗ to the 12
subsequent observations gives

6 6∗ 7 9∗ 10 10∗ 11∗ 13 16 ... 34∗ 35∗


1/7 0 6/119 0 32/595 0 0 16/255 16/255 . . . 16/255 16/255
since 32/595 + 2 · (32/595)/12 = 16/255.
5.2 Estimating the Cumulative Hazard rate 149

When this process is continued through all the observed data and the resulting mass function is processed
as described in the beginning of this example the KME results. To check this for one special value, e.g.,
for x5 = 16, note that

S(16)
b = 1 − [1/7 + 6/119 + 32/595 + 16/255 + 16/255] = 0.6275

which matches the result given in Tab. 5/1.

5.2 Estimating the Cumulative Hazard rate6


We have two approaches to estimate H(x) resulting from the two possibilities of defining H(x)
in the discrete case, see (1.62a,b):

• an indirect estimator built upon a previously estimated survival function and

• a direct estimator resting upon a previously estimated hazard rate.

Of course, both estimators have differing statistical properties and the generated estimates will
not be identical to each other.
In Chapter 1, see (1.11f), we have seen that the cumulative hazard rate H(x) and the survival
function S(x) are related as
H(x) = − ln S(x).
So, the indirect estimator of H(x), sometimes called the natural estimator of H(x), is

H(x)
e = − ln S(x).
b (5.10)

The KME of S(x) is, see (5.3a) and (5.3d):7


   
Q di
1 − ; i = 1, 2, ..., k for tied failure times

 

ni

 

i : x i ≤x
S(x)
b =  δi (5.11a)
n−i
; i = 1, 2, ..., n for untied failure times.8

 Q 

 
i : yi ≤x n − i + 1
 

Thus, the natural estimator of H(x) is


 
di
 
P

 − ln 1 − , 


i : xi ≤x  ni  
H(x) =
e (5.11b)
P n−i
 − δi ln .

 

n−i+1

i : yi ≤x

The estimated variance of ln S(X)


b is given in (5.8c) and (5.8f) so that for H(x) e = − ln S(x)
b the
estimated variance reads
di
 
P

 , 

   i : xi ≤x n i (n i − d i ) 
Var H(x) =
c e
P δi (5.11c)

 . 

i : yi ≤x (n − i) (n − i + 1)
 

6
Suggested reading for this section: A ALEN (1978), E LANDT–J OHNSON /J OHNSON (1980, Chapter 2),
G ROSS /C LARK (1975, Sect. 4.7), L AWLESS (1982, Sect. 2.4), M ILLER (1981, Chapter 3), N ELSON (1969,
1970, 1972, 1982), S MITH (2002, Chapter 6).
7
When we have grouped data we have to take the life table estimator of S(x), see Sect. 6.2, which resembles the
formula for tied failure times.
8
For the definition of ni and di see (4.31) and for that of yi and δi see (4.2a,b).
150 5 Hazard Rate Estimation and the K APLAN /M EIER and N ELSON /A ALEN Approaches

The direct estimator of H(x) comes with different names, either as empirical accumulated
hazard function, see (1.62b), or as N ELSON /A ALEN estimator. It has first been suggested
by N ELSON (1972) in a reliability context and has been rediscovered by A ALEN (1978) using
modern counting process techniques. Using the MLEs of hi in (5.2e) or (5.3e) the empirical
accumulated hazard rate function results as9
P di
 

 for tied failure times, 


i : x i ≤x n i

H(x) =
b
P δi (5.12a)

 for untied failure times, 

i : yi ≤x (n − i + 1)
 

with estimated variance


 
P di (ni − di )
,

 

n3i

 


c H(x)
 i : xi ≤x
Var b = (5.12b)
P δi (n − i)
.

 

3
i : yi ≤x (n − i + 1)

 

j
b j ) ≈ P d i n2 .
  
c H(x
A ALEN (1978) gives an approximation: Var i
i=1

Since H(x)
e = − ln S(x)
b we have from (5.11a)
 
 Y  di

H(x)
e = − ln 1−
 ni 
i : xi ≤x
 
X di
= − ln 1 −
ni
i : xi ≤x
X  di d2i

= + + ... .
ni 2 n2i
i : x ≤x i

Thus, H(x)
b in (5.12a) can be viewed as first–order approximation to H(x).
e Comparing the esti-
mators H(x) and H(x) we see that for a given set of failure data
b e

• H(x)
b produces smaller estimates than H(x)
e and

• the variance of H(x)


b is smaller than that of H(x),
e
9
We remark that based on the N ELSON /A ALEN estimator we may give another estimator of S(x), namely
 
S(x)
e = exp − ln H(x
b i) .
P 
Furthermore, the estimator H(x
b i) = di ni is correct only when we have non–grouped data with tied
i : xi ≤x
failure times where both di and ni refer to the same points of time xi , the failure time. With grouped data — see
Sect. 6.2 — nj refers to the start tj−1 of the j–th interval Ij = [tj−1 , tj ) and dj is the number of failures in Ij .
Thus, it seems reasonable in this case to use a slightly modified estimator
j
b ∗j ) =
X di
H(t ∗
i=1
n i

where
tj−1 + tj ni + ni+1
t∗j = and n∗j = .
2 2
5.2 Estimating the Cumulative Hazard rate 151

see Example 5/3 and Fig. 5/2. Under certain regulatory conditions one can show that both esti-
mators are

• non–parametric MLEs,

• consistent,

• asymptotically equivalent and

• converge weakly to G AUSSIAN PROCESSES, see M ILLER (1981), pp. 67 ff.), meaning that
for fixed x, the estimators are approximately normally distributed.

Example 5/3: Estimation of the CHR for the 21 leukaemia patients’ data

We return to the data of Example 5/1 and Tab. 5/1.


Table 5/2: Estimates of H(xi ) and its variances for the 21 leukaemia patients’ data
   
i xi ni di S(x b i ) H(x
e i ) Var c H(xe i) H(x
b i ) Varc H(xb i)

1 6 21 3 0.8571 0.1542 0.0079 0.1429 0.0058


2 7 17 1 0.8067 0.2148 0.0116 0.2017 0.0091
3 10 15 1 0.7529 0.2838 0.0164 0.2683 0.0132
4 13 12 1 0.6902 0.3708 0.0240 0.3517 0.0196
5 16 11 1 0.6275 0.4661 0.0330 0.4426 0.0271
6 22 7 1 0.5378 0.6202 0.0569 0.5854 0.0446
7 23 6 1 0.4482 0.8026 0.0902 0.7521 0.0678

Figure 5/2: Estimated CHR with pointwise 95%–confidence intervals


left part: indirect estimates; right part: direct (N ELSON /A ALEN) estimates
6 Estimating the Hazard Rate
from Life Tables
One of the oldest tools in empirical statistics is the life table. Life tables have been used since the
17th century1 by demographers and actuaries to find the law of decrement for human populations
and to calculate premiums for life insurances. But life tables apply equally well to reliability
and to biostatistical situations for which grouped data rather than individual observations are
available. The life table methods for estimating the survival function and other representatives
of lifetime data are primarily designed for situations in which the sample size n is not too small
and in which exact censored times and failure times shall not be processed or are not available at
all. A drawback of hazard rates estimated by life table methods is their lack of smoothness and
continuity.
This chapter is organized as follows:

• In Sect. 6.1 we will give basic definitions of life table functions and show how they are
related.

• Sect. 6.2 is the core of this chapter where we see how life table functions including the
hazard rate are to be estimated.

• In Sect.6.3 we review some further estimators of the hazard rate which are related to the
life table approach.

6.1 Life Table Function2


A divided time axis is needed to set up a life table. We introduce m+1 preassigned time intervals3

Ii = [tj−1 , tj ); j = 1, 2, ..., m + 1;

where t0 = 0, tm = tend and tm+1 = ∞ with interval width

wj = tj − tj−1 ; j = 1, 2, ..., m + 1.

tend is an upper limit of observation, and for life tables we usually have tend = 100 years.4 For
those life tables the interval width, generally, is constant and amounts to one year. In order to
document the high infant mortality and its fast decline we sometimes have a finer division for the
1
The first life tables have been compiled by J. G RAUNT in 1666 and E. H ALLEY in 1693. Another pioneer is
the Prussian statistician and demographer J. P. S ÜSSMILCH (1707 – 1767). In the 19th century B. G OMPERTZ
(1779 – 1865) and W. M AKEHAM (1829 – 1871) were interested in the graduation of crude death rates for older
people.
2
Suggested reading for this section: E LANDT–J OHNSON /J OHNSON (1980), S HYROCK /S IEGEL (1976),
S PIEGELMAN (1968).
3
Another, but not so popular way of grouping is to construct the intervals so that they will contain fixed numbers
of failures. By this way, the interval limits and their widths are random. Hazard rate estimation for this kind of
grouping will be presented in Sect. 6.3
4
As human life expectancy is growing we nowadays may find tend = 105 years or even tend = 110 years,
especially for economically developed countries.
6.1 Life Table Functions 153

first and second years of life, i.e., days for the first week after birth, weeks and months thereafter.
So–called abridged life tables on the other hand, have a width of five or even ten years.
A life table combines two types of quantities:

• stocks, measured or counted at a certain point of time or age, and

• flows, measured or counted within a certain interval of time.

Stocks and flows are linked by the general updating formula

stock at tj = stock at tj−1 + incoming flows in [tj−1 , tj ) − outgoing flows in [tj−1 , tj ).

The updating formula shows that there are flows into the stock as well as out of the stock. Looking
at a demographic life table we will only see flows out of the stock, where the stock is the number
of persons `x alive at exact age x and the outgoing flow is the number 1 dx of persons dying in
[x, x + 1).
We will first look at the quantities and functions coming up in a demographic life table. Later on,
when describing hazard rate estimation, we will change and simplify the notation and introduce
some more quantities. The key quantities of a demographic life table are 1 qx ; x = 0, 1, ..., xend ;
the conditional probabilities of dying in [x, x + 1), given an individual is alive at exact age x.
Roughly speaking, this ratio is obtained from dividing the number of persons dying in [x, x + 1),
originating from a nation’s mortality statistics by the number of persons entering the age of x,
originating from a nation’s population census. Then, a number `0 as starters at age x = 0 is
taken to derive all the other quantities. `0 is called the radix of the life table, and it is often
conventionally taken as 100,000 or 10,000. The `x , the expected numbers of persons at exact
age x, are found recursively by applying

1 px = 1 − 1 qx (6.1a)

to `0 with 1 px as an individual’s conditional probability of not dying in [x, x + 1), given being
alive at exact age x. We thus find

`x = (1 − 1 qx−1 ) `x−1
= 1 px−1 `x−1
x−1
Y
= `0 1 py . (6.1b)
y=0

The quantity
1 dx = 1 qx `x = `x − `x+1 (6.1c)
is the expected number of deaths in [x, x + 1) from where we may write

1 dx `x − `x+1
1 qx = = (6.1d)
`x `x

and
`x+1
1 px = 1 − 1 qx = . (6.1e)
`x
Whereas `x is the expected number of survivors out of `0 at exact age x, the expected propor-
tion of survivors out of `0 is
`x
Πx = . (6.1f)
`0
154 6 Estimating the Hazard Rate from Life Tables

This is in fact the survival function S(x). With (6.1a,b) we may write Πx as a telescope product:
x−1
Y x−1
Y
Πx = 1 py = (1 − 1 qy ), (6.1g)
y=0 y=0

which parallels the product–limit estimator (PLE) of (5.3a). Also, the expected proportion sur-
viving for k years, given alive at exact age x, is
x+k−1
Y x+k−1
Y
k px = 1 py = (1 − 1 qy )
y=x y=x
`x+1 `x+2 `x+k `x+k
= ... = . (6.1h)
`x `x+1 `x+k−1 `x

There are some more basic functions. The first one is Lx , the expected total number of years
lived in [x, x + 1). Lx is nothing but the number of ‘person × years’ that `x persons, aged x
exactly, are expected to live through [x, x + 1) and is recognized as a contribution of what is
called total–time–on–test statistic in life testing. Each member in the group who survives the full
year x to x + 1 contributes exactly one year to Lx , whereas each member who dies in [x, x + 1)
only contributes a fraction of a year to Lx . Formally, we have
x+1
Z Z1
Lx = `y dy = `x+u du. (6.2a)
x 0
This integral may be evaluated exactly when the age at death of each member is known. An
approximation to Lx is
`x + `x+1
Lx ≈ = `x − 0.5 1 dx , (6.2b)
2
assuming the deaths to be equally distributed within [x, x + 1). If the interval width is other than
one year, Lx of (6.2b) has to be multiplied by this width. The approximation (6.2b) tends to
overestimate Lx for younger ages and to underestimate it for older ages.
Tx is the expected total number of years live beyond age x by the `x persons alive at that age:5
Tx = Lx + Lx+1 + ... + Lxend −1
−x−1
xendX
= Lx+u . (6.2c)
u=0
Of course,
Tx = Tx−1 + Lx (6.2d)
and using the approximation (6.2b)
xend 
`x X
Tx ≈ + `u



2 

u=x+1 

P−1
xend


= (x + 0.5) 1 dx (6.2e)
x=0 

end −1
xX 
`0



= + x 1 dx . 

2 
x=0
5
Remember that xend is the oldest age reported, which is assumed not to be survived so that `end = 0 and we have
1 dxend −1 = `xend −1 − `xend = `xend −1 . Furthermore, since all persons entering life at x = 0 will die before xend
we have
xend −1
X
`0 = 1 dx .
x=0
6.2 Estimators for Life Table Functions Including the Hazard Rate 155

A last basic life table function is


o Tx
ex = , (6.3a)
`x
the expected future lifetime of an individual alive at x. In Sect. 1.1.1.6 this quantity has been
o
introduced by the name ‘mean residual life’ (MRL). Using (6.2e) ex may be written as
xend
o 1 1 X
ex = + `u . (6.3b)
2 `x
u=x+1

The expected age at death of a person surviving x is


o
E(X | X ≥ x) = x + ex . (6.3c)

o
The basic functions 1 qx , `x , 1 dx , Lx , Tx and ex are usually tabulated in a standard format as
in Tab. 6/1, which is the life table 2000 – 2002 for German males. It has been constructed using
updated results from the 1987 German population census and death statistics for the years 2000
through 2002. It is common practice to take the death statistics for more than one year to allow
for years of over–mortality and of sub–mortality.
Table 6/1: Extraction from the German life table 2000 – 2002 for males
o
x 1 qx `x 1 dx Lx Tx ex
0 0.00451281 100, 000 451 99, 605 7, 537, 995 75.38
1 0.00043340 99, 549 43 99, 527 7, 438, 390 74.72
2 0.00024513 99, 506 24 99, 493 7, 338, 863 73.75
3 0.00022031 99, 481 22 99, 479 7, 239, 370 72.77
4 0.00013878 99, 459 14 99, 452 7, 139, 899 71.79
.. .. .. .. .. .. ..
. . . . . . .
96 0.30296858 2, 242 679 1, 902 5, 541 2.47
97 0.32184621 1, 563 503 1, 311 3, 639 2.33
98 0.34097780 1, 060 361 879 2, 328 2, 20
99 0.36031243 698 252 573 1, 448 2.07
100 0.37979995 447 170 362 876 1.96
Source: Statistisches Bundesamt, ed., (2004)

6.2 Estimators for Life Table Functions Including the Hazard Rate6
In this section we assume strictly grouped data, i.e., we only know how many sample units either
failed or had censored lifetimes in each interval. The case, where we have for each interval
individually recorded failure times and censored times, will be treated in Sect. 6.3.
We now revert to the notation for grouped data as has been introduced in Fig. 4/2, i.e., we drop
the two indices in n qx , n px and n dx , which indicate flows out of the interval [x, x + n), and
we switch from x and y to i and j, conventionally used in counting. The demographic life table
6
Suggested reading for this section: E LANDT–J OHNSON /J OHNSON (1980), G EHAN (1969), K IMBALL (1960),
L ONDON (1988), M ILLER (1981), M ÜLLER et al. (1997), S INGPURWALLA /W ONG (1983), S MITH (2002).
156 6 Estimating the Hazard Rate from Life Tables

does not know censored observations which usually might be encountered in life tables used in
displaying and evaluating data from clinical and biological survival studies or from life testing.
The type of life table used now is outlined in Tab. 6/2. We start with commenting upon the seven
columns 2 – 8 which either reflect observations (nj , n0j , cj , dj ) or are quantities defined on the
time axis (tj , t∗j , wj ). The last five columns show estimates, and their estimators are the core of
this section.
Table 6/2: Lay–out of a non–demographic life table

j Ij wj t∗j nj cj n0j dj qbj pbj Π


bj fb(t∗j ) h(t∗j )
b

1 [t0 , t1 ) w1 t∗1 n1 c1 n01 d1 qb1 b 1 = 1 fb(t∗ )


pb1 Π 1 h(t∗1 )
b

2 [t1 , t2 ) w2 t∗2 n2 c2 n02 d2 qb2 pb2 Π


b2 fb(t∗2 ) h(t∗2 )
b
.. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . .

m [tm−1 , tm ) wm t∗m nm cm n0m dm qbm pbm Π


bm fb(t∗m ) b
h(t∗m )

m + 1 [tm , ∞) − − nm+1 cm+1 n0m+1 dm+1 1 0 Π


b m+1 − −

1.
Ij = [tj−1 , tj ); j = 1, 2, ..., m + 1; tm+1 = ∞, t0 = 0; (6.4a)
is the half–open time interval. The last interval is infinite in length.

2.
wj = tj − tj−1 ; j = 1, 2, ..., m; (6.4b)
is the width of the interval j. A constant width will be denoted w. The widths are required
to estimate rates such as PDF and HR. Since the width of the last interval is infinite, no
estimate of either PDF and HR can be given for this interval.

3.
tj−1 + tj
t∗j = ; j = 1, 2, ..., m; (6.4c)
2
is the midpoint of interval j. The midpoints are used as point of reference for the estimated
PDF and HR which are assumed to be constant within Ij .

4. cj ; j = 1, 2, ..., m + 1; is the total number of units whose lifetime is censored in Ij


without regard to the reason of censoring.7

5. dj ; j = 1, 2, ..., m + 1; is the total number of units who die or fail in the j–th interval.

6. nj ; j = 1, 2, ..., m + 1; is the number of units entering Ij . Especially we have

n1 = n, (6.4d)

n being the sample size. nj is the number of units exposed to the risk of either dying
(failing) or being censored in Ij . The updating formula linking the nj ’s is

nj = nj−1 − dj−1 − cj−1 ; j = 2, 3, ..., m + 1. (6.4e)


7
There are life tables with cj split into several categories: the number of losses to follow–up, the number of
withdrawals not failed or the number of deaths or failures due to another reason than that under investigation.
These numbers are irrelevant for the estimation process and thus are not displayed here.
6.2 Estimators for Life Table Functions Including the Hazard Rate 157

Furthermore, the sample size can be decomposed as


m+1
X m
X
n= dj + cj . (6.4f)
j=1 j+1

7. n0j ; j = 1, 2, ..., m + 1; is the number of units exposed to risk of dying or failing in Ij


for the special reason under study. If all the censoring would occur immediately at the
start of Ij , then the number at risk (with the potential of dying or failing in the interval) is
essentially nj − cj . On the other hand, if the censoring occurs just before the end of the
interval, the censored units were at risk for the whole duration of Ij and the number of units
at risk is obtained by interpreting cj = 0. Averaging these extrem numbers of units at risk
for estimation purposes we have
(nj − cj ) + nj cj
n0j = = nj − . (6.4g)
2 2
This definition of n0j essentially assumes that all censoring happens at t∗j or occurs uni-
formly across Ij .

b j , which are needed in calculating fb(t∗ ) and b


The estimated quantities qbj , pbj and Π h(t∗j ), have
j
counterparts in the sampled population. We will present these first. The unconditional probability
for a member of the sampled population to survive up to the start of Ij is

Πj = S(tj−1 ); j = 1, 2, ..., m + 1 with (6.5a)


Π1 = S(t0 ) = S(0) = 1.

The unconditional probability of its failing in Ij is

Ztj
πj = f (u) du = F (tj ) − F (tj−1 ) = S(tj−1 ) − S(tj )
tj−1
= Πj − Πj+1 ; j = 1, 2, ..., m + 1; with Πm+2 = 0. (6.5b)

Combining πj and Πj we have the following conditional probability of failing in Ij , given survival
up to the start of Ij :
πj Πj+1
qj = =1− . (6.5c)
Πj Πj
The complement
Πj+1
pj = 1 − q j = (6.5d)
Πj
is the conditional probability of surviving Ij . We immediately see that

Πj = p1 · p2 · ... · pj−1 ; j = 1, 2, ..., m + 1; with p0 = 1. (6.5e)

Thus, Πj is the cumulated product of the conditional survival probabilities for the first j − 1
intervals. (6.5e) is the life table analogue of the product limit estimator (5.3a). Upon combining
(6.5c,e) we may write the unconditional failure probability in Ij as8
j−1
Y
πj = Πj qj = qj pi , (6.5f)
i=1
k
8 Q
Remember ai = 1 for k < 1.
i=1
158 6 Estimating the Hazard Rate from Life Tables

and we have

π1 = q 1
p1 = 1 − qi = Π2

Estimating Πj , qj and pj when there is no censoring

When there is no censoring the construction of estimators for qj , pj and Πj is straightforward. In


this case we have
m+1
X
n= dj , (6.6a)
j=1

i.e., all members of the sample will be observed dying or failing. The set {d1 , d2 , ..., dm+1 } has
a multinomial distribution:
m+1
Y π dj
Pr(d1 , d2 , ..., dm+1 ) = n! (6.6b)
dj !
j=1

with
E(dj ) = n πj , (6.6c)

Var(dj ) = n πj (1 − πj ), (6.6d)

Cov(dj , dk ) = −n πj πk , j 6= k. (6.6e)

The total number of units entering Ij , i.e., surviving up to tj−1 is

nj = n − (d1 + d2 + ... + dj−1 ); j = 2, 3, ..., m + 1; (6.7a)

with binomial distribution


 
n n
Pr(nj ) = Πj j (1 − Πj )n−nj ; j = 2, 3, ..., m + 1. (6.7b)
nj

The well–known MLE of the binomial parameter Πj is

b j = nj ; j = 2, 3, ..., m + 1;
Π (6.7c)
n
which is also binomially distributed and has

E(Π
b j ) = Πj , (6.7d)
Πj (1 − Πj )
Var(Π
bj) = . (6.7e)
n

We mention that Π
b i and Π
b j with i < j are positively correlated:
s
1 − Πi 1 − Πj
Cor(Π
b i, Π
bj) = . (6.7f)
Πi Πj

With n → ∞ the distribution of Π b j goes to a normal distribution, and its mean and variance are
estimated by (6.7d,e) upon substituting Πj by its estimator Π
b j . Thus, we easily have approximate
confidence intervals for Πj .
6.2 Estimators for Life Table Functions Including the Hazard Rate 159

In (6.5c) we have seen that the conditional probability of dying or failing in Ij = [tj−1 , tj ),
conditional on survival up to tj−1 , is qj . So, the proportion of deaths or failures in Ij ,

dj
qbj = (6.8a)
nj

has a binomial distribution with parameters nj and qj and

q j | nj ) = q j ,
E(b (6.8b)
nj − d j
pj | nj ) = pj with pbj = 1 − qbj =
E(b , (6.8c)
nj
pj q j
qj | nj ) = Var(b
Var(b pj | n j ) = . (6.8d)
nj

Conditional on (n1 = n, n2 , ..., nj ) the random variables qbj , ..., qbj are mutually independent.
Besides the estimator of Πj in (6.7c) there is another estimator which rests upon (6.5e):
b j = pb1 · pb2 · ... · pbj−1 ; j = 1, 2, ..., m + 1; pb0 = 1.
Π (6.9a)
Because of the mutual independence of the qbj ’s and hance of the pbj ’s the estimator (6.9a) is
unbiased and has the approximate conditional variance9
j−1
b j | n1 , ..., nj ) = Π2j
X qi
Var(Π ; j = 2, 3, ..., m + 1; (6.9b)
n i pi
i=1

which is G REENWOOD’s formula. Substituting Πj , qi and pi by their estimates we obtain the


estimated variance
j−1
c Π b2
b j | n1 , ..., nj ) = Π
X di
Var( j ; j = 2, 3, ..., m + 1. (6.9c)
ni (ni − di )
i=1

Observing that ni − di = ni+1 we can rewrite a term of the sum in (6.9c) as


di ni − ni+1 1 1
= = − ,
ni (ni − di ) ni ni+1 ni+1 ni
so that the sum turns into j−1  
X 1 1 1 1
− = − .
ni+1 ni nj n1
i=1
As n1 = n, we finally have

1 1
 b 2 (n − nj )
Π
b j | n1 , ..., nj ) = b2 j
c Π
Var( Π j − =
nj n n nj
b 2 (1 − Π
Π Π bj)
b j (1 − Π
bj)
j
= = , (6.9d)
nj n

since Πb j = nj n. So, (6.9c) also estimates the unconditional (exact) variance of Π
b j in (6.7e)
after substituting Πj with Π
bj.

Estimating Πj , qj and pj when there is censoring

With censoring (6.7a) does no longer hold and the number nj of units entering Ij = [tj−1 , tj ) is
not the number of survivors up to tj−1 because some of the lifetimes censored before tj−1 may
last longer than tj−1 . Consequently, Πj cannot be estimated by (6.7c), and we have to revert to
9
For a proof see E LANDT–J OHNSON /J OHNSON (1980, p. 140).
160 6 Estimating the Hazard Rate from Life Tables

(6.5e) and its conditional survival probabilities in search for an estimatorΠ b j . Thus, we have to
look for an estimator of qj = 1 − pj under censoring. The estimator dj nj in (6.8a) might be
expected to underestimate qj , since it is possible that some of the units censored in Ij might have
failed or died before the end of Ij , hat they not been censored first. It is therefore desirable to
make some adjustments for the censored units. The most commonly used procedure is to estimate
qj by the so–called actuarial estimator, also called standard life table estimator, which is
dj d
qbj =  = j0 , (6.10a)
ni − cj 2 nj

i.e., we replace nj in (6.8a) by n0j , the average number of units exposed to risk of failing or dying,
see (6.4g). This adjustment is arbitrary, but sensible in many situations. Its appropriateness
depends on the failure and censoring process of course. Once estimates qbj and pbj = 1 − qbj have
been calculated, Πj can be estimated using (6.9a).
Conditioned on nj and cj and assuming that qbj of (6.10a) is approximately a binomial proportion
we have

qj | nj , cj ) = Var(b
Var(b pj | nj , cj )
pj q j pj q j
≈ =  . (6.10b)
n0j nj − cj 2
c qj | nj , cj ) is found by replacing pj and qj by their estimators. Conditional on
An estimator Var(b
the sets {nj } = (ni , ..., nj ) and {cj } = (c1 , ..., cj ), G REENWOOD’s formula for the conditional
variance of Πb j , derived for the uncensored data in (6.9b,c), is approximately valid here when nj
is replaced with n0j = nj − cj 2. So we have


j−1
c Π b 2j
b j | {nj }, {cj }) ≈ Π
X qbi
Var( ; j = 2, 3, ..., m + 1. (6.10c)
n0i pbi
i=1

This estimator is reasonable provided E(n0j ) is not too small, though, when there is a lot of cen-
soring, m + 1, the number of intervals, should not be too small. (6.10c) sometimes tends to
underestimate the variance of Πb j for intervals in the right–hand tail of the lifetime distribution,
0
essentially when E(nj ) is quite small. However, in such instances the distribution of Π b j is typ-
ically highly skewed, and its variance is not a particularly good indicator of estimator precision
anyway.

Estimating f (t∗j )

The PDF f (x) of a lifetime distribution is also called curve of deaths in the context of life table

analysis. With tj = (tj−1 + tj ) 2 as midpoint and wj = tj − tj−1 as width of the j–th interval10


we can approximate the PDF at t∗j by the well-known formula



∗ dS(t)
f (tj ) = −
dt t=t∗
j

S(tj−1 ) − S(tj )

wj
Πj − Πj+1
=
wj
Πj − pj Πj
=
wj
10
As the last interval Im+1 has length ∞, in all the following formulas j ranges from 1 to m.
6.2 Estimators for Life Table Functions Including the Hazard Rate 161

Πj q j
=
wj
πj
= ; j = 1, 2, ..., m; see (6.5f); (6.11)
Πj

i.e., f (t∗j ) is the unconditional probability πj of failing in Ij per unit width, the definition of the
PDF.
To arrive at an estimator we insert estimators of Πj and qj . When the sample has not been censored
 
we take qbj = dj nj and Π b j = nj n and have the following estimator of the PDF at the midpoint
t∗j of Ij :
Π
b j qbj dj
fb(t∗j ) = = . (6.12a)
wj n wj
fb(t∗ ) is taken to construct the histogram–estimator of the PDF. From (6.6c,d) and (6.5c) we have
j

E(dj ) = n πj = n qj Πj , (6.12b)
Var(dj ) = n πj (1 − πj ) = n qj Πj (1 − qj Πj ), (6.12c)

so that unconditional variance of fb(t∗j ) is


h i  1 2 qj Πj (1 − qj Πj ) πj (1 − πj )

Var f (tj ) =
b Var(dj ) = 2 = . (6.12d)
n wj n wj n wj2

The estimated version of (6.12d) results when qj and Πj are replaced by their estimators qbj =
 
dj nj and Π
b j = nj n, respectively:

c fb(t∗j ) = dj (n − dj ) .
h i
Var (6.12e)
n3 wj2

When the sample is censored we have to use

Π
b j qbj pb1 pb2 ...b
pj−1 qbj
fb(t∗j ) = = =: g(b
p1 pb2 ...b
pj−1 pbj ), (6.13a)
wj wj
where qbj = dj n0j and pbj = 1 − qbj . For this estimator we can only give a conditional variance.11


The problem now is to find this variance as the function g(b p1 pb2 ...b
pj−1 pbj ) of j variates. Based on
the large–sample–approximation formula
j  
  X ∂g ∂g
Var g(b pj−1 pbj ) ≈
p1 pb2 ...b pi , pbk ) ,
Cov(b
∂ pbi ∂ pbk
i,k=1

G EHAN (1969, p. 644) gives the result


!2 j−1  
c fb(t∗j ) | {n0j } ≈
  Π
b j qbj X qbi pbi
Var + , (6.13b)
wj n0i pbi n0i qbi
i=1

Estimating h(t∗j )

As is well known the hazard rate at any point of time is defined as h(t) = f (t) S(t). For t = t∗j


it is not directly estimated as fb(t∗j ) Π



b j , since Π
b j is the estimated probability of survival up to tj
11
Of course, when there is no censoring we may use (6.13a) with nj instead of n0j , and we have a conditional
variance (6.13b) with nj .
162 6 Estimating the Hazard Rate from Life Tables

and not up to t∗j . We thus must find Π(t


b ∗ ), the probability of of surviving up to t∗ , the midpoint
j j
of Ij . Clearly, by linear interpolation

b ∗j ) = Πj + Πj+1 = Πj (1 + pbj ) .
b b b
Π(t (6.14a)
2 2
Upon combining (6.13a) and (6.14a) we have

fb(t∗j ) 2 qbj
h(1) (t∗j ) =
b = . (6.14b)

Π((tj )
b wj (1 + pbj )

h(1) (t∗j ) is the most popular estimator, sometimes called classical or actuarial estimator of h(t∗j ).
b
(6.14b) can be transformed by inserting qbj = dj n0j into


dj
h(1) (t∗j ) =
b (6.14c)
wj (n0j

− dj 2)

which is known by the name central death rate. The denominator wj (n0j − dj 2) estimates the


h(1) (t∗j ) is related to the estimators of Chapter 7, and in


total–time–on–test spent in Ij , and so b
fact, G RENANDER (1956) has shown that this a MLE when the hazard rate is non–decreasing. An
approximate variance estimator, conditioned on {nj }, has bee derived by G EHAN (1969):
 (1) ∗ 2  " #
(1) (t∗ ) 2 
 (1) ∗ h
b (t j )  w j h
b
j
h (tj ) | {n0j } ≈

Var
c b 1− . (6.14d)
n0j qbj  2 

When n0j is small, (6.14d) looses accuracy. This implies that if there are very few survivors in the
 (1) ∗ 
later intervals, the computation of Var
c b h (tj ) | {nj } is not worthwhile for these later stages.
Another estimator for h(t∗j ) is found by converting the conditional failure probability qbj = dj n0j


into a rate:
qbj dj
h(2) (t∗j ) =
b = , (6.15a)
wj wj n0j

called death rate. Its estimated approximate variance, conditioned on {n0j }, follows from (6.10b)
as
 (2) ∗ 1 pbj qbj
h (tj ) | {n0j } ≈ 2 0 .

Var
c b (6.15b)
wj nj

h(1) (t∗j ) and b


The statistics b h(2) (t∗j ) converge to different functions as n → ∞, assuming for now
that we have a constant width w, assumed fixed:12

S(t∗j − w 2) − S(t∗j + w 2)
 

lim b (1)
h (tj ) = w  ∗    = e
∗ + w 2)
h(1) (t∗j ), (6.16a)
n→∞
2 S(t j − w 2) − S(t j

S(t∗j − w 2) − S(t∗j + w 2)
 
h(2) (t∗j ) =
lim b h(2) (t∗j ).
= e (6.16b)
w S(t∗j − w 2)

n→∞

We see, that
2 1
h(1) (t∗j ) ≤
e h(2) (t∗j ) ≤ ,
and e
w w
12
See M ÜLLER et al. (1997).
6.2 Estimators for Life Table Functions Including the Hazard Rate 163

so that neither statistic can approximate h(t∗j ) whenever h(t∗j ) > 2 w. This is the underlying


reason for the discretisation biases inherent in bh(1) (t∗j ) and b


h(2) (t∗j ) when viewed as estimators

of h(tj ).
The estimators in (6.14b) and (6.15a) have been derived for the hazard rate at the midpoint of
Ij = [tj−1 , tj ), but they are often taken as estimators for all t in Ij , i.e., by assuming the hazard
rate is constant within Ij . There exists, however, a special estimator for the assumption that h(t)
is constant within an interval, but varies among the intervals. This estimator has been proposed
by S ACHER (1956). Let hj be this constant hazard rate in Ij , then, given survival up to the
start tj−1 of Ij , the chance of failure or death in Ij is 1 − exp(−hj wj ) and the chance of
surviving is exp(−hj wj ) . Assuming that there is no censoring the number of failures in Ij
has a binomial distribution with parameters nj (number of survivors up to tj−1 ) and binomial
proportion 1 − exp(−hj wj ), i.e.,
 
nj
Pr(dj ) = [1 − exp(−hj wj )]dj [exp(−hj wj )]nj −dj . (6.17a)
dj
 
The usual MLE of the binomial parameter 1 − exp(−hj wj ) is

\ j wj ) = dj .
1 − exp(−h (6.17b)
nj

Observing dj = nj − nj+1 and pbj = nj+1 nj we find, after some manipulations, S ACHER’s
estimator13
(3) ln pbj
hj = −
b . (6.17c)
wj
This is a MLE too, and as it is a natural log–function of the binomial variate dj its variance can be
approximated by the method of statistical differentials using (1.30b). The result for the estimated
version of the conditional variance then is

(3) 1 qbj
Var[
c b hj | {nj }] ≈ 2 . (6.17d)
wj nj pbj

Often, (6.17c,d) are applied when there is censoring, so that nj is replaced with n0j . This procedure
is satisfying as long as the numbers of censored lifetimes per interval do not differ much. As
(3)
G EHAN (1969) reports, Monte Carlo Studies have shown that b h(1) (t∗j ) is less biased than b
hj ,
(3)
and S INGPURWALLA /W ONG (1983) report that b h is almost positively biased whereas b
j h(1) (t∗ ) j
tends to be negatively biased as t∗j increases.

Example 6/1: Life testing of 200 pieces of equipment

n = 200 pieces of equipment had been put on a life test for a certain type of failure. The test was scheduled
to last at most tend = 240 hours. By the end of the test 5 pieces had not failed. The reason for censoring
in this life test is failure due to another reason than that under study and due to withdrawal of non–failed
units for special further investigations. Table 6/3 gives the data and Table 6/4 displays estimates together
with their estimated variances.
13
An estimator similar to (6.17c) is
(4) 1 
hj = − ln pbj−1 + ln pbj ,
b
2
qj +0.5 qbj2 −... we are — for hj and qj very small —
and from the series presentation of ln pbj = ln(1− qbj ) = −b
back to (6.15a).
j

9
8
7
6
5
4
3
2
j Var(b Π Var( 1

11
10
c qj | n0j ) bj c Π b j | {n0j }) fb(t∗j ) Var[ h(1) (t∗j ) Var[
c fb(t∗j )|{n0j }] b c b h(3) (t∗j ) Var[
h(1) (t∗j )|{n0j }] b c b h(3) (t∗j )|{n0j }]
164

qbj pbj
(7.10a) (7.10b) (7.9a) (7.10c) (7.13a) (7.13b) (7.14c) (7.14d) (7.17c) (7.17d)
Ij

×10−4 ×10−4 ×10−3 ×10−4 ×10−2 ×10−6 ×10−2 ×10−6

240 − ∞
0 − 24

72 − 96
48 − 72
24 − 48

96 − 120

216 − 240
192 − 216
168 − 192
144 − 168
120 − 144
1 0.0050 0.9950 0.249 1 0 0.208 0.010 0.021 0.044 0.021 0.044
2 0.0202 0.9798 0.995 0.9950 0.249 0.835 0.208 0.085 0.180 0.085 0.180


wj

24
24
24
24
24
24
24
24
24
24

3 0.0310 0.9690 1.553 0.9749 1.224 1.260 0.534 0.131 0.287 0.131 0.287
t∗j


84
60
36
12

228
204
180
156
132
108

4 0.0753 0.9247 3.742 0.9447 2.625 2.963 3.093 0.326 0.757 0.326 0.760
5
6
4
1
dj

11
17
36
40
27
24
14

5 0.1416 0.8584 7.171 0.8736 5.584 5.154 9.595 0.635 1.670 0.636 1.689 0
1
2
1
3
1
3
2
1
1
0
cj

6 0.1882 0.8118 10.645 0.7499 9.588 5.879 12.747 0.865 2.744 0.869 2.804
5

7 0.3493 0.6507 19.852 0.6088 12.305 8.861 29.358 1.764 7.428 1.791 8.141
nj

17
36
73
116
144
171
187
194
199
200
Table 6/3: Data for life table estimation

8 0.4966 0.5034 34.481 0.3961 12.568 8.196 25.555 2.752 18.747 2.859 23.618
5
n0j

35
186
200

9 0.4857 0.5143 71.370 0.1994 8.596 4.036 6.421 2.673 37.704 2.771 46.847
16.5
72.5
114.5
143.5
169.5
193.5
198.5

10 0.6667 0.3333 134.680 0.1026 5.112 2.849 3.495 4.167 118.371 4.578 210.438
11 1 0 0 0.0342 1.985 − − − − − −
Table 6/4: Estimates (with variances) of life table quantities
6 Estimating the Hazard Rate from Life Tables
6.3 Related Hazard Rate estimators 165

Figure 6/1: Plot of life table quantities

6.3 Related Hazard Rate estimators


Even if the data are grouped in fixed interval, we may still have records of the exact failure and
censoring times. Let t∗ji denote the recorded time of failure or censoring for the i–th unit in
interval Ij = [tj−1 , tj ); j = 1, 2, ..., m + 1; tm+1 = ∞. For t∗ji we have
 
+
 t if unit i failed in Ij , 
 ji

 


∗ −
tji = tji if unit i has been censored in Ij , (6.18a)

 


 t if unit i survived I . 

j j

The t∗ji –s can be processed to give the amount of ‘item × time units’ spent in Ij :
nj
X
Lj = (t∗ji − tj−1 )
i=1
X X
= (t+
ji − tj−1 ) + (t−
ji − tj−1 ) + nj+1 wj . (6.18b)
i i

Lj is the sample equivalent of Lx in (6.2a), called the amount exposed to risk. The central death
rate for Ij , which is taken as an estimator of the hazard rate in Ij , results as
dj
h(5) (t∗j ) =
b , (6.18c)
Lj
expressing the number of failures per item per unit of time. We note that when we assume that
over Ij
166 6 Estimating the Hazard Rate from Life Tables

• the time at failure has mean (tj−1 + tj ) 2 and

• the time of censoring has mean (tj−1 + tj ) 2, too,

then the average amount of item × time units is


 
cj dj
L∗j = wj nj − −
2 2
 
dj
= wj n0j − , see (6.4g). (6.18d)
2

With L∗j inserted in (6.18c) instead of Lj we are back to b


h(1) (t∗j ) of (6.14c).
The sampling procedure or ‘stop rule’ for another hazard rate estimator is based on the number
of failures observed rather than on the attainment of predetermined time limits, i.e., we now have
fixed numbers of failures per interval and the boundaries of the intervals are random variables.
Based on this sampling procedure S EAL (1954) has found an unbiased estimator of h(t) when
there is no censoring. The boundaries tj and the width wj of the j–th interval now are random
and result from the cumulated numbers of failures nj such that nj+1 − nj = dj is equal to the
preassigned number of failures in Ij . If wj is small enough so that one may assume

h(tj−1 + τ ) = hj for 0 ≤ τ < wj (6.19a)

the estimator given by S EAL reads

(6) dj − 1
hj =
b , tj−1 ≤ t < tj , (6.19b)
j −1
dP
(nj − i) (t+
j,i+1 − t+
ji )
i=0

where t+
ji is the time of failure of item i in Ij . The denominator in (6.19b) is nothing but the
(6)
number of time units lived in Ij by those items that failed in Ij . The estimated variance of b
h is j

 (6) 2
(6)  hj
b
Var
c hj
b ≈ . (6.19c)
dj − 2

Assumption (6.19a), although suitable for some purposes, is not consistent with most data on
mortality and failure intensities which indicate that h(t) usually is a non–decreasing function
of t. K IMBALL (1960) has given hazard rate estimators when (6.19a) does not hold, but instead it
is assumed that h(t) increases linearly with τ over the interval for which h(t) is being estimated.
The estimates and their variances are obtained recursively.
7 Maximum Likelihood Estimation of
Monotone Hazard Rates1
The direct non–parametric ML estimation of the hazard rate started as early as 1956 by papers of
G RENANDER (1956) and K IEFER /W OLFOWITZ (1956). They assumed a continuous distribution
with increasing hazard rate and a sample with uncensored data. Later papers generalized their
ideas to decreasing as well as to U-shaped hazard rates, to discrete distributions, to censored
samples and to the discovery of the statistical properties of the resulting estimators. A general
drawback of the ML–estimated hazard rate is its non–smoothness, i.e., the hazard rate is assumed
constant between observed failure times. Another drawback is that for U–shaped hazard rates the
change point has to be known and that for monotone hazard rates one first has to test whether the
distribution is IHR or DHR.2 We will only discuss the estimation of monotone hazard rates — for
U–shaped hazard rates see M YKYTYN /S ANTNER (1981) — and we will first present results for
complete sample data (Sect. 7.1) and then give results for censored samples (Sect. 7.2).
ML methods are especially apt for processing non–grouped data of smaller sample sizes. When
we have a continuous lifetime distribution, the estimates found here subsequently should be
smoothed in one way or the other, whereas for a discrete lifetime distribution the estimates found
here are definitive.

7.1 The Case of Complete Samples


Since the hazard rate is given by
f (x)
h(x) =
S(x)
and the survival function by
Zx
 

S(x) = exp− h(u) du


0

the likelihood function for n independent uncensored lifetimes x1 ≤ x2 ≤ ... ≤ xn follows as


n
Y n
Y
L = f (xi ) = h(xi ) S(xi )
i=1 i=1
Zxi
 
n
Y
= h(xi ) exp− h(u) du , (7.1a)
i=1 0

so that the log–likelihood is

n n Z xi
X X
L = ln L = ln h(xi ) − h(u) du. (7.1b)
i=1 i=1 0

1
Suggested reading for this chapter: BARLOW et al. (1972), G RENANDER (1956), K IEFER /W OLFOWITZ (1956),
M ARSHALL /P ROSCHAN (1965), M YKYTYN /S ANTNER (1981), PADGETT (1988), PADGETT /W EI (1980),
P RAKASA R AO (1970)
2
Such test will be presented in Sect.10.2.2.
168 7 Maximum Likelihood Estimation of Monotone Hazard Rates

We first consider the case of a continuous IHR distribution. It is not possible to obtain a MLE
directly by maximizing either L or L, since (h(x) can be arbitrarily large. It follows from argu-
mentation of M ARSHALL /P ROSCHAN (1965) that
h
X n−1
X
L≤ ln h(xi ) − (n − i) (xi+1 − xi ) h(xi ) =: L∗ . (7.1c)
i=1 i=1

The maximization of L∗ subject to h(x1 ) ≤ h(x2 ) ≤ ... ≤ h(xn ) is performed in G RENANDER


(1956). This yields for h(xi ) :
 

 

 
 ν − κ 
h(xi ) = min max ν−1
b ; i = 1, 2, ..., n − 1 (7.1d)
ν≥i+1 κ≤i  P 


 (n − j) (xj+1 − xj ) 


j=κ

and
h(xn ) = ∞.
b (7.1e)
For the remaining values of x, b
h(x) is determined as
 


 0 for x < x1 



 

h(x) =
b h(xi ) for xi ≤ x < xi+1 ; i = 1, 2, ..., n − 1
b , (7.1f)

 

 
 ∞

for x ≥ xn

h(x) is a monotone increasing step–function, see Fig. 7/1. The estimator (7.1d) is consis-
so that b
tent and its — not simple looking — asymptotic distribution has been found by P RAKASA R AO
(1970). The corresponding estimators of S(x) and f (x) are obtained using b h(x) :
 x 
Z
S(x)
b = exp− b h(u) du
0
n X  o
= exp − h(x) min(x, xi−1 ) − xi
b (7.1g)

where the sum is over i such that xi ≤ x, and

fb(x) = b
h(x) S(x).
b (7.1h)

The resulting curve for S(x)


b has knees (break points) at the distinct failure times and that for fb(x)
looks rather strange, see Fig. 7/1.
The estimator in (7.1d) is built on quantities

Ti = (n − i) (xi+1 − xi ); i = 0, 1, ..., n − 1; (7.2a)

where x0 = 0. Ti is nothing but the total–time–on–test spent by the xi –survivors in the interval
[xi , xi+1 ), i.e., between the i–th and the (i + 1)–st failure times. Another estimator of h(x),
called naive estimator by some authors, is the reciprocal of Ti :
 
1

 for xi ≤ x < xi+1 ; i = 0, 1, ..., n − 1 

h(xi ) =
e (n − i) (xi+1 − xi ) (7.2b)
0 for x ≥ xn .

 

7.1 The Case of Complete Samples 169

The naive estimator is asymptotically unbiased, but it is not consistent since it has a limiting
non-degenerate distribution. Furthermore, since for any n distinct time points x1 , ..., xn the es-
h(xi ), ..., e
timators e h(xn ) are asymptotically independent, the graph of e h(x), x ≥ 0, will exhibit
wild fluctuations, prohibiting its use as an estimator for a monotone hazard rate. For this reason
S INGPURWALLA /W ONG (1983) have proposed a smoothed version of e h(x), smoothed by kernel
methods.
The estimator bh(xi ) in (7.1d) can be interpreted as the result of averaging naive estimators until
an increasing sequence b h(xi ) ≤ bh(x2 ) ≤ ... ≤ bh(xn ) has been found. First, the maximum of

L in (7.1c) is found, giving h(xi ) in (7.2b). If there is a reversal, say e
e h(xi ) > e
h(xi+1 ), then set
h(xi ) = h(xi+1 ) in (7.1c) and repeat the procedure. After, at most n steps of this kind, a mono-
tone estimator is obtained. The maximum of L∗ derived with h(xi ) = h(xi+1 ) can be directly ob-
n  o−1
h(xi ) and e
tained by replacing e h(xi+1 ) by their harmonic mean e h(xi )−1 + e
h(xi+1 )−1 2 .
Succeeding steps amount to further such harmonic averaging which is extended just to the point
necessary to eliminate all reversals.

Example 7/1: ML–estimation of an increasing hazard rate of a continuous distribution

The following n = 10 observations xi in Tab. 7/1 have been simulated from a W EIBULL distribution (see
Sect. 3.1) with parameters a = 0, b = 4 and c = 2 and thus come from an IHR distribution. Tab. 7/1
gives the ML–estimated hazard rate values according to (7.1d) together with the naive estimates according
to (7.2b).
Table 7/1: ML–estimates and naive estimates of an increasing hazard rate

i xi−1 ≤ x < xi h(x)


b h(x)
e

1 0 0.69 0 0.1449
2 0.69 1.20 0.1720 0.2179
3 1.20 2.08 0.1720 0.1420
4 2.08 2.21 0.3110 1.0989
5 2.21 3.13 0.3110 0.1811
6 3.13 3.28 0.5894 1.3333
7 3.28 3.72 0.5894 0.5681
8 3.72 4.58 0.5894 0.3876
9 4.58 5.09 0.8230 0.9804
10 5.09 6.50 0.8230 0.7092

Figure 7/1: ML–estimates for a continuous distribution with increasing hazard rate
170 7 Maximum Likelihood Estimation of Monotone Hazard Rates

We now show how to find b h(x) from e h(x). There is a first reversal of e
h(x) between 0.2179 and 0.1420 (i =
 −1 −1
 −1
2 and i = 3), so we replace both values by (0.2179 + 0.1420 ) 2 = 0.1720. The next reversal
 −1 −1
 −1
is between 1.0989 and 0.1811 and both values are replaced by (1.0981 + 01811 ) 2 = 0.3130.
The next reversal is between 1.3333, 0.5681 and 0.3876 which are replaced by (1.3333−1 + 0.5681−1 +

 −1
0.3876−1 ) 3 = 0.5894. The last reversal between 0.9804 and 0.7092 is replaced by (0.9804−1 +

−1
0.7092−1 ) 2
 
= 0.8230. Fig. 7/1 depicts the estimated hazard rate, the estimated density function and
the estimated survival function (solid lines), each supplemented by their true curves (dashed lines).

We now turn to a continuous DHR distribution. Estimation in the DHR case parallels that of the
preceding IHR case with some obvious modifications:

1. As the hazard rate is assumed decreasing there is no trivial estimate for x < x1 .

2. For the same reason the estimator is defined only for x ≤ xn , but it may be extended
beyond xn in any manner that preserves the DHR property.

Thus the estimator in the DHR case reads

b h(xi ) for xi−1 < x ≤ xi ; i = 2, 3, ..., n;


h(x) = b (7.3a)

where
 

 

 
 ν−κ 
h(xi ) = max min
b ; i = 2, 3, ..., n. (7.3b)
 ν−1
ν≥i κ≤i−1  P


(n − j) (xj+1 − xj ) 

 
j=κ

As in the IHR case the estimator (7.3b) results from harmonic averaging the naive estimators until
a decreasing sequence has been reached.

Example 7/2: ML–estimation of a decreasing hazard rate of a continuous distribution

The n = 6 observations xi in Tab. 7/2 come from a W EIBULL distribution with parameters a = 0, b = 4
and c = 0.8, so the sampled population has a decreasing hazard rate. Tab. 7/2 shows the estimated hazard
h(x) according to (7.3a,b) and the naive estimates. There is only one reversal for i = 4 and i =
rate b
 −1
5. Replacing the values 0.1494 and 1.0000 by their harmonic mean yields (0.1494−1 + 1−1 ) 2

=
0.2600.

Table 7/1: ML–estimates and naive estimates of a decreasing hazard rate

i xi−1 ≤ x < xi h(x)


b h(x)
e

2 0.25 0.60 0.5714 0.5714


3 0.60 1.30 0.3571 0.3571
4 1.30 3.53 0.2600 0.1494
5 3.53 4.03 0.2600 1.0000
6 4.03 7.89 0.2591 0.2591

Fig. 7/2 shows the estimated hazard rate, density function and survival function (solid lines) together with
the true curves (dashed lines).
7.1 The Case of Complete Samples 171

Figure 7/2: ML–estimates for a continuous distribution with decreasing hazard rate

A related problem of interest occurs in the case of a discrete distribution. Let F (·) be a discrete
distribution with probability mass Pi at xi , the xi ordered increasingly. Then, for convenience we
encode xi =: i; i = 1, 2, .... The ratio
Pi
hi = ; i = 1, 2, ... (7.4a)
Si
with survival function

X
Si = Pr(X ≥ xi ) = Pj (7.4b)
j=i

is called discrete hazard rate.3 Given hi we find Pi as4


i−1
Y
Pi = h i (1 − hj ); i = 1, 2, ... (7.4c)
j=1

Let a sample of n independent observations from F (·) consist of ki occurrences of xi = i; i =


1, 2, ..., m where
m
X
n= ki .
i=1
The log–likelihood function now reads
m
X
L= ki ln Pi . (7.5a)
i=1

Inserting (7.4c) we find after some rearrangements5


 
Xm m
X
L= ki ln hi + ln(1 − hi ) kj  . (7.5b)
i=1 j=i+1
3
For more on discrete hazard rates see Sect. 1.2.1.
k
4 Q
Remember: ai = 1 for k < j.
i=j
k
5 P
Remember: ai = 0 for k < j.
i=j
172 7 Maximum Likelihood Estimation of Monotone Hazard Rates

The maximization of L with respect to hi yields the naive estimators

 



 0 for i < 1 



 ki 
hi =
e for i = 1, 2, ..., m (7.5c)

 ki + ... + km 

 
1 for i > m.

 

 
hi = ki (ki + ... + km ) is nothing but the estimator of the conditional probability Pi Si =
e
Pr(X = xi | X ≥ xi ) where

m 
ki X
Pbi = and Sbi = kj n.
n
j=i

In Sect. 1.2.1 we have shown that for a discrete distribution the hazard rate corresponds to this
conditional probability, see (1.60c).
An increasing hazard rate of a discrete distribution is found by maximizing (7.5b) subject to
h1 ≤ h2 ≤ ... ≤ hm . The result is

 

 


k + k 
κ κ+1 + ... + kν

hi = min max
b
κ ; i = 1, 2, ..., m. (7.6)
i≤ν≤m κ≤i  P
(k + ... + k )

j m 

 


j=ν

hi of (7.6) are found by averaging through adding numerators and denominators of


Likewise, the b
the naive estimators involved in any reversal.
A decreasing discrete hazard rate is estimated as

 

 


k + k 
κ κ+1 + ... + k ν

hi = max min
b
κ ; i = 1, 2, ..., m. (7.7)
i≤ν≤m κ≤1  P
(k + ... + k )

j m

 

 
j=ν

Estimates of the probability mass function and the survival function can be found by evaluating
hi of (7.6) or (7.7).
(7.4c) and (7.4b), respectively, with the b

Example 7/3: ML–estimation of an increasing hazard rate of a discrete distribution

Tab. 7/3 contains the counts ki of n = 100 replicates of a binomial distribution with parameters N = 10
and P = 0.3. The binomial distribution always has an increasing hazard rate, see Sect. 3.2. In Tab. 7/3 we
hi according to (7.6) — the naive estimates e
also give — besides the b hi according to (7.5c).
We have a reversal between e h6 and e
h7 . The result
 of averaging through adding the pertinent numerators
new new
and denominators is h6 = h7 = (8 + 1) (12 + 4) = 0.5625. Now, we have a reversal between
e e

e hnew
h5 , e e new
6 , h7 and h8 . Thus, we replace these estimates by (23 + 8 + 1 + 1) (35 + 12 + 4 + 3) = 0.6111.
e
Fig. 7/3 displays the estimated hazard rate together with the estimates PMF and survival function.
7.2 The case of Randomly Censored Samples 173

Table 7/3: ML–estimates and naive estimates of the increasing hazard rate of a discrete distribution

i ki hi
b hi
e

1 2 0.0200 0.0200
2 12 0.1224 0.1224
3 30 0.3488 0.3488
4 21 0.3750 0.3750
5 23 0.6111 0.6571
6 8 0.6111 0.6667
7 1 0.6111 0.2500
8 1 0.6111 0.3333
9 2 1.0000 1.0000
10 0 1.0000 1.0000
11 0 1.0000 1.0000

Figure 7/3: ML–estimates of an increasing discrete hazard rate

7.2 The case of Randomly Censored Samples


When the observations are singly censored on the right the formulas of the preceding section
hold, but they can be evaluated only for the k < n failure times observed, and we will only have
estimates bh(x) for x < xk .
The methods of this section request samples that are randomly censored on the right. The data set
for such samples is presented by the pairs (yi , δi ); i = 1, 2, ..., n. δi indicates whether yi is an
uncensored observation (δi = 1) or not (δi = 0) and yi = min(xi , zi ), where xi is a realization
of the interesting lifetime variate X and z
i a realization of some censoring variate Z, independent
of X. Using the hazard rate h(x) = f (x) S(x) of X, the likelihood function, see (4.2g),

n
Y
L= f (yi )δi S(yi )1−δi (7.8a)
i=1
174 7 Maximum Likelihood Estimation of Monotone Hazard Rates

turns into
n
Y
L= h(yi )δi S(yi ). (7.8b)
i=1
h Ry i
From S(y) = exp − h(u) du we finally have the log–likelihood function
0

n n Z yi
X X
L= δi ln h(yi ) − h(u) du. (7.8c)
i=1 i=1 0

Suppose, that h(x) is increasing and — without loss of generality — further assume that y1 ≤
y2 ≤ ... ≤ yn . It follows from (7.1b) that
n
X n−1
X
L≤ δi ln h(yi ) − (n − i) (yi+1 − yi ) h(yi ) =: L∗ , (7.8d)
i=1 i=1

and the problem of maximizing L is equivalent to that of maximizing L∗ . The following results
are due to PADGETT /W EI (1980).
We denote the distinct uncensored failure times by x1 < x2 < ... < xk and let dj be the number
of uncensored failure times exactly at xj ; j = 1, 2, ..., k. Also, let cj denote the number of losses
(due to censoring) which occur in the interval [xj , xj+1 ) for j = 0, 1, ..., k, where x0 = 0 and
(j)
xk+1 = ∞. Furthermore, let the times of the cj losses be denoted by `ι ; ι = 1, 2, ..., λj . The
(j)
quantities just defined — without `ι — are illustrated in Fig. 4/1.
Now, for any given increasing h(x), we can define

 0

 for x < x1


h (x) = h(xj ) for xj ≤ x < xj+1 ; j = 1, 2, ..., k − 1



 h(x ) for x ≥ x .
k k

Then, for each j we have that



bj
P
(n−i)(yi+1 −yi )h(yi )




i=aj +1


h 

(j)  (j) (j)  
≥ (n−aj ) `1 −xj + (n−aj −1) `2 −`1 + ...+


(j) 
i (7.9a)
+ (n − bj ) xj+1 − `cj h∗ (xj ) 


" # 

λj 
(j)
P 
`ι + (n − bj ) xj+1 − (n − aj ) xj h(xj ),

= 


ι=1

where
j−1
X j
X
aj = ci + di , (7.9b)
i=0 i=1
Xj Xj
bj = ci + di . (7.9c)
i=0 i=1
(0) 
Replacing h `ι by zero for ι = 1, 2, ..., λ0 , we have that
k
X k
X

L ≤ dj ln h(xj ) − αj h(xj ) =: L∗∗ , (7.10a)
j=1 j=1
7.2 The case of Randomly Censored Samples 175

say, where
 cj 
P (j)
`ι + (n − bj ) xj+1 − (n − aj ) xj ; j = 1, 2, ..., k − 1

 


 

ι=1
αj = (7.10b)
ck
 P (k) 
`ι − ck xk ; j = k.

 

 
ι=1

Since h∗ (x) is increasing, it follows that the maximization of L is equivalent to that of L∗∗ .
Note that only αk can be zero and this happens when there are no censored observations strictly
larger than xk , the largest uncensored lifetime observed. The problem of obtaining an estimator
of h(x) subject to its increasing is reduced to that of maximizing L∗∗ subject to the constraint
h(x1 ) ≤ h(x2 ) ≤ ... ≤ h(xk ).
In maximizing L∗∗ we have to distinguish two cases.
1. The last observation yn is uncensored so that αk = 0. In this case L∗∗ is unbounded, and
it is not possible to find MLEs of h(x) directly from L∗∗ . Following the argumentation of
M ARSHALL /P ROSCHAN (1965) we estimate h(x) by
 


 0 for x < x1 , 


 
h(x) =
b h(xj ) for xj ≤ x < xj+1 ; j = 1, 2, ..., k − 1;
b (7.11a)

 

 
 b h(xk ) for x ≥ xk , 

where
 Pν  
dµ 


 µ=κ  
h(xj ) =
b min max   ; j = 1, ..., k − 1; 

ν
j≤ν≤k−1 1≤κ≤j  P 
(7.11b)
αµ 
µ=κ 



h(xk ) = ∞.
b 

h(x) truly is not the MLE of h(x), but can be considered as the limit of a sequence of MLEs
b
in the sense of M ARSHALL /P ROSCHAN (1965).

2. The last observation yn is uncensored so that αk 6= 0. In this case the MLE of h(x) is given
by (7.11a) with
 Pν 

 µ=κ 
h(xj ) = min max 
b
ν
 ; j = 1, ..., k. (7.11c)
j≤ν≤k 1≤κ≤j  P 
αµ
µ=κ

The ML–estimation of a decreasing hazard rate with sample data randomly censored on the right
follows along the same lines as above. But there are some evident minor modifications which we
already encountered in Sect. 7.1 for the change–over from the IHR case to the DHR case.

• As h(x) is now assumed decreasing there is no trivial estimate of h(x) for x < x1 .

• For the same reason the estimate is defined only for x < xk , but it may be extended beyond
xk in any manner that preserves the DHR property.

So we have
h(xj ) for xj ≤ x < xj+1 ; j = 1, 2, ..., k − 1;
h(x) = b
b
176 7 Maximum Likelihood Estimation of Monotone Hazard Rates

with  Pν 

 µ=κ 
h(xj ) = max min 
b  ; j = 1, ..., k − 1. (7.12)
ν
j≤ν≤k−1 1≤≤j  P 
αµ
µ=κ

Formulas (7.11a-c) and (7.12) do not only process multiply censored data sets, where the censored
and uncensored observations are mixed, but they also cope with complete data sets as well as with
data sets which are singly censored whether of type–I or of type–II. For these situations the input
has to be organized properly. First of all the observed times yi together with their indicators δi
have to be in ascending order with respect to y.

• When there are no censored observations within the sample of size n the input has to look
like:  
 y1 ≤ y2 ≤ ... ≤ yn−1 ≤ yn 
.
 1 1 ... 1 1 

• When the sample of size n is singly censored of type–I with censoring time y` , ` ≤ n, the
input has to be
 
 y1 ≤ y2 ≤ ... ≤ y ≤ y = y = ... = y n

`−1 ` `+1
.
 1 1 ... 1 0 0 ... 0 

• When the sample of size n is singly censored of type–II with censoring at the k–th failure
the input should read
 
 y1 ≤ y2 ≤ ... ≤ y ≤ y = x = ... = y n

k−1 k k+1
.
 1 1 ... 1 1 0 ... 0 

Example 7/4: ML–estimation of an increasing hazard rate with multiply censored data

The following observations in Tab. 7/4 have been taken from K APLAN /M EIER (1958, p. 464). The hazard
rate has been found by (7.11a,c) because the last observation y8 is censored.
Table 7/4: Hazard rate estimate for K APLAN /M EIER’s data set

i yi δi j xj−1 ≤ x < xj h(x)


b

1 0.8 1 1 −∞ 0.8 0.0000


2 1.0 0
3 2.7 0
4 3.1 1 2 0.8 3.1 0.0375
5 5.4 1 3 3.1 5.4 0.1087
6 7.0 0
7 9.2 1 4 5.4 9.2 0.1087
8 12.1 0 5 9.2 ∞ 0.3448
8 Smooth Hazard Rate Estimators
In the preceding chapters we have presented several estimation approaches that only led to point-
wise or to non–smooth estimates of the hazard rate. When lifetime is a continuous variate with a
continuous hazard rate we, of course, like to have a continuous estimate for its whole course or
at least for the course between the shortest and longest lifetime observed. So, we have to look for
an appropriate method to smooth or to graduate discrete hazard rate estimates. A lot of smooth-
ing or graduating techniques have been developed in mathematics and in statistics, e.g., moving
averages, least squares, splines, orthogonal series, wavelets, or kernels, however an accepted stan-
dard solution to hazard rate smoothing does not exist. It seems that the kernel technique prevails
because it rests upon an intuitive idea, has mostly known mathematical and statistical properties
and is relatively easy to implement. The kernel smoothing approach has been most thoroughly
developed and has an extensive literature. For these reasons the focus of this chapter is on kernel
smoothing, but we also present other techniques in the final sections of this chapter.

8.1 Kernel Smoothing1


A kernel estimator is a convolution of a smooth function and a rough empirical function estimator
chosen in such a way as to produce a smooth functional estimator. The underlying idea is to take
advantage of the fact that this linear functional transfers continuity properties from the smooth
function, the so–called kernel,2 to the final estimator. Although potentially useful in a variety of
settings, kernel methods have been principally exploited in four settings:

• probability density estimation,

• hazard rate estimation, which is very closely related to PDF estimation,

• spectral density estimation, and

• non–parametric regression.

In this Section 8.1 we will only present PDF and HR kernel smoothing for non–grouped data.

8.1.1 Motivation and Basic Concepts

Kernel estimators for the PDF and the HR have the same structure, similar formulas, and share
the same set of problems and nearly the same set of approaches to solve these problems. We start
this section by looking at PDF smoothers which have been the first field of kernel estimation and
which started by papers of ROSENBLATT (1956) and PARZEN (1962), whereas HR estimation by
kernels started a little bit later with papers of WATSON /L EADBETTER (1964a,b).

1
Suggested reading for this section: I ZENMAN (1991), P RAKASA R AO (1983), WAND /J ONES (1995).
2
C ACOULOS (1966) appears to be the first to call this smoothing function a kernel function. Previously it was
referred to as a weight function or as a window.
178 8 Smooth Hazard Rate Estimators

8.1.1.1 The Convolution Formula

The most simple kernel estimator of a PDF for an uncensored set of distinct and ordered obser-
vations x1 < x2 < ... < xn is given by
n  
1 X x − xi
fn (x) =
b K . (8.1)
nb b
i=1

The idea of this estimator is the following: The empirical distribution function Fbn (x) = i/n,
where i is the number of sample observations less or equal to x, or the empirical survival function
Sbn (x) = (n − i)/n are discrete discrete functions each placing mass 1/n at each of the observa-
tions xi , giving the rough empirical function. By formula (8.1) this probability mass is smeared
out continuously, smearing according to the choice of the kernel. The kernel is a smooth func-
tion3 that determines the pattern of how the mass 1/n is redistributed around the the observation
xi , and b, the bandwidth or window width,4 is responsible for ‘how far’ the kernel stretches
out to either side of xi when the kernel is symmetric. Stated in another way, one may say that
all observations xi that are within a distance b on either side of a given point x contribute to the
density estimate at this point x.
(8.1) can be motivated by generalizing the sliding histogram:

x+b
Fbn (x + b) − Fbn (x − b)
Z
1 b
= dFn (u) (8.2a)
2b 2b
x−b

where dFbn (·) is the empirical measure. (8.2a) is a special case of (8.1) when dFbn (·) = 1/n and
 

x − xi
  1/2 for |x − xi | ≤ b 
K = (8.2b)
b  0 else. 

The kernel (8.2b) is known as uniform kernel or rectangular kernel.


Generally, a kernel estimator of the PDF smoothes the increments of the empirical distribution
function Fbn (·) which are also the decrements of the empirical survival function Sbn (·). It is more
common to write the kernel PDF estimator by means of the empirical survival function because
— using the K APLAN /M EIER estimator of Sect. 5.1 — we have a compact notation which covers
both, censored as well as uncensored data sets.
When there are no tied observations5 we denote any observed lifetime — whether uncensored
or not — by yi which are given in ascending order. δi = 1 indicates a failure time (uncensored
lifetime) and δi = 0 a censored observation. The data set thus consists of n pairs (yi , δi ). The
KME of S(x) reads, see (5.3d):
 


 1 for x < x1  

Sbn (x) =  δi (8.3a)
Q n−i
 else, 
i : yi ≤x n − i − 1

 

3
More on the properties and types of kernels is found in Sect. 8.1.1.3.
4
More on bandwidths is found in Sect. 8.1.1.4.
5
Censored observations may be tied among themselves or with uncensored data. In the latter case censored
lifetimes are moved a little amount to the right of the uncensored lifetime so that censoring is assumed to happen
later.
8.1 Kernel Smoothing 179

where x1 = mini (yi , 1), i.e., x1 is the shortest uncensored lifetime observed. Let

n ∆i = Sbn (yi−1 ) − Sbn (yi ); i = 1, 2, ..., n; Sbn (y0 ) = 1; (8.3b)

be the magnitude of the discontinuity of Sbn (·) at yi with


 
 = 0 for (yi , 0), 
n ∆i (8.3c)
 > 0 for (y , 1). 
i

The PDF kernel estimator is then given by


n  
1X x − yi
fn (x) =
b n ∆i K . (8.3d)
b b
i=1

In the special case where all observations are uncensored (δi = 1 ∀ i), Sbn (x) is a staircase with
n steps and 1/n as constant step height and (8.3d) reduces to (8.1). When there are censored
observations we have fewer steps and the step heights are greater than 1/n.
In the case of tied observations, di ≥ 1 being the size of the tie at xi with at least one di > 1, we
have k < n distinct uncensored lifetimes xi . In the interval [xi , xi+1 ); i = 0, 1, ..., k; between
two uncensored lifetimes, where x0 = 0 and xk+1 = ∞, there may be ci , ci ≥ 0, censored
lifetimes. The number of sample units at risk just before xi is

ni = ni−1 − ci−1 − di−1 ; i = 1, 2, ..., k;

where
n0 = n and d0 = 0.
For an illustration of these quantities see Fig. 4/1. The KME of the survival function in this case
of tied uncensored lifetimes, see (5.3a), reads
Y  di

Sn (x) =
b 1− ; i = 1, 2, ..., k; (8.4a)
ni
i : xi ≤x

with step height

n ∆i = Sbn (xi−1 ) − Sbn (xi ); i = 1, 2, ..., k; Sbn (x0 ) = 1. (8.4b)

With respect to the step height we can state


 
 = di /n when the sample has no censored observations at all 
n ∆i (8.4c)
 ≥ d /n when there are censored observations somewhere in the sample.. 
i

The formula of the PDF kernel estimator again is (8.3d), but the summation now goes from i = 1
to i = k and yi is replaced by xi :
k  
1X x − xi
fn (x) =
b n ∆i K . (8.5)
b b
i=1

We know turn to the kernel estimator of the hazard rate. The rough empirical estimator to be
convoluted with a smooth kernel is given by the increments of an estimated CHR. In Sect. 5.2
we have found two such estimators, the indirect or natural estimator, see (5.10), and the direct
estimator, known as N ELSON /A ALEN estimator, see (5.12a), which is nothing but the cumulation
180 8 Smooth Hazard Rate Estimators

of the empirical hazard rate values. In kernel estimation the N ELSON /A ALEN estimator is pre-
ferred over the indirect estimator because it avoids taking logarithms and has a smaller variance,
compare (5.11c) to (5.12b). So we have when there are no tied observations
 


 0 for x < x1 = min (y
i i , 1) 


H(x) =
b
P δi (8.6a)

 else, 

 n−i−1
i : yi ≤x

with step height

n Di
b n (yi ) − H
=H b n (yi−1 ); i = 1, 2, ..., n; H
b n (y0 ) = 0. (8.6b)

and  

 0 for (yi , 0) 

n Di = (8.6c)
1
for (yi , 1). 

 
n−i+1
For tied observations we have
 


 0 for x < x1 = mini (yi , 1) 


H(x)
b = P di (8.7a)

 else, 

i : xi ≤x ni
 

with step height

n Di
b n (xi ) − H
=H b n (xi−1 ); i = 1, 2, ..., k; H
b n (x0 ) = 0 (8.7b)

and  
 = di /n when there is no censoring in the sample 
n Di (8.7c)
 ≥ d /n with censoring somewhere in the sample. 
i

The HR kernel estimator is then given by


 n   
1X x − yi
n Di K for a sample having no ties

 

 b b

 


i=1
hn (x) =
b
k   (8.8)
 1X
 x − xi 

n Di K for a sample with tied observations.

 

 b b 
i=1

8.1.1.2 Performance of Kernel Smoothing

Like any statistical procedure, kernel estimators are recommended only if they possess desirable
properties. These properties depend on — besides the sample size — the chosen kernel and the
chosen size of the bandwidth, but the greatest influence comes from the bandwidth. Finite–sample
properties are available for special situations, but, in general, research emphasis is and has been
on large–sample properties.
Let ψ(x) denote the continuous curve to be estimated by kernel techniques, e.g.,

• the PDF f (x) in density estimation,

• the HR h(x) in hazard rate estimation,


8.1 Kernel Smoothing 181

• the regression function in regression estimation, or

• the spectral density in spectral estimation,

and let ψ(x)


b denote its kernel estimator.
Consider, for example, unbiasedness. The estimator ψ(x)
b is unbiased for ψ(x) if, for all x ∈ R,
which
 — without loss of general validity — is assumed to be the domain of ψ(x), we have
E ψ(x)
b = ψ(x). Unbiasedness seldom exists for kernel estimators, hence attention has focused

on sequences ψbn (x) of kernel estimators that are asymptotically unbiased for ψ(x), that is,
 
for all x ∈ R, E ψbn (x) → ψ(x) as n → ∞.
A more important property is consistency. As n → ∞, ψ(x)b is weakly pointwise consistent for
ψ(x) if ψ(x)
b → ψ(x) in probability for every x ∈ R, and it is strongly consistent if convergence
holds almost surely. Other types of consistency depend on the error criterion chosen, i.e., on the
distance function (= metric) used, either the L1 –norm or the L2 –norm, but L2 –approaches are
dominant as being more tractable than L1 –approaches.
We first look at L2 –approaches, also known as squared–error criteria. If ψ(x) is assumed square–
integrable, then the performance of ψ(x)
b at x ∈ R is measured by the mean squared error

   2
MSE ψ(x)
b = E ψ(x)
b − ψ(x)
  n  o2
= Var ψ(x)
b + Bias ψ(x)
b (8.9a)

where the expectation is with respect to random sampling and


  n  o2
Var ψ(x)
b = E ψ(x)
b − E ψ(x) (8.9b)
   
Bias ψ(x)
b = E ψ(x)
b − ψ(x). (8.9c)
 
If MSE ψ(x)
b → 0 for all x ∈ R as n → ∞, then ψ(x)
b is said to be pointwise consistent in
quadratic mean. More important is to measure how well the entirecurveψ(x)
b estimates ψ(x).
One such measure of goodness of fit is found by integrating MSE ψ(x) over all values of x,
b
yielding the integrated mean squared error6
Z
   2
IMSE ψ(·)
b = E ψ(x)
b − ψ(x) dx. (8.10)

Another measure is the integrated squared error


Z
   2
ISE ψ(·) =
b ψ(x)
b − ψ(x) dx. (8.11a)

Taking expectation over ψ(·)


b in (8.11a) gives the mean integrated squared error
    
MISE ψ(·)
b = E ISE ψ(·)
b . (8.11b)
     
Note that MISE ψ(·)
b = IMSE ψ(·)
b . ISE ψ(·)b is often preferred as a criterion rather than
its mean MISE, since ISE determines how closely ψ(·)
b approximates ψ(·) for a given data set,
whereas MISE is concerned with the average over all possible data sets. On the other hand asymp-
totic MSE and MISE are more often used to find the optimal kernel and the optimal bandwidth as
will be shown below.
6
R
At all times, an unqualified integral sign will be taken to mean integration over R.
182 8 Smooth Hazard Rate Estimators

When L2 –approaches are used in PDF kernel estimation the tail behavior of the density becomes
less important, possibly resulting in peculiarities in the tails of the density estimates. For this and
other reasons some authors prefer L1 –approaches like the integrated absolute error
Z
 
IAE ψ(·) = ψ(x)
b b − ψ(x) dx (8.12a)

which is invariant under monotone transformations with 0 ≤ IAE ≤ 2 for ψ(x) = f (x). The
expectation of (8.12a) over all ψ(·)
b yields the mean integrated absolute error
    
MIAE ψ(·)
b = E IAE ψ(·)
b . (8.12b)

The labor needed to get L1 –results is more difficult than that needed to obtain analogous L2 –
results. It should be realized that the MIAE and the MISE do not necessarily conform to the
human perception of closeness of a curve estimate to its target.
We now take a closer look at the L2 –criteria MSE and MISE when a PDF is to be estimated.
These results for f (x) are needed in Sect. 8.1.2 on indirect hazard rate smoothing which is based
on fbn (x). Furthermore, these results can — more or less easily — be transferred to and gener-
alised for direct hazard rate smoothing, i.e., smoothing the increments of the empirical cumulative
hazard rate as given in (8.8). To keep things easy we assume an uncensored sample with untied
observations so that the estimator to be studied reads
n  
1 X x − Xi
fn (x) =
b K . (8.13)
n bn bn
i=1

where Xi is the i-th ordered lifetime to be observed in the sample.


We will make the following assumptions concerning the PDF, the kernel and the bandwidth:

(i) f (x) is such that its second derivative f 00 (x), which measures the curvature of f (x), is
continuous, square integrable and ultimately monotone.7

(ii) The bandwidth b := bn is a non–random sequence of positive numbers, where the depen-
dence on the sample size n will be suppressed in the following formulas in order to keep
the notation as lean as possible, but we assume

lim bn → 0 and lim (n bn ) → ∞,


n→∞ n→∞

i.e., bn approaches zero, but at a rate slower than n−1 .

(iii) The kernel is a bounded PDF which is symmetric about Xi and has a finite second moment
about the origin.

We first look at the expectation of (8.13) at a given x ∈ R. The Xi ’s in (8.13) are iid (= inde-
pendently and identically distributed) variables with the same PDF as given by the target function
f (·). So we have
n
"  #
  1 X x − X i
E fbn (x; b) = E K
nb b
i=1
n  
x−v
Z X
1
= K f (v) dv
nb b
i=1
 
x−v
Z
1
= K f (v) dv.
b b
7
An ultimately monotone function is one that is monotone over both (−∞, x∗ ) and (x∗ , ∞) for some x∗ > 0.
8.1 Kernel Smoothing 183

Introducing a new variable


x−v
z=
b
and observing the symmetry of K(·) around z = 0 we arrive at
Z
 
E fn (x; b) = K(z) f (x − b z) dz.
b (8.14a)

Expanding f (x − b z) in a TAYLOR series around x gives


1 2 2 00
f (x − b z) = f (x) − b z f 0 (x) + b z f (x) + o(b2 ) (8.14b)
2
uniformly in z, and a remainder, which approaches zero more rapidly than b2 goes to zero with
n → ∞. With (8.14b) we have for (8.14a):
Z
1
E fbn (x; b) = f (x) + b2 f 00 (x) z 2 K(z) dz + o(b2 )
 
(8.14c)
2
because of assumption (iii) above
Z Z Z
K(z) dz = 1, z K(z)dz = 0, z 2 K(z) dz < ∞.

For the second moment about zero of the kernel we write


Z
µ2 (K) := z 2 K(z) dz (8.15)

and thus find the bias of fbn (x, b) as


   
Bias fbn (x; b) = E fbn (x, b) − f (x)
1 2 00
= b f (x) µ2 (K) + o(b2 ). (8.16)
2
A closer look at (8.16) reveals that the bias

• is of order o(b2 ) implying that fbn (x, b) is asymptotically unbiased,


• is proportional to the squared given bandwidth b,
• is proportional the variance of the kernel,8
• depends on the second derivative of the target function f (x) and is zero when there is no
curvature at x and is the higher the greater |f 00 (x)|.

We now look at the variance of (8.13) which — after some manipulations like those for finding
the expectation — reads
Z
1 1 n b o2
K(z)2 f (x − b z) dz −
 
Var fn (x; b) =
b E fn (x, b)
nb n
Z
1 1 2
K(z)2 f (x) + o(1) dz − f (x) + o(1)
 
=
nb n
Z
1
f (x) K(z)2 dz + o(n b)−1
 
=
nb
1
f (x) R(K) + o(n b)−1
 
= (8.17a)
nb
8
R
Remember that — because of µ1 (K) = z K(z) dz = 0 — µ2 (K) is equal to the variance. Tab. 8/1 shows
this variance for different kernels.
184 8 Smooth Hazard Rate Estimators

where we have introduced9 Z


R(K) := K(z)2 dz, (8.17b)

and in general Z
R(φ) := φ(u)2 du (8.18)

−1
for any square–integrable
 function  φ(·). Since the variance is of order (n b) assumption (ii)
above assures that Var fbn (x; b) converges to zero.
Adding (8.17a) and the square of (8.16) gives the mean squared error of fbn (x; b) :

1 1
f (x) R(K) + b4 f 00 (x)2 µ2 (K)2 + o (n b)−1 + b4 .
   
MSE fbn (x; b) = (8.19)
nb 4
Integrating (8.19) under the integrability assumptions in (i) above we obtain
1 1
R(K) + b4 µ2 (K)2 R(f 00 ) + o (n b)−1 + b4 .
   
MISE fbn (·; b) = (8.20)
nb 4
The first two terms on the right–hand side constitute AMISE, the asymptotic mean integrated
squared error:
1 1
R(K) + b4 µ2 (K)2 R(f 00 ),
 
AMISE fbn (·; b) = (8.21)
nb 4
which is a large–sample–approximation to the MISE. AMISE is — besides n — influenced by

• the bandwidth b,

• the kernel K and

• the target function f (·) via its curvature f 00 (·).

We see that the second term of AMISE, the integrated squared bias, is proportional to b4 , so for
this term to decrease one needs to take b to be small. However, taking b small means an increase
in the leading factor of the first term, the integrated variance, which is proportional to (n b)−1 .
Therefore, as n increases b should vary in such a way that each of the two terms of AMISE
becomes smaller. This is known as the variance–bias trade–off and is in accordance with the
intuitive role of b demonstrated in Fig. 8/1 below. For very small b, fbn (.; b) is very spiky and
hence very variable in the sense that, over repeated sampling from f (·) the spikes wold appear in
different places. There is, however, very little bias.
(8.21) lends to find the optimal bandwidth with respect to this criterion. The bandwidth mini-
mizing AMISE can be given in closed form as
 1/5
R(K)
bAMISE = . (8.22)
n R(f 00 ) µ2 (K)2

Aside from its dependence on the known R(K), µ2 (K) and n, (8.22) showsRthat bAMISE is in-
versely proportional to the unknown R(f 00 )1/5 . The functional R(f 00 )1/5 = f 00 (x)2 dx mea-
sures the total curvature of f (·). Thus, for a PDF with little curvature, R(f 00 ) will be small and
a large bandwidth is called for, on the other hand, when R(f 00 ) is large, little smoothing with a
smaller bandwidth will be optimal. Unfortunately, direct use of (8.22) to choose a good band-
width in practice is impossible since R(f 00 ) is not known. Some proposals for estimating R(f 00 )
and then selecting b will be presented in Sect. 8.1.1.4.
9
Tab. 8/1 shows R(K) for different kernels.
8.1 Kernel Smoothing 185

Inserting (8.22) into (8.21) leads to the smallest possible AMISE of fbn (·; b) using a given ker-
nel K :
 5h i1/5
inf AM ISE fbn (·; b) = µ2 (K)2 R(K)4 R(f 00 ) n−4/5 .

(8.23)
b>0 4
This expression gives the rate of convergence of the minimum AMISE to zero as n → ∞.
Under the stated assumptions, the best obtainable rate is of order n−4/5 . This rate is slower than
the typical parametric rate of order n−1 , e.g., E(X)

\ = X with Var X = Var(X)/n. To arrive
at a higher order of convergence one has to choose special kernels, the so–called higher–order
kernels, see Sect. 8.1.1.3.

Example 8/1: Effect of varying bandwidth on smoothed PDF estimates

We have randomly generated n = 80 realizations of the following mixture of two normal distributions:

0.75 · N o(µ = 0, σ = 0.6) + 0.25 · N o(µ = 1, σ = 0.2).

Fig. 8/1 displays the true density as a solid line. Using the biweight kernel, see Tab. 8/1, with three different
bandwidths we have smoothed the data. Smoothing with b = 0.2 gives a very rugged curve because the
kernel is very narrow and the averaging process only covers relatively few observation. This estimate pays
too much attention to the particular data set at hand and does not allow for the variation across the sample
and thus is undersmoothed. Using b = 0.9 results in a much smoother estimate of f (·) which is really too
smooth since the true bimodality has been smoothed away, so this an oversmoothed estimate. The graph
in the middle of Fig. 8/1 is a compromise with b = 0.6. This kernel estimate is not overly noisy and the
structure of the true density, i.e., its bimodality, has been recovered.

Figure 8/1: Biweight–kernel smoothing with different bandwidths


186 8 Smooth Hazard Rate Estimators

8.1.1.3 Kernel Selection

The simplest class of kernels consists of symmetric10 PDFs satisfying

(1) K(u) ≥ 0 ∀ u ∈ R,
R
(2) K(u) = K(−u) =⇒ u K(u) du = 0,
R
(3) K(u) du = 1,

(4) u2 K(u) du =: µ2 (k) 6= 0.


R

Because of (4) such a kernel is called of order 2 or second order kernel. The argument of the
kernel is the scaled variable
x − xi
u= .
b
Second order kernels with an infinite support are, e.g.:

• the G AUSS kernel,

• the C AUCHY kernel,

• the L APLACE kernel,

• the logistic kernel.

For more details of these kernels see Tab. 8/1. More popular, especially in HR and PDF estimation
of lifetime data, are kernels with finite support which mostly are polynomial functions related to
the beta distribution, more precisely, they are symmetric beta distributions on the interval [−1, 1].
Their generating formula is s
K(u) = κr,s 1 − |u|r I|u|≤1 (8.24a)
with
r
κr,s = , r > 0, s ≥ 0, (8.24b)
2 B(s + 1, 1/r)
and the beta function
Z1
Γ(p) Γ(q)
B(p, q) = v p−1 (1 − v)q−1 dv = . (8.24c)
Γ(p + q)
0

Among these kernels — some of which come with different names — the most popular are:

• the uniform kernel or rectangular kernel with

s = 0, r = 1 =⇒ κ1,0 = 1/2;

• the triangular kernel with

s = 1, r = 1 =⇒ κ1,1 = 1;

• E PANECHNIKOV kernel or quadratic kernel with

s = 1, r = 2 =⇒ κ2,1 = 3/4;
10
Asymmetric kernels will be needed when estimating near the boundaries, see further down.
8.1 Kernel Smoothing 187

• biweight kernel or quartic kernel or biquadratic kernel with


s = 2, r = 2 =⇒ κ2,2 = 15/16;

• triweight kernel or triquadratic kernel with


s = 3, r = 2 =⇒ κ2,3 = 35/32;

• tricube kernel with


s = 3, r = 3 =⇒ κ3,3 = 70/81.

After a suitable rescaling the G AUSS kernel is seen to be of the above type with r = 2, s = ∞.
Two other kernels with finite support but not of polynomial type are the cosine kernel
π π 
K(u) = cos u I|u|≤1 (8.25)
4 2
and the semi–elliptical kernel
2p
K(u) = 1 − u2 I|u|≤1 . (8.26)
π
Tab. 8/1 summarizes the kernels mentioned above and displays them together with the pertaining
µ2 (K) = u K(u) du and R(K) = K(u)2 du. I|u|≤1 is the indicator function:
R 2 R
 
 1 for u ∈ [−1, 1] 
I|u|≤1 = (8.27)
 0 for u else. 
Fig. 8/2 displays all the kernels mentioned above.
Table 8/1: Common kernels
Name Formula µ2 (K) R(K)

1 1 1
uniform K(u) = I|u|≤1 ≈ 0.3333 = 0.5
2 3 2
(rectangular)
1 2
triangular K(u) = (1 − |u|) I|u|≤1 ≈ 0.1667 ≈ 0.667
6 3
3 1 3
1 − u2 I|u|≤1

E PANECHNIKOV K(u) = = 0.2 = 0.6
4 5 5
(quadratic)
15 2 1 5
biweight K(u) = 1 − u2 I|u|≤1 ≈ 0.1429 ≈ 0.7143
16 7 7
(quartic, biquadratic)
35 3 1 350
triweight K(u) = 1 − u2 I|u|≤1 ≈ 0.1111 ≈ 0.8159
32 9 429
(triquadratic)
70 3 35 175
tricube K(u) = 1 − |u|3 I|u|≤1 ≈ 0.1440 ≈ 0.7086
81 243 247
π π  8 π2
cosine K(u) = cos u I|u|≤1 1− ≈ 0.1884 ≈ 0.6169
4 2 π2 16
2p 1 16
semi–elliptical K(u) = 1 − u2 I|u|≤1 = 0.25 ≈ 0.5404
π 4 3 π2
1 1
exp − u2 2 , u ∈ R
  
GAUSS K(u) = √ 1 √ ≈ 0.2821
2π 2 π
−1 1
K(u) = π 1 + u2 )

C AUCHY , u∈R non–existent ≈ 0.1592

1   1
L APLACE K(u) = exp − |u| , u ∈ R 2 = 0.25
2 4
exp(−u) π2 1
logistic K(u) =  2 u ∈ R ≈ 3.2899 ≈ 0.1667
1 + exp(−u) 3 6
188 8 Smooth Hazard Rate Estimators

Figure 8/2: Common kernels


8.1 Kernel Smoothing 189

Optimizing AMISE (8.21) with respect to the kernel is not an easy problem since the scaling of
K is coupled with the bandwidth b. E PANECHNIKOV (1969) has found the AMISE–optimizing
kernel to be
3
K(u) = 1 − u2 ) I|u|≤1 . (8.28)
4
Investigations have revealed that using another ‘suboptimal’ kernel does not cause great loss of
efficiency, i.e., seldom more than 5%. Indeed, these results suggest that most unimodal densities
perform about the same as each other when used as a kernel. Thus, the choice between kernels
can be made on other grounds such as computational efficiency. The kernel effects the local
smoothness whereas the bandwidth is responsible for the global smoothness of the estimate. The
smaller the sample size the greater the effect of the kernel and when n = 1, then the kernel wholly
determines the graph of the estimate.
Fig. 8/3 shows the effect of the kernel. The data used in this figure are: 12, 14, 15, 16, 18, 23,
35, 42, 50, and the bandwidth has been set to b = 10. The uniform kernel gives an estimate of
the density that is piecewise konstant, the triangular kernel and the L APLACE kernel have a kink
which is reflected in the estimated densities. Even the E PANECHNIKOV kernel gives an estimate
having a discontinuous first derivative which sometimes can be unattractive because of its kinks.
Very smooth estimates are produced by the G AUSS and the C AUCHY kernels, respectively. The
use of the triweight kernel seems to be a good compromise.

Figure 8/3: Effect of kernel choice

We have seen in (8.23) that the best obtainable rate of convergence of the kernel estimator consid-
ered there is of order n−4/5 . But it is possible to obtain a better rate of convergence at the price of
relaxing the restriction that the kernel be a density and K(u) ≥ 0 ∀ u. Such kernels are of higher
order than two. We say that K(u) is an `-th order kernel if

u0 K(u) du = 1,
R
µ0 (K) =
uj K(u) du = 0 for j = 1, 2, ..., ` − 1 and
R
µj (K) =
u` K(u) du 6= 0.
R
µ` (K) =
190 8 Smooth Hazard Rate Estimators

Still requiring that K(u) be symmetric we see that ` must be even. With ` → ∞ the convergence
rate can be made arbitrarily close to n−1 , the parametric convergence rate.
There are several rules to construct higher–order kernels. Let K` (u) denote the `-th order kernel
which is assumed to be differentiable, then formula
3 1
K`+2 (u) = K` (u) + u K`0 (u) (8.29)
2 2
can be use to generate higher–order kernels. Taking the G AUSS kernel
1
exp(−u2 2)

K2 (u) = √

we find from (8.29)
1
K4 (u) = 0.5 (3 − u2 ) √ exp(−u2 2),


and taking the triweight kernel
35 3
K2 (u) = 1 − u2 I|u|≤1
32
we have
105 2
1 − u2 1 − 3 u2 I|u|≤1 .

K4 (u) =
65
Fig. 8/4 shows these kernels. Notice the negative lobes of K4 (u) which entail that the resulting
smoothed density will not be a density itself. Higher–order kernels necessarily take on negative
values, so there is a price to be paid in interpretability and plausibility.

Figure 8/4: Fourth–order kernels based on G AUSS and triweight kernels, respectively

There are some situations where there is scope for improvement of the basic kernel presented up
to here. Some of the modifications will be needed in Sect. 8.1.3, and we will shortly present their
ideas here. A first modification is the local kernel estimator. Given that the optimal amount of
smoothing varies across the real line, an obvious extension of
n  
1 X x − xi
fn (x; b) =
b K
nb b
i=1
8.1 Kernel Smoothing 191

is to that having different bandwidth b(x), say, for each x where f (·) is to be estimated. This
leads to the local kernel estimator
n  
1 X x − x i
fbn [x; b(x)] = K , (8.30)
n b(x) b(x)
i=1

where a different basic kernel estimator is employed at each point. A popular method which fits
into the framework of (8.30) is the nearest neighbor kernel estimator that uses distances from
x to the data point being the k-th nearest to x.
A quite different idea from local kernel estimation is that of variable kernel estimation where
the single b is replaced by n values b(xi ); i = 1, 2, ..., n; rather than by b(x). The estimator has
the form
n  
1X 1 x − xi
fn [x; b(xi )] =
b K , (8.31)
n b(xi ) b(xi )
i=1

so that the kernel centered on xi has associated with it its own scale parameter b(xi ) allowing
different degrees of smoothing depending on where xi is in relation to other data points. The aim
is to smooth out the mass associated with data values that are in sparse regions much more than
those situated in the main body of the data. Variable kernel estimation can also be realized by a
nearest neighbor approach, but using the distance from xi to the data point being the k-th nearest
to xi .
A special situation arises when the function to be estimated has a bounded support. For the
lifetime variable we have a naturally lower bound equal to zero. When the point x where to
estimate f (·) or h(·) is smaller than the bandwidth a symmetric kernel is not appropriate because
no lifetimes less than zero are observable. In the region 0 ≤ x < b the use of an asymmetric
kernel is suggested. This is also true when there is a right endpoint xend of the support where
we have to consider an asymmetric kernel for xend − b < x ≤ xend . The right endpoint is often
taken to be the greatest observed lifetime in the sample. The asymmetric kernels needed near the
boundaries and which are different for each x within a distance of b to the boundary are called
boundary kernels. In Sect. 8.1.3.1 we will present different types of boundary kernels.

8.1.1.4 Bandwidth Selection

The implementation of a kernel estimator requires the specification of a bandwidth b. One pos-
sibility is to choose the bandwidth subjectively by eye and on aesthetic grounds. This would
involve looking at several PDF or HR estimates over a range of bandwidths and selecting the
estimate that is the ‘most pleasing’ in some sense. One such strategy is to begin with a large
bandwidth and to decrease the amount of smoothing until fluctuations that are more random than
structural start to appear. However, there are also many circumstances where it is very beneficial
to have the bandwidth automatically selected from the data.
A method that uses the data x1 , x2 , ..., xn to produce a bandwidth b is called a bandwidth selec-
tor. A look into the professional journals reveals that work on constructing bandwidth selectors
is still going on. Available selectors can be roughly divided into two classes:

• quick and simple selectors, sometimes called ‘quick and dirty methods’, and

• sophisticated selectors aiming to minimize AMISE or some other criterion.

The first class consists of formulas which are easy to evaluate, but without any mathematical
guarantee of being close to the optimal bandwidth. These selectors often provide a starting point
for the subjective choice of the smoothing parameter. The two methods falling into the category
192 8 Smooth Hazard Rate Estimators

‘quick and dirty’ are the rule of thumb, sometimes called normal scale bandwidth selector, and
the maximal smoothing or oversmoothing principle. Both are based on the optimal bandwidth
minimizing the AMISE:
 1/5
R(K)
bAMISE = .
n R(f 00 ) µ2 (K)2
Note that only the term R(f 00 ) is unknown in this expression.
The rule of thumb replaces the unknown PDF f (·) in this functional by a reference distribution
function. The reference distribution is rescaled to have variance equal to the sample variance. If,
e.g., we take K as the G AUSS kernel and the standard normal distribution as reference distribution
the rule of thumb yields the bandwidth
b n−1/5
bRoT = 1.0592 σ (8.32a)
b2 = (xi − x (n − 1) is the sample variance. A version which is more robust against
P 
where σ
outliers in the sample uses the interquartile range rq as a measure of spread instead of the variance
giving the modified estimator
h  rq i −1/5
bRoT = 1.0592 min σ b, n . (8.32b)
1.34
The maximal smoothing principle is due to T ERRELL (1990). He showed that there is a lower
bound for the functional R(f 00 ) for all densities having standard deviation σ, and this bound is
attained by the triweight density, see Tab. 8/1. Thus we have an upper bound for bAMISE leading
to the oversmoothed bandwidth
243 R(K) 1/5
 
bos = σ
b. (8.33)
35 µ2 (K)2 n
While bos will give a too large bandwidth for optimal estimation of a general density f (·) it
provides an excellent starting point for subjective choice of the bandwidth. A graphical strategy
is to plot an estimate with the bandwidth bos and then successively look at plots based on fractions
of bos to see what features are present in the data.
There are two fundamental approaches in the class of sophisticated selectors:

• the cross–validation method and


• the plug–in method,

each coming with several versions.


The most popular and best studied cross–validation method is least–squares cross–validation,
originally proposed by RUDEMO (1982) and B OWMAN (1984). Its motivation comes from ex-
panding the MISE of fbn (·; b) to obtain
Z Z Z
2
fn (x; b) f (x)dx + f (x)2 dx.
     
MISE fn (·; b) = E
b fn (x; b) dx − 2 E
b b (8.34a)

f (x)2 dx does not depend on b the minimization of MISE is equivalent to minimization of


R
As
Z Z Z 
2 2
 
MISE fn (·; b) − f (x) dx = E
b fn (x; b) dx − 2 fn (x; b) f (x) dx .
b b (8.34b)

The right–hand side of (8.34b) is unknown since it depends on f (x). Using a method of moments
to estimate this term results in the least–squares cross–validation function
Z n
2Xb
LSCV (b) = fbn (x; b)2 dx − f−i (xi ; b) (8.35a)
n
i=1
8.1 Kernel Smoothing 193

where  
1 X1 x − xj
fb−i (xi ; b) = K (8.35b)
n−1 b b
j6=i

is the density estimate based on the sample with xi deleted, often called the leave–one–out den-
sity estimator. This is the reason for the name ‘cross–validation’ which refers to the use of part
of a sample to obtain information about another part. It therefore seems reasonable to chose b
to minimize LSCV (b); the bandwidth chosen by this way is denoted bLSCV . This estimate of
b suffers a lot under sample variation, i.e., for different samples from the same distribution the
estimated bandwidths have a big variance. Another drawback of LSCV is that it often has several
minima. Simulation studies have shown that this problem can be fixed by selecting the largest
value of b for which a local minimum occurs.
The idea of plug–in methods goes back to W OODROOFE (1970). These methods are based on the
asymptotically best choice of b given by bAMISE in (8.22). The only unknown quantity in bAMISE
is the functional R(f 00 ). W OODROOFE proposed to use
 a first bandwidth b1 to calculate fbn (x; b1 ),
00 00

take this estimate to calculate R(f ) = R fn (x; b1 ) and to plug R(f ) into (8.22) to obtain b2 ,
b b b
the final bandwidth. This  direct plug–in rule may be generalized by iterating the process, i.e.,
b 00 ) = R fbn (x; b2 ) , plug R(f
calculating R(f b 00 ) into (8.22) to obtain b3 etc., until bi converges.

8.1.2 Indirect Smoothing — The Ratio–type Estimator11


The hazard rate is defined as
f (x)
h(x) = .
S(x)
A smoothed hazard rate estimator based on this definition and on suitably chosen estimators of
the nominator and the denominator is called a ratio–type estimator or an indirectly smoothed
estimator. The latter name is justified by the fact that we do not smooth rough estimates of the
hazard rate itself. Direct smoothing as presented in Sect. 8.1.3 prevails owing to its theoretical
tractability (exact mean square errors are available) and aesthetic superiority over the ratio–type
estimator even though – as has been shown by R ICE / ROSENBLATT (1976) — the direct and the
indirect estimators have the same asymptotic variance but different asymptotic biases.
The ratio–type estimator comes in two variants, resulting from the use of different estimators of
the survival function S(x). The first variant, called simple indirect estimator, takes the K A -
PLAN /M EIER estimator, see (8.3a) and (8.4a):12
  δi 
 Q n − i 
for untied observations 

 
n − i + 1

 

i : yi ≤x
Sn (x) =
b   (8.36a)
 Q di 
1− for tied observations. 

 



i ; xi ≤x ni 

The second variant, called smoothed indirect estimator, is based on the integrated smoothed
density estimator
Zx
Sn (x) = 1 − fbn (u) du
b (8.36b)
0

where fbn (·) is a kernel estimator of f (·) with a kernel that has to be PDF itself. The results
from using (8.36a) or (8.36b) do not differ much, but as (8.36a) is a stair–case function the simple
11
Suggested reading for this section: L O et al. (1989), R ICE /ROSENBLATT (1976), WATSON /L EADBETTER
(1964a,b).
12
S ARDA /V IEU (1990) show how to find the ISE –minimizing bandwidth by cross–validation.
194 8 Smooth Hazard Rate Estimators

indirect estimator will generate hazard rate courses which are less smooth than those coming from
the smoothed indirect estimator.
To show this effect we have plotted in Fig. 8/5 both versions of the ratio–type estimator. This
figure rests upon the data of Example 5/1 (leukaemia patients’ data). We have used the func-
tion ‘ksdensity’ of MATLAB with a positive support, the E PANECHNIKOV kernel and band-
width 10.

Figure 8/5: Ratio–type estimates of the hazard rate for the leukaemia patients’ data

The smoothed indirect estimator has first been studied by WATSON /L EADBETTER (1964a,b) for
uncensored samples. The paper of L O et al. (1989) investigates this estimator for samples with
censored observations. We shortly repeat the results of WATSON /L EADBETTER, but will not go
further into the details of indirect smoothing as it is of no great importance in practice. WAT-
SON /L EADBETTER use a sequence {δn (x)} of smoothing functions tending, as n → ∞, to a
D IRAC delta–function.13 This delta–sequence method is quite general and covers several types
of smoothing methods, including the kernel method with δn (u) = (1/b) K(u/b). The smoothed
indirect estimator of WATSON /L EADBETTER reads

fbn (x)
hn (x) =
b (8.37a)
1 − Fbn (x)

13
The δ–function is (informally) a generalized function on R that is zero anywhere except at zero, with an integral
of unity over R. This unit impulse function may be written as
 
 +∞ for x = 0 
δ(x) =
 0 for x 6= 0 
+∞
R
with δ(x) dx = 1. Rigorously defined the δ–function is a distribution or a measure.
−∞
8.1 Kernel Smoothing 195

with
n
1X
fbn (x) = δn (x − xi ), (8.37b)
n
i=1
Zx
Fbn (x) = fbn (u) du. (8.37c)
0

If the sequence {δn (x)} is suitably chosen, i.e., if


Z
αn = δn2 (x) dx < ∞ (8.37d)

at every point of continuity x of h(·) at which F (x) < 1, then b


hn (x) is shown to be asymptotically
unbiased with an asymptotic variance
h i α h(x)
n
Var bhn (x) ≈ . (8.37e)
n 1 − F (x)
If in addition
 to (8.37d) αn = o(n), then the asymptotic variance converges to zero in order
of
 α n n,
i.e., hn (x) is consistent. Under some further slightly more restrictive conditions on
b
δn (x) the random variable
 1/2
  n  
1 − F (x) hn (x) − h(x)
b
αn f (x)

is asymptotically standard normally distributed at every continuity point of h(·).


S ALHA (w.y.) has generalized the smoothed indirect estimator to incorporate a bandwidth de-
pending on xi as bi = ci b with
" #−α
fe(xi )
ci = (8.38a)
g

where fe(xi ) is a pilot estimate and


" n
#1/n
Y
g= fe(xi ) (8.38b)
i=1

and 0 < α < 1. He suggests to take α = 0.5.

8.1.3 Direct Smoothing


In this section we — generally — will look at samples with untied observations and possible
censoring. The general form of a direct kernel estimator of the hazard rate in such a situation can
be written as
n  
X δi 1 x − yi
hn (x) =
b Kx ; bn (i, x) > 0, 0 ≤ x ≤ xend . (8.39)
n − i − 1 bn (i, x) bn (i, x)
i=1

δi is the censoring indicator corresponding to the i-th ordered observation yi . δi = 1 stands for an
uncensored observation and δi = 0 for a censored observation. bn (i, x) represents the bandwidth
function. The bandwidth will depend inversely on the sample size n, but we can also make it
dependent on the point x for which h(·) is to be estimated and/or on the data point yi processed
by the kernel. The numerous ways of specifying the bandwidth have been developed in order to
optimize the estimator and here they serve as guideline for the organization of this section.
196 8 Smooth Hazard Rate Estimators

The bandwidth function leads to different properties of the resulting hazard rate estimator whereas
the influence of the particular kernel function Kx (·) is only marginal except for the behavior
 near
the boundaries. We thus may have  different kernels near the boundaries, i.e., for x ∈ 0; b n (i, x)
and x ∈ xend − bn (i, x); xend  , and some other kernel in the interior of the data body, i.e., for
x ∈ bn (i, x); xend − bn (i, x) .
We will first present boundary kernels (Sect. 8.1.3.1) before we turn to different possibilities of
choosing a bandwidth (Sect. 8.1.3.2 – 8.1.3.3).

8.1.3.1 Boundary Kernels14


 
Let 0, xend be the support of the hazard rate to be estimated, where xend is the greatest uncen-
sored observation in the sample. When we have a kernel with bandwidth b, i.e., K x−x b
i
, a part
of those kernels having either 0 ≤ xi < b or xend − b < xi ≤ xend is outside the support. It
has been noted that bias problems occur when estimating near an endpoint of the data. The ap-
plication of unmodified kernel estimators leads to meaningless estimates in the boundary regions
near the endpoints. Therefore some authors suggest to estimate h(x) — or f (x) in case of den-
sity estimation — only for the interior region [b, xend − b]. Another suggestion is the so–called
‘cut–and–normalize’ modification whereby that part of the kernel lying outside the boundary is
omitted and the remaining part is normalized to be a proper density. This solution achieves con-
sistency near the boundary but results in a large bias there. A variety of further modifications is
possible to achieve a smaller bias, see K ARUNAMUNI /A LBERTS (2005). One can think of these
boundary modifications in terms of special boundary kernels which are different for each x within
a distance of b to the boundary.
For the following formulas we define
x
q= for 0 ≤ x < b
b
and
xend − x
q= for xend − b < x ≤ xend ,
b
so we have 0 ≤ q < 1. We will look at boundary kernels corresponding to some basic second
order kernels.
One simple family of boundary kernels for the left–hand side (lower boundary side) is the follow-
ing linear multiple of a given kernel K(u)
µ2,q (K) − µ1,q (K) u
KqL (u) = K(u) I{−1≤u≤q} (8.40a)
µ0,q (K) µ2,q (K) − µ1,q (K)2
with
Zq
µ`,q (K) = u` K(u) du. (8.40b)
−1
This family goes back to G ASSER /M ÜLLER (1979). For the left boundary we have

Uniform kernel 
U K(u) = 0.5 for for − 1 ≤ u ≤ 1 


 2 ) + 3 (1 − q) u
 (8.41a)
L
4 2 (1 − q + q
U Kq (u) = U K(u) for − 1 ≤ u ≤ q 


(1 + q)3
14
Suggested reading for this section: B OUEZMARNI /ROMBOUTS (2008), G ASSER /M ÜLLER (1979),
G ASSER /M ÜLLER /M AMMITZSCH (1985), K ARUNAMUNI /A LBERTS (2005), M ESSER /G OLDSTEIN (1993),
M ÜLLER (1991, 1993), M ÜLLER /WANG (1994).
8.1 Kernel Smoothing 197

E PANECHNIKOV kernel 
E K(u) = 0.75 (1 − u2 ) for − 1 ≤ u ≤ 1 


(8.41b)
L 64 (2−4q + 6q 2 −3q 3 ) + 240 (1−q)2 u
E Kq (u) = E K(u) for − 1 ≤ u ≤ q 


(1 + q)4 (19 − 18 q + 3 q 2 )

Biweight kernel
15 
B K(u) = (1 − u2 )2 for − 1 ≤ u ≤ 1 
16





64 (8 − 24 q + 48 q 2 − 45 q 3 + 15 q 4 ) (8.41c)
+ 1120 (1 − q)3 u



L

B Kq (u) = B K(u) for − 1 ≤ u ≤ q 

(1−q)5 (81−168q+126q 2 −40q 3 +5q 4 )

Triweight kernel
35

T K(u) = (1 − u2 )3 for − 1 ≤ u ≤ 1 

32 





256(−8(−16 + q(64 + 5q(−32 + q(43 

+ 7(−4 + q)q))))) + 80640 (1 − q 4 )u (8.41d)
L
T Kq (u) = T K(u) for − 1 ≤ u ≤ q 


6

(1 + q) (5359 + 5q(−3550 + q (4909





+ q (−3620 + q(1517 + 35(−10 + q)q))))) 

Two other classes have been suggested by M ÜLLER (1991) and M ÜLLER /WANG (1994). Both
classes result as solutions of a variational problem under asymmetric support and lead to classes
of compactly supported polynomial kernel functions. The class proposed in M ÜLLER /WANG
(1994) gives rise to smaller leading constants of the asymptotic MSE than the previously sug-
gested class. When the basic kernel is the uniform kernel the corresponding boundary kernels in
both classes are the same as (8.41a), but for other types of basic kernels the formulas are different
from formulas(8.41b-d).
The left boundary versions of the 1991–class are

E PANECHNIKOV kernel
(  2 )
91 6(1 + u)(q − u) 1−q 1−q
E Kq (u) = 1+5 + 10 u for − 1 ≤ u ≤ q, (8.42a)
(1 + q)3 1+q (1 + q)2

Biweight kernel
( 2 )
30(1 + u)2 (q − u)

91 1−q 1−q
B Kq (u) = 1+7 + 14 u for −1 ≤ u ≤ q, (8.42b)
(1 + q)5 1+q (1 + q)2

Triweight kernel
( 2 )
3 (q−u)3

91 140(1 + u) 1−q 1−q
T Kq (u) = 1+9 + 18 u for − 1 ≤ u ≤ q, (8.42c)
(1 + q)7 1+q (1 + q)2

and for the 1994–class

E PANECHNIKOV kernel
3q 2 − 2q + 1
 
94 12(u + 1)
E Kq (u) = + (1 − 2q)u for − 1 ≤ u ≤ q, (8.43a)
(1 + q)4 2
198 8 Smooth Hazard Rate Estimators

Biweight kernel
15(u + 1)2 (q − u) 5(1 − q)2
   
94 1−q
B Kq (u) = 2u 5 − 1 + (3q − 1) + for − 1 ≤ u ≤ q,
(1 + q)5 1+q 1+q
(8.43b)
Triweight kernel
70(u + 1)3 (q − u)2 7(1 − q)2
   
94 1−q
T Kq (u) = 2u 7 − 1 + (3q − 1) + for − 1 ≤ u ≤ q,
(1 + q)7 1+q 1+q
(8.43c)
Fig. 8/6 displays all three variants of the triweight boundary kernel.
Figure 8/6: Triweight boundary kernels
(left: linear multiple, center: M ÜLLER–91, right: M ÜLLER /WANG–94)

For all three classes of boundary kernels we can state the following results:
1. All kernels conform to the moments conditions of second order kernels:
Rq Rq Rq 2
Kq (u) du = 1, u Kq (u) du = 0, u Kq (u) du 6= 0.
−1 −1 −1

2. For the right boundary the formulas are the same, but with −u instead of u.
3. For q → 1 the boundary kernel approaches the basic kernel.
4. For q → 0 the boundary kernel takes on negative values. This might lead to negative
hazard rate estimates which should be replaced by zero values.

8.1.3.2 Kernel Estimators with Globally Constant (Fixed) Bandwidth15

In (8.39) we have introduced the general form of the direct kernel estimator with a bandwidth
depending on the sample size n, the data to be processed {yi }, and the point x for which the
15
Suggested reading for this section: D IEHL /S TUTE (1988), L O et al. (1989), R AMLAU –H ANSEN (1983),
R ICE /ROSENBLATT (1976), TANNER /W ONG (1983), U ZUNOGULLARI /WANG (1992), WATSON /L EADBET -
TER (1964a,b), YANDELL (1983).
8.1 Kernel Smoothing 199

hazard rate is to be estimated. The simplest case of kernel estimation is given by a bandwidth
which is independent of {yi } as well as of x and is fixed for a given sample size:

bn = bn (i, x) ∀ i and ∀ x.

Sometimes this bandwidth is called global and is determined by one of the methods presented
in Sect. 8.1.1.4. Of course, the optimal, i.e., the MISE or AMISE minimizing global bandwidth
depends on the kernel chosen and on the distribution of the lifetime variable X and on that of
the censoring variable Z, respectively, see (8.52b). The fixed–bandwidth kernel estimator to be
presented in this section reads
n  
1 X δi x − yi
hn (x) =
b K . (8.44)
bn n−i+1 bn
i=1

This estimator has been discussed extensively and analyzed with different techniques.
We first look at the properties of the estimator when there is no censoring, i.e., we have δi = 1 ∀ i
and each observation yi is a failure time xi . This case has already been studies by WAT-
SON /L EDBETTER (1064a,b), and subsequently by R ICE /ROSENBLATT (1976) and R AMLAU –
H ANSEN (1983). The estimator in this case of no censoring reads
n  
1 X 1 x − xi
hn (x) =
b K . (8.45a)
bn n−i+1 bn
i=1

Sometimes we use a shortened notation for the kernel function


 
1 x − xi
Kb (x − xi ) := K (8.45b)
bn bn

resulting in
n
X 1
hn (x) =
b Kb (x − xi ). (8.45c)
n−i+1
i=1

The failure times are assumed to be increasing order x1 < x2 < . . . < xn . Thus, the mean or
hn (x) is
expectation of b

n Z∞
h i X 1
E hn (x) =
b Kb (x − u) fi:n (u) du (8.46a)
n−i+1
i=1 0

where
n! n−i
F (u)i−1 1 − F (u)

fi:n (u) = f (u) (8.46b)
(i − 1)! (n − i)!
is the PDF of the i-th order statistic. Upon inserting (8.46b) into (8.46a) and changing the order
of summation and integration we have

h i Z∞ X
n 
n

n−i
F (u)i−1 1 − F (u)

E bhn (x) = Kb (x − u) f (u) du. (8.46c)
i−1
0 i=1

As
n 
n−i 1 − F (u)n

X n
F (u)i−1 1 − F (u)

=
i−1 1 − F (u)
i=1
200 8 Smooth Hazard Rate Estimators

and observing h(x) = f (x) (1 − F (x) we finally arrive at

h i Z∞
1 − F (u)n h(u) Kb (x − u) du
 
E bhn (x) =
0
Z∞ Z∞
= Kb (x − u) h(u) du − F (u)n h(u) Kb (x − u) du. (8.46d)
0 0

If x is such that F (x) < 1, the second term tends to zero geometrically as n → ∞. If f (x) is
twice continuously differentiable the first term of (8.46d) can be expanded in a TAYLOR series
analogous to (8.14a,b):
Z∞   Z∞
1 x−u f (u) f (x−bn v)
K = K(v) dv
bn bn 1−F (u) 1−F (x − bn v)
0 0
 Z∞
b2n f (x) 00

f (x)
K(v) v dv+o b2n . (8.46e)

= +
1−F (x) 2 1−F (x)
0

In (8.46e) we see that the bias of b hn (x) depends on the second derivative of h(x) = f (x)
[1 − F (x)], but b
hn (x) is asymptotically unbiased.
WATSON /L EADBETTER (1964a,b) give the exact variance of b hn (x) as
h i R∞ K 2 (x − u)  
Var b hn (x) = In F (u) du +
0 1 − F (u)
(8.46f)
R R K(x−u1 )K(x−u2 ) 1−F (u1 )n n −F (u )n
 
F (u 2 ) 1
2 F (u2 )n − dF (u1 )dF (u2 )
0≤u1 ≤u2 <∞ 1 − F (u2 ) 1−F (u1 ) F (u2 )−F (u1 )

where
1−F
(F − B)n − F n
Z
In (F ) = dB.
B
0

Formula (8.46f) is difficult to appraise for finite n, but as n → ∞ only the first term needs to be
considered with the result that
Z
h i 1 h(x)
Var b hn (x) ≈ K 2 (u) du. (8.46g)
n bn 1 − F (x)

This variance formula is similar in construction to that of the PDF kernel estimator given in
(8.17a). Furthermore, at each continuity point of h(x) the estimator is asymptotically normally
distributed.
hn (x) when there is censoring. We assume a random censorship
We now look at the properties of b
model and observe Yi = min(Xi , Zi ) together with δi = I(Xi ≤Zi ) . The Xi ; i = 1, 2, . . . , n; are
i.i.d. lifetimes with CDF FX (·) and PDF f (·). The Xi are independent of the censoring variables
Z1 , . . . , Zn each having CDF FZ (·) and PDF fZ (·). The CDF of Yi will be denoted FY (·) and is
given via   
1 − FY (t) = 1 − FX (t) 1 − FZ (t) (8.47a)
and the PDF of the Yi follows as
   
fY (t) = fX (t) 1 − FZ (t) + fZ (t) 1 − FX (t) . (8.47b)
8.1 Kernel Smoothing 201

The observations yi are assumed increasingly ordered together with the corresponding indicator
δi .
Let  
fX (t) 1 − FZ (t)
m(t) = if fY (T ) > 0, (8.48a)
fY (t)
then, see TANNER /W ONG (1983),

E(δi Yi = t) = m(t) (8.48b)

E(δi δj Yi = t, Yj = s) = m(t) m(s) ∀ i < j, t < s. (8.48c)
h i h i
Using (8.48b,c) the derivation of the formulas for E bhn (x) and for Var bhn (x) proceeds in
essentially the same way as in WATSON /L EADBETTER (1964a,b) for the uncensored case. To
illustrate the idea,
n Z∞
Yi = u)
h i X E(δ i
E bhn (x) = Kb (x − u) Y fi:n (u) du
n−i+1
i=1 0
Z∞ " nX n 
#
i−1
 n−i
= FY (u) 1 − FY (u) fY (u) m(u) Kb (x − u) du
i−1
0 i=1
Z∞
1 − FY (u)n  
= fX (u) 1 − FZ (u) Kb (x − u) du
1 − FY (u)
0
Z∞
1 − FY (u)n hX (u) Kb (x − u) du,
 
= (8.49)
0

observing (8.47a), (8.48a) and hX (u) = fX (u) [1 − FX (u)]. The only difference between (8.49)
and (8.46d) is that we have to use the CDF of Y instead of the CDF of X in the censoring case.
The asymptotic variance (n → ∞) in the censoring case is similar to (8.46g):
Z
h i 1 hX (x)
Var hn (x) ≈
b K 2 (u) du. (8.50)
n bn [1 − FX (x)] [1 − FZ (x)]

The rate of convergence of the kernel hazard rate estimator


n  
1 X δ i x − y i
hn (x) =
b K
bn n−i+1 bn
i=1

depends on the order of the kernel, the bandwidth and the differentiability of the hazard rate.
Typically, the order of the kernel is chosen to be an even number with k = 2 being the standard
choice.The resulting bias and variance are

(−1)k
h i  Z 
k (k) k
Bias hn (x) = bn h (x)
b u K(u) du + o(1) (8.51a)
k!
 Z 
h i 1 h(x) 2
Var hn (x) =
b K (u) du + o(1) (8.51b)
n bn [1 − FX (x)] [1 − FZ (X)]

The influence of the bandwidth bn and the trade–off between the bias and the variance is seen
from (8.51) and (8.51b). The optimal rate for the MSE of bh( x) is attained when the squared bias
and the variance are of the same order. This results in an optimal MSE rate of convergence of
202 8 Smooth Hazard Rate Estimators

n2k (2k+1) , which is n4/5 for the standard choice of k = 2. This rate is slower than the usual rate
of n regardless of the order k. For the asymptotic distribution we further assume that

d = lim n b2k+1
n
n→∞

exists for some 0 ≤ d < ∞. Then

(−1)k
 
hn (x) − h(x) D
Z
b
k h(x) 2
√ −→ No h (x) ; K (u) du . (8.51c)
n bn k! [1 − FX (x)] [1 − FZ (x)]

Extensions to the estimation of derivatives h(k) (x) of the hazard function can be found in
M ÜLLER /WANG (1990b). These essentially involve a change in the kernel. Derivatives are of
interest to detect rapid changes in the hazard rate or for data based bandwidth choice.
To find the optimal global bandwidth, we have to restrict the range of x to a compact interval
[0, τ ] with FX (τ ) < 1 and FZ (τ ) < 1. The global optimal bandwidth which minimizes the
leading term of Z τ h
h i i2 
MISE hn (x) = E
b hn (u) − h(u) du
b (8.52a)
0
is
 1/(2k+1
τ
K 2 (u) du
1 Z R
h(u) 
bopt = du h i . (8.52b)
n k [1−FX (u)][1−FZ (u)] (−1)k Rτ
k K(u)du
R τ
k (u) 2 du

0
k! 0 u 0 h

This optimal global bandwidth involves unknown quantities, so that in practice one has to find
alternatives. There is an extensive literature on bandwidth selection, see Sect. 8.1.1.4.
One way to pick a good bandwidth is to use a cross–validation technique for determining the
bandwidth that minimizes some measure of how well the estimate performs. One such measure
hn (x) over the range 0 to τ, see (8.52a), defined
is the mean integrated squared error (MISE) of b
by
h i Z τ h i2 
MISE hn (x) = E
b hn (u) − h(u) du
b
0
Z τ  Z τ  Z τ 
2 2
= E hn (u) du −2 E
b hn (u) h(u) du + E
b h (u) du . (8.53a)
0 0 0

This function depends both on the kernel used to estimate h(·) and on the bandwidth bn . Note
that, although the last term in (8.53a) depends on the unknown hazard rate, it is independent of
the choice of the kernel and of the bandwidth and can be ignored when finding the best value of
Rτ 2
bn . The first term of (8.53a) can be estimated by b h (u) du. If we evaluate b
n hn (·) at a grid of
0
points 0 < u1 < . . . < u` = τ, then, we find an approximation to this integral by some formula
hn (·), e.g., by the trapezoid rule as
of numerical integration over b

τ `−1
ui+1 − ui hb 2
Z X i
h2n (u) du ≈
b h2n (ui+1 ) .
hn (ui ) + b (8.53b)
0 2
i=1

The second term of (8.53a) can be estimated by a cross–validation estimate suggested by


R AMLAU –H ANSEN (1983). This estimate is
Z τ   
1 X yi − yj δi δj
E
b hn (u) h(u) du =
b K (8.53c)
0 bn bn n−i+1 n−j+1
i6=j
8.1 Kernel Smoothing 203

where the sum is over all observed times between 0 and τ. Thus, to find the best value of bn which
minimizes the MISE for a fixed kernel, we find bn which minimizes the function
`−1 i 2 X y −y 
X ui+1 −ui hb 2 2 i j δi δj
g(bn ) = hn (ui )+ hn (ui+1 ) −
b K . (8.53d)
2 bn bn n−i+1 n−j +1
i=1 i6=j

One also has to find alternatives to calculate the variance (8.51b) which contains unknown quan-
 Rx
hn (x) for h(x), exp − b
tities. One possibility is to use b hn (u) du for 1 − FX (x), where the
0
integral may be evaluated  trapezoid rule, and to neglect 1 − FZ (x) what will lead to an
 by the
underestimation of Var bhn (x) for censored samples. Another possibility has been suggested
by K LEIN /M OESCHBERGER (1997, p. 153). Since the kernel–smoothed estimator is a linear
combination of the increments of the cumulated hazard rate
n  
1 X x − yi
hn (x) =
b K ∆Hb n (yi )
bn bn
i=1

with
δi
∆H
b n (yi ) =
n−i+1
 
a crude estimator of Var bhn (x) follows as
n  
  1 X x − yi  
Var hn (x) = 2
c b K ∆Var
c H b n (yi ) (8.54a)
bn bn
i=1
 
c H
with Var b n (x) given in (5.12b) so that

  δi (n − i)
∆Var
c H b n (yi ) = . (8.54b)
(n − i + 1)3

Example 8/2: Fixed–bandwidth kernel estimation of the hazard rate

The following data are from B YSON /S IDDIQUI (1961) and represent the ordered times (in days) at death
of n = 43 patients suffering from chronic granulocytic leukemia with x = 0 taken as the patient’s date of
diagnosis. This is an uncensored sample.

7 47 58 74 177 232 273 285 317 429


440 445 455 468 495 497 532 571 579 581
650 702 715 779 881 900 930 968 1077 1109
1314 1334 1367 1534 1712 1784 1877 1886 2045 2056
2260 2429 2509

Fig. 8/7 shows the hazard rate estimated with four different kernels. The grid has 100 evenly spaced points
between 0 and 2509 in each case and the bandwidth is bn = 250 for all cases. The resulting hazard rate
is nearly constant for 0 < x < 1600 and is increasing for x ≥ 1600. With the exception of the uniform
kernel the other three kernels produce nearly the same hazard rates.
In Fig. 8/8 we find four estimated hazard rates, each using the E PANECHNIKOV kernel with a grid of 100
evenly spaced points between 0 and 2509, and bandwidths 600, 300, 200, 100, respectively. bn = 600 gives
an oversmoothed estimate whereas bn = 100 produces a rather erratic course. bn = 300 and bn = 200
seem to be good compromise. The computations for this example have been done by using the MATLAB
program Hazard 04 of the appended software package Inference.zip.
204 8 Smooth Hazard Rate Estimators

Figure 8/7: Hazard rate estimates for the survival time of 43 patients having granulocytic leukemia —
Different kernels and common bandwidth bn = 250

Figure 8/8: Hazard rate estimates for the survival time of 43 patients having granulocytic leukemia —
Different bandwidths and common E PANECHNIKOV kernel

Up to now we have presented results and formulas for samples with untied observations. The key
formulas have to be modified properly when the sample has tied observations where the ties may
8.1 Kernel Smoothing 205

occur among the uncensored as well as among the censored observations. We have the following
notation, also see Fig. 4/1:

xj ; j = 1, 2, . . . , k — distinct failure times (= uncensored observations),

dj ≥ 1 — number of failures at xj ,

cj ≥ 0; j = 0, 1, . . . , k − 1 — number of censored observations in the right–opened


interval [xj , xj+1 ) where x0 = 0,

nj = nj−1 − cj−1 − dj−1 ; j = 1, 2, . . . , k — number of units at risk at xj where n0 = n


and d0 = 0,
dj f (xj )
hj =
b — MLE for hj = , see (5.2e),
nj S(xj )
 dj (nj − dj )
Var
c b hj = , see (5.7c)
n3j

The fixed–bandwidth kernel estimator of (8.44) turns into


k  
1 X dj x − xj
hn (x) =
b K (8.55a)
bn nj bn
j=1

 
hn (x) in (8.54a) now reads
and the crude estimator of Var b

k  
  1 X 2 x − xj dj (nj − dj )
Var
c b hn (x) = 2 K (8.55b)
bn bn
j=1
n3j

and the goal function (8.53d) is

`−1  
X ui+1 − ui hb 2 i 2 X xi − xj di dj
g(bn ) = h2n (ui+1 ) −
hn (ui ) + b K . (8.55c)
2 bn bn ni nj
i=1 i6=j

8.1.3.3 Kernel Estimators with Varying Bandwidth16

The fixed–bandwidth kernel estimator of the hazard rate has many good properties, e.g., asymp-
totic unbiasedness, mean square error consistency, asymptotic normality, but in practical applica-
tion it has been observed that a globally constant bandwidth leads to undesirable effects whenever
the data are not evenly distributed over the whole range of interest. The fixed–bandwidth kernel
estimator cannot adopt to unevenness in the distribution of the data and thus tends to oversmooth
in regions with many observations and to undersmooth in regions with few observations reveal-
ing many misleading peaks. One approach to overcome these problems of the fixed–bandwidth
estimator is to incorporate the idea of nearest neighbor into the definition of the bandwidth. The
resulting estimator has a bandwidth which adapts to the configuration of the data. There are two
estimators emerging from this idea depending on what is the point from where to look for the k-th
nearest neighbor.
16
Suggested reading for this section: BAGKAVOS /PATIL (2009), C HENG (1987), D ETTE /G EFELLER (1995),
G EFELLER /D ETTE (1991, 1992), G EFELLER /M ICHELS (1992), H ESS et al. (1999), M ÜLLER /WANG (1990b,
1994), N IELSEN (2003), S CH ÄFER (1985, 1986), TANNER (1983, 1984), TANNER /WANG (1984).
206 8 Smooth Hazard Rate Estimators

1. The local kernel estimator takes the distance from x, the point of interest where to estimate
h(x), to its k-th nearest neighbor among the observations as bandwidth.

2. The variable kernel estimator defines the bandwidth as distance from Xi , the i-th uncen-
sored observation in ascending order to its k-th nearest neighbor among the observations.

There are questions to be answered for each of these two approaches:

• How to choose k? — FAILING (1984), based on published experience, suggests to specify


k as √
k ≈ c n with 1 ≤ c ≤ 2.5. (8.56)
TANNER (1984) finds k by a time–consuming cross–validation method.17

• How to find the k-th nearest neighbor when there are censored observations? — This
question has been answered in different ways as will be shown further down.

We first turn to the local kernel estimator for a sample of n uncensored and distinct observations
X1 , X2 , . . . , Xn . In this case, the kernel estimator of h(x) is defined as
n  
X 1 1 x − Xi
h(x) =
b K , (8.57)
n − i + 1 D(kn , x) D(kn , x)
i=1

where D(kn , x) is the the distance of x to its k-th nearest neighbor among X1 , X2 , . . . , Xn .
D ETTE /G EFELLER (1995) have derived the asymptotic MISE of this estimator. TANNER (1983)
has shown that the estimator (8.57) converges (almost surely) to h(·) at each point of continuity of
h(·), provided that the sequence kn fulfills the condition kn = nα , α ∈ (0.5, 1). Other versions of
h(x) have been considered by L IU /VAN RYZIN (1985) and C HENG (1987). These
the estimator b
authors have proved asymptotic normality and strong consistency. Mathematical disadvantages
of (8.57) are due to the fact that the bandwidth function by the k-th nearest neighbor distance is
not differentiable at all x > 0 and that the integral from 0 to ∞ over (8.57) is not bounded in
general.
The problem of transferring appropriately the definition of D(kn , x) from the uncensored to the
censored setting has been solved in different ways.

1. A straightforward and simple solution — as suggested in TANNER (1983), TANNER /W ONG


(1984), L IU /VAN RYZIN (1985) and C HENG (1987) — ignores all censored observations
when looking for the k-th neighbor and defines D1 (kn , x) as the distance of x to its k-th
neighbor among the uncensored data (Yi , δi = 1); i = 1, 2, . . . , n :
n  
X δi 1 x − Yi
h(x) =
b K . (8.58)
n − i + 1 D1 (kn , x) D1 (kn , x)
i=1

S CH ÄFER (1985) points out that these distances are biased by the censoring distribution in
the sense that they adapt to the conditional density of Xi under the condition Xi ≤ Zi of
being uncensored rather than to the density function f (·) or the hazard rate to be estimated.

2. Therefore, S CH ÄFER (1985) proposes an alternative definition of D(kn , x) :

b n (x − d) ≤ kn − 1
n o
b n (x + d) − H
D2 (kn , x) = sup d > 0 H (8.59a)
n
17
The paper of TANNER gives a FORTRAN–code for the variable kernel estimator of the hazard rate.
8.1 Kernel Smoothing 207

resulting in
n  
X δi 1 x − Yi
h(x) =
b K . (8.59b)
n − i + 1 D2 (kn , x) D2 (kn , x)
i=1

This definition of D(kn , x) incorporates the information of the censored observations by


using the N ELSON /A ALEN estimator (5.12a) of the cumulated hazard rate, and the nearest
neighbor distances are no longer biased by the censoring distribution. It suffers, however,
as G EFELLER /D ETTE (1992) and D ETTE /G EFELLER (1995) criticize, from other serious
conceptual drawbacks:

a) ”Even if no censored data were observed in the sample, D2 (kn , x) is not identical to
D(kn , x) of (8.57), i.e., D2 (kn , x) does not reveal the natural definition of nearest
neighbor distances in the uncensored setting.”
b) ”In addition, one inherent property of H b n (·) has an awkward effect on the definition
of nearest neighbor distances: the heights of the steps in H
b n (·) increase automatically
by definition as x → Yn . Consequently, in the right tail of the lifetime distribution
this effect dominates the value (kn − 1)/n used in the definition of D2 (·, ·).”

3. To avoid the problems mentioned in a) and b) above, the authors D ETTE and G EFELLER
propose a modification of S CH ÄFER’s idea defining
n kn − 1 o
D3 (kn , x) = sup d > 0 Sn (x − d) − Sn (x + d − 0) ≤
b b (8.60)
n

where Sbn (x + d − 0) denotes the limit from the left of the K APLAN /M EIER estimator of
S(·) — see (5.3d) — at the point x+d. Using Sbn (·) instead of H
b n (·) resolves the drawbacks
of D2 (·, ·).

In an intensive simulation study G EFELLER /D ETTE (1992) found that the kernel hazard rate esti-
mator based an D3 (·, ·) always had a smaller MISE than that based on D1 (·, ·) or D2 (·, ·).
We now turn to the variable kernel estimator of the hazard rate which has been proposed by TAN -
NER /W ONG (1984) using a bandwidth which is the distance D(kn , Xi ) between the observation
Xi and its k-th nearest neighbor among the remaining uncensored observations:
n  
X 1 1 x − Xi
hn (x) =
b K , (8.61)
n − i + 1 D(kn , Xi ) D(kn , Xi )
i=1

assuming a sample with uncensored data. This definition of the bandwidth is independent of the
points of interest x and adapts only to the configuration of the data. The kernel estimator (8.61)
is differentiable for appropriate kernel functions K(·) at all x > 0 and its integral is bounded.
We mention that contrary to the hazard rate estimator with fixed bandwidth, the bandwidth here
is not globally constant and contrary to the local kernel estimators (8.57) – (8.59) the number
of observations influencing bh(x) is not fixed either. Statistical properties of (8.61), e.g., uniform
convergence to h(x), have been derived by S CH ÄFER (1985).

Example 8/3: Local and variable kernel estimators of the hazard rate

Using the data of the preceding Example 8/2 Fig. 8/9 shows the smoothed hazard estimated by the local and
the variable kernel method, respectively, taking the biweight kernel with k = 13 and 100 gridpoints. Evi-
dently and ceteris paribus the variable kernel method produces a smoother curve. The computations have
been done with the help of the MATLAB programs Hazard 05 and Hazard 06 of the appended software
package Inference.zip.
208 8 Smooth Hazard Rate Estimators

Figure 8/9: Hazard rate estimates for the survival time of 43 patients having granulocytic leukemia using
a local and a variable biweight kernel with k = 13 and 100 gridpoints

A kernel–based estimator with varying bandwidths can be found by optimizing the bandwidth for
a given point of interest x. For kernel estimators, the bandwidth regulates the trade–off between
local bias and local variance. Thus, another way to overcome the non–adaptive behavior of the
fixed–bandwidth estimator is to vary the local bandwidth in order to balance local variance and
local bias. A natural objective function is the local MSE expressing the estimation error as a
function of local bias and local variance. An optimal local bandwidth kernel estimator can be
found by minimizing an estimate of the local MSE with respect to the bandwidth. This problem
has been solved by M ÜLLER /WANG (1990b, 1994) who also give a MATLAB program named
HADES for doing this job.18

8.2 Further Smoothing Techniques19


There are two serious competitors to the kernel approach when looking for smooth hazard rate
estimators, the spline approach and the wavelet approach, both being not very popular in practice.
We will shortly comment on these two techniques without going into details, starting with the
spline estimator.
In mathematics a spline20 is a sufficiently smooth polynomial function that is piecewise defined
and possesses a high degree of smoothness at the places where the polynomial pieces connect.
18
See www.stat.ucdavis.edu/ entyang/hades/. Other programs — written in FORTRAN or S–plus — per-
forming different approaches in kernel estimation of the hazard rate can be found on the website
htpp://odin.mdacc.tmc.edu of H ESS et al. (1999).
19
Suggested reading for this section: A NDERSON /S ENTHILSELVAN (1980), A NTONIADIS et al. (1994, 1999),
B ÉZANDRY et al. (2005), D UPUY /G NEYOU (2011), G U (1996), JARJOURA (1988), L I (2002), O’S ULLIVAN
(1988a,b), PATIL (1997), ROSENBERG (1995), W U /W ELLS (2003).
20
The term ‘spline’ is adapted from the name of a flexible strip of metal used by draftsmen to assist in drawing
curved lines. They are very popular among naval architects in designing a ship’s hull.
8.2 Further Smoothing Techniques 209

These places are called knots and in hazard rate estimation they — generally — are the observed
lifetimes Xi . The most commonly used splines are cubic splines, i.e., they consist of polynomials
of order 3 and their first derivatives coincide at the knots so that there are no sharp edges in the
curvature.
There are several types of spline methods, the most widely investigated spline method for hazard
smoothing is the penalized likelihood approach. Let η(x) = ln h(x) be the log hazard rate. Then
the log likelihood function for the censored data is

ZYi 

Xn 
L(η) = δi η(Yi ) − eη , (8.62a)
 
i=1 0

which is unbounded if no shape restriction on η(·) is imposed. A penalty P (η), measuring the
roughness of η(·), is therefore introduced in (8.62a). The penalized estimator ηb(·) of η(·) is the
maximum of the penalized log likelihood

ZYi 

n
1 X α
L(η) = δi η(Yi ) − eη − P (η) (8.62b)
n   2
[i=1 0

among all η(·) in a H ILBERT space. α is a smoothing parameter playing the same role as the band-
width b in kernel estimation. A smaller α yields a better fit bit a more rough curve. JARJOURA
(1988) describes how to determine the smoothing parameter α by a cross–validation likelihood
approach.
The penalty function P (η) determines the kind of spline resulting from (8.62b). With
Z
 (2) 2
P (η) = η (x) dx (8.62c)

we find an estimator which is twice continuously differentiable and a piecewise cubic polynomial
between two consecutive Xi0 s. For computing details see the papers of O’S ULLIVAN (1988a,b)
and for asymptotic results see G U (1996). A NDERSON /S ENTHILSELVAN (1980) take
Z
 0 2
P (η) = h (x) dx (8.62d)

h(x) as a piecewise quadratic spline that may result in negative values under heavy
which leads to b
censoring. Another type of spline method is regression splines or B–splines which adopt a fixed
number of knots and basis functions. ROSENBERG (1995) describes how to select the number
and the location of the knots.
Wavelet–based hazard rate estimation has been treated by many authors for a long time. A
wavelet is a wave–like oscillation with an amplitude that begins at zero, increases, and then de-
creases back to zero. For instance, it can be visualized as a brief oscillation like one might see
recorded by a seismograph or heart monitor.
Let φ be the so–called ‘father wavelet’ and ψ the ‘mother wavelet’.21 We assume that the functions

φj,k (x) := 2j/2 φ 2j x − k and ψj,k (x) := 2j/2 ψ 2j x − k


 

satisfy the following conditions:

φj,k , ψj,k : k ∈ Z, j ≥ j0 is an orthogonal basis of L2 (R),



1.
21
The term ‘father’ (respectively ‘mother’) comes from the fact that the functions φj,k (respectively ψj,k ) are
derived by dilations and translations from the original function φ (respectively ψ).
210 8 Smooth Hazard Rate Estimators

2. φ and ψ are bounded and compactly supported,

3. y k ψ(y) dy = 0 for 0 ≤ k ≤ r − 1 and κ = (r!)−1 y r ψ(y) dy = 0.


R R

This implies that an arbitrary square–integrable function may be expanded in a generalized


F OURIER series.
Readers who are interested in the rather complicated theory and computation of wavelet–based
hazard rate estimation should consult one of the original papers on this subject: A NTONIADIS et
al. (1994, 1999), B ÉZANDRY et al. (2005), D UPUY /G NEYOU (2011), L I (2002), PATIL (1997),
W U /W ELLS (2003). An introduction in wavelets and their statistical applications is H ÄRDLE et
al. (1998).
9 Hazard Plotting
Hazard plotting allows estimation as well as informal testing of hypotheses. So this chapter marks
the transition from one topic of statistical inference to another one.

9.1 Introduction and Motivation


Data plotting, especially probability plotting, has been applied for a long time by engineers to
display and interpret failure data because of its simplicity and effectiveness. Data plotting is often
used in place of or in addition to standard numerical methods of data analysis because it serves a
lot of purposes:

• A plot provides a complete and easy–to–grasp picture of data according to an old Chinese
proverb saying that one picture is worth a thousand words, or in the present context, a
thousand numbers. A plot is particular useful in presenting data, since it aids in convincing
others of conclusions drawn from data by numerical methods.

• A plot provides a convenient means of fitting a theoretical distribution to data. This can
can be done by drawing a straight line by eye through the plotted data points on specialized
graph paper. This line is used to smooth, interpolate and extrapolate data. Estimates of
distribution parameters, percentiles, and predictions of number of failed and unfailed units
in specified periods of time are easily obtained from this straight line.

• A plot allows one to assess whether a chosen theoretical distribution provides an adequate
fit to the data or not. The data points will tend to plot as a straight line on the plotting paper
for a satisfactory distribution. Non–random departures of the plotted data from a straight
line can provide useful information to the statistician. Such departures may indicate that
the chosen distribution is incorrect, that there is more than one failure mode, or that certain
data points are outliers that do not fit in with the rest of the data.

The object of plotting in this chapter is the cumulative hazard function H(x). For this purpose
we first have to find estimates of H(x), see Sect. 5.2., which then will be displayed either on
normal (= naturally or linearly scaled) graph paper or on specialized graph paper as presented
in Sect. 9.2. The plot on normal graph paper serves to make non–parametric inference whereas
the plot on special hazard paper aims at making inference on a hypothetical parametric lifetime
distribution. We close this chapter by a section on how to find the appropriate hazard–scale for
distributions belonging to the location–scale family.

9.2 Hazard Plots and Hazard Paper1


Only little knowledge can be gained from line plots of the direct estimator H(x)
b or of the indirect
estimator H(x) as given in Fig. 5/2. When the left–hand edges of the stair–case plot form a
e
nearly convex (concave) line we may guess that the sampled distribution is IHR (DHR) and when
they scatter around a straight line we may have a sample from an exponential distribution. More
insight can be found when we make a point–plot on hazard paper. The basic idea is to make plots
that should be roughly linear if the proposed family of distributions seems to have generated the
sample at hand, since departures from linearity can readily be appreciated by eye.
1
Suggested reading for this section: N ELSON (1969, 1970, 1972, 1982), R INNE (2010).
212 9 Hazard Plotting

Hazard plotting — like probability plotting — can be successfully applied to location–scale dis-
tributions2 and to those distributions that after suitable transformation can be converted into a
location–scale type. Hazard plots and probability plots are closely related to one another, the
main difference is the scaling of the ordinate where to lay down the CHR–values instead of the
CDF–values and the choice of the plotting position, i.e., the ordinate–value to be plotted against
the ordered sample values xi on the abscissa. Each member of the location–scale family has an
ordinate–scaling of its own, distorted in such a way that, when the sample comes from the per-
tinent distribution, the plotted points on the graph paper will randomly scatter around a straight
line, thus giving a graphical and informal goodness–of–fit test. The fitted line — either fitted by
eye or by regression — enables the statistician to read off estimates of the location parameter and
of the scale parameter, respectively, either as point or as an interval on the abscissa.
We will first demonstrate how to construct a hazard paper and how to find estimates of the location
and scale parameters for a genuine location–scale distribution and for distributions transformed
to location–scale type. Then we comment on the choice of the plotting position.
A random variable X is said to belong to the location–scale family when its CDF

FX (x | a, b) = Pr(X ≤ x | a, b) (9.1a)

is a function of only (x − a)/b :


 
x−a
FX (x | a, b) = F ; a ∈ R, b > 0; (9.1b)
b
where F (·) denotes a distribution having no other parameters. Different F (·)’s correspond to
different members of the family. The random variable
X −a
Y = (9.1c)
b
is called the reduced variable.3 The location parameter a is either a measure of central tendency
(mean, median, mode) or a threshold parameter. The scale parameter b can be either the standard
deviation or the length of the distribution’s support or the length of a central (1 − α)–interval
for X.
We will write the reduced CDF as
 
x−a x−a
FY (y) := F , y= , (9.1d)
b b
and the CCDF belonging to (9.1a,b) read

SX (x | a, b) = 1 − FX (x | a, b) (9.1e)
SY (y) = 1 − FY (y). (9.1f)

We now turn to the CHR and find


 
HX (x | a, b) = − ln 1 − FX (x | a, b)
 
= − ln 1 − FY (y)
= HY (y), y = (x − a)/b. (9.1g)
2
See R INNE (2010) for a detailed representation of probability plotting and linear estimation techniques. This
monograph is supplemented by a MATLAB program that — among other features — produces probability
papers.
3
Some authors call it the standardized variable.
 We will
p refrain from using this name because, conventionally,
a standardized variable is defined as Z = X − E(X) Var(x), and thus has mean E(Z) = 0 and variance
Var(Z) = 1. The normal distribution, which is a member of the location–scale family, is the only distribution
with a = E(X) and b2 = Var(X). So, in this case reducing and standardizing are the same.
9.2 Hazard Plots and Hazard Paper 213

Let Λ, Λ ≥ 0, be a value of the CHR, then the hazard quantile of order Λ in the reduced case is

yΛ = HY−1 (Λ) (9.1h)

and consequently
xΛ = a + b yΛ . (9.1i)

The hazard paper for a location–scale distribution is now constructed by taking the horizontal
axis (abscissa) for x or xΛ and the vertical axis (ordinate) for y or yΛ 4 where the labeling of this
axis is according to the corresponding CHR–value Λ. This procedure gives a scaling with respect
to Λ which is non–linear, the only exception is the exponential distribution,5 where Λ and YΛ
coincide:  
x−a
FX (x | a, b) = 1 − exp − = FY (y)
b
x−a
HX (x | a, b) = = y = HY (y) = Λ
b
yΛ = HY−1 (Λ) = Λ.

The probability grid on a probability paper and the hazard grid on a hazard paper for one and the
same distribution are related to one another because

Λ = − ln(1 − P ) (9.1j)
P = 1 − exp(−Λ), (9.1k)

where P is a given value of the CDF. Thus, a probability grid may be used for hazard plotting
when the P –scaling on the ordinate is supplemented by a Λ–scaling, see Fig. 9/1. Conversely, a
hazard paper may be used for probability plotting.
The extreme value distribution of type I for the minimum, the log–W EIBULL distribution, is a
genuine location–scale distribution:
  
x−a
FX (x | a, b) = 1 − exp − exp ; a ∈ R, b > 0, x ∈ R, (9.2a)
b
 
FY (y) = 1 − exp − exp(y) , y ∈ R. (9.2b)

So, the reduced CHR reads


 
HY (y) = − ln 1 − FY (y)
  
= − ln exp − exp(y)
= exp(y). (9.2c)

The reduced hazard quantile of order Λ is

yΛ = ln Λ (9.2d)
4
Some authors construct hazard paper by interchanging the axes. This approach has some justification when
looking at (9.1i), where the dependent variable is xΛ which normally is laid down on the ordinate, and yΛ is the
independent variable, to be displayed on the abscissa.
5
For a discrete distribution the exception is the geometric distribution with

Λ = HY (y) = P y; y = 0, 1, 2 . . . , 0 ≤ P < 1

and
Λ
yΛ = HY−1 (Λ) = .
P
214 9 Hazard Plotting

and the X–hazard quantile is

xΛ = a + b yΛ = a + b ln Λ. (9.2e)

The hazard paper of the log–W EIBULL distribution has a logarithmic scale on the ordinate and a
linear scale on the abscissa. For didactic reasons we have given three scalings in Fig. 9/1, the Λ–,
the P – and the y–scaling, and only the last one is linear.

Figure 9/1: Hazard paper of the log–W EIBULL distribution

When the straight line xΛ = a + b yΛ is given, we may find a and b by suitably chosen values of
Λ. For Λ = 1 we have
x1 = a + b ln 1 = a (9.2f)
and the distance between the x–hazard quantiles of order Λ = 1 and Λ = exp(1) = e leads to b :

xe − x1 = (a + b ln e) − (a + b ln 1) = b. (9.2g)

A distribution transformable to location–scale type is the W EIBULL distribution with

x−a c
   
FX (x | a, b, c) = 1 − exp − ; a ∈ R, b, c > 0, x ≥ a. (9.3a)
b

a, b, c are the original location, scale and shape parameters. When a, the lower threshold of X,
is known (mostly a = 0) or has been estimated in some way or the other, the transformed variable

X ∗ = ln(X − a) (9.3b)

has the log–W EIBULL distribution:


 ∗
x − a∗
 
∗ ∗ ∗
FX ∗ (x | a , b ) = 1 − exp − exp ; a∗ ∈ R, b∗ > 0, x∗ ∈ R (9.3c)
b∗
9.2 Hazard Plots and Hazard Paper 215

where

a∗ = ln b, (9.3d)
b∗ = 1 c.

(9.3e)

So, the hazard paper of the W EIBULL distribution has the same ordinate as the log–W EIBULL
distribution, but a logarithmic scale on the abscissa.
We now turn to how to find an estimate of H(x) = Λ, i.e., how to find a plotting position on
the hazard grid. Hazard plots are not based on the PLE of S(x), which would give H(x e i) =
− ln S(xi ), but they rest upon the empirical cumulative hazard function (5.12a). When the data
b
set is singly type–II censored, the observed distinct lifetimes x1 < x2 < . . . < xk are the first k
lifetimes in a sample of size n,6 and the number of units at risk at xi is ni = n − i + 1. This gives

j
X 1
H(x
b j) = Λ
bj = ; j = 1, 2, . . . , k. (9.4)
n−i+1
i=1

The quantity ni = n − i + 1 in (9.4) is nothing but the reverse rank which results in the case
of random censoring when all observations — censored as well as uncensored — would be or-
dered, but then the summation is only over those reciprocal reverse ranks belonging to uncensored
observations, see Example 9/1.
One argument in support of (9.4) is that with singly type–II censoring it can be shown that the
estimator (9.4) is unbiased:
 
E H(x
b j ) = H(xj ). (9.5)

To prove this suppose X has survivor function S(x). As is well known, the random variable
U = S(X) has an uniform distribution on [0, 1], and hence W = − ln U = H(X) has the
reduced exponential distribution with PDF f (w) = exp(−w), w ≥ 0. Therefore, if X1 <
X2 < . . . < Xn are the ordered random observations in a sample of size n, the random variable
Wj = H(Xj ) is the j-th ordered observation in a random sample of size n from the reduced
exponential distribution. As is known too, see R INNE (2010, p. 121), the mean of Wj is

j
X 1
E(Wj ) = , (9.6)
n−i+1
i=1

and thus the stated result follows from (9.4). It is more tedious, see N ELSON (1972), to show that
if the data are progressively (= multiply) type–II censored, the result (9.5) still holds, where Xj
represents the j-th smallest uncensored observation. N ELSON (1982) suggests modified plotting
positions obtained by averaging the hazard step function at thejumps xj . The modified position
of the earliest failure x1 is half its regular hazard value Λ1 = 1 n. The modified positions agree
better with a distribution fitted by maximum likelihood.
How to find estimates for the location–scale parameters a and b? — A first possibility is least–
squares estimation of (9.1i), interpreted as a regression of YΛ on XΛ , where the regressand is
taken as uncensored observation of order j and the regressor is

YΛb j = HY−1 Λ

bj , (9.7a)

the reduced hazard quantile of order Λ


b j estimated by (9.4). Introducing the following vectors and

6
The reasoning will be the same for an uncensored sample where k = n.
216 9 Hazard Plotting

matrices
     
X1 1 YΛb 1
       
X2 1  YΛb 2 a
     
     
x=
 ..
; 1 = 
  ..
; y
 b =  ..
 ; θ =  ; A = 1 y
 b (9.7b)
 .   .   .  b
     
Xk 1 YΛb k

the ordinary least–squares (OLS) estimator of θ is


 
a −1 0
θb =   = A0 A
b
A x. (9.7c)
bb

This estimator is not statistically optimal as the regressor variables YΛb j are neither homoscedastic
nor free of autocorrelation.7
A second possibility is an eye–fitted straight line to the point–cluster on the hazard paper
whereby the user should keep in mind that the horizontal distances between the xj ’s and the
straight line have to be rendered as small as possible. If we look at (9.1i) we see that

− a = xΛ0 , where Λ0 is such that yΛ0 = 0 and (9.8a)

− b = xΛ1 − xΛ0 , where Λ1 is such that yΛ1 = 1. (9.8b)

So we find so–called hazard–quantile estimates of a and b by processing — according to


(9.8a,b) — x bΛ0 and xbΛ1 belonging to Λ0 and Λ1 on the eye–fitted straight line.8 When the
straight line has been fitted by OLS the estimates read–off according to (9.8a,b) will be identical
— apart from errors due to rounding and reading–off — to the OLS estimates.

Example 9/1: Hazard plotting and parameter estimation for a logistic distribution

The following table gives a simulated data set (n = 15) from a logistic distribution having a = 20 and
b = 2. The sample has been randomly censored. The logistic distribution has
  −1
x−a
SX (x) = 1 + exp , a ∈ R, b > 0, x ∈ R, (9.9a)
b
giving the cumulated hazard rate
  
x−a
HX (x) = − ln SX (x) = ln 1 + exp . (9.9b)
b
From the reduced distribution we find
 
Λ = HY (y) = ln 1 + exp(y) , y ∈ R, (9.9c)

= HY−1 (Λ) = ln exp(Λ) − 1 , Λ ≥ 0.


 
yΛ (9.9d)

The hazard paper in Fig. 9/2 has an ordinate scaled according to (9.9c). The figure displays the plotted data
and the OLS–fitted straight line with parameter estimates

a = 21.1580, bb = 2.1428
b
7
See R INNE (2010, Chapter 4) for the alternative general least–squares estimator, which is implemented in the
MATLAB program LEPP appended to the monograph.
8
We have location–scale distributions where we have to choose other reduced quantiles than y = 0 and/or y = 1
to find hazard–quantile estimators for a and b, see, e.g., the arc–sine distribution in Sect. 9.3.
9.2 Hazard Plots and Hazard Paper 217

which do not differ much from the input parameters a = 20 and b = 2. In order to find the hazard–quantile
estimates we need
 
Λ0 = ln 1 + exp(0) = ln 2 ≈ 0.6931,
 
Λ1 = ln 1 + exp(1) ≈ ln 3.7183 ≈ 1.3133.

Fig. 9/2 then shows the way to read–off the hazard–quantile estimates which — apart from random er-
rors — agree with the OLS–estimates.

 j 
j xj δj nj δ j nj cj = P 1 ni
Λ
i=1
1 16.3 1 15 0.0667 0.0667
2 16.5 0 14 0 −
3 17.5 1 13 0.0769 0.1436
4 17.9 1 12 0.0833 0.2269
5 17.9 0 11 0 −
6 18.7 1 10 0.1000 0.3269
7 20.0 1 9 0.1111 0.4380
8 20.1 1 8 0.1250 0.5630
9 20.4 1 7 0.1429 0.7059
10 20.7 0 6 0 −
11 21.7 1 5 0.2000 0.9059
12 22.2 1 4 0.2500 1.1559
13 24.2 1 3 0.3333 1.4892
14 26.3 1 2 0.5000 1.9892
15 26.3 0 1 0 −

Figure 9/2: Hazard plot for the logistic distribution


218 9 Hazard Plotting

9.3 Hazard Papers for Location–scale Distributions


This section informs on how to scale the cumulative hazard rate when the random variable either
has a genuine location–scale distribution or a distribution transformable to location–scale type.
We will also give those CHR–values and their hazard quantiles that allow to comfortably read–off
estimates of the location and scale parameters.
We start with genuine location–scale distributions.
Arcsine distribution

π − 2 arcsin x−a

b
S(x) = ; a ∈ R, b > 0, a − b ≤ x ≤ a + b
2π  
Λ = HY (y) = ln(2 π) − ln π − 2 arcsin(y) ; −1 ≤ y ≤ 1
 
yΛ = cos π exp(−Λ) ; Λ ≥ 0
a = x(− ln[1/2]) ≈ x0.6931
b = x(− ln[1/3]) − x(− ln[2/3]) ≈ x1.0986 − x0.4055

C AUCHY distribution

1 1 x−a

S(X) = 2 − π ; a ∈ R, b > 0, x ∈ R
arctan b
 
Λ = HY (y) = ln(2 π) − ln π − 2 arctan(y) ; y ∈ R
 
yΛ = tan π exp(−Λ) ; Λ ≥ 0
a = x(− ln[1/2]) ≈ x0.6931
b = x(− ln[1/4]) − x(− ln[3/4]) ≈ x1.3863 − x0.2877

Cosine distribution, ordinary

x−a
 
S(x) = 0.5 1 − sin b ;
π
b ≤ x ≤ a + π2 b
a ∈ R, b > 0, a − 2

= HY (y) = ln 2 − ln 1 − sin(y) ; − π2 ≤ y ≤ π
 
Λ 2
 
yΛ = arcsin 1 − 2 exp(−Λ) ; Λ ≥ 0
a = x(− ln[1/2]) ≈ x0.6931
b = x(ln{2/[1−sin(0.5)]}) − x(ln{2/[1−sin(−0.5)]}) ≈ x1.3460 − x0.3015

Cosine distribution, raised

1 x−a 1 x−a
 
S(x) = 2 1− b − ; a ∈ R, b > 0, a − b ≤ x ≤ a + b
π sin π b
 
Λ = HY (y) = ln(2 π) − ln π (1 − y) − sin(π y) ; −1 ≤ y ≤ 1
yΛ − cannot be given in closed form
a = x(− ln[1/2]) ≈ x0.6931
b ≈ x(− ln 0.0908) − x(− ln 0.9092) ≈ x2.3391 − x0.0952
9.3 Hazard Papers for Location–scale Distributions 219

Exponential distribution

S(x) = exp − x−a



b ; a ∈ R, b > 0, x ≥ a
Λ = HY (y) = y; y ≥ 0
yΛ = Λ; Λ ≥ 0
a = x0
b = x1 − x0

Exponential distribution, reflected

x−a

S(x) = 1 − exp ; a ∈ R, b > 0, x ≤ a
b
 
Λ = HY (y) = − ln 1 − exp(y) ; y ≤ 0
 
yΛ = ln 1 − exp(−Λ) ; Λ ≥ 0

a is the upper threshold of the reflected exponential distribution where Λ = HX (a) = ∞. Thus,
a cannot be read off as a point on the abscissa. Therefore, we propose the following procedure.
First, from the difference of x(y = −1) = a − 1 b and x(y = −2) = a − 2 b we find

b = x(Λ[−1]) − x(Λ[−2]) ≈ x0.4587 − x0.1482 ,

and then from


 
x(y = −1) + b = x(y = −1) + x(y = −1) − x(y = −2) = 2 x(y = −1) − x(y = −2)

we have
a = 2 x(Λ[−1]) − x(Λ[−2]) ≈ 2 x0.4587 − x0.1482 .

Extreme value distribution of type I for the maximum (G UMBEL distribution)

S(x) = 1 − exp − exp − x−a


 
b ; a ∈ R, b > 0, x ∈ R
  
Λ = HY (y) = − ln 1 − exp − exp(−y) ; y ∈ R
  
yΛ = ln − ln 1 − exp(−Λ) ; Λ ≥ 0
a = x(− ln{1−exp[− exp(0)]}) ≈ x0.4587
b = x(− ln{1−exp[− exp(0)]) − x(− ln{1−exp[− exp(−1)]}) ≈ x0.4587 − x0.0683

Extreme value distribution of type I for the minimum (Log–W EIBULL distribution)

x−a
 
S(x) = exp − exp b ; a ∈ R, b > 0, x ∈ R
Λ = HY (y) = exp(y); y ∈ R
yΛ = ln(Λ); Λ ≥ 0
a = x(exp[0]) = x1
b = x(exp[1]) − x(exp[0]) ≈ x2.7181 − x1
220 9 Hazard Plotting

Half–C AUCHY distribution

2 x−a

S(x) = 1 − π arctan ; a ∈ R, b > 0, x ≥ a
b

= HY (y) = − ln 1 − π2 arctan(y) ; y ≥ 0
 
Λ
= tan π2 1 − exp(−Λ) ; Λ ≥ 0
  

a = x0
b = x(− ln[1−(2/π) arctan(1)]) − x0 ≈ x0.6931 − x0

Half–logistic distribution

2 exp − x−a

b  ; a ∈ R, b > 0, x ≥ a
S(x) =
1 + exp − x−a
b h i
2
Λ = HY (y) = − ln 1+exp(y) ; y≥0
 
yΛ = ln 2 exp(Λ) − 1 ; Λ ≥ 0
a = x0
b = x(− ln{2/[1+exp(1)]}) − x0 ≈ x0.6201 − x0

Half–normal distribution

x−a
 
S(x) = 2 1 − Φ b ; a ∈ R, b > 0, x ≥ a
  
Λ = HY (y) = − ln 2 1 − Φ(y) ; y ≥ 0
h i
yΛ = Φ−1 1 − exp(−Λ)
2 ; Λ≥0
a = x0
b = x(− ln{2 [1−Φ(1)]}) − x0 ≈ x1.1479 − x0

Φ(·) is the CDF of the standardized normal distribution and Φ−1 (·) is its percentile function.

Hyperbolic secant distribution

2 x−a
 
S(x) = 1 − π arctan exp b ; a ∈ R, b > 0, x ∈ R
2
  
Λ = HY (y) = − ln 1 − π arctan exp(y) ; y ∈ R

= ln tan π2 1 − exp(−Λ)
  
yΛ ; Λ≥0
a = x(− ln{1−(2/π) arctan[exp(0)]}) ≈ x0.6931
b = x(− ln{1−(2/π) arctan[exp(1)]}) − x(− ln{1−(2/π) arctan[exp(0)]}) ≈ x1.14941 − x0.6931
9.3 Hazard Papers for Location–scale Distributions 221

L APLACE distribution
 
 1 − 0.5 exp − a−x  for x ≤ a 
b
S(x) = ; a ∈ R, b > 0
 0.5 exp − x−a  for x ≥ a 
b

 − ln 1 − 0.5 exp(−y) for y ≤ 0
Λ = HY (y) =
 − ln 0.5 exp(−y) for y ≥ 0

 − ln 2 1 − exp(−Λ) for Λ ≤ − ln 0.5 ≈ 0.6931
yΛ =
 − ln 2 exp(−Λ) for Λ ≥ − ln 0.5 ≈ 0.6931

a = x(− ln[1−0.5]) ≈ x0.6931


b = x(− ln[0.5 exp(−1)] − x(− ln[1−0.5]) ≈ x1.6931 − x0.6931

Logistic distribution

 x−a
−1
S(x) = 1 + exp b; a ∈ R, b > 0, x ∈ R
 
Λ = HY (y) = ln 1 + exp(y) ; y ∈ R
 
yΛ = ln exp(Λ) − 1 ; Λ ≥ 0
a = x(ln[1+exp(0)]) ≈ x0.6931
b = x(ln[1+exp(1)]) − x(ln[1+exp(0)]) ≈ x1.3133 − x0.6931

M AXWELL –B OLTZMANN distribution


h i
x−a 2

S(x) = 1 − Fχ23 ; a ∈ R, b > 0, x ≥ a
b
h i
Λ = HY (y) = − ln 1 − Fχ23 (y) ; y ≥ 0
yΛ − cannot be given in closed form
a = x(− ln[1−Fχ2 (0)]) = x0
3

b = x(− ln[1−Fχ2 (1)]) − x(− ln[−Fχ2 (0)]) ≈ x0.2215 − x0


3 3

Fχ23 (·) is the CDF of the χ2 –distribution with 3 degrees of freedom.


Normal distribution

x−a

S(x) = 1 − Φ b; a ∈ R, b > 0, x ∈ R
 
Λ = HY (y) = − ln 1 − Φ(y) ; y ∈ R
= Φ−1 1 − exp(−Λ) ; Λ ≥ 0
 

a = x(−[1−Φ(0)]) ≈ x0.6931
b = x(− ln[1−Φ(1)]) − x(−[1−Φ(0)]) ≈ x1.8407 − x0.6931

Φ(·) is the CDF of the standardized normal distribution and Φ−1 (·) is its percentile function.
222 9 Hazard Plotting

Parabolic U–shaped distribution


h 3 i
S(x) = 12 1 − x−a b ; a ∈ R, b > 0, a − b ≤ x ≤ a + b
= HY (y) = ln 2 − ln 1 − y 3 ; −1 ≤ y ≤ 1
 
Λ
 1/3
yΛ = 1 − exp(ln 2 − Λ ; Λ≥0
a = x(ln 2−ln[1−03 ]) ≈ x0.6931
b = x(ln 2−ln[1−0.53 ]) − x(ln 2−ln[1−(−0.5)3 ]) ≈ x0.8267 − x0.5754

Parabolic inverted U–shaped distribution


h i
x−a 3
S(x) = 12 − 14 3 (x−a)
b − b ; a ∈ R, b > 0, a − b ≤ x ≤ a + b
= HY (y) = ln 4 − ln 2 − 3 y + y 3 ; −1 ≤ y ≤ 1
 
Λ
yΛ 3 = 4 exp(−Λ) − 2
− admissible solution, i.e., − 1 ≤ yΛ ≤ 1, of 3 yΛ − yΛ
a = x(ln 4−ln 2) ≈ x0.6931
b = x(ln 4−ln[2−3·0.5+0.53 ]) − x(ln 4−ln[2−3 (−0.5)+(−0.5)3 ]) ≈ x18563 − x0.1699

R AYLEIGH distribution
h i
x−a 2
S(x) = exp − 21

b ; a ∈ R, b > 0, x ≥ a
= HY (y) = y 2 2; y ≥ 0

Λ

yΛ = 2 Λ; Λ ≥ 0
a = x0
b = x0.5 − x0

R AYLEIGH distribution, inverse


  2 
b
S(x) = 1 − exp − x−a ; a ∈ R, b > 0, x ≥ a
h  i
Λ = HY (y) = − ln 1 − exp − y12 ; y ≥ 0
r
yΛ =  1 ; Λ ≥ 0
− ln 1−exp(−Λ)

a = x0
b = x(− ln[1−exp(−1)]) − x0 ≈ x0.4587 − x0

Semi–elliptical distribution
 q 
1 1 x−a x−a 2 x−a
 
S(x) = 2 − π b 1− b + arcsin b ; a ∈ R, b > 0, a − b ≤ x ≤ a + b
n h p io
Λ = HY (y) = − ln 12 − π1 y 1 − y 2 + arcsin(y) ; −1 ≤ y ≤ 1
yΛ − admissible solution, ı.e., − 1 ≤ yΛ ≤ 1, of
q
yΛ 1 − yΛ 2 + arcsin(y ) = π [0.5 − exp(−Λ)]
Λ

a = x(− ln 0.5) ≈ x0.6931


b = x(− ln[(1/2)−(1/π){0.5 √1−0.52 +arcsin(0.5)}]) −
x(− ln[(1/2)−(1/π){0.5 √1−(−0.5)2 +arcsin(−0.5)}]) ≈ x1.6322 − x0.2175
9.3 Hazard Papers for Location–scale Distributions 223

T EISSIER distribution
x−a x−a
 
S(x) = exp 1 + b − exp b ; a ∈ R, b > 0, x ≥ a
Λ = HY (y) = exp(y) − y − 1; y ≥ 0
yΛ − admissible solution, i.e., yΛ ≥ 0, of exp(yΛ ) − yΛ = 1 + Λ
a = x0
b = x(exp[1]−2) − x0 ≈ x0.7183 − x0

Triangular distribution, right–angled and negatively skew


 
a−x x−a
S(x) = + 2 ; a ∈ R, b > 0, a − b ≤ x ≤ a
b b
Λ = HY (y) = − ln(−y 2 − 2 y); −1 ≤ y ≤ 0
yΛ − admissible solution, i.e., − 1 ≤ yΛ ≤ 0, of yΛ2 + 2 y = − exp(−Λ)
Λ
a is the upper threshold where HX (a) = ∞, and thus cannot be read off. From the difference of
x(y = −0.5) = a − 0.5 b and x(y = −1) = a − b we find
b
= xΛ(−0.5) − xΛ(−1) ≈ x0.2877 − x0
2
and then
b
a = x0 + 2 ≈ 2 x0.2877 − x0 .
2
Triangular distribution, right–angled and positively skew

a+b−x 2
 
S(x) = ; a ∈ R, b > 0, a ≤ x ≤ a + b
b
Λ = HY (y) = −2 ln(1 − y); 0 ≤ y ≤ 1
= 1 − exp − Λ2 ; Λ ≥ 0


a = x0
b = 2 (x1.3863 − x0 )

Triangular distribution, symmetric

x−a 2
   
1
 1− 2 1+ b for a − b ≤ x ≤ a 

 
 

S(x) = ; a ∈ R, b > 0
x−a 2
   
 1 
2 1− for a ≤ x ≤ a + b

 

b
 − ln 1 − 1 (1 + y)2  for − 1 ≤ y ≤ 0
2
Λ = HY (y) =
 ln 2 − 2 ln(1 − y) for 0 ≤ y ≤ 1
 q
 2 1 − exp(−Λ) − 1 for Λ ≤ ln 2 ≈ 0.6931
yΛ =
 1 − p2 exp(−Λ) for Λ ≥ ln 2 ≈ 0.6931

a = x(ln 2) ≈ x0.6931
b = x(ln 2− ln[1−0.5]) − x(− ln[1−0.5 (1−0.5)2 ]) ≈ x2.0794 − x0.1335
224 9 Hazard Plotting

Uniform distribution
x−a
S(x) = 1 − ; a ∈ R, b > 0, a ≤ x ≤ a + b
b
Λ = HY (y) = − ln(1 − y); 0 ≤ y ≤ 1
yΛ = 1 − exp(−Λ); Λ ≥ 0
a = x0
b = 2 (x(− ln[0.5]) − x0 ) ≈ 2 (x0.6931 − x0 )

V–shaped distribution
 h i 
1 a−x 2


2 1+ b for a − b ≤ x ≤ a 
S(x) = h i ; a ∈ R, b > 0
1 a−x 2

2 1− b for a ≤ x ≤ a + b 

 ln 2 − ln(1 + y 2 ) for − 1 ≤ y ≤ 0
Λ = HY (y) =
 ln 2 − ln(1 − y 2 ) for 0 ≤ y ≤ 1
 p
 2 exp(−Λ) − 1 for Λ ≤ ln 2 ≈ 0.6931
yΛ =
 p1 − 2 exp(−Λ) for Λ ≥ ln 2 ≈ 0.6931

a = x(ln 2) ≈ x0.6931
b = x(ln 2−ln[1−0.52 ]) − x(ln 2−ln[1−(−0.5)2 ]) ≈ x0.9808 − x0.4700

We now turn to those distributions that by a ln–transformation can be converted into a location–
scale type. Generally, the original form of these distributions has three parameters, a location
(shift) parameter a, a scale parameter b and a shape parameter c. The location parameter a must
be known to make the ln–transformation. One way to find an estimate of a is by trial–and–error,
b i over ln(xi − a) is sufficiently linear.
i.e., a is chosen so that the hazard plot of Λ
Extreme value distribution of type II for the maximum (Inverse W EIBULL distribution)
"   #
x − a −c
S(x) = 1 − exp − ; a ∈ R, b > 0, c > 0, x ≥ a
b
X ∗ = ln(X − a) has an extreme value distribution of type I for the maximum (G UMBEL distri-
bution) with parameters a∗ = ln b and b∗ = 1 c. Thus we find


a∗ ≈ x∗0.4587
b∗ ≈ x∗0.4587 − x∗0.0683

 ln(x − a)–scaled
on the abscissa of a hazard paper with an ordinate scaled as Λ = − ln 1 −
exp − exp(y) , y ∈ R. b and c follow as re–transforms of a∗ and b∗ .


Extreme value distribution of type II for the minimum (F R ÉCHET distribution)


"   #
a − x −c
S(x) = exp − ; a ∈ R, b > 0, c > 0, x ≤ a
b
X ∗ = − ln(a − X) has an extreme value distributionif type I for the minimum (Log–W EIBULL
distribution) with parameters a∗ = − ln b and b∗ = 1 c. Thus we find

a∗ ≈ x∗1
b∗ ≈ x∗2.7181 − x∗1
9.3 Hazard Papers for Location–scale Distributions 225

on the − ln(a − x)–scaled abscissa of a hazard paper with an ordinate scaled as Λ = exp(y),
y ∈ R. b and c follow as re–transforms of a∗ and b∗ .
Extreme value distribution of type III for the maximum (Reflected W EIBULL distribution)

a−x c
   
S(x) = 1 − exp − ; a ∈ R, b > 0, c > 0, x ≤ a
b

X ∗ = − ln(a − X) has an extreme value distribution  of type I for the maximum (G UMBEL
distribution) with parameters a∗ = − ln b and b∗ = 1 c. Thus we find

a∗ ≈ x∗0.4587
b∗ ≈ x∗0.4587 − x∗0.0683

 − ln(x − a)–scaled abscissa of a hazard paper with an ordinate
on the scaled as Λ = − ln 1 −
exp − exp(y) , y ∈ R. b and c follow as re–transforms of a∗ and b∗ .
Extreme value distribution of type III for the minimum (W EIBULL distribution)

x−a c
   
S(x) = exp − ; a ∈ R, b > 0, c > 0, x ≥ a
b

X ∗ = ln(X − a) has an extreme value distribution  if type I for the minimum (log–G UMBEL
distribution) with parameters a∗ = ln b and b∗ = 1 c. Thus we find

a∗ ≈ x∗1
b∗ ≈ x∗2.7181 − x∗1

on the ln(x−a)–scaled abscissa of a hazard paper with an ordinate scaled as Λ = exp(y), y ∈ R.


b and c follow as re–transforms of a∗ and b∗ .
Log–normal distribution with lower threshold
 
ln(x − a) − e
a
S(x) = Φ − ; a ∈ R, e
a ∈ R, eb > 0, x > a
eb

X ∗ = ln(X − a) has a normal distribution with parameters e


a = E(X ∗ ) and eb =
p
Var(X ∗ ).
Thus we find
a ≈ x∗0.6931
e
eb ≈ x∗ ∗
1.8407 − x0.6931

on the
 ln(x − a)–scaled abscissa of a hazard paper with an ordinate scaled as Λ = − ln 1−
Φ(y) , y ∈ R.
Log–normal distribution with upper threshold
 
ln(a − x) − e
a
S(x) = Φ − ; a ∈ R, e
a ∈ R, eb > 0, x < a
eb

X ∗ = ln(a − x) has a normal distribution with parameters e


a = E(X ∗ ) and eb =
p
Var(X ∗ ).
a and eb are found in the same way as above.
e
226 9 Hazard Plotting

PARETO distribution of the first kind

x − a −c
 
S(x) = ; a ∈ R, b > 0, c > 0, x ≥ a + b
b

X ∗ = ln(X − a) has an exponential distribution with parameters a∗ = ln b and b∗ = 1 c. Thus




we find
a∗ = x∗0
b∗ = x∗1 − x∗0
on the ln(x − a)–scaled abscissa of a hazard paper with an ordinate scaled as Λ = y, y ≥ 0.
b and c follow as re–transforms of a∗ and b∗ .
Power function distribution
 c
x−a
S(x) = 1 − ; a ∈ R, b > 0, c > 0, a ≤ x ≤ a + b
b

X ∗ = ln(X − a) has a reflected exponential distribution with parameters a∗ = ln b and b∗ = 1 c.




Thus we find
a∗ = x∗0
b∗ ≈ x∗0.9808 − x∗0.4700
on the ln(x − a)–scaled abscissa of a hazard paper with an ordinate scaled as Λ = − ln(1 − y),
0 ≤ y ≤ 1. b and c follow as re–transforms of a∗ and b∗ .
10 Testing Hypotheses on
Life Distributions
This chapter is mainly devoted to testing hypotheses concerning the hazard rate (Sect. 10.2), but
in Sect. 10.3 we will also look for tests deciding whether a life distribution has one or the other
aging property, which have been introduced in Sect. 2.4. The approaches of this chapter are non–
parametric, the exception being Sect. 10.2.1 where we test the constancy of the hazard rate. As
the only continuous distribution with constant hazard rate is the exponential distribution, the tests
of Sect. 10.2.1 will be test for an exponential distribution.
This chapter is organized as follows:

• Sect. 10.1 present those statistical concepts which directly or indirectly give the test statis-
tics for most of the subsequent tests. These concepts are order statistics which lead to
spacings and the TTT–statistics which are a function of spacings.

• Sect. 10.2 examines the behavior of the hazard rate, i.e., its course and curvature.

• Finally, the topic of Sect. 10.3 are tests for several classes of aging.

A topic non treated here is that of comparing hazard rates, see Q IU /S HENG (2008), and of testing
the equality of two hazard rates, see C HENG (1985).

10.1 Prerequisites: Order Statistics, Spacings, TTT–statistics1


In life testing without replacement of failed units the observed life times naturally occur in as-
cending order. Thus, it is obvious to investigate order statistics or function thereof like spacings
and TTT–statistics. These concepts will furnish us with test statistics for a lot of hypotheses on
life distributions.
Let X = (X1 , X2 , . . . , Xn ) denote a sample of n independent random variables, each with CDF
F (·) and PDF f (·), and let Xn = (X(1) , X(2) , . . . , X(n) ) denote the vector of the associated
order statistics: X(1) ≤ X(2) ≤ . . . ≤ X(n) . The joint PDF of any subset of the order statistics
X(k1 ) , X(k2 ) , . . . , X(km ) with 1 ≤ k1 < k2 < . . . < km ≤ n is given by P YKE (1965) as
 
m+1
Q h ikj−kj−1−1
×

n! F (xj )−F (xj−1 )
 


 



 j=1 


fXk (x1 , . . . , xm ) = f (xj ) (10.1)
× if x1 < x2 < . . . < xm 
(k − k −1)!



 j j−1 


 
0 otherwise.

 

where x0 = −∞, xm+1 = ∞, F (x0 ) = 0, F (xm+1 ) = 1, km+1 = n + 1, k0 = 0 and


f (xm+1 ) = 1.
1
Suggested reading for this section: BALAKRISHNAN /C OHEN (1991), BARLOW (1979), BARLOW /C AMPO
(1975), BARLOW et al. (1972), B ERGMAN (1979), B ERGMNA /K LEFSJ Ö (1984), DAVID /NAGARAJA (2003),
K LEFSJ Ö (1982b), P YKE (1965).
228 10 Testing Hypotheses on Life Distributions

We look at special subsets of Xn .

1. m = 1 — a single order statistic X` ; ` = 1, 2, . . . , n

n!  `−1  n−`
fX(`) (x) = f (x) F (x) 1 − F (x) , x ∈ R (10.2a)
(` − 1)! (n − `)!
n  
X n  i  n−i
FX(`) (x) = F (x) 1 − F (x) , x∈R (10.2b)
i
i=`

FX` (x) is the CCDF of a binomial distribution with parameters n and P = F (x) and
may be expressed by an incomplete beta function ratio (= CDF of a beta distribution with
parameters ` and n − ` + 1.)
Especially, we have:

• ` = 1 — sample minimum
 n−1
fX(1) (x) = n f (x) 1 − F (x) , x∈R (10.2c)
 n
FX(1) (x) = 1 − 1 − F (x) , x ∈ R (10.2d)

• ` = n — sample maximum
 n−1
fX(n) (x) = n f (x) F (x) , x∈R (10.2e)
 n
FX(n) (x) = F (x) , x ∈ R. (10.2f)

2. m = 2 — a pair of order statistics (X(k) , X(`) ); k, ` = 1, 2, . . . n; k 6= `

n!
 
 k−1

 f (x) f (y) F (x) × 

(k − 1)!(` − k − 1)!(n − `)!

 


 

  `−k−1  n−` 
× F (y) − F (x) 1 − F (y)
 
fX(k) ,X(`) (x, y) = (10.3a)
for k < `, x < y and x, y ∈ R

 


 


 


 0 otherwise 

FZ(x) FZ(y)
 
n!

 

k−1
u ×

 

 (k − 1)!(` − k − 1)!(n − `)!

 


FX(k) ,X(`) (x, y) = 0 u (10.3b)



 × (v − u)`−k−1 (1 − v)n−` dv du 




 

for k < `, x < y and x, y ∈ R
 

Especially, we have:

• k = i, ` = i + 1, i = 1, 2, . . . , n − 1 (two adjacent order statistics)

n!
 
 i−1

 f (x) f (y) F (x) × 

(i − 1)!(n − i − 1)!

 


 

  n−i−1 
× 1 − F (y)
 
fX(i) ,X(i+1) (x, y) = (10.3c)
for i = 1, 2, . . . , n−1, x < y and x, y ∈ R

 


 


 


 0 otherwise 

10.1 Prerequisites: Order Statistics, Spacings, TTT–statistics 229

• k = 1, ` = n (sample minimum and sample maximum)


  n−1 


 n (n − 1) f (x) f (y) F (y) − F (x) 


 
fX(1) ,X(n) (x, y) = for x < y and x, y ∈ R (10.3d)

 


 0 otherwise 

3. m = n — the complete vector of order statistics Xn = (X(1) , . . . , X(n) )


 
n! f (x1 ) · . . . · f (xn ) for x1 < . . . < xn and xi ∈ R ∀ i
fXn (x1 , . . . , xn ) = (10.4)
 0 otherwise 


Order statistics form a M ARKOV process, more precisely: X(i) , 1 ≤ i ≤ n is a non-
homogeneous, discrete-parameter, real-valued M ARKOV process whose initial measure is given
by (10.2d):
 n
FX(1) (x) = 1 − 1 − F (x) ,
and whose transition-CDF Pr(X(i+1) ≤ x | X(i) = y) is the CDF of the minimum of n − i
independent observations of the CDF F (·) truncated at y, namely:
 n−i  −n+i
Pr(X(i+1) ≤ x | X(i) = y) = 1 − 1 − F (x) 1 − F (y) for x > y. (10.5)

The PDF and CDF of order statistics from most parametric distributions have no nicely looking
and closed formulas. Two exceptions are the uniform and the exponential distributions which
play a dominant role in testing hypotheses on life distributions.

1. Order statistics from the reduced uniform distribution


 
 1 for 0 ≤ x ≤ 1 
f (x) = (10.6a)
 0 otherwise 
 


 0 for x < 0 


 
F (x) = x for 0 ≤ x ≤ 1 (10.6b)

 


 1 
for x > 1 

n!
fX` (x) = x`−1 (1 − x)n−`
(` − 1)!(n − `)!
for ` = 1, . . . , n; 0 ≤ x ≤ 1 (10.6c)
n!
fXk ,X` (x, y) = xk−1 (y − x)`−k−1 ×
(k − 1)!(` − k − 1)!(n − `)!
× (1 − y)n−` for k < `, x < y and x, y ∈ [0, 1] (10.6d)

fXn (x1 , . . . , xn ) = n! for 0 ≤ x1 < x2 < . . . < xn ≤ 1 (10.6e)


`
E(X(`) ) = ; ` = 1, 2, . . . , n (10.6f)
n+1
2
 `+1 `
E X(`) = ; ` = 1, 2, . . . , n (10.6g)
n+2 n+1
230 10 Testing Hypotheses on Life Distributions

` (n − ` + 1)
Var(X(`) ) = ; ell = 1, 2, . . . , n (10.6h)
(n + 1)2 (n + 2)
k (n − ` + 1)
Cov(X(k) , X(`) ) = ; k < `; k, ` = 1, . . . , n (10.6i)
(n + 1)2 (n + 2)
s
k (n − ` + 1)
Cor(X(k) , X(`) ) = ; k < `; k, ` = 1, . . . , n (10.6j)
` (n − k + 1)

2. Order statistics from the exponential distribution

 1 exp − x
   
for x ≥ 0, b > 0 
f (x) = b b (10.7a)
 0 otherwise 
 
 0 for x < 0 
F (x) = (10.7b)
 1 − exp − x
 
for x ≥ 0 
b
 
n! (n − ` + 1) x
fX(`) (x) = exp − ×
b (` − 1)!(n − `)! b
h  x i`−1
× 1 − exp − ; x≥0 (10.7c)
b
 
n! x + y (n − ` + 1) x
fX(k) ,X(`) (x, y) = exp − ×
b2 (k − 1)!(` − k − 1)!(n − `)! b
h  x ik−1 h  x  y i`−k−1
× 1 − exp − exp − − exp − ;
b b b
k < `, x < y and x, y ≥ 0 (10.7d)
n
!
n! 1X
fXn (x1 , . . . , xn ) = n exp − xi ; 0 ≤ x1 < x2 < . . . < xn < ∞ (10.7e)
b b
i=1

`
X 1
E(X(`) ) = b ; ` = 1, 2, . . . , n (10.7f)
n−i+1
i=1

` `
!2 
2
X 1 X 1
= b2 

E X(`) 2
+ ; ` = 1, 2, . . . , n (10.7g)
(n−i+1) n−i+1
i=1 i=1

`
2
X 1
Var(X(`) ) = b ; ` = 1, 2, . . . , n (10.7h)
(n − i + 1)2
i=1

k
2
X 1
Cov(X(k) , X(`) ) = Var(X(k) ) = b ; k < `; k, ` = 1, . . . , n (10.7i)
(n−i+1)2
i=1
v
u k
uP 1
u i=1 (n−i+1)2
u
Cor(X(k) , X(`) ) = u
u P̀ ; k < `; k, ` = 1, . . . , n (10.7j)
t 1
(n−i+1)2
i=1
10.1 Prerequisites: Order Statistics, Spacings, TTT–statistics 231

The difference Si of two adjacent order statistics X(i−1) , X(i) is called spacing:

Si = X(i) − X(i−1) ; i = 2, 3, . . . , n. (10.8a)

For a lifetime variable the spacing is nothing but the waiting time (time elapsed) between the
(i + 1)-st and the i-th failure. Let
S = (S2 , . . . , Sn ) (10.8b)
be the vector of all spacings in a sample of size n, then by means of a linear transformation of
(10.4) we find
R∞ Q
n
 
 n!
 f (x+s2 +. . .+si ) dx for si > 0, 2 ≤ i ≤ n 

fS (s2 , . . . , sn ) = −∞ i=2 (10.8c)
 
 0 otherwise. 

For a single spacing we have from (10.3c)


Z∞
fSi (x) = fX(i−1) ,X(i) (y, y + x) dy
−∞
Z∞
n!  i−2  n−i
= F (y) 1 − F (y + x) f (y)f (y + x)dy;
(i − 2)!(n − i)!
−∞
2 ≤ i ≤ n, x > 0 (10.8d)

As this expression indicates, the formulas for the PDFs of sets of arbitrary spacings are not par-
ticularly simple, although they are derived straightforwardly. P YKE (1965, p. 400), based on the
M ARKOV property of order statistics, gives the following PDF of (Si , Sj ), i 6= j,
Z∞ Z∞
n!
fSi ,Sj (u, v) = [F (x)]i−2 [F (y) − F (x + u)]j−i−2 ×
(i − 2)!(j − i − 2)!(n − j)!
−∞ x+u
n−j
× [1 − F (y + v)] f (x) f (x + u) f (y) f (y + v) dy dx;
i 6= j; u, v > 0. (10.8e)

For two adjacent spacings Si and Si+1 we have


Z∞
n!
fSi ,Si+1 (u, v) = [F (x)]i−2 [1 − F (x + u + v)]n−i−1 ×
(i − 2)!(n − i − 1)!
−∞
× f (x) f (x + u) f (x + u + v) dx; i = 2, . . . , n − 1; u, v > 0. (10.8f)

We now look at spacings from the uniform and the exponential distributions. These spacings are
of special interest in lifetime analysis.
Spacings of the reduced uniform distribution
Let Xn = (X(1) , X(2) , . . . , X(n) ) denote the order statistics in a sample of size n from the
reduced uniform distribution (10.6a,b). Set X(0) = 0 and X(n+1) = 1. The spacings are defined
by
Si = X(i) − X (i−1) for 1 ≤ i ≤ n + 1. (10.9a)
Since S1 + S2 + . . . + Sn = 1, the random vector S = (S1 , S2 , . . . , Sn+1 ) has a singular
distribution, but when restricted to this hyperplane has the PDF

fS (s1 , . . . , sn+1 ) = n! if si ≥ 0 and s1 + . . . + sn+1 = 1. (10.9b)


232 10 Testing Hypotheses on Life Distributions

It follows from (10.9b) that the PDF of S remains unchanged under any permutation of its co-
ordinates, i.e., uniform spacings are interchangeable variates. This implies in particular, that the
PDF of any Si is equal to that of S1 = X(1) and the joint PDF of any pair (Si , Sj ), i 6= j, is the
same as that of (S1 , S2 ). So we have

FSi (x) = FS1 (x) = FX(1) (x) = 1 − (1 − x)n , 0 ≤ x ≤ 1, (10.9c)


fSi (x) = fS1 (x) = fX(1) (x) = n (1 − x)n−1 , 0 ≤ x ≤ 1, (10.9d)

and for x, y ≥ 0 with x + y ≤ 1

FSi ,Sj (x, y) = Pr(X(1) ≤ x, X(2) − X(1) ≤ y)


Z1 "  n−1 #
y
= n 1− 1− (1 − u)n−1 du
1−u
0
= 1 − (1 − x)n + (1 − y)n − (1 − x − y)n ,

(10.9e)

fSi ,Sj (x, y) = n (n − 1) (1 − x − y)n−2 . (10.9f)

For the single moments of Si we find


1
E(Si ) = E(S1 ) = E(X(1) ) = , (10.9g)
n+1
2
E(Si2 ) = E(S12 ) = E(X(1)
2
) = , (10.9h)
(n + 2) (n + 1)
n
Var(Si ) = Var(S1 ) = Var(X(1) ) = (10.9i)
(n + 2) (n + 1)2
and, since
1−u
E(S2 | S1 = u) = for 0 < u < 1,
n
the product moments are
E[S1 (1−S1 )] 1
E(S1 S2 ) = = = E(Si Sj ); i 6= j; i, j = 1, . . . , n+1, (10.9j)
n (n+1)(n+2)
1
Cov(Si , Sj ) = − 2
; i 6= j; i, j = 1, . . . , n + 1, (10.9k)
(n + 1) (n + 2)
1
Cor(Si , Si ) = − ; i 6= j; i, j = 1, . . . , n + 1. (10.9l)
n

Spacings of the exponential distribution


Let Xn = (X(1) , X(2) , . . . , X(n) ) denote the set of order statistics from an exponential distribu-
tion with f (x) = (1/b) exp[−(x/b)], x ≥ 0. Set X(0) = 0 and Si = X(i) − X(i−1) , 1 ≤ i ≤ n.
Then for the vector
S = (S1 , S2 , . . . , Sn )
the joint PDF follows from (10.4) as
n  
n! Y s1 + . . . + si
fS (s1 , . . . , sn ) = n exp −
b b
i=1
n
" #
n! 1X
= n exp − (n − i + 1) si
b b
i=1
n  
Y n−i+1 (n − i + 1) si
= exp − ; si > 0; 1 ≤ i ≤ n. (10.10a)
b b
i=1
10.1 Prerequisites: Order Statistics, Spacings, TTT–statistics 233

Hence, the joint PDF of exponential spacings is the product of n marginal exponential densities
 
n−i+1 (n − i + 1) s
fSi (s) = exp − ; s > 0; i = 1, . . . , n; (10.10b)
b b

the parameters being n/b, (n − 1)/b, . . . , b and

b
E(Si ) = , (10.10c)
n−i+1
b2
Var(Si ) = . (10.10d)
(n − i + 1)2

Equivalently, one may say that


n−i+1
Di∗ = Si ; 1 ≤ i ≤ n; (10.10e)
b
the normalized exponential spacings, are iid exponential variates with parameter b = 1 and
E(Di∗ ) = Var(Di∗ ) = 1; i = 1, . . . , n.
For a general variate X the quantity

Di = (n − i + 1) (X(i) − X(i−1) ) (10.11)

is simply called a normalized spacing. It plays a dominant role in testing hypotheses on the
hazard rate.
Let 0 = X(0) < X(1) < X(2) < . . . < X(n) denote an ordered sample of size n from a life

R ∞ with CDF F (·), F (0 ) = 0, survival function S(·) = 1 − F (·) and finite mean
distribution
µ = o S(x) dx. Tn , the total time spent on test by the n sample units until the failure of the
longest living unit, may be expressed in two different ways:

1. as the sum of all observed life spans


n
X
Tn = X(i) , (10.12a)
i=1

i.e., as the area given by n horizontal beams of length X(i) , each having height equal to 1,
see the upper display in Fig. 10/1, or

2. as the sum of all normalized spacings


n
X n
X
Tn = (n − i + 1) (X(i) − X(i−1) ) = Di , (10.12b)
i=1 i=1

i.e., as the area given by n vertical beams having length Di = X(i) − X(i−1) and corre-
sponding height n−i+1. Such a vertical beam gives the time spent on test of those n−i+1
units having lived in the interval (X(i) − X(i−1) ), see the lower display in Fig. 10/1.

Tn is known as TTT–statistic (total–time–on–test statistic). The successive TTT–statistics are


defined

• according to (10.12a) as
i
X
Ti = X(j) + (n − i) X(i) , (10.12c)
j=1
234 10 Testing Hypotheses on Life Distributions

• according to (10.12b) as
i
X
Ti = (n − j + 1) (X(j) − X(j−1) )
j=1
i
X
= Dj . (10.12d)
j=1

The scaled TTT–statistics, defined on [0, 1], are


Ti
Ti∗ = ; i = 1, 2, . . . , n. (10.12e)
Tn

Figure 10/1: Two ways of expressing the total time spent on test

By plotting Ti∗ on the ordinate against the empirical CDF Fn (x(i) ) = i/n on the abscissa and
then connecting these points by straight lines we obtain a curve within the unit square of the
(x, y)–plane, called TTT–plot. The message of the TTT–plot is easy to understand: The shortest
living 100 (i/n)% of the sample units contribute 100 (Ti∗ )% of the total time lived by all sample
units.2
TTT–statistics were first used by E PSTEIN /S OBEL (1953) to make inference about the exponen-
tial distribution. Starting with a paper by BARLOW /C AMPO (1975), researchers have studied
generalizations of the original TTT–concept that have proven to be very useful in a great number
of applications, e.g., for model identifications, as a basis for the characterization of life distribu-
tion classes, for hypothesis testing, and to determine optimal age replacement intervals.
The empirical quantities defined in (10.12a-e) have theoretical counterparts.
F −1
Z (P )
HF−1 (P ) = S(x) dx, 0 ≤ P ≤ 1, (10.13a)
0
2
The TTT–plot resembles the L ORENZ–curve which is used in economics to describe the inequality in income
distributions. Contrary to the TTT–plot the L ORENZ–curve is never situated above the 45◦ –line.
10.1 Prerequisites: Order Statistics, Spacings, TTT–statistics 235

the counterpart of Ti , is called the TTT–transform of F (·). F −1 (P ) is the percentile xP of order


P. For P = 1 we have
F −1
Z (P )
S(x) dx = E(X) = µ. (10.13b)
0

The counterpart of Ti∗ in (10.12e) is

HF−1 (P ) HF−1 (P )
φF (P ) = = , (10.13c)
HF−1 (1) µ

the scaled TTT–transform of F (·); for examples see Fig. 10/4 and 10/5.
We realize — e.g. Fig 10/2 — that the TTT–plot of a sample from a population with F (·) will
approach the graph of the scaled TTT–transform φF (P ) of F (·) as n, the sample size, increases.
This is so, because

Fn−1
Z (i/n)
i
X  
Ti = (n − j + 1) (X(j) − X(j−1) ) = 1 − Fn (x) dx,
j=1 0

Fn (·) being the empirical CDF and, because by the G LIVENKO —C ANTELLI theorem and the
strong law of large numbers with probability one:

F −1
Z (P )
 
Ti −→ 1 − F (x) dx uniformly in [0, 1]
0

when n → ∞ and i/n → F (·).


It is easy to verify that for an exponential distribution with F (x) = 1 − exp[−(x/b)], F −1 (P ) =
−b ln(1 − P ) and µ = b we have

−b ln(1−P
Z )

HF−1 (P ) = exp[−(u/b)] dx = b P (10.14a)


0

and
bP
φF (P ) = = P, 0 ≤ P ≤ 1. (10.14b)
P

So, the TTT–plot will be a 45◦ –line running from (0, 0) to (1, 1), see Fig. 10/2. The notation
of the TTT–transform as an inverse CDF indicates that the inverse of HF−1 (P ) will be a CDF of
some variate Y with support [0, µ]. In case of (10.14a) the corresponding CDF is

HF (y) = P = y/b, 0 ≤ y ≤ b, (10.14c)

i.e., a uniform distribution on [0, b].


236 10 Testing Hypotheses on Life Distributions

Figure 10/2: TTT–plots based on simulated exponential data (b = 5; n = 10, 50, 100) and the
scaled TTT–transform of the exponential distribution

We state the following properties of the TTT–transform HF−1 (P ) :3

1. There is a one–to–one correspondence between life distributions and their TTT–transforms.

2. If F (·) is strictly increasing or, equivalently, if F −1 (·) is continuous, then HF−1 (·) is con-
tinuous.

3. If F (·) is absolutely continuous and strictly increasing the derivative of HF−1 (P ) is found
to be
dHF−1 (P ) 1 − F (x) 1
= = , (10.15)
dP f (x) h(x)
for almost all P ∈ [0, 1], where h(·) is the hazard rate. The property, that the derivative
of the TTT–transform φF (P ) = HF−1 (P )/µ is proportional to the reciprocal of the hazard
rate, is of utmost importance in finding test statistics for hypotheses on the hazard rate in
later sections.

10.2 Testing Hazard Rate Properties


We are interested in the behavior of the hazard rate and want to know whether it is constant
(Sect. 10.2.1), monotonely increasing or decreasing (Sect. 10.2.2) or has a bathtub shape or an
upside–down bathtub shape (Sect. 10.2.3)

10.2.1 Constancy of the Hazard Rate4


The exponential distribution is the only continuous distribution with constant hazard rate. Thus,
testing
3
For more properties see BARLOW et al. (1972), BARLOW /C AMPO (1975) and BARLOW (1979).
4
Suggested reading for this section: D OKSUM /YANDELL (1984), E PSTEIN (1960), E PSTEIN /S OBEL (1953).
10.2 Testing Hazard Rate Properties 237

H0 : ‘h(x) is constant.’ against HA : ‘h(x) is not constant.’

amounts to decide whether the sample comes from an exponential distribution or not.5 For this
purpose we may revert to one of the numerous existing goodness–of–fit tests, e.g., the tests of
KOLMOGOROV–S MIRNOV, A NDERSON –DARLING , C RAM ÉR –VON M ISES or WATSON.
Here, we will only recommend informal procedures which are based on graphs to be judged
personally. These approaches have neither a test statistic nor a calculable error probability. The
accompanying MATLAB–program HAZARD 09 produces the following graphs.

1) Plot of the transformed empirical CDF


The CDF of the general exponential distribution reads
 
x−a
F (x) = 1 − exp − ; a ∈ R, b > 0, x ≥ a. (10.16a)
b
It follows  
1 x−a
y = ln = . (10.16b)
1 − F (x) b

So, if we plot y = ln 1/[1 − F (x)] against x — what is nothing but the probability
plot of the exponential distribution, see R INNE (2010, pp. 118 ff.) — we get a straight line
with slope 1/b cutting the x–axis at the point a. This suggests the following procedure for
testing departures from exponentiality:
1.1) If we have an uncensored sample of n items, x(1) ≤ x(2) ≤ . . . ≤ x(n) being the
ordered times of failure, we estimate F (x) by
i
Fbn (x(i) ) = ; i = 1, 2, . . . , n; (10.16c)
n
and plot " #
1
ybi = ln (10.16d)
1 − Fbn (x(i) )
against x(i) . The points (x(i) , ybi ) will fluctuate at random around a straight line for an
exponential population.
1.2) If we have a singly censored type–I or type–II sample with altogether k failures ob-
served, k being fixed for type–II and random for type–I,
 we evaluate (10.16c,d) for
 
i = 1, 2, . . . , k and find a good linear fit up to x(k) , ln 1/(1 − k/n) under
exponentiality.
1.3) If we have a multiply censored or randomly censored sample with observations
(y(i) , δi ), δi = 1 for uncensored data and δi = 0 for censored data, we take the
K APLAN /M EIER estimator
i  δi
Y n−j
Sn (x(i) ) =
b ; 1 = 1, 2, . . . , n. (10.16e)
n−j+1
j=1

Then we look for linearity in the plot of


" #
1
ybi = ln ; δi = 1, (10.16f)
Sbn (x(i) )
against the uncensored observations (x(i) , 1).
5
Testing constancy of the hazard rate for a discrete distribution based on the score statistics of the log–likelihood
function is described in S CHIFFMAN (1986).
238 10 Testing Hypotheses on Life Distributions

2) TTT–plot
The scaled TTT–transform of an exponential distribution is φF (P ) = P, i.e., a 45◦ –line
running from (0, 0) to (1, 1) and the scaled TTT–statistic Ti∗ from an exponential sample
will deviate randomly from this 45◦ –line.

2.1) For an uncensored sample of size n we plot Ti∗ on the ordinate against i/n on the
abscissa.
2.2) For a singly censored type–I or type–II sample we plot

Ti∗∗
Ti∗ = ; i = 1, 2, . . . , k; (10.17a)
Tk∗

against i/k with k as the total number of failed items observed and

i
X
Ti∗∗ = (k − j + 1) (x(j) − x(j−1) ); i = 1, 2, . . . , k. (10.17b)
j=1

2.3) For a multiply censored or randomly censored sample we take the K APLAN /M EIER
estimator of (10.16e) and plot

Ti∗∗∗
Ti∗ = ; i = 1, 2, . . . , , k; (10.18a)
Tk∗∗∗

against Fbn (x(i) ) for the uncensored observations (x(i) , 1). k is the total number of
failed items observed and

i b
X Sn (x(i) )+ Sbn (x(i−1) )
Ti∗∗∗ = ; i = 1, . . . , k; x(0) = 0; Sbn (x(0) ) = 1. (10.18b)
2 (x(i) −x(i−1) )
j=1

We mention that the TTT–graphs resulting from (10.17a,b) and (10.18a,b) would lie
below those that would result had the sample been uncensored.

3) We mention another graphical approach suggested by E PSTEIN (1960). This approach


needs a greater sample size. A property of the exponential distribution is that its conditional
probability of failing in (x, x + ∆x), given survival up to x, is independent of x. This
conditional probability is

1
f (x) ∆x b exp − x/b) ∆x ∆x
= = . (10.19)
1 − F (x) exp(−x/b) b

So, if we start with a large number n of items, we divide the x–axis into intervals
(0, ∆), (∆, 2 ∆), (2 ∆, 3 ∆), . . . where ∆ is suitably chosen, and if n1 , n2 , n3 , . . . are
the numbers of items failing in these intervals, then

n1 n2 n3
, , , ...
n n − n1 n − n1 − n2

should fluctuate within reasonable limits about a constant, namely the hazard rate 1/b.
10.2 Testing Hazard Rate Properties 239

Example 10/1: Checking for exponentiality

The following n = 20 observations are randomly censored. They have been generated from a W EIBULL
distribution with scale parameter b = 1 and shape parameter c = 2.5, so the hazard rate is increasing more
than linear.

x(i) 0.2727 0.3877 0.4556 0.4565 0.5271 0.5487 0.5789 0.7846 0.8142 0.9329
δi 1 1 1 1 1 1 1 0 1 1

x(i) 0.9659 1.1217 1.1267 1.2378 1.3001 1.3012 1.3181 1.3803 1.4362 1.9594
δi 1 0 1 0 1 1 1 0 1 1

Fig. 10/3 shows the probability plot on the left where the points significantly deviate from the straight
OLS–fitted line. The same is true for the TTT–plot on the right where the points lie on a concave curve
above the 45◦ –line what is typical for IHR distributions.
Figure 10/3: Graphs for judging exponentiality

10.2.2 Monotonicity of the Hazard Rate6


The topic of this section is to decide whether a distribution is IHR (DHR) or not. A lot of tests
have been designed to test

H0 : ‘F (x) = 1 − exp(−x/b), b unspecified.’


against
HA : ‘F (x) is IHR and not exponential.’
or
H?A : ‘F (x) is DHR and not exponential.’
6
Suggested reading for this section: BARLOW (1968), BARLOW /P ROSCHAN (1965, 1969), B ICKEL (1964),
B ICKEL /D OKSUM (1969), D OKSUM /YANDELL, H ALL /VAN K EILEGOM (2005), H OLLANDER /P ROSCHAN
(1984), K LEFSJ Ö (1983a), P ROSCHAN /P YKE (1967).
240 10 Testing Hypotheses on Life Distributions

We will only present a few of these tests that can be found in the suggested reading for this section.
We first mention a graphical approach that starts from (10.13c) in connection with (10.15):
 
d φF (P ) d 1 −1 1 1
= HF (P ) = . (10.20)
dP dP µ µ h(x)
From (10.20) we see that the graph of the scaled TTT–transform φF (P ) will be

• a straight line for an exponential distribution,


• concave for an IHR distribution and
• convex for a DHR distribution.

As the graph of φF (P ) runs from (0, 0) to (1, 1) it is clear that this graph will have

• a decreasing slope and lie above the 45◦ –line for an IHR distribution,
• an increasing slope and lie below the 45◦ –line for a DHR distribution,

see Fig. 10/4 which shows φF (P ) for three W EIBULL distributions having h(x) = c xc−1 : c =
0.5 gives DHR, c = 1 gives the exponential distribution and c = 2 gives IHR. The graph of the
scaled TTT–statistic Ti∗ approaches that of φF (P ) as n → ∞ and we can reject H0 in favor of HA
(H?A ) when the Ti∗ –graph wholly lies above (below) the 45◦ –line of the exponential distribution
and is concave (convex). In case of an uncensored sample of size n the level of significance will
be α = 1/n.
Figure 10/4: Scaled TTT–transforms of three W EIBULL distributions

A numerical testing approach attached to the curvature of the TTT–transform goes back to
K LFSJ Ö (1983a). This tests needs an uncensored sample! Suppose that the φF (P )–graph is
concave (convex). Since the graph of the scaled TTT–statistic Ti∗ converges to that of φF (P ), it is
reasonable to expect the TTT–plot based on a sample from an IHR (DHR) distribution to behave
concavely (convexly), too, i.e.:
2 Ti∗ − Ti−1
∗ ∗
− Ti+1 > 0; i = 1, . . . , n − 1; T0∗ = 0, Tn∗ = 1. (10.21a)
(<)
10.2 Testing Hazard Rate Properties 241

A possible test statistic against the IHR (DHR) alternative therefore is


n−1
X
2 Ti∗ − Ti−1
∗ ∗

A1 = − Ti+1 , (10.21b)
i=1

and we expect a positive (negative) value of A1 if F (·) is IHR (DHR), but not exponential. We
immediately see that — using the normalized spacings Di of (10.11) — A1 can be written as

A1 = T1∗ + Tn−1 − 1
D1 − Dn
= . (10.21c)
Tn

K LEFSJ Ö gives the asymptotic distribution of A1 under H0 as a L APLACE distribution and re-
marks that — because the numerator (D1 − Dn ) of A1 is independent of D2 , D3 , . . . , Dn−1 —
a test based on A1 is not consistent against the whole IHR (DHR) class. For this reason K LEFSJ Ö
suggests a second test based on the idea that, when φF (P ) is concave (convex) we would not only
expect (10.21a) to hold, but we also expect, for i = 1, 2, . . . , n − 2 and k = 2, 3, . . . , n − j,
that ∗
Tj+k − Tj∗ ν
Tj∗ + > T ∗ for ν = 1, 2, . . . , k − 1 (10.22a)
k/n n (<) j+ν
or

− Tj∗ < ν Tj+k ∗
− Tj∗ .
 
k Tj+ν (10.22b)
(>)

From (10.22b) we can construct the test statistic


n−2
XX n−j X
k−1 h 

− Tj∗ − ν Tj+k

− Tj∗ .

A2 = k Tj+ν (10.22c)
j=0 k=2 ν=1

For F (·) to be IHR (DHR), but not exponential, we expect A2 to be positive (negative). A2 can
be written more comfortably as
n
X Dj
A2 = αj (10.22d)
Tn
j=1

with
1
(n + 1)3 j − 3 (n + 1)2 j 2 + 2 (n + 1) j 3 .

αj = (10.22e)
6
K LEFSJ Ö then considers the slightly modified test statistic
r
7560
A = A2 (10.22f)
n7
which is asymptotically No(0, 1). So, the asymptotic critical values for A are the percentiles τγ
of No(0, 1) :

lower tail upper tail


γ 0.01 0.05 0.10 0.90 0.95 0.99
τγ −2.3263 −1.6449 −1.2816 1.2816 1.6449 2.3263

H0 is rejected in favor of HA (H?A ) at level α when A is greater (smaller) than the critical value.
Exact critical values have also been calculated by K LEFSJ Ö, and an extract of his table is given in
the following Tab. 10/1.
242 10 Testing Hypotheses on Life Distributions
p
Table 10/1: Critical values of A2 7560/n7

upper tail†
n α = 0.10 α = 0.05 α = 0.01
5 2.739 3.396 4.402
10 1.912 2.412 3.270
15 1.680 2.133 2.936
20 1.573 2.002 2.777
25 1.511 1.927 2.683
30 1.470 1.878 2.622
35 1.442 1.843 2.578
40 1.421 1.817 2.546
45 1.405 1.797 2.521
50 1.392 1.782 2.501
55 1.382 1.769 2.485
60 1.373 1.758 2.472
65 1.366 1.749 2.460
70 1.361 1.742 2.450
75 1.358 1.736 2.442
∞ 1.282 1.645 2.326
Source: K LEFSJ Ö (1983a, p. 922)
† For lower tail change the sign!
?
 of H0 versus HA or HA are based on the normalized spacings Di = (n − i + 1) X(i) −
Many test
X(i−1) while other tests only utilize the ranks of the normalized spacings. One of the oldest
tests based on normalized spacings is the cttot–test (cumulative total–time–on–test) of E PSTEIN
(1960). see also BARLOW (1968). This test can be used for uncensored samples as well as for
singly censored type–I and type–II sample.
Pi The test
Pi rests upon the fact that under H0 the total lifes
(= successive TTT–statistics) Ti = j=1 Dj = j=1 (n − j + 1) X(j) − X(j−1) are uniformly
distributed over [0, Tr ] where 1 ≤ r ≤ n is the number of failures in a sample of size n. The test
statistic considered is
r−1
P P j r−1
P
Di (r − j) Dj
j=1 i=1 j=1
Kr = r = r . (10.23a)
P P
Di Di
i=1 i=1

BARLOW (1968) has given the exact percentage points kr, γ (critical values) for this test statistic,
see Tab. 10/2. H0 is rejected in favor of HA (H?A ) at level α when

Kr ≥ kr, 1−α (Kr ≤ kr, α ). (10.23b)

Even for small r we can use a normal approximation of Kr under H0 . The approximate critical
values are
r
r−1 r−1
kr, γ ≈ + τγ , (10.23c)
2 12

with τγ as the γ–percentile of No(0, 1).


10.2 Testing Hazard Rate Properties 243

Table 10/2: Percentage points kr, γ of E PSTEINS’s cttot–test

@ γ 0.01 0.05 0.10 0.90 0.95 0.99


@r@
@@
2 0.01 0.05 0.10 0.90 0.95 0.99
3 0.14 0.32 0.45 1.55 1.68 1.85
4 0.39 0.68 0.84 2.15 2.33 2.61
5 0.69 1.04 1.25 2.75 2.95 3.30
6 1.02 1.43 1.65 3.34 3.57 4.00
7 1.41 1.83 2.08 3.90 4.15 4.60
8 1.77 2.24 2.52 4.49 4.75 5.24
9 2.12 2.65 2.94 5.06 5.35 5.88
10 2.52 3.06 3.38 5.62 5.92 6.45
Source: BARLOW (1968, p. 558)

B ICKEL /D OKSUM (1969) extensively studied tests of H0 versus HA (H?A ) based on the ranks of
the normalized spacings. Their test statistics (see below) are partially motivated by the test of
P ROSCHAN /P YKE (1967) which is as follows. Let
 
 1 if Di ≥ Dj for i, j = 1, 2, . . . , n 
Vij = (10.24a)
 0 otherwise. 

So, this test requires uncensored samples. The test statistic is


n
X
Vn = Vij . (10.24b)
i, j = 1
i<j

H0 is rejected in favor of HA (H?A ) at level α if

Vn ≥ vn, 1−α (Vn ≤ vn, α ) (10.24c)

where the critical value vn, γ is determined so that

Pr(Vn < vn, γ | H0 ) = γ. (10.24d)

The exact value of vn, γ is calculated from

Pn (k)
Pr(Vn = k | H0 ) = , (10.24e)
n!
where Pn (k) is the number of orderings of D1 , D2 , . . . , Dn with exactly k inversions of indices.
An inversion of indices i < j occurs when Di > Dj . Vn is asymptotic normal with

n (n − 1) (2 n + 5) (n − 1) n
E(Vn ) = and Var(Vn ) = . (10.24f)
4 72
So, we have r
n (n − 1) (2 n + 5) (n − 1) n
vn, γ ≈ + τγ . (10.24g)
4 72
This test is justified asfollows: Under H0 the normalized spacings D1 , D2 , . . . , Dn are iid, each
with PDF exp(−x/b) b, so that Pr(Vij = 1) = 0.5 for i, j = 1, 2, . . . , n; i 6= j. However, under
HA we have Pr(Vij = 1) > 0.5 for i, j = 1, 2, . . . , n; i < j. Thus, each Vij and consequently
244 10 Testing Hypotheses on Life Distributions

Vn , tend to be large under HA , so that rejection of H0 in favor of HA occurs for large values of
Vn . We finally mention that the asymptotic relative efficiency of the cttot–test is higher than that
of the P ROSCHAN /P YKE–test.
Let Ri be the rank of Di . Based on these ranks B ICKEL /D OKSUM (1969) suggested a great
number of test statistics, e.g.:
n
X i Ri
W0 = ,
n+1 n+1
i=1
n  
X i Ri
W1 = − ln 1 − ,
n+1 n+1
i=1
n    
X i Ri
W2 = ln 1 − ln 1 − ,
n+1 n+1
i=1
n     
X i Ri
W3 = ln − ln 1 − ln 1 − .
n+1 n + 1)
i=1

Large (small) values of these test statistics are significant for HA (H?A ).
The tests of cttot–test and the tests of K ELFSJ Ö and P ROSCHAN /P YKE have been implemented
in the accompanying MATLAB–program HAZARD 10.

Example 10/2: Testing for IHR and DHR

A first uncensored data set of n = r = 15 from a W EIBUILL distribution with scale parameter b = 10 and
shape parameter c = 0.7 is:

0.0309 0.3641 0.5317 0.9545 1.0119


1.9145 3.5331 3.9321 4.1219 10.9776
13.5405 14.9801 15.2600 15.8278 31.4019

So, this sample comes from a DHR–distribution. The cttot–test gives Kr = 5.58, so that H0 is rejected in
favor of H?A (DHR), the level of significance is about 0.10. The K LEFSJ Ö–test gives A = −1.766, so that
H0 is rejected in favor of H?A with a level of significance of about 0.10. The P ROSCHAN /P YKE–test gives
Vn = 41, so that H0 is rejected in favor of H?A with a level of significance of approximately 0.12. A second
uncensored sample of n = r = 20 from a W EIBULL distribution with b = 20, c = 2 is:

2.5349 2.6149 4.4532 4.8567 5.7627


11.0273 15.5141 16.8996 18.3318 18.5556
18.6857 19.8772 20.7875 22.1906 22.9138
25.4471 26.2949 29.9485 34.3137 40.8301

This sample comes from an IHR–distribution. The cttot–test gives Kr = 12.57 and H0 ist rejected in favor
of HA (IHR) with α < 0.01. The K LEFJ Ö–test with A = 3.260 rejects H0 in favor of HA with α  0.01.
The P ROSCHAN /P YKE–test with Vn = 122 rejects H0 in favor of HA with α ≈ 0.04.

10.2.3 Bathtub Shape of the Hazard Rate7


There are two non–monotone courses of a hazard rate which are of special interest:

• the bathtub shape (DIHR = decreasing–increasing hazard rate) where the hazard rate ini-
tially is decreasing during the so–called ‘infant mortality’ phase, then constant during the
‘useful life’ phase, and finally increasing during the ‘wear–out’ phase, and
7
Suggested reading for the section: A ARSET (1985, 1987), B ERGMAN (1979), K UNITZ (1989).
10.2 Testing Hazard Rate Properties 245

• the inverted bathtub shape (IDHR = increasing–decreasing hazard rate) where the three
phases mentioned above are changed in order.

From the behavior of the scaled TTT–transform φF (P ) of F (·) with respect to the hazard rate as
given in (10.20) we expect for the DIHR case that

• φF (P ) is convex and lies below the 45◦ –line for P being small, i.e., in the leftmost part of
the plot, and
• φF (P ) is concave and lies above the 45◦ –line for P being large, i.e., in the rightmost part
of the plot.

In the IDHR case the order of these phases of φF (P ) is reverted. Fig. 10/5 shows the scaled TTT–
transforms of a lognormal distribution, which is an IDHR distribution, and of a power function
distribution, which is DIHR.

Figure 10/5: Scaled TTT–transforms of lognormal and power function distributions

B ERGMAN (1979) suggests the following procedure for testing

H0 : ‘F (·) is the exponential distribution.’


against
HA : ‘F (·) is DIHR (bathtub–shaped).’

based on the TTT-plot with i/n on the abscissa and Ti∗ on the ordinate. This test asks for uncen-
sored samples! We introduce
   
∗ i
 min i ≥ 1 : Ti ≥
 


Vn = n (10.25a)
 n if T ∗ < i for i = 1, 2, . . . , n − 1; 
 
i
  n  
∗ i
 max i ≤ n − 1 : Ti ≤
 


Kn = n (10.25b)
 0 if T ∗ > i for i = 1, 2, . . . , n − 1; 
 
i
n
Gn = Vn + n − Kn∗ ,
∗ ∗
(10.25c)
246 10 Testing Hypotheses on Life Distributions

and reject H0 in favor of HA when G∗n is large. The motivation for this test is that when the
distribution is DIHR, then we may expect Vn∗ as well as (n − Kn∗ ) to be large. G∗n obviously takes
on integer values in [2, n + 1].
The graph of φF (P ) for an IDHR distribution (inverted bathtub shape) behaves like the reflection
of φF (P ) for a DIHR distribution, see Fig. 10/5, the line of reflection being the 45◦ –line of the
exponential distribution. Thus, we can modify (10.25a-c) to test for IDHR in the following way:
   
i
 min i ≥ 1 : Ti∗ ≤
 

∗∗
Vn = n (10.26a)
 n if T ∗ > i for i = 1, 2, . . . , n − 1; 
 
i
  n  
i
 max i ≤ n − 1 : Ti∗ ≥
 

∗∗
Kn = n (10.26b)
 0 if T ∗ < i for i = 1, 2, . . . , n − 1; 
 
i
n
Gn = Vn + n − Kn∗∗ .
∗∗ ∗∗
(10.26c)

H0 is rejected in favor of H?A : ‘F (·) is IDHR’ when G∗∗


n is large.
A ARSET (1985) has derived the following distribution of G∗n under exponentiality, the so–called
null distribution:

i−1
 `−1 
P (n − 1)! `
×

 

 



 `=1 (`−1)!(n−i+1)!(i−`−1)! n 



 
n−i+1 i−`−1
     
n − i i − ` 1 1

 

 ×

for i = 2, . . . , n − 1


∗ n n ` i − `
Pr(Gn = i) = (10.27a)
 
0 for i = n

 


 


 

n−2
   
2 n + 1

 

for i = n + 1.

 

 
n n
He also tabulated the CCDF of G∗n , see Tab. 10/3, which also allows to give the level of signifi-
cance of this test. We see from Tab. 10/3 that for smaller sample sizes we cannot achieve a small
probability of the first–kind–error. An asymptotic result for Pr(G∗n = n − k) is also given by
A ARSET:
k k+1
lim n Pr(G∗n = n − k) = 2 exp(−k). (10.27b)
n→∞ (k + 1)!
Table 10/3: Pr(G∗n ≥ n − k)
@ n 10 50 75 100 125 150 175 200 250
k@
@@ @
−1 0.42872 0.10348 0.07013 0.05303 0.04263 0.03565 0.03063 0.02685 0.02153
1 0.46698 0.11091 0.07506 0.05673 0.04559 0.03811 0.03273 0.02869 0.02300
2 0.51102 0.11843 0.08001 0.06041 0.04852 0.04055 0.03482 0.03051 0.02446
3 0.56003 0.12564 0.08470 0.06389 0.05129 0.04284 0.03678 0.03222 0.02582
4 0.61577 0.13257 0.08917 0.06718 0.05389 0.04499 0.03862 0.03383 0.02710
5 0.68139 0.13927 0.09343 0.07030 0.05636 0.04703 0.04035 0.03534 0.02830
6 0.76202 0.14579 0.09753 0.07329 0.05871 0.04897 0.04200 0.03677 0.02944
7 0.86578 0.15217 0.10149 0.07617 0.06097 0.05082 0.04358 0.03814 0.03052
8 1.00000 0.15846 0.10535 0.07895 0.06314 0.05261 0.04509 0.03945 0.03156

Source: A ARSET (1985, p. 59)


A ARSET (1987) proposed another test — for uncensored samples — of H0 against HA which is
an adaption to the well–known C RAM ÉR –VON –M ISES goodness–of–fit test. The latter test has
10.2 Testing Hazard Rate Properties 247

the test statistic


Z∞
2
Wn2

=n Fn (x) − F (x) dF (x), (10.28)
−∞

i.e., it rests upon the squared distances between the empirical CDF Fn (x) and the hypothetical
F (x). A ARSET now suggests the test statistic

Z1
Rn = ∆2n (u) du (10.29a)
0

where
 √ h
 n T ∗ − φF (u) for i − 1 < u ≤ i , 1 ≤ i ≤ n 
i 
i
∆n (u) = n n (10.29b)
 0 for u = 0. 

Rn rests upon the squared distances between the scaled TTT–statistics Ti∗ and the TTT–transform
of F (·). Under H0 F (·) is given by the exponential distribution and the test statistic turns into

n  
X 2i − 1 n
Rn = Ti∗ Ti∗ − + . (10.29c)
n 3
i=1

According to the invariance principle Rn has the same asymptotic distribution as the C RAM ÉR –
VON –M ISES statistic Wn2 with the following percentage points:

limn→∞ Pr(Wn2 ≤ w) 0.90 0.95 0.99 0.999


w 0.34730 0.46136 0.74346 1.16786
Source: A NDERSON /DARLING (1952, p. 203)

The reason for a large value of Rn is any great discrepancy between Ti∗ and φF (P ) of the expo-
nential distribution. Thus, we cannot rely on only Rn to decide for DIHR (IDHR). We have to
take into account other evidences like the TTT–plot and the statistics G∗n or G∗∗
n .
The tests of B ERGMAN and A ARSET are implemented in the accompanying MATLAB–program
HAZARD 11.

Example 10/3: Testing for DIHR (bathtub–shaped hazard rate)

We want to know whether the following data set (n = 50) comes from a DIHR distribution or not:

0.2 0.4 2 2 2 2 2 4 6 12
14 22 24 36 36 36 36 36 43 64
72 80 90 92 94 100 110 120 126 126
134 134 134 134 144 150 158 164 164 166
168 168 168 170 170 170 170 170 172 172

Fig. 10/6 clearly indicates DIHR. The B ERGMAN–test gives G∗50 = 45. As — according to Tab. 10/3 —
Pr(G∗50 ≥ 45) = 0.13927 this test has a level of significance for rejecting exponentiality against DIHR.
The A ARSET–test gives R50 = 1.2922 and we can reject exponentiality in favor of DIHR at α  0.001.
248 10 Testing Hypotheses on Life Distributions

Figure 10/6: TTT–plot for a data set coming from a DIHR distribution

When we submit the second data set (n = 20) of Example 10/2 to the B ERGMAN–test and to the A ARSET–
test we find G∗20 = 21, which is insignificant for DIHR as well as for IDHR, and R20 = 0..8750. The latter
statistic alone would be significant (α < 0.01) for deviation from exponentiality, but the TTT–plot in
Fig. 10/7 clearly indicates IHR and neither DIHR nor IDHR.
Figure 10/7: TTT–plot of a data set coming from an IHR distribution

When there is evidence for an IDHR or a DIHR distribution we often want to known that special
lifetime where the hazard rate changes from increasing to decreasing or vice versa. This special
10.3 Testing for Aging Classes 249

lifetime is called change point.8 There exists an extensive literature on turning points of the
hazard rate. Most papers deal with the estimation of the change point, see A NTONIADIS et al.
(2000), B EBBINGTON et al. (2008), J OSHI /M C E ACHERN (1997), L AI et al. (2001), L OADER
(1991), M ÜLLER /WANG (1990a), N GUYEN et al. (1984) or PATRA /D EY (2002). The paper of
H ENDERSON (1990) tests for the existence of a change point.

10.3 Testing for Aging Classes9


In Chapter 2 we have introduced several classes of aging. A detailed description of the classes
IHRA, NBU, NBUE, DMRL and HNBUE along with their duals DHRA, NWU, NWUE, IMRL
and HNWUE is given in Sect. 2.4. Here, we look for testing procedures to decide whether a
sample comes from a distribution belonging to one of these classes. Most of these tests rest upon
the relationships between the TTT–transform and the aging properties as described in K LEFSJ Ö
(1982b). The exponential distribution is the border case of each of these classes and the test is of
H0 : ‘F (·) is exponential’ against HA : ‘F (·) is member of this class without the exponential
distribution’.

10.3.1 IHRA (DHRA) Tests


A distribution F (·) is IHRA (DHRA) — increasing (decreasing) hazard rate average — if
Zx
H(x) 1
= h(u) du
x x
0

is increasing (decreasing). BARLOW /C AMPO (1975) and BARLOW (1979) give the following
theorem relating IHRA (DHRA) to the TTT–transform φF (P ):

‘If F (·) is a life distribution which is IHRA (DHRA) then φF (P ) P is decreasing
(increasing) for 0 < P < 1.’

Thus, since φF (P ) P being decreasing is a necessary (but not sufficient) condition for F (·) to be
IHRA K LEFSJ Ö (1983a) proposesa statistic which investigates whether the analogous property
holds for the TTT–plot. If φF (P ) P is decreasing we expect the corresponding to hold for the
TTT–plot. This means that
Ti∗ T∗
 > j for j > i, i = 1, 2, . . . , n − 1. (10.30a)
i n j n

Multiplication by (i j) n and summing over i and j gives the test statistic
n−1
X n 
X 
B= j Ti∗ − i Tj∗ . (10.30b)
i=1 j=i+1

Significantly large (small) values of B lead to the rejection of

H0 : ‘F (·) is exponential.’
in favor of
HA : ‘F (·) is IHRA, but not exponential.’
(in favor of
H?A : ‘F (·) is DHRA, but not exponential’.)
8
In statistics the notion ‘change point’, generally has another meaning: it is that realization of a variate where
there is a change of the distribution function, see C S ÖRG Ö /H ORV ÁTH (1968) or K RISHNAIAH /M IAO (1988).
9
Suggested reading for this section: H OLLANDER /P ROSCHAN (1984), K LEFSJ Ö (1982b, 1983a).
250 10 Testing Hypotheses on Life Distributions

The test statistic B can be simplified by using the normalized


Pn spacings Di = (n − i + 1) (X(i) −
X(i−1) ) and the total–time–on–test statistic Tn = i=1 Di :

n
1 X
B= bj Dj (10.30c)
Tn
j=1

where
1 3
2 j − 3 j 2 + j (1 − 3 n − 3 n2 ) + 2 n + 3 n2 + n3 .

bj = (10.30d)
6
K LEFSJ Ö (1983a) has found the exact null distribution of the slightly modified test statistic
r
∗ 210
B =B . (10.30e)
n5

The upper 0.01, 0.05, 0.10 percentiles are in Tab. 10/4. The asymptotic distribution of B ∗ un-
der H0 is No(0, 1). Example 10/4 further down shows the working of this test which has been
implemented in the accompanying MATLAB–program HAZARD 12.

Table 10/4: Critical values of B ∗ = B


p
210/n5

upper tail†
n α = 0.10 α = 0.05 α = 0.01
5 1.703 2.257 3.227
10 1.508 2.003 2.951
15 1.441 1.909 2.815
20 1.406 1.858 2.736
25 1.385 1.827 2.684
30 1.371 1.804 2.646
35 1.360 1.788 2.618
40 1.352 1.755 2.595
45 1.346 1.765 2.577
50 1.341 1.757 2.562
55 1.337 1.750 2.550
60 1.333 1.744 2.538
65 1.330 1.739 2.528
70 1.328 1.734 2.520
75 1.325 1.730 2.512
∞ 1.282 1.645 2.326
Source: K LEFSJ Ö (1983a, p. 923)
† For lower tail change the sign!

There are other tests for IHRA (DHRA). BARLOW (1968) derives a likelihood ratio test statistic,
lower percentiles of which are intended for testing IHRA against DHRA. He also suggests to
take the cttot–test of E PSTEIN (1960) — see (10.23a-c) — for testing H0 versus HA or H?A .
BARLOW /C AMPO (1975) proposed to take the number of crossings between the the TTT–graph
and the 45◦ –line as a test statistic and to reject HA when this number is small. B ERGMAN (1977)
gives the exact and the asymptotic distribution of this test statistic under H0 together with a table
of its CCDF.
10.3 Testing for Aging Classes 251

10.3.2 NBU (NWU) Tests


A life distribution F (·) is NBU (new better than used) if

S(x + y) ≤ S(x) S(y) ∀ x, y ≥ 0,

and NWU (new worse than used) if the reversed inequality holds. The following NBU/NWU test
of H OLLANDER /P ROSCHAN (1972) is motivated by considering the parameter
Z∞ Z∞
 
γ = S(x) S(y) − S(x + y) dF (x) dF (y)
0 0
Z∞ Z∞
1
= − S(x + y) dF (y)
4
0 0
1
= − ∆(F ) (10.31a)
4
with
Z∞ Z∞
∆(F ) = S(x + y) dF (y) = Pr(X1 > X2 + X3 ) (10.31b)
0 0

where X1 , X2 , X3 are iid according to F (·). γ is a measure


 of deviation of F (·) from exponen-
tiality towards NBU (NWU) alternatives. ∆(F ) = 1 4 when F (·) is exponential.
The classical non–parametric approach of replacing F (·) by the empirical CDF Fn (·) suggests
rejecting

H0 : ‘F (·) is exponential.’
in favor of
HA : ‘F (·) is NBU, but not exponential.’
(in favor of
H?A : ‘F (·) is NWU, but not exponential.’)
RR 
if 1 − Fn (x + y) dFn (x) dFn (y) is too small (large). H OLLANDER /P ROSCHAN (1972)
found it more convenient to reject H0 for small (large) values of the asymptotically equivalent
statistic
2 X


J= Ψ X a1 , X a2 + X a3 (10.31c)
n (n − 1) (n − 2)
where  
 1 for a > b 
Ψ(a, b) = (10.31d)
 0 for a ≤ b 

and ∗ as the sum over all n (n − 1) (n − 2) triples (a1 , a2 , a3 ) of three integers such that
P
1 ≤ a1 ≤ n, a1 6= a2 , a2 6= a3 and a2 < a3 . Defining
n (n − 1) (n − 2) X


Mn = J= Ψ X a1 , X a2 + X a3 (10.31e)
2
and denoting X(1) ≤ X(2) ≤ . . . ≤ X(n) as the ordered X’s and since i ≤ max(j, k) implies
Ψ X(i) , X(j) + X(k) = 0 we can rewrite Mn as
X 
Mn = Ψ X(i) , X(j) + X(k) (10.31f)
i>j>k
252 10 Testing Hypotheses on Life Distributions

with Mn = 0, 1, 2, . . . , n (n − 1) (n − 2) 6. The following Tab. 10/5, based on the critical
values of Mn to be found as table 4.1 in H OLLANDER /P ROSCHAN (1972) — gives critical values
jn, γ of the test statistic J in (10.31c). H0 has to be rejected in favor of HA (H?A ) at level α = 0.05
or 0.10 if J ≤ jn, α (J ≥ jn,1−α ). The normal approximation treats
 r
1 432 n
J∗ = J − (10.31g)
4 5

as No(0, 1)–distributed.
Table 10/5: Critical values jn, γ of J

@ γ 0.05 0.10 0.90 0.95


@n@
@@
8 0.1488 0.1786 0.3095 0.3155
9 0.1587 0.1865 0.3016 0.3095
10 0.1667 0.1917 0.2972 0.3056
11 0.1737 0.1960 0.2949 0.3030
12 0.1772 0.1984 0.2924 0.3000
13 0.1830 0.2028 0.2902 0.2972
14 0.1864 0.2042 0.2885 0.2949
15 0.1897 0.2066 0.2872 0.2938
16 0.1923 0.2083 0.2857 0.2923
17 0.1946 0.2103 0.2843 0.2907
18 0.1961 0.2116 0.2831 0.2896
19 0.1985 0.2129 0.2820 0.2883
20 0.2003 0.2140 0.2810 0.2871
25 0.2068 0.2184 0.2780 0.2835
30 0.2113 0.2220 0.2750 0.2800
35 0.2147 0.2238 0.2729 0.2783
40 0.2171 0.2260 0.2716 0.2763
45 0.2194 0.2275 0.2702 0.2751
50 0.2214 0.2286 0.2691 0.2736

An application of this test — worked out with the MATLAB–program HAZARD 12 — is found
further down in Example 10/4.

10.3.3 DMRL (IMRL) Tests


A continuous life distribution F (·) with S(·) = 1 − F (·) is DMRL (IMRL) if its mean residual
life function
Z∞
1
µ(x) = S(u) du
S(x)
x

is decreasing (increasing). K LEFSJ Ö (1982b) proved the following theorem connecting this aging
property with the TTT–transform φF (P ):

‘A life distribution is DMRL (IMRL) if and only if

1 − φF (P )
Q(P ) =
1−P
is decreasing (increasing) for 0 ≤ P < 1.’
10.3 Testing for Aging Classes 253

Based on this theorem and on the same idea as in Sect. 10.3.1 we expect that — if F (·) is DMRL
(IMRL) — the following holds:
1 − Tj∗ 1 − Ti∗
 <  for j > i and i = 0, 1, 2, . . . , n − 1. (10.32a)
1 − j n (>) 1 − i n

After multiplication by (n − i) (n − j) n and summation we get the test statistic
n−1
X n
X
(n − j) (1 − Ti∗ ) − (n − i) (1 − Tj∗ ) .
 
K= (10.32b)
i=0 j=i+1

If F (·) is DMRL (IMRL), but not exponential, we expect K to be positive (negative). It is


to be noted that the test statistic K is proportional to the test statistic proposed by H OLLAN -
DER /P ROSCHEN (1975) for the same test. Their test statistic is
n
" #
1 1 X
Wn∗ = ci X(i) (10.33a)
X n4 i=1
P 
where X = Xi n and
ci = (4 3) i3 − 4 n i2 + 3 n2 i − 0.5 n3 + 0.5 n2 − 0.5 i2 + i 6.

(10.33b)
Significantly large (small) values of Wn∗ suggest DMRL (IMRL) alternatives to exponentiality.
Under H0 : ‘F (·) is exponential’ the statistic

W = Wn∗ 210 n
is asymptotically No(0, 1). For smaller sample sizes H OLLANDER /P ROSCHAN (1975) give up-
per and lower critical values found by Monte Carlo simulation, see Tab. 10/6. The simulation
evidently accounts for the slight disturbances in the monotonicity of the tabulated percentiles.
Table 10/6: Critical values wn, γ for W
@ γ 0.01 0.05 0.10 0.90 0.95 0.99
@n@
@@
8 −3.029 −2.095 −1.565 1.415 1.703 2.162
9 −2.956 −2.017 −1.532 1.385 1.659 2.131
10 −2.946 −2.038 −1.511 1.365 1.651 2.155
11 −2.887 −1.983 −1.506 1.355 1.657 2.125
12 −2.838 −1.959 −1.496 1.339 1.642 2.145
13 −2.828 −1.950 −1.482 1.349 1.639 2.143
14 −2.826 −1.918 −1.474 1.347 1.641 2.145
15 −2.822 −1.910 −1.466 1.317 1.615 2.143
16 −2.736 −1.900 −1.458 1.323 1.630 2.131
17 −2.788 −1.905 −1.447 1.314 1.625 2.153
18 −2.752 −1.881 −1.444 1.303 1.623 2.127
19 −2.669 −1.889 −1.429 1.309 1.620 2.141
20 −2.730 −1.865 −1.434 1.299 1.617 2.137
25 −2.694 −1.834 −1.398 1.315 1.627 2.141
30 −2.678 −1.805 −1.382 1.288 1.609 2.129
35 −2.570 −1.767 −1.365 1.286 1.606 2.153
40 −2.606 −1.775 −1.358 1.282 1.605 2.171
45 −2.564 −1.745 −1.361 1.273 1.606 2.154
50 −2.529 −1.762 −1.351 1.293 1.609 2.159
∞ −2.326 −1.645 −1.282 1.282 1.645 2.325
Source: H OLLANDER /P ROSCHAN (1975, p. 589)
An application of this test — worked out with the MATLAB–program HAZARD 12 — is found
further down in Example 10/4.
254 10 Testing Hypotheses on Life Distributions

10.3.4 NBUE (NWUE) Tests


A continuous life distribution F (·) with S(·) = 1 − F (·) is called NBUE (new better than used
in expectation) [NWUE (new worse than used in expectation)] if

µ(0) ≥ µ(x) [µ(0) ≤ µ(x)], ∀ x > 0,

or equivalently
Z∞ h Z∞ i
S(u) du < µ(0) S(x), S(u) du > µ(0)S(x) , ∀ x > 0.
x x

The following theorem of K LEFSJ Ö (1983a) relates these properties to the TTT–transform
φF (P ) :

‘A life distribution F (·) is NBUE (NWUE) if and only if φF (P ) ≥ P for


(≤)
0 ≤ P ≤ 1.’

This theorem leads to the


 following test statistic based
 on the differences between the scaled

TTT–statistics Tj = Tj Tn and the empirical CDF j n :
n  
X j
C = Tj∗ −
n
j=1
n−1
X n−1
= Tj∗ − , as Tn∗ = 1. (10.34a)
2
j=1

If F (·) is NBUE (NWUE), but not exponential, C is expected to be positive (negative). We


remark that
n−1
P n−1
P
n−1 T j Tj
X
∗ j=1 j=1
V = Tj = = n (10.34b)
Tn P
j=1 Dj
j=1

is the cumulative TTT–statistic used in (10.23a) for IHR(DHR)–testing. Thus, we have


n−1
C=V − . (10.34c)
2

H OLLANDER /P ROSCHAN (1975) proposed another test statistic for testing

H0 : ‘F (·) is exponential.’
in favor of
HA : ‘F (·) is NBUE, but not exponential.’
(in favor of
H?A : ‘F (·) is NWUE, but not exponential.’)

Their statistic rests upon the weighted difference between µ(0) and µ(x) and reads
n
 
P 3n + 1
− 2 j X(j)
∗ j=1 2
K = n . (10.34d)
P
n Dj
j=1
10.3 Testing for Aging Classes 255

We note that
n−1
C=V − = n K ∗. (10.34e)
2
Hence, C, V and K ∗ are equivalent test statistics which can be traced back to the test statistic
n−1
P
Tj
j=1
Kn =
Tn
of (10.23a). We have
n−1
Kn = n K ∗ + . (10.34f)
2
Significantly large (small) values of K ∗ suggest NBUE (NWUE). We do not need to furnish
critical values of K ∗ because — based on (10.34f) — we can use the percentage points kn, γ of
Tab. 10/2 in the following way: “Reject H0 in favor of HA (H?A ) if
 
∗ 1 n−1 ∗ 1 n−1
K ≥ kn, 1−α − K ≤ kn, α − .”
n 2n n 2n

Under H0 we have K ∗ n → No(0, 1 12) so that for large n we reject H0 in favor of HA (H?A )


with level α if r r !
1 1
K ∗ ≥ τ1−α K ∗ ≤ τα .
12 n 12 n
This test has been implemented in the MATLAB–program HAZARD 12. For an application see
Example 10/4.

10.3.5 HNBUE (HNWUE) Tests


A continuous distribution with F (·) and S(·) = 1 − F (·) is said to be HNBUE (harmonic new
better than used in expectation) [HNWUE (harmonic new worse than used in expectation)] if
Z∞  
x
S(u) du ≤ µ exp − for x ≥ 0.
(≥) µ
x

K LEFSJ Ö (1983b) proposed several test statistics for testing

H0 : ‘F (·) is exponential.’
in favor of
HA : ‘F (·) is HNBUE, but not exponential.’
(in favor of
H?A : ‘F (·) is HNWUE, but not exponential.’)

He recommends
n
"  #
j 2 1
X  
Q1 = 3 1− − X(j) Tn (10.35a)
n 3
j=1

as it has hight P ITMAN efficiency and good power.


He derived the following exact null distribution of Q1 :
n n
X Y αj − x
Pr(Q1 > x) = ϑj (10.35b)
αj − αi
j=1 i = 1
i 6= j
256 10 Testing Hypotheses on Life Distributions

with
j 2 1−j n
  
1
αj = − + 1 − + (10.35c)
3 n 2n

and  
 1 for α < x 
ϑj = (10.35d)
 0 otherwise. 

The limiting distribution of


r
45 n
Q = Q1 (10.35e)
4
is No(0, 1). Exact critical values qn,γ resulting from (10.35b) are in the following table:
q 
Table 10/7: Critical values qn,γ of Q1 45 n 4

@ γ 0.01 0.05 0.10 0.90 0.95 0.99


@n@
@@
10 −2.087 −1.656 −1.397 0.929 1.309 2.032
20 −2.191 −1.678 −1.383 1.060 1.449 2.195
∞ −2.326 −1.645 −1.282 1.282 1.645 2.326
Source: K LEFSJ Ö (1983b, p. 71)
H0 is rejected in favor of HA H?A with level α if


q   q  
Q1 45 n 4 > qn,1−α Q1 45 n 4 < qn,α .

This test is implemented in the MATLAB–program HAZARD 12.

Example 10/4: Testing for aging classes

We take the data set of Example 8/8: survival times (in days) of the n = 43 patients suffering from
granulocytic leukemia. x = 0 is taken as the patient’s date of diagnosis and begin of treatment. We want
to test of this sample comes from a distribution belonging to one or the other of the aging classes discussed
in Sect. 10.3.
The TTT–plot of Fig. 10/8 clearly indicates an IHR distribution. This is also confirmed by the three IHR
tests of Sect. 10.2.2 with E PSTEINS’s cttot–test statistic Kr = 24.11, K LEFSJ Ö’s test statistic A = 1.079
and P ROSCHAN /P YKE’s test statistic Vn = 512.
For the test of the aging classes we get the following results:

• B ∗ = 8.1546 → Exponentiality is rejected in favor of IHRA with α  0.01.

• J = 0.2248 → Exponentiality is rejected in favor of NBU with α ≈ 0.10.

• W = 1.3881 → Exponentiality is rejected in favor of DMRL with α < 0.10.

• K ∗ = 0.0723 → Exponentiality is rejected in favor of NBUE with α ≈ 0.05.

• Q = 1.4356 → Exponentiality is rejected in favor of HNBUE with α ≈ 0.07.

These results are in accordance with the chain of implications in Fig. 2/7. As we have evidence for IHR
we expect to have evidence for IHRA, NBU, DMRL, NBUE and HNBUE.
10.3 Testing for Aging Classes 257

Figure 10/8: TTT–plot for the 43 granulocytic leukemia patients


Part III

Appendices
Bibliography
A-A-A-A-A
A ALEN , O. O. (1978): Nonparametric inference for a family of counting processes; Ann. Statist. 6,
701–726
A ARSET, M. V. (1985): The null distribution for a test of constant versus bathtub–failure rate; Scand. Jour.
Statist. 12, 55–62
A ARSET, M. V. (1987): How to identify a bathtub hazard rate; IEEE Trans. Rel. 36, 106–108
A L –H USSAINI , E. K. / S ULTAN , K. S. (2001): Reliability and hazard based on finite mixture models; in
BALAKRISHNAN /R AO (eds.): Handbook of Statistics, Vol. 20 (Advances in Reliability), 139–183, North–
Holland, Amsterdam etc.
A NDERSON , J. / S ENTHILSELVAN , A. (1980): Smooth estimates for the hazard function; Jour. Roy.
Statist. Soc. B 42, 322–327
A NDERSON , T. W. / DARLING , D. A. (1952): Asymptotic theory of certain goodness–of–fit criteria based
on stochastic processes; Ann. Math. Statist. 23, 193–213
A NTONIADIS , A. / G IJBELS , I. / M C G IBBON , B. (2000): Non–parametric estimation of the location of
a change point in an otherwise smooth hazard rate function under random censoring; Scand. Jour. Statist.
27, 501–519
A NTONIADIS , A. / G R ÉGOIRE , G. / M C K EAGUE , I. W. (1994): Wavelet methods for curve estimation;
Jour. Amer. Statist. Ass. 89, 1340–1353
A NTONIADIS , A. / G R ÉGOIRE , G. / NASON , G. (1999): Density and hazard rate estimation for right–
censored data using wavelet methods; Jour. Roy. Statist. Soc. B 61, 63–84
A RNOLD , B. C. / Z AHEDI , H. (1988): On multivariate mean remaining life functions; Jour. Multivar.
Analysis 25, 1–9
A SADIA , M. (1999): Multivariate distributions characterized by a relationship between mean residual life
and hazard rate; Metrika 49, 121–126
B-B-B-B-B
BAGKAVOS , D. / PATIL , P. (2009): Variable bandwidth for nonparametric hazard rate estimation; Comm.
Statist. — Theory & Methods 38, 1055–1078
BAIN , L. J. / E NGELHARDT, M. (1991): Statistical Analysis of Reliability and Life–Testing Models —
Theory and Methods, 2nd ed.; Marcel Dekker, New York etc.
BALAKRISHNAN , N. / C OHEN , A. C. (1991): Order Statistics and Inference: Estimation Methods; Aca-
demic Press, San Diego
BARLOW, R. E. (1968): Likelihood ratio tests for restricted families of probability distributions; Ann.
Math. Statist. 39, 547–566
BARLOW, R. E. (1979): Geometry of the total time on test transforms; Naval Res. Log. Quart. 26 393–402
BARLOW, R. E. / BARTHOLOMEW, D. J. / B REMNER , J. M. / B RUNK , H. D. (1972): Statistical Inference
under Order Restrictions; Wiley, New York etc.
BARLOW, R. E. / C AMPO , R. (1975): Total time on test processes and applications to failure analysis; in
BARLOW /F USSEL /S INGPURWALLA (eds.): Reliability and Fault Tree Analysis, SIAM, Philadelphia
BARLOW, R. E. / M ARSHALL , A. W. (1964): Bounds for distributions with monotone hazard rate I, II;
Ann. Math. Statist. 35, 1234–1257, 1258–1274
BARLOW, R. E. / M ARSHALL , A. W. / P ROSCHAN , F. (1963): Properties of probability distributions with
monotone hazard rate; Ann. Math. Statist. 34, 375–389
BARLOW, R.E. / P ROSCHAN , F. (1965): Mathematical Theory of Reliability; Wiley, New York etc.
BARLOW, R.E. / P ROSCHAN , F. (1969): A note on tests for monotone failure rate based on incomplete
data; Ann. Math. Statist. 40, 595–600
262 Bibliography

BARLOW, R. E. / P ROSCHAN , F. (1975): Statistical Theory of Reliability and Life Testing; Holt, Rinehart
and Winston, New York etc.
BASU , A. P. (1971): Bivariate failure rate; Jour. Amer. Statist. Ass. 66, 103–104
B EBBINGTON , M. / L AI , C. D. / Z ITIKIS , R. (2008): Estimating the turning point of a bathtub–shaped
failure distribution; Jour. Statist. Plan. Inf. 138, 1157–1166
B ERGMAN , B. (1977): Crossings in the total–time–on–test plot; Scand. Jour. Statist. 4, 171–177
B ERGMAN , B. (1979): On age replacement and the total time on test concept; Scand. Jour. Statist. 6,
161–168
B ERGMAN , B. / K LEFSJ Ö , B. (1984): The total time on test concept and its use in reliability theory;
Operations Research 32, 596–606
B ÉZANDRY, D. H. / B ONNEY, G. E. / G ANNOUN , A. (2005): Consistent estimation of the density and
hazard rate functions for censored data via the wavelet method; Statistics and Probability Letters 74,
366–372
B ICKEL , P. J. (1969): Tests for monotone failure rate II; Ann. Math. Statist. 40, 1250–1260
B ICKEL , P. J. / D OKSUM , K. A. (1969): Tests for monotone failure rate based on normalized spacings;
Ann. Math. Statist. 40, 1216–1235
B IRNBAUM , Z. W. / S AUNDERS , S. C. (1968): A probabilistic interpretation of miner’s rule; SIAM —
Jour. Appl. Math. 16, 637–652
B IRNBAUM , Z. W. / S AUNDERS , S. C. (1969): A new formula of life distribution; SIAM — Jour. Appl.
Prob. 6, 319–317
B LOCK , H. W. / S AVITS , T. H. (1982): The class of MIFRA lifetimes and its relation to other classes;
Nav. Res. Log. Quart. 29, 55–61
B LOCK , H. W. / S AVITS , T. H. (1984): Multivariate nonparametric classes in reliability; in K RISH -
NAIAH /R AO (eds.): Handbook of Statistics, Vol. 7 (Quality Control and Reliability), 121–129, North–
Holland, Amsterdam etc.
B OUEZMARNI , T. / ROMBOUTS , J. V. K. (2008): Density and hazard rate estimation for censored and
α–mixing date using gamma kernels; Jour. Nonpar. Statist. 20, 627–643
B OWMAN , A. (1984): An alternative method of cross–validation for the smoothing of density estimates;
Biometrika 71, 353–366
B RINDLEY, E. C. J R . / T HOMPSON , W. A. J R (1972): Dependence and aging aspects of multivariate
survival; Jour. Amer. Statist. Ass. 67, 822–830
B RYSON , M. C. / S IDDIQUI , M. M. (1969): Some criteria for ageing; Jour. Amer. Statist. Ass. 64,
1472–1483
B URR , I. W. (1942): Cumulative frequency functions; Ann. Math. Statist. 13, 215–232
C-C-C-C-C
C ACOULOS , T. (1966): Estimation of a multivariate density; Ann. Inst. Statist. Math. 18, 178–189
C HANDRA , N. K. / ROY, D. (2001): Some results on reversed hazard rate; Probability in the Engineering
and Informational Sciences 15, 95–102
C HANDRA , N. K. / ROY, D. (2005): Classification of distribution based on reversed hazard rate; Calcutta
Statist. Ass. Bull. 56, 231–249
C HENG , K. F. (1985): Tests for the equality of failure rates; Biometrika 72, 211–215
C HENG , P. E. (1987): A nearest neighbour hazard rate estimator for randomly censored data; Comm.
Statist. — Theory & Methods 16, 613–625
C OHEN , A. C. (1991): Truncated and Censored Samples; Marcel Dekker, New York etc.
C OX , D. R. (1972): Regression models and life tables; Jour. Roy. Statist. Soc. B 34, 187–220
C OX , D. R. / OAKES , D. (1984): Analysis of Survival Data; Chapman & Hall, London
Bibliography 263

C ROWDER , M. J. / K IMBER , A. C. / S MITH , R. L. / S WEETING , T. J. (1991): Statistical Analysis of


Reliability Data, Chapman & Hall, London etc.
C S ÖRG Ö , M. / H ORV ÁTH . L. (1988): Nonparametric methods for change point problems; in K RISH -
HAIH /R AO (eds.): Handbook of Statistics, Vol. 7 (Quality Control and Reliability), 403–425, North–
Holland, Amsterdam etc.
D-D-D-D-D
DAVID , H. A. / NAGARAJA , H. N. (2003): Order Statistics; Wiley, New York etc.
D ETTE , H. / G EFELLER , O. (1995): The impact of different definitions of nearest neighbour distances
for censored data on nearest neighbour kernel estimators of the hazard rate; Jour. Nonpar. Statistics 4,
271–282
D HILLON , B. S. (1979): A hazard rate model; IEEE Trans. Rel. 28, 150
D HILLON , B. S. (1981): Life distributions; IEEE Trans. Rel. 30, 457–459
D IEHL , S. / S TUTE , W. (1988): Kernel density and hazard function estimator in the presence of censoring;
Jour. Multivar. Analysis 25, 293–310
D OKSUM , K. A. / YANDELL , B. S. (1984): Testing for exponentiality; in K RISHNAIAH /S EN (eds.):
Handbook of Statistics, Vol. 4 (Nonparametric Methods), 579–611, North–Holland, Amsterdam etc.
D UPUY, J.–F. / G NEYOU , K. E. (2011): A wavelet estimator of the intensity function with censored data;
Quality Technology and Quantitative Management 8, 401–410
E-E-E-E-E
E BRAHIMI , N. (1986): Classes of discrete decreasing and increasing mean residual life distributions; IEEE
Trans. Rel. 35, 403–405
E FRON , B. (1967): The two sample problem with censored data; Proceedings of the Fifth Berkeley Sym-
posium on Mathematical Statistics and Probability, Vol. IV, Univ. of California Press, Berkeley, 831–853

E LANDT–J OHNSON , R. C. / J OHNSON , N. L. (1980): Survival Models and Data Analysis; Wiley, New
York etc.
E PANECHNIKOV, V. A. (1969): Non–parametric estimation of a multivariate probability density; Theory
Probab. Appl. 14, 153–158
E PSTEIN , B. (1960): Testing for the validity of the assumption that the underlying distribution of life is
exponential; Technometrics 2, 83–101, 167–183
E PSTEIN , B. / S OBEL , M. (1953): Life testing; Jour. Amer. Statist. Ass. 48, 486–502
F-F-F-F-F
FAILING , K. (1984): Neue Methoden zur nicht–parametrischen Schätzung von Dichte- und Hazardfunk-
tionen bei zensierten Daten mit Anwendungen in klinischen Studien; in K ÖHLER /TAUTU /WAGNER (eds.):
Der Beitrag der Informationsverarbeitung zum Fortschritt der Medizin, 92–99, Springer, Berlin etc.
G-G-G-G-G
G ASSER , T. / M ÜLLER , H.–G. (1979): Kernel estimation for regression functions; in G ASSER /ROSENBLATT
(eds.): Smoothing Techniques for Curve Estimation, Lecture Notes in Mathematics, 23–68, Springer,
Berlin/Heidelberg
G ASSER , T. / M ÜLLER , H.–G. / M AMMITZSCH , V. (1985): Kernels for nonparametric curve esti-
mation; Jour. Roy. Statist. Soc. B 47, 238–252
G EFELLER , O. / D ETTE , H. (1991): A comparative study on hazard function estimators employ-
ing nearest neighbour distances as bandwidths; in A DLASSNIG /G RABNER /B ENGTSON /H ANSEN
(eds.): Medical Information Europe 1991, 963–987, Springer, Berlin etc.
G EFELLER , O. / D ETTE , H. (1992): Nearest neighbour kernel estimation of the hazard function
from censored data; Jour. Statist. Comp. Simul. 43, 93–101
264 Bibliography

G EFELLER , O. / M ICHELS , P. (1992): A review on smoothing methods for the estimation of the
hazard rate based on kernel functions; in D ODGE /W HITTAKER (eds.): Proceedings of the 10th
Symposium on Computational Statistics, 459–464
G EHAN , E. A. (1969): Estimating survival functions from the life table; Jour. Chronic Disease
21, 629–644
G LASER , R. E. (1980): Bathtub and related failure rate characterizations; Jour. Amer. Statist.
Ass. 75, 667–672
G REENWOOD , M. (1926): The natural duration of cancer; Report on Public Health and Medical
Subjects; Her Majesty’s Stationary Office, London, Vol. 33, 1–26
G RENANDER ; U. (1956): On the theory of mortality measurement, Part II; Skand. Aktuarietidskr.
39, 125–153
G RIFFITH , W. S. (1982): Representations of distributions having monotone or bathtub–shaped
failure rate; IEEE Trans. Rel. 31, 95–96
G ROSS , A. J. / C LARK , V. A. (1975): Survival Distributions: Reliability Applications in the
Biomedical Sciences; Wiley, New York etc.
G U , C. (1996): Penalized likelihood hazard estimation: A general procedure; Statistica Sinica 6,
861–876
G UESS , F. M. / PARK , D. H. (1988): Modeling discrete bathtub and upside–down bathtub mean
residual life functions; IEEE Trans. Rel. 37, 545–549
G UESS , F. M. / P ROSCHAN , F. (1988): Mean residual life: Theory and application; in K RISH -
NAIAH /R AO (eds.): Handbook of Statistics, Vol. 7 (Quality Control and Reliability), 215–224,
North–Holland, Amsterdam etc.
G UPTA , P. L. / G UPTA , R. C. (1997): On the multivariate normal hazard; Jour. Multivar. Analy-
sis 62, 64–73
G UPTA , P. L. / G UPTA , R. C. / T RIPATHI , R. C. (1997): On the monotonic properties of discrete
failure rates; Jour. Statist. Plan. Inf. 65, 255–268
G UPTA , R. C. (1981): Moments in terms of the mean residual life function; IEEE Trans. Rel. 30,
450–451
G UPTA , R. D. / NANDA , A. K. (2001): Some results on reversed hazard rate ordering; Comm.
Statist. — Theory & Methods 30, 2447–2457
H-H-H-H-H
H ÄRDLE , W. / K ERKYACHARIAN , G. / P ICARD , D. / T SYBAKOV, A. (1998): Wavelets, Ap-
proximations and Statistical Applications; Springer, Berlin etc.
H ALL , P. / VAN K EILEGOM , I. (2005): Testing for monotone increasing hazard rate; Ann. Statist.
33, 1109–1137
H ARRIS , R. (1970): A multivariate definition for increasing hazard rate distribution functions;
Ann. Math. Statist. 41, 713–717
H ENDERSON , R. (1990): A problem with the likelihood ratio test for a change–point hazard rate
model; Biometrika 77, 835–843
H ERD , G. R. (1960): Estimation of reliability from incomplete data; Proc. of the Sixth National
Symposium on Reliability and Quality Control, 202–217
H ESS , K. R. / S ERACHITOPOL , D. M. / B ROWN , B. W. (1999): Hazard function estimators: A
comparative simulation study; Statistics in Medicine 18, 3075–3088
Bibliography 265

H JORTH , U. (1980): A reliability distribution with increasing, decreasing, constant, and bath–
tub–shaped failure rates; Technometrics 22, 99–107
H OLLANDER , M. / PARK , D. H. / P ROSCHAN , F. (1986): A class of life distributions for aging;
Jour. Amer. Statist. Ass. 81, 91–95
H OLLANDER , M. / P ROSCHAN , F. (1972): Testing whether new is better than used; Ann. Math.
Statist. 43, 1136–1146
H OLLANDER , M. / P ROSCHAN , F. (1975): Tests for mean residual life; Biometrika 62, 585–593
H OLLANDER , M. / P ROSCHAN , F. (1984): Nonparametric concepts and methods in reliability; in
K RISHNAIAH /S EN (eds.): Handbook of Statistics, Vol. 4 (Nonparametric Methods), 613–655,
Elsevier, Amsterdam etc.
I-I-I-I-I
I ZENMAN , A. I. (1991): Recent developments in nonparametric curve estimation; Jour. Amer.
Statist. Ass. 86, 205–224
J-J-J-J-J
JARJOURA , D. (1988): Smoothing hazard rates with cubic splines; Comm. Statist. — Simul. &
Comp. 17, 377–392
J OHNSON , L. G. (1964): The Statistical Treatment of Fatigue Experiments; Amsterdam
J OHNSON , N. L. / KOTZ , S. (1975): A vector multivariate hazard rate; Jour. Multivar. Analysis
5, 53–66, 498
J OHNSON , N. L. / KOTZ , S. / BALAKRISHNAN , N. (1994): Continuous Univariate Distribu-
tions, Vol. I, 2nd ed.; Wiley, New York etc.
J OHNSON , N. L. / KOTZ , S. / BALAKRISHNAN , N. (1995): Continuous Univariate Distribu-
tions, Vol. II, 2nd ed.; Wiley, New York etc.
J OHNSON , N. L. / KOTZ , S. / K EMP, A. W. (1992): Univariate Discrete Distributions; Wiley,
New York etc.
J OSHI , S. N. / M C E ACHERN , S. N. (1997): Isotonic maximum likelihood estimation for the
change point of a hazard rate; Sankhya A 59, 392–407
K-K-K-K-K
K ALBFLEISCH , J. D. / P RENTICE , R. L. (1980): The Statistical Analysis of Failure Time Data;
Wiley, New York etc.
K APLAN , E. L. / M EIER , P. (1958): Non–parametric estimation from incomplete data; Jour.
Amer. Statist. Ass. 53, 457–481
K ARUNAMUNI , R. J. / A LBERTS , T. (2005): On boundary correction in kernel density estima-
tion; www.cims.nyu.edu alberts/pub/SM2005.pdf
K EMP, A. W. (2004): Classes of discrete lifetime distributions; Comm. Statist. — Theory &
Methods 33, 3069–3093
K IEFER , J. / W OLFOWITZ , J. (1956): Consistency of the maximum likelihood estimator in the
presence of infinitely many identical parameters; Ann. Math. Statist. 27, 897–906
K IMBALL , A. W. (1960): Estimation of mortality intensities in animal experiments; Biometrics
16, 505–521
K LEFSJ Ö , B. (1982a): The HNBUE and HNWUE classes of life distributions; Nav. Res. Log.
Quart. 2, 331–344
266 Bibliography

K LEFSJ Ö , B. (1982b): On ageing properties and total time on test transforms; Scand. Jour.
Statist. 9, 37–41
K LEFSJ Ö , B. (1983a): Some tests against ageing based on the total time on test transform; Comm.
Statist. — Theory & Methods 12, 907–927
K LEFSJ Ö , B. (1983b): Testing exponentiality against HNBUE; Scand. Jour. Statist. 10, 65–75
K LEIN , J. P. / M OESCHBERGER , M. L. (1997): Survival Analysis; Springer, New York etc.
KOTZ , S. / BALAKRISHNAN , N. / J OHNSON , N. L. (2000): Continuous Multivariate Distribu-
tions, Vol. I; Wiley, New York etc.
K RISHNAIAH , P. R. / M IAO , B. Q. (1988): Review about estimation of change points; in
K RISHNAIAH /R AO (eds.): Handbook of Statistics, Vol. 7 (Quality Control and Reliability), 375–
402, North–Holland, Amsterdam etc.
K UNDU , C. / NANDA , A. K. / H U , T. (2009): A note on the reversed hazard rate of order
statistics and record values; Jour. Statist. Plan. Inf. 139, 1257–1265
K UNITZ , H. (1989): A new class of bathtub–shaped hazard rates and its application in a compar-
ison of two test–statistics; IEEE Trans. Rel. 38, 351–354
L-L-L-L-L
L AGAKOS , S. W. / BARRAJ , L. M. / D E G RUTTOLA , V. (1988): Nonparametric analysis of
truncated survival data with application to AIDS; Biometrika 75, 515–523
L AI , C. D. (2013): Issues concerning constructions of discrete lifetime models; Quality Technol-
ogy and Quality Management 10, 251—262
L AI , C. D. / X IE , M. / M URTHY, D.N. P. (2001): Bathtub shaped failure rate life distributions; in
BALAKRISHNAN /R AO (eds.): Handbook of Statistics, Vol. 20, (Advances in Reliability), 69–104,
Elsevier, Amsterdam
L ANGBERG , N. A. / L EON , R. V. / LYNCH , J. / P ROSCHAN , F. (1980): Extreme points of the
class of discrete decreasing failure rate life distributions; Math. Oper. Res. 5, 35–42
L ANGBERG , N. A. / L EON , R. V. / LYNCH , J. / P ROSCHAN , F. (1982): Extreme points of
the class of discrete decreasing failure rate average life distributions; TIMS — Studies in the
Management Sciences 19, 297–304
L AWLESS , J. F. (1982): Statistical Models and Methods for Lifetime Data; Wiley, New York etc.
L EEMIS , L. M. (1986): Lifetime distribution identities; IEEE Trans. Rel. 35, 170–174
L EEMIS , L. M. (1995): Reliability: Probabilistic Models and Statistical Methods; Prentice–Hall,
New Jersey
L I , L. (2002): Hazard rate estimation for censored data by wavelet methods; Comm. Statist. —
Theory & Methods 31, 943–960
L IU , Y. C. / VAN RYZIN , J. (1985): A histogram estimator of the hazard rate with censored data;
Ann. Statist. 13, 592—605
L O , S. H. / M ACK , Y. P. / WANG , J.–L. (1989): Density and hazard rate estimation for censored
data via strong representation of Kaplan/Meier estimate; Prob. Theo. Rel. Fields 80, 461–473
L OADER , C. R. (1991): Inference for a hazard rate change point; Biometrika 78, 749–757
L ONDON , D. (1988): Survival Models and Their Estimation, 2nd ed.; ACTEX Publications, Win-
stedt & Avon, Conn.
M-M-M-M-M
M A , C. (2000): A note on the multivariate normal hazard; Jour. Multivar. Analysis 73, 282–283
Bibliography 267

M ARSHALL , A. W. / O LKIN , I. (2007): Life Distributions — Structure of Nonparametric, Semi-


parametric and Parametric Families; Springer, New York
M ARSHALL , A. W. / P ROSCHAN , F. (1965): Maximum likelihood estimation for distributions
with monotone failure rate; Ann. Math. Statist. 36, 69–77
M ARSHALL , A. W. / P ROSCHAN , F. (1972): Classes of distributions applicable in replacement
with renewal implications; in Proceedings of the Sixth Berkeley Symposium on Mathematical
Statistics and Probability, Vol. 3, 395–415; Univ. of California Press, Berkeley
M C G ILL , J. I. (1992): The multivariate hazard gradient and moments of the truncated multinor-
mal distribution; Comm. Statist. — Theory & Methods 21, 3053–3060
M EEKER , W. Q. / E SCOBAR , L. A. (1998): Statistical Methods for Reliability Data; Wiley, New
York etc.
M EILIJSON , I. 1972): Limiting properties of the mean residual lifetime function; Ann. Math.
Statist. 43, 354–357
M ESSER , K. / G OLDSTEIN , L. (1993): A new class of kernels for nonparametric curve estima-
tion; Ann. Statist. 21, 179–195
M I , J. (1995): Bathtub failure rates and upside–down bathtub mean residual life; IEEE Trans.
Rel. 44, 388–391
M ILLER , R. G. J R . (1981): Survival Analysis; Wiley, New York etc.
M ILLS , J. P. (1926): Tables of the ratio area to bounding ordinate, for any portion of normal
curve; Biometrika 18, 395–400
M ORRISON , D. G. (1978): On linearly increasing mean residual lifetimes; Jour. Appl. Prob. 15,
617–620
M ÜLLER , H.–G. (1991): Smooth optimum kernel estimators near endpoints; Biometrika 78,
521–530
M ÜLLER , H.–G. (1993): On the boundary kernel method for non–parametric curve estimation
near endpoints; Scand. Jour. Statist. 20, 313–328
M ÜLLER , H.–G. / WANG , J.–L. (1990a): Nonparametric analysis of changes in hazard rates for
censored data: An alternative to change–point models; Biometrika 77, 305–314
M ÜLLER , H.–G. / WANG , J.–L. (1990b): Locally adaptive hazard smoothing; Probability The-
ory and Related Fields 85, 823–838
M ÜLLER , H.–G. / WANG , J.–L. (1994): Hazard rate estimation under random censoring with
varying kernels and bandwidths; Biometrics 50, 61–76
M ÜLLER , H.–G. / WANG , J.–L. / C APRA , W. B. (1997): From life tables to hazard rates: the
transformation approach; Biometrika 84, 881–892
M UTH , E. J. (1974): Moments expressed in terms of the hazard function and applications; Mi-
croelectronics and Reliability 13, 469–471
M UTH , E. J. (1980): Memory as a property of probability functions; IEEE Trans. Rel. 29,
160–164
M YKYTYN , S. W. / S ANTNER , T. J. (1981): Maximum likelihood estimation of the survival
function based on censored data under hazard rate assumptions; Comm. Statist. — Theory &
Methods 10, 1369–1387
N-N-N-N-N
NAKAGAWA , T. / O SAKI , S. (1975): The discrete Weibull distribution; IEEE Trans. Rel. 24,
300–301
268 Bibliography

NANDA , A. K. / G UPTA , R. D. (2001): Some properties of reversed hazard rate function; Statist.
Methods 3, 108–124
NANDA , A. K. / G UPTA , R. D. (2004): Some properties of reversed hazard rate function —
Corrections; Statist. Methods 6, 90–91
NAVARRO , J. / RUIZ , J. M. (2004): A characterization of the multivariate normal distribution by
using the hazard gradient; Ann. Inst. Statist. Math. 56, 361–367
N ELSON , W. (1969): Hazard plotting for incomplete data; Journal of Quality Technology 1,
27–52
N ELSON , W. (1970): Hazard plotting methods for analysis of life data with different failure
modes; Journal of Quality Technology 2, 126–149
N ELSON , W. (1972): Theory and applications of hazard plotting for censored failure data; Tech-
nometrics 14, 945–966
N ELSON , W. (1982): Applied Life Data Analysis; Wiley, New York etc.
N ELSON , W. (1984): Accelerated Testing — Statistical Models, Plans, and Data Analysis; Wiley,
New York etc.
N GUYEN , H. T. / ROGERS , G. S. / WALKER , E. A. (1984): Estimation in change–point hazard
rate models; Biometrika 71, 299–304
N IELSEN , J. P. (2003): Variable bandwidth kernel hazard estimators; Jour. Nonpar. Statist. 15,
355–376
O-O-O-O-O
OAKES , D. / DASU , T. (1980): A note on residual life; Biometrika 71, 409–410
O’S ULLIVAN , F. (1988a): Nonparametric estimation of relative risk using splines and cross–
validation; SIAM — Jour. Sci. Statist. Comp. 9, 531–542
O’S ULLIVAN , F. (1988b): Fast computation of fully automated log–density and log–hazard esti-
mators; SIAM — Jour. Sci. Statist. Comp. 9, 363–379
P-P-P-P-P
PADGETT, W. J. (1988): Nonparametric estimation of density and hazard rate function when sam-
ples are censored; in K RISHNAIAH /R AO (eds.): Handbook of Statistics, Vol. 7 (Quality Control
and Reliability), 313–333, North–Holland, Amsterdam etc.
PADGETT, W. J. / S PURRIER , J. D. (1985): On discrete failure models; IEEE Trans. Rel. 34,
253–256
PADGETT, W. J. / W EI , L. J. (1980): Maximum likelihood estimation of a distribution with
increasing failure rate based on censored observations; Biometrika 67, 470–474
PARZEN , E. (1962): On estimation of a probability density function and mode; Ann. Math.
Statist. 33, 1065–1076
PATEL , J. (1973): A catalogue of failure distributions; Comm. Statist. 1, 281–284
PATIL , P. (1997): Nonparametric hazard rate estimation by orthogonal wavelet methods; Jour.
Statist. Plan. Inf. 60, 153–168
PATRA , K. / D EY, D. K. (2002): A general class of change point and change curve models for
lifetime data; Ann. Inst. Statist. Math. 54, 517–530
P RAKASA R AO , B. L. S. (1970): Estimation for distributions with monotone failure rate; Ann.
Math. Statist. 41, 507–529
Bibliography 269

P RAKASA R AO , B. L. S. (1983): Nonparametric Functional Estimation; Academic Press, New


York
P ROSCHAN , F. / P YKE , R. (1967): Tests for monotone failure rate; Proc. Fifth Berkeley Symp.
on Math. Statist. and Prob. Vol. 3, 293–312
P URI , P. S. / RUBIN , H. (1974): On a characterization of the family of distributions with constant
multivariate failure rates; The Annals of Probability 2, 738–740
P YKE , R. (1965): Spacings; Jour. Roy. Statist. Soc. B 27, 395–449
Q-Q-Q-Q-Q
Q IU , P. / S HENG , J. (2008): A two–stage procedure for comparing hazard rate functions; Jour.
Roy. Statist. Soc. B 70, 191–208
R-R-R-R-R
R AMLAU –H ANSEN , H. (1983): Smoothing counting process intensities by means of kernel func-
tions; Ann. Statist. 11, 453–466
R ICE , J. / ROSENBLATT, M. (1976): Estimation of log survivor function and hazard rate;
Sankhya A 38, 60–78
R INNE , H. (2009): The Weibull Distribution — A Handbook; CRC Press, Boca Raton
R INNE , H . (2010): Location–scale Distributions — Linear Estimation and Probability Plotting
Using MATLAB; httpp://geb.uni-giessen.de/volltexte/2010/7607
ROLSKI , T. (1975): Mean residual life; Bull. Int. Statist. Inst. 46, 266–270
ROSENBERG , P. S. (1995): Hazard function estimation using B–splines; Biometrics 51, 874–887
ROSENBLATT, M. (1956): Remarks on some nonparametric estimates of a density function; Ann.
Math. Statist. 27, 832–837
ROY, D. / G UPTA , R. P. (1992): Classifications of discrete lives; Microelectronics and Reliability
32, 1459–1473
RUDEMO , M. (1982): Empirical choice of histograms and kernel density estimators; Scand. Jour.
Statist. 9, 65–78
S-S-S-S-S
S ACHER , G. A. (1956): On the statistical nature of mortality with special reference to chronic
radiation mortality; Radiology 67, 250–257
S ALHA , R. (w.y.): Adaptive kernel estimation of the hazard rate function; Research Magazine
17, 71–81; Islamic University of Gaza, Department of Mathematics
S ALVIA , A. A. (1985): Reliability application of the alpha distribution; IEEE Trans. Rel. 34,
251–252
S ALVIA , A. A. / B OLLINGER , R. C. (1982): On discrete hazard rates; IEEE Trans. Rel. 31,
458–459
S ANKARAN , P. G. / G LEEJA , V. L. / JACOB , T. M. (2007): Nonparametric estimation of re-
versed hazard rate; Calcutta Statist. Ass. Bull. 59, 55–68
S ARDA , P. / V IEU , P. (1990): Smoothing parameter selections in hazard rate estimation; Statist.
Prob. Letters 11, 429–434
S CH ÄFER , H. (1985): A note on data–adaptive kernel estimation of the hazard rate and density
function in the random censorship situation; Ann. Statist. 13, 818–826
S CH ÄFER , H. (1986): Local convergence of empirical measures in the random censorship situa-
tion with application to density and rate estimation; Ann. Statist. 14, 1240–1245
270 Bibliography

S CHIFFMAN , D. A. (1986): The score statistic in constancy testing for a discrete hazard rate;
IEEE Trans. Rel. 35, 590–594
S EAL , H. L. (1954): The estimation of mortality and other decrement probabilities; Skand. Ak-
tuarietidskr. 37, 137–162
S HAKED , M. / S HANTHIKUMAR , J. G. (1987): Multivariate hazard rates and stochastic ordering,
Adv. Appl. Probab. 19, 123–137
S HAKED , M. / S HANTHIKUMAR , J. G. (1989): Multivariate conditional hazard rate and the
MIFRA and MIFR properties, Jour. Appl. Probab. 25, 150–168
S HAKED , M. / S HANTHIKUMAR , J. G. / VALDEZ –T ORRES , J. B. (1995): Discrete hazard rate
functions; Comput. Oper. Res. 22, 391–402
S HYROCK , H. S. / S IEGEL , J. S. (1976): The Methods and Materials of Demography; Academic
Press, San Diego
S ILVA , R. B. / BARRETO –S OUZA , W. / C ORSEIRA , G. M. (2010): A new distribuiton with
decreasing, increasing and upside–down bathtub failure rate; Comp. Statist. & Data Analysis 54,
935-944
S INGPURWALLA , N. D. (2006): The hazard potential: Introduction and overview; Jour. Amer.
Statist. Ass. 101, 1705–1717
S INGPURWALLA , N. D. / W ONG , M. Y. (1983): Estimation of the failure rate — A survey of
nonparametric methods (Part I – Non–Bayesian methods); Comm. Statist. — Theory & Methods
12, 559–588
S MITH , P. J. (2002): Analysis of Failure and Survival Data; Chapman & Hall / CRC Press, Boca
Raton
S PIEGELMAN , M. (1968): Introduction to Demography; rev. ed., Harvard University Press,
Cambridge/Mass.
S TACY, E. W. (1962): A generalization of the gamma distribution; Ann. Math. Statist. 33,
1187–1192
S TATISTISCHES B UNDESAMT (ed.) 2004): Perioden–Sterbetafeln für Deutschland (1871/1881
bis 2001/2003); Wiesbaden
S TEIN , W. E. / DATTERO , R. (1984): A new discrete Weibull distribution; IEEE Trans. Rel. 33,
196–197
S WARTZ , G. B. (1973): The mean residual life function; IEEE Trans. Rel. 22, 108–109
S WEET, A. L. (1990): On the hazard rate of the lognormal distribution; IEEE Trans. Rel. 39,
325–328
T-T-T-T-T
TANNER , M. A. (1983): A note on the variabel kernel estimator of the hazard function from
randomly censored data; Ann. Statist. 11, 994–998
TANNER , M. A. (1984): Data–based nonparametric hazard estimation (Algorithm AS 202); Jour.
Roy. Statist. Soc. (Applied Statistics) C 33, 248–258
TANNER , M. A. / W ONG , W. H. (1983): The estimation of the hazard rate function from ran-
domly censored data by the kernel method; Ann. Statist. 11, 989–993
Bibliography 271

TANNER , M. A. / W ONG , W. H. (1984): Data–based nonparametric estimation of the hazard


function with applications to model diagnostics and exploratory analysis; Jour. Amer. Statist.
Ass. 79, 174–182
T EISSIER , G. (1934): Recherches sur le viellissement et sur les lois de la mortalité; Annales de
Physiologie et de Physico–chemie Biologique, X, 237–284
T ERRELL , G. R. (1990): Maximal smoothing principle in density estimation; Jour. Amer. Statist.
Ass. 85, 470–477
U-U-U-U-U
U ZUNOGULLARI , Ü. / WANG , J.–L. (1992): A comparison of hazard rate estimators for left
truncated and right censored data; Biometrika 79, 297–310
W-W-W-W-W
WAND , M. P. / J ONES , M. C. (1995): Kernel Smoothing; Chapman & Hall, London
WATSON , G. S. / L EADBETTER , M. R. (1964a): Hazard analysis I; Biometrika 51, 175–184
WATSON , G. S. / L EADBETTER , M. R. (1964b): Hazard analysis II; Sankhya A 26, 101–116
WATSON , G. S. / W ELLS , W. T. (1961): On the possibility of improving the mean useful life of
items by eliminating those with short lives; Technometrics 3, 281–298
W OODROOFE , M. (1970): On choosing a delta–sequence; Ann. Math. Statist. 41, 1665–1671
W U , S. / W ELLS , M. T. (2003): Nonparametric estimation of hazard functions by wavelet meth-
ods; Jour. Nonpar. Statist. 15, 187–203
X-X-X-X-X
X EKALAKI , E. (1983a): A property of the Yule distribution and its applications; Comm. Statist.
— Theory & Methods 12, 1181–1189
X EKALAKI , E. (1983b): Hazard functions and life distributions in discrete time; Comm. Statist.
— Theory & Methods 12, 2503–2509
Y-Y-Y-Y-Y
YANDELL , B. S. (1983): Nonparametric inference for rates with censored survival data; Ann.
Statist. 11, 1119–1135
Author Index
Aalen, O. O., 149f., 261 Csörgö, M., 249, 263
Aarset, M. V., 244, 246f., 261
Al–Hussaini, E. K., 28, 261 Darling, D. A., 247, 261
Alberts, T., 196, 265 Dasu, T., 17, 268
Anderson, J., 208f., 261 Dattero, R., 59, 270
Anderson, T. W., 247, 261 David, H. A., 227, 263
Antoniadis, A., 208f., 249, 261 De Gruttola, V., 266
Arnold, B. C., 64, 261 Dette, H., 205ff., 263
Asadia, M., 60, 261 Dey, D. K., 249, 268
Dhillon, B. S., 78, 83, 86, 100, 263
Bagkavos, D., 205, 261 Diehl, S., 198, 263
Bain, L. J., 3, 261 Doksum, K. A., 236, 239, 243, 262, 263
Balakrishnan, N., 62, 111, 227, 261, 265, Dupuy, J.–F., 208f., 263
266
Barlow, R. E., 14, 21, 30, 68ff., 75f., 88, 167, Ebrahimi, N., 84, 263
227, 234, 236, 239, 242f., 249f., Efron, B., 148, 263
261 Elandt–Johnson, R. C., 149, 152, 155, 159,
Barraj, L. M., 266 263
Barreto–Souza, W., 270 Engelhardt, M., 3, 261
Bartholomew, D. J., 261 Epanechnikov, V. A., 189, 263
Basu, A. P., 61, 262 Epstein, B., 234, 236, 238, 242, 250, 263
Bebbington, M., 249, 262 Escobar, L. A., 30, 267
Bergman, B., 227, 244f., 262 Failing, K., 206, 263
Bézandry, D. H., 208f., 262
Bickel, P. J., 239, 243, 262 Gannoun, A., 208, 262
Birnbaum, Z. W., 98, 262 Gasser, T., 196, 263
Block, H. W., 67, 262 Gefeller, O., 205ff., 263
Bollinger, R. C., 49, 53, 57f., 128, 269 Gehan, E. A., 155, 161, 264
Bonney, G. E., 208, 262 Gijbels, I., 261
Bouezmarni, T., 196, 262 Glaser, R. E., 78ff., 264
Bowman, A., 192, 262 Gleeja, V. L., 14, 269
Bremner, J. M., 261 Gneyou, K. E., 208f., 263
Brindley, E. C. Jr., 61, 262 Goldstein, L., 196, 267
Brown, B. W., 264 Greenwood, M., 146, 264
Brunk, H. D., 261 Grenander, U., 162, 167f., 264
Bryson, M. C., 84f., 88f., 203, 262 Griffith, W. S., 78, 264
Burr, I. W., 98, 262 Gross, A. J., 264
Grégoire, G., 208, 261
Cacoulos, T., 177, 262 Gu, C., 208f., 264
Campo, R., 227, 234, 236, 249f., 261 Guess, F. M., 17, 22, 84, 88, 264
Capra, W. B., 267 Gupta, P. L., 62, 72ff., 264
Chandra, N. K., 14, 262 Gupta, R. C., 22, 62, 264
Cheng, K. F., 227, 262 Gupta, R. D., 14, 264, 268
Cheng, P. E., 205f., 262 Gupta, R. P., 51, 269
Clark, V. A., 264
Cohen, A. C., 40, 227, 261, 262 Härdle, W., 210, 264
Cordeiro, G. M., 270 Hall, P., 239, 264
Cox, D. R., 9, 38, 51, 60, 64, 142, 145f., 262 Harris, R., 61, 67, 264
Crowder, M. J., 30, 263 Henderson, R., 249, 264
274 Author Index

Herd, G. R., 143f., 264 McEachern, S. N., 249, 265


Hess, K. R., 205, 208, 264 McGibbon, B., 261
Hjorth, U., 78, 82, 265 McGill, J. I., 60, 62, 267
Hollander, M., 67, 91, 94, 239, 249, 251f., McKeague, I. W., 208, 261
253, 254, 265 Meeker, W. Q., 30, 267
Horváth, L, 249, 263 Meier, P., 142f., 176, 265
Hu, T., 266 Meilijson, I., 20, 267
Messer, K., 196, 267
Izenman, A. I., 177, 265 Mi, J., 86, 267
Jacob, T. M., 14, 269 Miao, B. Q., 249, 266
Jarjoura, D., 208f., 265 Michels, P., 205, 264
Johnson, L. G., 143f., 265 Miller, R. G. Jr., 148ff., 155, 267
Johnson, N. L., 60ff., 111, 149, 152, 155, Mills, J. P., 10, 267
159, 263, 265, 266 Moeschberger, M. L., 142, 203, 266
Jones, M. C., 177, 271 Morrison, D. G., 21, 267
Joshi, S. N., 249, 265 Müller, H.–G., 155, 162, 196f., 202, 205,
249, 263, 267
Kalbfleisch, J. D., 52, 265 Murthy, D. N. P., 266
Kaplan, E. L., 142f., 176, 265 Muth, E. J., 12, 17, 267
Karunamuni, R. J., 196, 265 Mykytyn, S. W., 167, 267
Kemp, A. W., 49, 51, 72, 84, 86, 88, 265
Kerkyacharian, G., 264 Nagaraja, H. N., 227, 263
Kiefer, J., 167, 265 Nakagawa, T., 58, 267
Kimball, A. W., 155, 166, 265 Nanda, A. K., 14, 264, 266
Kimber, A. C., 263 Nason, G., 208, 261
Klefsjö, B., 84, 88f., 94, 227, 239ff., 249, Navarro, J., 62, 268
252, 254f., 262, 265 Nelson, W., 38, 149f., 211, 215, 268
Klein, J. P., 142, 203, 266 Nguyen, H. T., 249, 268
Kotz, S., 60ff., 111, 265, 266 Nielsen, J. P., 205, 268
Krishnaiah, P. R., 249, 266 Oakes, D., 9, 17, 38, 51, 142, 145f., 262, 268
Kundu, C., 14, 266 Olkin, I., 63, 67, 267
Kunitz, H., 244, 266 Osaki, S., 58, 267
Lagakos, S. W., 14, 266 O’Sullivan, F., 208f., 268
Lai, C. D., 49, 78, 80, 84, 249, 262, 266 Padgett, W. J., 57f., 128, 167, 174, 268
Langberg, N. A., 72, 75, 266 Park, D. H., 84, 88, 91, 264, 265
Lawless, J. F., 52, 142, 149, 266 Parzen, E., 177, 268
Leadbetter, M. R., 177, 193f., 198ff., 271 Patel, J., 268
Leemis, L. M., 3, 29f., 35, 52, 142, 147, 266 Patil, P., 205, 208f., 261, 268
Leon, R. V., 266 Patra, K., 249, 268
Li, L., 208f., 266 Picard, D., 264
Liu, Y. C., 206, 266 Prakasa Rao, B. L. S., 167f., 177, 268
Lo, S. H., 193f., 198, 266 Prentice, R. L., 52, 265
Loader, C. R., 249, 266 Proschan, F., 17, 21f., 30, 67ff., 75f., 84, 88,
London, D., 155, 266 91, 94, 167f., 175, 239, 243, 249,
Lynch, J., 266 251f., 253f., 261, 264–267
Ma, C., 62, 266 Puri, P. S., 61, 269
Mack, Y. P., 266 Pyke, R., 227, 231, 239, 243, 269
Mammitzsch, V., 196, 263 Qiu, P., 227, 269
Marshall, A. W., 63, 67f., 88, 167f., 175,
261, 267 Ramlau–Hansen, H., 198f., 202, 269
Author Index 275

Rice, J., 193, 198f., 269 Tripathi, R. C., 264


Rinne, H., 3, 14, 41, 57f., 64, 83, 137f., 211, Tsybakov, A., 264
215f., 237, 269
Rogers, G. S., 268 Uzunogullari, Ü., 198, 271
Rolski, T., 93, 269
Valdez–Torres, J. B., 270
Rombouts, J. V. K., 196, 262
Van Keilegom, I., 239, 264
Rosenberg, P. S., 208f., 269
Van Ryzin, J., 206, 266
Rosenblatt, M., 177, 193, 198f., 269
Vieu, P., 193, 269
Roy, D., 14, 51, 262, 269
Rubin, H., 61, 269 Walker, E. A., 268
Rudemo, M., 192, 269 Wand, M. P., 177, 271
Ruiz, J. M., 62, 268 Wang, J.–L., 196f., 198, 202, 205, 249,
266f., 271
Sacher, G. A., 163, 269
Watson, G. S., 23, 84, 177, 193f., 198ff., 271
Salha, R., 195, 269
Wei, L. J., 167, 174, 268
Salvia, A. A., 49, 53, 57f., 96, 128, 269
Wells, M. T., 208f., 271
Sankaran, P. G., 14, 269
Wells, W. T., 23, 84, 271
Santner, T. J., 167, 267
Wolfowitz, J., 167, 265
Sarda, P., 193, 269
Wong, M. Y., 155, 163, 169, 270
Saunders, S. C., 98, 262
Wong, W. H., 198, 201, 205f., 270
Savits, T. H., 67, 262
Woodroofe, M., 193, 271
Schäfer, H., 205ff., 269
Wu, S., 208f., 271
Schiffman, D. A., 237, 270
Seal, H. L., 166, 270 Xekalaki, E., 132, 271
Senthilselvan, A., 208f., 261 Xie, M., 266
Serachitopol, D. M., 264
Shaked, M., 49, 60, 65, 67, 72, 270 Yandell, B. S., 198, 236, 263, 271
Shanthikumar, J. G., 67, 270
Sheng, J., 227, 269 Zahedi, H., 64, 261
Shyrock, H. S., 152, 270 Zitikis, R., 262
Siddiqui, M. M., 84f., 88f., 203, 262
Siegel, J. S., 152, 270
Silva, R. B., 78, 84, 103, 270
Singpurwalla, N. D., 16, 155, 163, 169, 270
Smith, P. J., 3, 30, 142, 149, 155, 270
Smith, R. L., 263
Sobel, M., 234, 236, 263
Spiegelman, M., 152, 270
Spurrier, J. D., 57f., 128, 268
Stacy, E. W., 83, 104, 270
Statistisches Bundesamt, 155, 270
Stein, W. E., 59, 270
Stute, W., 198, 263
Sultan, K. S., 28, 261
Swartz, G. B., 17, 19, 270
Sweet, A. L., 113, 270
Sweeting, T. J., 263

Tanner, M. A., 198, 201, 205ff., 270


Teissier, G., 271
Terrell, G. R., 192, 271
Thompson, W. A. Jr., 61, 262
Subject Index
accelerated life model, 38f. type–II, 139
accumulated hazard function, 51 central death rate, 162
actuarial estimator, 160 change point, 249
age–specific death rate, see hazard rate characteristic function, 4
aging, 67 CHR, see cumulative hazard rate
aging factor complementary cumulative distribution func-
decreasing, 88 tion, see survival function
increasing, 88 convolution, 72, 94
AMISE, see asymptotic mean integrated covariate, 38
squared error cttot–test, 242
asymptotic mean integrated squared error, cumulative distribution function, see distri-
184 bution function
cumulative hazard rate, 16ff.
bandwidth, 178
direct estimator, 150f.
global, 199
discrete, 51
optimal, 202
indirect estimator, 149f.
optimal, 184
natural estimator, see cumulative hazard
selection, 191ff.
rate, indirect estimator
cross validation methods, 192f.
cumulative rate, 3
maximal smoothing, 192
curve of deaths, 160
oversmoothing principle, see band-
width selection, maximal smooth-
DAF, see aging factor, decreasing
ing
density function, see failure density
plug–in methods, 192f.
density quantile function, 4
rule of thumb, 192
DHR, see hazard rate, decreasing
bandwidth selector, 191
DHRA, see hazard rate, average, decreasing
baseline model, 38
DIHR, see hazard rate, bathtub–shaped
BED, see distribution, exponential bivariate
DIHRA, see hazard rate, decreasing interval–
beta function, 96
average
incomplete, 96
DIMRL, see mean residual life, bathtub–
incomplete ratio, 97
shaped
block–diagram, 31
D IRAC delta–function, 194
CCDF, see survival function D IRAC function, 57
CDF, see distribution function distribution
censored data, 137 alpha, 96
censoring, 41 arcsine, 96f., 218
from above, see censoring on the left generalized, 96
from below, see censoring on the right B ERNOULLI, 58
hyper–, see censoring, multiple beta, 97, 186
multiple, 139 binomial, 74, 122, 127, 158
non–informative, 137 positive, 122
on both sides, 139 B IRNBAUM –S AUNDERS, 98
on the left, 139 B URR, 98
on the right, 139 C AUCHY, 99, 218
progressice, see censoring, multiple χ, 99, 116
random, 137f. χ2 , 28, 99
single, 139 cosine
type–I, 139 ordinary, 100, 218
278 Subject Index

raised, 100, 218 G UMBEL, 107, 219, 224f.


degenerate, 58 half–C AUCHY, 21, 107, 220
D HILLON–I, 83, 86f., 100 half–logistic, 108, 220
D HILLON–II, 101 half–normal, 108, 220
E RLANG, see distribution, gamma H JORTH, 82, 108
exponential, 13, 25f., 29, 40, 48, 57, 82, hyperbolic secant, 109, 220
102, 105, 111, 122, 213, 219, 226 hypergeometric, 123, 127
bilateral, see distribution, L APLACE positive, 124
bivariate, 62ff. inverse G AUSSIAN, 80, 109
double, see distribution, G UMBEL, L APLACE, 101, 110, 221
see distribution, L APLACE linear hazard rate, 8, 18, 105, 111, 116
exponentiated, 102 generalized, 105
generalized, see distribution, expo- linear mean residual life, 21
nentiated exponential log–gamma, 111
order statistics, 230 log–L APLACE, 112
reflected, 15, 116, 219, 226 log–logistic, 30, 112
spacing, 232 log–normal, 80, 88
two–tailed, see distribution, L APLACE lower threshold, 113, 225
extreme value of type I upper threshold, 113, 225
for the maximum, see distribution,
log–W EIBULL, 83, 101, 113, 213, 219,
G UMBEL
224, 225
for the minimum, see distribution,
logarithmic, 124
log–W EIBULL
right–truncated, 124
extreme value of type II
logarithmic series, 74, see distribution,
for the maximum, see distribution, in-
logarithmic
verse W EIBULL
logistic, 111, 216, 221
for the minimum, see distribution,
generalized, 105
F R ÉCHET
L OMAX, 82, 114
extreme value of type III
generalized, 106
for the maximum, see distribution, re-
matching, 125
flected W EIBULL
for the minimum, see distribution, M AXWELL –B OLTZMANN, 114, 221
W EIBULL multinomial, 158
F , 103 M UTH, 114
F ISHER, see distribution, F negative binomial, 74, 122, 125
F ISK, see distribution, log–logistic zero–truncated, 124
F R ÉCHET, 103 negative hypergeometric, 125
G ALTON, see distribution, log–normal normal, 114, 221, 225
with lower threshold multivariate, 62
gamma, 103 standardized, 80, 96
generalized, 83, 104, 116 occupancy, 126
G AUSS, see distribution, normal parabolic
generalized exponential geometric, 84, inverted U–shaped, 115
103 U–shaped, 115
geometric, 57, 122, 125, 131f., 213 parabolic inverted U-shaped, 222
positive, 122 parabolic U-shaped, 222
zero–inflated, 86, 123 PARETO
G IBRAT, see distribution, log–normal discrete, see distribution, zeta of Zipf
with lower threshold of the first kind, 115, 226
G OMPERTZ, 14, 106 of the second kind, 21, see distribu-
G OMPERTZ –M AKEHAM, 101, 107 tion, L OMAX
Subject Index 279

PASCAL, see distribution, negative bi- W IGNER’s semi–circle, see distribution,


nomial semi–elliptical
P OISSON, 28, 53, 56, 74, 126 Y ULE, 53, 132
positive, 126 zeta
power function, 97, 115, 226 of H AIGHT, 133
P ÓLYA, 127 of Z IPF, 133
R AYLEIGH, 13, 82, 99, 111, 116, 222 distribution function, 3, 5f.
generalized, 106 conditional, 10
inverse, 110 discrete, 50
reduced, 8, 44, 48 joint, 60
rectangular with discrete and continuous compo-
continuous, see distribution, uniform nents, 56
continuous DMRL, see mean residual life, decreasing
discrete, see distribution, uniform dis-
crete empirical accumulated hazard function, see
right–angled cumulated hazard rate, direct esti-
negatively skew, 223 mator
positively skew, 223 error function, 45
runs, 127 complementary, 45
S ALVIA –B OLLINGER
failure, 3
DHR, 128
failure data, 137
generalized DHR, 128
failure density, 3, 4f.
generalized IHR, 129
conditional, 10
IHR, 128
joint, 60
semi–elliptical, 117, 222
failure function, see distribution function
Student, see distribution, t
failure rate, see failure density
symmetric, 223
F ISHER information matrix, 145
t, 117
force of decrement, see hazard rate
T EISSIER, 117, 223
force of mortality, see hazard rate
triangular, 97, 115
F OURIER transform, 4
continuous, 118
discrete, 129f. gamma function
uniform complete, 83, 96
continuous, 97, 115, 118, 224 incomplete, 86, 96
discrete, 127, 131 complementary, 86, 96
order statistics, 229 G REENWOOD’s formula, 146, 159f.
reduced, 27
spacing, 231 harmonic new better than used in expecta-
V–shaped, 119, 224 tion, 92
WALD, see distribution, inverse G AUS - harmonic new worse than used in expecta-
SIAN tion, 93
W EIBULL, 8, 40, 70, 76, 101f., 116, hazard gradient, see hazard rate, multivariate
119, 169f., 214, 225 continuous
discrete, 58 hazard paper, 211ff.
double, 101 hazard plotting, 212ff.
inverse, 26, 110, 224 hazard potential, 16
reduced, 14, 25, 85 hazard quantile, 17, 213
reflected, 101, 116, 225 hazard rate, 3
type I, 131 average, 75ff.
type II, 131 decreasing, 75ff.
type III, 132 increasing, 75ff.
280 Subject Index

baseline, 39 HNBUE, see harmonic new better than used


bathtub–shaped, 78ff. in expectation
constancy testing, 236ff. HNWUE, see harmonic new worse than used
continuous, 9ff. in expectation
decreasing, 68ff. HR, see hazard rate
decreasing interval–average, 89 HRA, see hazard rate, average
DIHR/IDHR testing, 244ff.
IAE, see integrated absolute error
discrete, 49ff.
IAF, see aging factor, increasing
constant, 57
IDHR, see hazard rate, upside–down bathtub–
decreasing, 57
shaped
increasing, 58
IDMRL, see mean residual life, upside–
IHR/DHR testing, 239ff. down bathtub–shaped
increasing, 68ff. IHR, see hazard rate, increasing
increasing interval–average, 89 IHRA, see hazard rate, average, increasing
multivariate iid – independently and identically distributed,
continuous, 61ff. 182
discrete, 65f. IIHRA, see hazard rate, increasing interval–
non–monotone, 78ff. average
of a system, 37 IMRL, see mean residual life, increasing
of a transformed variate, 23ff. IMSE, see mean squared error, integrated
of the maximum order statistic, 38 instantaneous failure rate, see hazard rate
of the minimum order statistic, 37 integrated absolute error, 182
of the parallel system, 37 integrated hazard rate, see cumulative hazard
of the series system, 37 rate
pseudo, 51 integrated squared error, 181
reciprocal, 79f. intensity function, 9, see hazard rate
reversed, 14f. inversion formula, 20
specific interval–average, 89 ISE, see integrated squared error
upside–down bathtub–shaped, 78ff.
K APLAN /M EIER estimator, 143
hazard rate estimator
kernel, 177
actuarial, 162
biquadratic, see kernel, biweight
by death rate, 162
biweight, 187f.
classical, see hazard rate estimator, ac-
boundary, 191, 196ff.
tuarial
C AUCHY, 186f.
indirectly smoothed, see hazard rate es-
cosine, 187f.
timator, ratio–type
E PANECHNIKOV, 186f.
naive, 168 G AUSS, 186f.
of S ACHER, 163 higher–order, 189f.
ratio–type, 193f. L APLACE, 186f.
spline approach, 208f. logistic, 186f.
wavelet approach, 209f. quadratic, see kernel, E PANECHNIKOV
hazard rate function, see hazard rate quartic, see kernel, biweight
hazard rate model rectangular, see kernel, uniform
constant, 13 second order, 186
exponential, 14 semi–elliptical, 187f.
linear, 13 triangular, 186f.
power, 14 tricube, 187f.
hazard–quantile estimator, 216 triquadratic, see kernel, triweight
H ERD /J OHNSON estimator, 144 triweight, 187f.
HJE, see Herd/Johnson estimator uniform, 178, 186f.
Subject Index 281

kernel estimator mean residual life function, see mean resid-


local, 190f., 206ff. ual life
nearest neighbor, 191 mean squared error, 181
of a HR, 179f. integrated, 181
of a PDF, 178ff. mean time to failure, see mean life
properties, 180ff. median life, 6
variable, 191, 206, 207 M ELLIN transform, 4
with optimal local bandwidth, 208 MIAE, see mean integrated absolute error
kernel smoothing, 177ff. MISE, see mean integrated squared error
KME, see Kaplan/Meier estimator missing observation, 41
mixture model, 72, 94
L APLACE transform, 4 continuous, 28ff.
leave–one–out density estimator, 193 finite, 28ff.
life expectancy of an x–survivor, see mean MLE, see maximum likelihood estimator
residual life moment
life potential, 46ff. raw, 7
scaled, 47f. moment generating function, 4
life table, 152ff. MRL, see mean residual life
abridged, 153 MSE, see mean squared error
lifetime, 3 MTTF, see mean life
early, 41, 42f.
NBU, see new better than used
future, 10, 17, 41f.
NBUE, see new better than used in expecta-
interim, 41, 43f.
tion
remaining, see lifetime, future
NBUHR, see new better than used in hazard
young, see lifetime, early
rate
lifetime distribution function, see distribu-
NBUHRA, see new better than used in haz-
tion function
ard rate average
likelihood function, 138
nearest neighbor, 205ff.
link function, 38
N ELSON /A ALEN estimator, see cumulative
log–linear, 38
hazard rate, direct estimator
location–scale family of distributions, 212ff.
new better than used, 91
new better than used in expectation, 92
maximum likelihood estimator
new better than used in hazard rate, 93
for hi , 143f.
new better than used in hazard rate average,
of the survival function, 143
93
mean, 8
new worse than used, 91
mean age of an x–survivor, 18
new worse than used in expectation, 92
mean future life of an x–survivor, see mean
new worse than used in hazard rate, 93
residual life
new worse than used in hazard rate average,
mean integrated absolute error, 182
93
mean integrated squared error, 181 number of units at risk, 140
mean life, 6 NWU, see new worse than used
mean residual life, 3, 17ff., 155 NWUE, see new worse than used in expec-
bathtub–shaped, 84ff. tation
classes of distributions, 84ff. NWUHR, see new worse than used in hazard
decreasing, 84ff. rate
increasing, 84ff. NWUHRA, see new worse than used in haz-
joint multivariate, 64 ard rate average
of discrete lifetime, 52f.
proportional, 22 order statistics, 72, 227ff.
upside–down bathtub–shaped, 84ff. exponential distribution, 230
282 Subject Index

uniform distribution, 229 structure function, 31ff.


survival function, 3, 6ff.
PDF, see failure density baseline, 38
percentile function, 6 conditional, 10
PLE, see product–limit estimator discrete, 49
plotting position, 215 joint, 60
PMF, see probability mass function system
potential bridge, 35
cumulative hazard rate, 48 coherent, 32f., 72, 94
density function, 47 k–out–of–m, 34
distribution function, 47 parallel, 32
hazard rate, 48 reliability, 35f.
survival function, 47 series, 31
probability density function, see failure den- state vector, 31
sity
probability generating function, 4 test of P ROSCHAN /P YKE, 243
probability integral transformation, 27 testing for DMRL/IMRL, 252f.
probability mass function, 49 testing for HNBUE/HNWUE, 255f.
joint, 60 testing for IHRA/DHRA, 249f.
product–limit estimator, 143, 154 testing for NBU/NWU, 251f.
proportional hazards model, 39 testing for NBUE/NWUE, 254f.
ties, 139, 178f.
quantile function, see percentile function total–time–on–test, 47
total–time–on–test transform, 4, 47
radix, 153 truncation, 41
rate function, see hazard rate double, 41
redistribution to the right, 148 from above, see truncation, right
reliability from below, see truncation, left
of a coherent system, 36 from below and above, see truncation,
of a parallel system, 36f. double
of a series system, 36f. left, 41
time dependent of a system, 37 lower, see truncation, left
reliability function, see survival function lower point of, 41
retro hazard, see hazard rate, reversed on both sides, see truncation, double
reverse hazard rate, see hazard rate, reversed right, 41
reverse rank, 144, 215 upper, see truncation, right
R IEMANN’s zeta function, 133 upper point of, 41
TTT–plot, 234
safe life, 5 TTT–statistic, 233
sample maximum, 228 scaled, 234
sample minimum, 228 successive, 233
score vector, 143 TTT–transform, 235
shelf–aging, 3, 5 scaled, 235
sliding histogram, 178
spacing, 231 updating formula, 153
exponential distribution, 232 urn model, 121, 127
normalized, 233 variable
uniform distribution, 231 reduced, 212
spacings, 72 standardized, 212
standard life table estimator, see actuarial es- variance, 8
timator
stochastically larger (smaller), 69 window width, see bandwidth
Included MATLAB-programs
The PDF–file of the monograph “The Hazard Rate — Theory and Inference” is supplemented with two
ZIP–files containing MATLAB–programs. You should have version 7.4.0 (R 2007a) or higher of MATLAB
to run these programs successfully.

First ZIP–file

Distributions.zip, which should be extracted into a new directory — perhaps named ‘Distributions’ —
contains the programs creating plots of the density function (or the probability mass function), the survival
function, the hazard rate and the mean residual life function of 62 continuous and 32 discrete distributions.
The programs are menu–driven. To invoke continuous (discrete) distributions type ContDist (DiscDist) into
the Command Window after you have switched to the directory mentioned above. After having chosen
a distribution you will see a picture with the formula of the PDF (PMF) together with the domain of
its parameters. After your parameter–input has been checked you will see a plot of the four functions
mentioned above. You can repeat with another set of parameter values for the same distribution or you can
go to another distribution.

Second ZIP–file

Inference.zip, which should be extracted into a new directory — perhaps named ‘Inference’ — contains
12 programs HAZARD xx intended to do estimation and testing as described in Part II of this monograph.
Here is information on the Hazard–programs.

HAZARD 01 — This program computes the pointwise hazard rate by maximum–likelihood, the survival
function according to K APLAN /M EIER and the cumulative hazard rate according to N ELSON /A ALEN, all
functions with 95%–confidence limits. The relevant formulas are in Chapter 5 of the monograph.
Input: A sample of non–grouped data stored in the Workspace as a (n × 2)–matrix named y. The first
column is for the observations in arbitrary order, the second column is for the corresponding censoring
indicator: 1 for an uncensored observation, 0 for an censored observation.
Output: A table with the numerical results and a figure showing the three estimated functions with their
95%–confidence limits.

HAZARD 02 — This program estimates a life table and all its functions according to the formulas in
Chapter 6 of the monograph.
Input: A (3 × k)–matrix named y has to be stored in the Workspace. The first column is for lower class
limits in ascending order, the second column for the number of censored lifetimes in corresponding class,
the third column for the uncensored lifetimes in the corresponding class. The program asks you for the
sample size n and the number k of classes.
Output: A life table with 12 columns and k rows and a figure displaying the histogram, the survival function
and the hazard rate.

HAZARD 03 — Maximum likelihood estimation of an increasing hazard rate for a continuous distribution
according to Chapter 7.
Input: A (n×2)–matrix named y has to be stored in the Workspace. The first column is for the observations
in ascending order, the second column for the corresponding censoring indicator: 1 for an uncensored
observation, 0 for a censored observation.
Output: A table showing the uncensored observations and the hazard rate which is constant between any
two uncensored observation, a figure displaying the graphs of the estimated hazard rate, the density function
and the survival function.

HAZARD 04 — Maximum likelihood estimation of a decreasing hazard rate for a continuous distribution
according to Chapter 7.
Input: same as in HAZARD 03
Output: same as in HAZARD 03
284 Included MATLAB-Programs

HAZARD 05 — Maximum likelihood estimation of the hazard rate for a discrete distribution with realiza-
tions x = 1, 2, 3, . . . according to Chapter 7.
Input: You are asked whether the hazard has to be increasing or decreasing. A vector y to be stored in the
Workspace with the counts for each realization.
Output: A table showing the estimated hazard rate, the probability mass function and the survival function
and a figure display the graph of the three functions.

HAZARD 06 — User–supplied fixed–bandwidth kernel estimation of the hazard rate with one out of four
kernels and corresponding boundary kernel
Input: A (n × 2)–matrix named y to be stored in the Workspace. The first column is for the — not
necessarily ordered — observations, the second column for the corresponding censoring indicator: 1 for
an uncensored observation, 0 for a censored observation. The program asks for the number of gridpoints
in the plot of the smoothed hazard rate, for the kernel to be used (uniform, E PANECHNIKOV, biweight or
triweight) and for a bandwidth. (The first bandwidth is set automatically.)
Output: List of all bandwidths chosen (maximum number: 20), plot of the pointwise cumulative hazard
rate and plot of each smoothed hazard rate with 95%–confidence limits.

HAZARD 07 — Local kernel estimation of the hazard rate with one out of four kernels and corresponding
boundary kernel
Input: A (n × 2)–matrix named y to be stored in the Workspace. The first column is for the observations
in ascending order, the second column for the corresponding censoring indicator: 1 for an uncensored
observation, 0 for a censored observation. There have to be no ties neither among the uncensored nor
among the censored observations, but ties between uncensored and censored observations are allowed. In
this case the uncensored observation precedes the censored observation The program asks for the number
of gridpoints in the plot of the smoothed hazard rate, for the kernel to be used (uniform, E PANECHNIKOV,
biweight or triweight) and for a parameter specifying the k–nearest neighbor.
Output: Plot of the pointwise cumulative hazard rate and plot of the smoothed hazard rate with 95%–
confidence limits.

HAZARD 08 — Variable kernel estimation of the hazard rate with one out of four kernels and correspond-
ing boundary kernel
Input: A (n × 2)–matrix named y to be stored in the Workspace. The first column is for the observations in
ascending order, the second column for the corresponding censoring indicator: 1 for an uncensored obser-
vation, 0 for a censored observation. There have to be no ties neither among the uncensored nor among the
censored observations, but ties between uncensored and censored observations are allowed. In this case the
uncensored observation precedes the censored observation The program asks for the number of gridpoints
in the plot of the smoothed hazard rate, for the kernel to be used (uniform, E PANECHNIKOV, biweight or
triweight) and for a parameter specifying the neighborhood.
Output: Plot of the pointwise cumulative hazard rate and plot of the smoothed hazard rate with 95%–
confidence limits.

HAZARD 09 — Graphical check for constancy of the hazard rate


Input: A (n × 1)–matrix named y to be stored in the Workspace or
a (k × 1)–matrix named y to be stored in the Workspace of singly censored and ascendingly ordered ob-
servations, where the sample size n is asked by the program or
a (n × 2)–matrix named y to be stored in the Workspace of multiply or randomly censored observations in
ascending order, first column for the observations, second column for the corresponding censoring indica-
tor: 1 for uncensored observation, 0 for censored observation.
Output: A figure with the probability plot and the TTT–plot

HAZARD 10 — This program tests for IHR or DHR using the procedures of K LEFSJ Ö, E PSTEIN and
P ROSCHAN /P YKE depending on whether the sample is singly censored or uncensored.
Input: A column vector y to be stored in the Workspace with ascendingly ordered observations. The pro-
gram asks to enter r the number of observations and n the sample size. For r = n we have an uncensored
sample, for r < n a singly censored sample.
Output: TTT–plot and the test statistics with critical values.
Included MATLAB-Programs 285

HAZARD 11 — This program tests for bathtub–shape or inverted bathtub–shape of the hazard rate using
the procedures of B ERGMAN and A ARSET.
Input: A column vector y to be stored in the Workspace with n ascendingly ordered and uncensored obser-
vations. The program asks whether you want to test for bathtub-shape or for inverted bathtub–shape.
Output: TTT–plot and test statistics of B ERGMAN and A ARSET together with critical values.

HAZARD 12 — This program tests for aging classes other than IHR and DHR
Input: A column vector y to be stored in the Workspace with n ascendingly ordered and uncensored obser-
vations. The program asks you what aging class you want to test for.
Output: TTT–plot and test statistic for the chosen class together with critical values.

You might also like