Hazard Rate Theory and Inference
Hazard Rate Theory and Inference
Horst Rinne
c Prof. em. Dr. Horst Rinne
Department of Economics and Management Science
Chair of Econometrics and Statistics
Justus–Liebig–University, D 35394 Giessen, Germany
Preface
When we look at biological organisms like human beings, animals, plants or at technical devices
like motorcars, aircrafts, television sets and parts thereof or at economic and socio–economic
units like enterprizes, corporations, labor unions or at social units like families, parties or even
states, we observe that in every moment of their existence we will find them in a well–defined
state. A patient, after some medical treatment, may be alive, an adult person may be out of work,
a piece of machinery may either be down or functioning, a labor union may be on strike or a
state may be at war with some other state. The sojourn–time in a given state for such a unit ends
by the occurrence of some random event. In this book, the time–to–event since entering into a
give state will generally be called lifetime and the terminating event will be called failure. Since
the terminating event is random the lifetime is a random variable with realizations which — in
general — will be non–negative.
The main body of this monograph is given by Parts I and II. The final Part III, entitled ‘Appen-
dices’, gives the usual ingredients of a scientific opus: the bibliography and an author index as
well as a subject index. Here one will also find information about the MATLAB–programs which
have been written to facilitate practical working with the methods and procedures described in
the monograph. Part I is descriptive in the sense that gives definitions for the various functions
of the variate ‘lifetime’ and explores how these functions are related to one another. This is done
in Chapter 1 where we also differentiate between the univariate and the multivariate cases and
between the continuous and the discrete cases. Chapter 2 introduces several classes of lifetime
distributions with respect to aging. Chapter 3 is devoted to univariate parametric distributions and
enumerates important continuous and discrete distributions known in probability theory. Here the
reader will find the formulas for the four most important representatives of a lifetime variate: its
probability density in the continuous case or its probability mass function in the discrete case, its
survival function, its hazard rate function and its mean residual life function. In order to have a
graph of these four functions for any set of the function parameters the reader can revert to the
MATLAB–programs stored in ‘Distributions.zip’ and described in Part III.
Part II is devoted to the inference of the hazard rate. The focus is on non–BAYESIAN and non–
parametric inferential procedures of — unless stated otherwise — univariate continuous lifetime
distributions. The non–parametric approach to estimate hazard rates from lifetime data is flexible,
model–free and data–driven. No shape assumption is imposed other than that the hazard rate is a
smooth function, or occasionally in Chapter 7, a monotone function. Such an approach typically
involves smoothing of an initial and discrete hazard rate estimate, with arbitrary choice of the
smoother.
In Chapters 5 and 6 we present estimation techniques to find such initial estimates for non–
grouped and grouped data after having introduced — in Chapter 4 — sampling techniques for
lifetime data with the pertaining denotation of the quantities coming up. The core chapter of
Part II is Chapter 8 presenting smoothing techniques. Emphasis here is on smoothing with kernels,
a technique that is most elaborated, explored and used in practice, but we also look at some other
techniques.
Chapter 9 on hazard plotting is an exception from the non–parametric nature of this second part
when we plot on special graph paper. For each location–scale distribution we can design an
especially scaled grid so that the data points will lie on a straight line when coming from that
distribution. So the graph is a means of testing for a special distribution. Estimates of the pertain-
ing parameters of this distribution can be found as special hazard quantiles. Hazard plotting thus
serves as an instrument for estimating and testing and is the bridge to the testing procedures of
Chapter 10.
IV Preface
Testing procedures in Chapter 10 are not devoted to hypotheses on parameters of some parametric
lifetime distribution, but they are concerned with the presence or absence of certain aging prop-
erties. These properties have been described in Chapter 2 and will be tested here by means of
non–parametric methods. The testing of the no–aging property (= constant hazard rate) may be
seen and taken as testing for exponentially distributed lifetime and thus is parametric. Most of
the testing procedures are of numerical type, but with the total–time–on–test plot we also have
a graphical approach. After presentation of the prerequisites like order statistics, spacings and
TTT–transform we test for properties of the hazard rate, i.e., its shape and behavior, and we test
for aging classes.
Contents
Preface III
List of Figures IX
List of Tables XI
5 Hazard Rate Estimation and the K APLAN /M EIER and N ELSON /A ALEN Approaches142
5.1 Estimating the Hazard Rate and the Survival Function . . . . . . . . . . . . . . . 142
5.2 Estimating the Cumulative Hazard rate . . . . . . . . . . . . . . . . . . . . . . . 149
Bibliography 261
3/1 PDF-formula display of the D HILLON–II distribution by the program ContDist . . 120
3/2 Display of the functions of a D HILLON–II distribution by the program ContDist . . 120
3/3 PMF-formula display of the W EIBULL type I distribution by the program DiscDist 134
3/4 Display of the functions of a W EIBULL type I distribution by the program DiscDist 134
4/1 Illustration of the numbers ci , di , ni and the failure times xi on the time axis
(non–grouped data) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4/2 Illustration of the numbers cj , dj , nj for a divided time axis (grouped data) . . . . 141
5/1 Estimated hazard rate and survival function with pointwise 95%–confidence inter-
vals for the 21 leukaemia–patients’ data . . . . . . . . . . . . . . . . . . . . . 147
5/2 Estimated CHR with pointwise 95%–confidence intervals
left part: indirect estimates; right part: direct (N ELSON /A ALEN) estimates . . . 151
7/1 ML–estimates for a continuous distribution with increasing hazard rate . . . . . . 169
7/2 ML–estimates for a continuous distribution with decreasing hazard rate . . . . . . 171
7/3 ML–estimates of an increasing discrete hazard rate . . . . . . . . . . . . . . . . . 173
10/1 Two ways of expressing the total time spent on test . . . . . . . . . . . . . . . . . 234
10/2 TTT–plots based on simulated exponential data (b = 5; n = 10, 50, 100) and the
scaled TTT–transform of the exponential distribution . . . . . . . . . . . . . . 236
10/3 Graphs for judging exponentiality . . . . . . . . . . . . . . . . . . . . . . . . . . 239
10/4 Scaled TTT–transforms of three W EIBULL distributions . . . . . . . . . . . . . . 240
10/5 Scaled TTT–transforms of lognormal and power function distributions . . . . . . . 245
10/6 TTT–plot for a data set coming from a DIHR distribution . . . . . . . . . . . . . . 248
10/7 TTT–plot of a data set coming from an IHR distribution . . . . . . . . . . . . . . 248
10/8 TTT–plot for the 43 granulocytic leukemia patients . . . . . . . . . . . . . . . . . 257
List of Tables
1/1 Relations among the six functions describing a continuously distributed stochastic
lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1/2 Relations among the six functions describing a discrete stochastic lifetime . . . . . 55
5/1 Estimates of hi and S(xi ) for the 21 leukaemia patients’ data . . . . . . . . . . . . 147
5/2 Estimates of H(xi ) and its variances for the 21 leukaemia patients’ data . . . . . . 151
6/1 Extraction from the German life table 2000 – 2002 for males . . . . . . . . . . . . 155
6/2 Lay–out of a non–demographic life table . . . . . . . . . . . . . . . . . . . . . . 156
6/3 Data for life table estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6/4 Estimates (with variances) of life table quantities . . . . . . . . . . . . . . . . . . 164
1. the probability density function f (x), abbreviated by PDF, density function for short, also
known as failure density, or as failure rate;
2. the cumulative distribution function F (x), abbreviated by CDF, distribution function for
short, also known as failure function or as lifetime distribution function;
4. the hazard rate function h(x), abbreviated by HR, hazard rate for short, also known as
instantaneous failure rate or as force of mortality;
5. the cumulative hazard rate H(x), abbreviated by CHR, also known as integrated hazard
rate;
6. the mean residual life function µ(x), abbreviated by MRL, also known as life expectancy
of an x–survivor or as mean future life of an x–survivor.
The origins of some of these functions date back to the 17th century when the first life tables came
into existence, whereas their application in the engineering sciences and in the life sciences only
started in the 1950’s. Each of these six functions completely describes the distribution of lifetime,
and any of these functions determines the other five, see Tab. 1/1 at the end of Sect. 1.1.1.6. The
six functions are answering different questions with respect to the lifetime variable. The choice
further depends on whether
1
Suggested reading for this section: BAIN /E NGELHARDT (1991, Chapter 1), L EEMIS (1986; 1995, Chapter 3),
R INNE (2009, Chapter 2), S MITH (2002, Chapters 1 and 2).
4 1 The Hazard Rate and its Relatives
The six representatives are not the only ways to define the distribution of a random variable X.
Other concepts include, e.g.:
√
• the characteristic function E ei s X , i := −1,
F −1
R (P )
• the total–time–on–test transform S(x) dx, 0 ≤ P ≤ 1.
0
The six representatives used here have been chosen because of their special meaning for lifetime
data, for their intuitive appeal, for their usefulness in lifetime data analysis, and — last but not
least — for their popularity in probability theory and in statistics.
The first lifetime distribution representative to be described is the failure density PDF — or in a
more general context — the probability density function, defined as
∆ ∆
Pr x − <X ≤x+
2 2
f (x) := lim . (1.1a)
∆→0 ∆
∆ smallthe product f (x) ∆ approximates the probability of failure in the time interval
Thus, for
(x − ∆ 2, x + ∆ 2] or — roughly speaking — the probability of failure around an age of x. The
probability of reaching an age between xl and xu , xl < xu , is
Zxu
Pr(xl ≤ X ≤ xu ) = f (x) dx. (1.1b)
xl
Especially for a newly born organism or creature or a just produced unit, e.g., for a unit starting
at age x = 0, the probability to fail up to an age x > 0 is given by
Zx
Pr(X ≤ x) = f (u) du. (1.1c)
0
For varying x formula (1.1c) gives the lifetime distribution function, see (1.2a).
1.1 The Univariate Continuous Case 5
Theorem 1: All probability density functions for the variate ‘lifetime’ must satisfy two conditions:
1. f (x) ≥ 0, ∀ x ≥ 0, (1.1d)
Z∞
2. f (x) dx = 1. (1.1e)
0
Remarks:
1. When X has a parametric distribution with a shift parameter a ∈ R (1.1d,e) turn into
Z∞
f (x) ≥ 0 ∀ x ≥ a and f (x) dx = 1.
a
a is called safe life when a > 0, i.e., failing before the age a is impossible. When a < 0
we have shelf–aging.
2. Usually, when describing a particular PDF only its non–zero part will be explicitly stated,
and it should be understood that PDF is zero over any unspecified region of R.
3. Many characteristics such as age, length, weight etc. are true continuous variables, at least
conceptually, although it could be said that due to the physical limitations of measuring
devices, the characteristic can be observed only as a discrete variable. However, the mea-
surement restrictions are usually insignificant to other sources of error, and the continuous
model is mathematically and conceptually much more convenient.
4. For continuous variables we have the following numerical equivalences:
Pr(xl ≤ X ≤ xu ) = Pr(xl < X ≤ xu ) = Pr(xl ≤ X < xu ) = Pr(xl < X < xu ),
i.e., the probability of failing within a given interval is the same whether we include or
exclude none, one or both of its end–points.
Generally, for lifetime X the density function is positively skewed (skewed to the right or steep
on the left–hand side), see Fig. 1/1 below. Thus, f (x) has a flat and relatively long right–hand
tail, meaning that longer lifetimes are less probable than shorter lifetimes and that the mean life
(life expectancy) is greater than the median life, see Sect. 1.1.1.2.
The second lifetime distribution representative is the failure function or lifetime function CDF,
defined as
F (x) := Pr(X ≤ x), x ≥ 0, (1.2a)
giving the probability of failing up to age x or of having a life span of at most length x.
Theorem 2: Any function F (x) may be the CDF of a lifetime variable if its satisfies the following
properties:
1. lim F (x) = 0, (1.2b)
x→0
Remarks:
0 ≤ F (x) ≤ 1. (1.2f)
Because F (x) is a monotone and increasing2 function, see Fig. 1/1 below, the inverse function
F −1 (.) exists and is called percentile function or quantile function:
F (xP ) = P =⇒ xP = F −1 (P ), 0 ≤ P ≤ 1. (1.3)
xP is called the percentile or quantile of order P . The special percentile x0.5 is called median
life, i.e., there are equal chances of failing before or surviving beyond the age x0.5 . Because of
the positive skewness of most lifetime densities the median life is more popular than the mean
life µ := E(X) in measuring the central tendency by a single number. For positive skewness of
PDF we find 0.5 < µ.
Another lifetime distribution representative is CCDF, the survival function or reliability func-
tion, defined as
S(x) := Pr(X > x), x ≥ 0, (1.4a)
indicating the probability of surviving an age of x or becoming older than x. From (1.2a) and
(1.4a) we see that the lifetime distribution and the survival function are complementary functions:
Thus, S(x) is the probability of exceeding x and F (x) is the probability of reaching x, or —
stated for a technical unit — S(x) gives the probability of its functioning at time x and F (x) is
the probability of its being down at time x. PDF and CCDF are related as
d
f (x) = − S(x) and (1.4c)
dx
Z∞
S(x) = f (u) du. (1.4d)
x
The study of S(x) is at the heart of survival analysis and reliability theory. The survival function
is important in describing systems of components, i.e., in calculating systems’ reliability, see
Sect. 1.1.2.3.
2
In this text we always use increasing in the sense of non–decreasing, and decreasing has the meaning of
non–increasing.
1.1 The Univariate Continuous Case 7
From (1.4d) we can establish the following relations, simply because PDFs integrate to one:
Z∞
S(0) = f (u) du = 1. (1.4e)
0
Furthermore,
Z∞
S(∞) = lim S(x) = lim f (u) du = 0. (1.4f)
x→∞ x→∞
x
Finally, for xu ≥ xl :
Zxu
S(xl ) − S(xu ) = f (u) du ≥ 0. (1.4g)
xl
if and only if
lim xk−1 f (x) = 0 for k < ∞.
(1.5b)
x→∞
Proof of Theorem 4: Integrating by parts the last term on the right–hand side of (1.5a) gives
Z∞ ∞ Z∞
S(x) k xk−1 dx = S(x) xk − −f (x) xk dx
a
a a
h i Z∞
k k
= lim S(x) x − S(a) a + xk f (x) dx
x→∞
a
h i
lim S(x) xk − ak + µ0k , because S(a) = 1.
=
x→∞
h i
Applying once L’ H OSPITALS’s rule to the indeterminate form limx→∞ S(x) xk gives
1. the mean, often called mean time to failure and abbreviated by MTTF,
Z∞
µ := µ01 := E(X) = S(x) dx; (1.5c)
0
(Therefore, to find the average lifespan, we integrate the survival function over its support,
i.e., the mean life is equal to the area beneath the survival function.) and
2. the variance
σ 2 := Var(X) = E X 2 − µ2
2
Z∞
∞
Z
= 2 x S(x) dx − S(x) dx . (1.5d)
0 0
Example 1/1: PDF, CDF, and CCDF of the linear hazard rate distribution
A distribution with the linear hazard rate h(x) = a + b x; x ≥ 0, a ≥ 0, b > 0; has:
b
f (x) = (a + b x) exp −a x − x2 ,
2
b 2
F (x) = 1 − exp −a x − x ,
2
b 2
S(x) = exp −a x − x .
2
Fig. 1/1 shows the functions f (x), F (x), and S(x) for a = 0 and b = 1, which is nothing but the reduced
R AYLEIGH distribution, a special case of the W EIBULL distribution.
Figure 1/1: PDF, CDF, and CCDF of the linear hazard rate distribution with a = 0 and b = 1
1.1 The Univariate Continuous Case 9
r
a 1
xmode = − +
b b
r
a a 2 2
xP = − + − ln(1 − P ), 0 ≤ P ≤ 1
b b b
∞
(−b/2)i Γ(2 i + r + 1)
X Γ(2 + r + 2 i)
= E xr =
µr + b
i=0
i! a2 i+r a2 i+r+2
In the special case a = 0 we have µr = Γ(1 + r/2) (b/2)r/2 .
The reliability (survival) function examines the chance that breakdowns of organisms, of technical
units etc. occur beyond a given point in time. To monitor the lifetime of a unit across the support
of its lifetime distribution, the hazard rate h(x) is used.
In fact, the hazard rate usually is more informative about the underlying mechanism of failure than
the other representatives of a lifetime distribution. For this reason, consideration of the hazard rate
may be the dominant method for summarizing survival data. C OX /OAKES (1984, p. 16) give the
following number of reasons why consideration of the hazard rate may be a good idea:
The hazard rate is perhaps the most popular of the six representatives modeling and analyzing
lifetime data. This is due to its intuitive interpretation as the amount of risk to fail associated with
a unit at age x. Another reason for its popularity is that it is a special case of the intensity function
for a non–homogeneous P OISSON process. A hazard rate function models the occurrence of only
one, namely the first event (= failure), whereas the intensity function models the occurrence of a
sequence of events over time.
The hazard rate goes by several aliases.
• In vital statistics and in the life sciences it is known as the age–specific death rate.
3
This name gives reason for confusion with the failure density.
10 1 The Hazard Rate and its Relatives
• In point process and extreme value theory it is known as the rate function or intensity
function.
The hazard rate can be derived using the concept of conditional probability. Let A and B be two
random events with Pr(A) > 0, than the probability of the conditional event B | A (= event B
happens, given event A has happened) is defined as
Pr(A ∩ B)
Pr(B | A) = , (1.6a)
Pr(A)
where A ∩ B means that events A and B happen simultaneously. Now let A := ‘X > x’
(= lifetime is greater than x) and B := ‘X > x + y’, then, evidently, A ∩ B = ‘X > x + y’.
As Pr(A) = Pr(X > x) = S(x) and Pr(A ∩ B) = Pr(B) = Pr(X > x + y) = S(x + y) the
conditional survival probability according to (1.6a) results as
S(x + y)
Pr(X > x + y | X > x) = . (1.6b)
S(x)
The conditional event ‘X > x + y | X > x’ can be transformed to ‘X − x > y | X > x’ and the
corresponding conditional variate
Y | X > x := X − x | X > x
S(x + y)
S(y | X > x) = (1.6c)
S(x)
which is called conditional survival function. Its complement is the conditional distribution
function
S(x + y)
F (y | X > x) = 1 −
S(x)
S(x) − S(x + y)
=
S(x)
F (x + y) − F (x)
= . (1.6d)
1 − F (x)
dF (y | X > x)
f (y | X > x) =
dy
d F (x + y) − F (x)
=
dy 1 − F (x)
f (x + y) f (x + y)
= = . (1.6e)
1 − F (x) S(x)
4
This variate plays a role in Sect. 1.1.1.6 and is discussed in more detail under the heading of truncated lifetime
distributions in Sect. 1.1.2.5.
1.1 The Univariate Continuous Case 11
f (y | X > x) really is a density function, as the two conditions (1.1d,e) are fulfilled:
This is an approximation of an x–survivor’s chance to fail within the small time span ∆ adjacent
to x. Now, the hazard rate follows from (1.6e) and (1.7a) with ∆ → 0 :
f (x + ∆)
= lim
∆→0 S(x)
f (x)
= , S(x) > 0. (1.7b)
S(x)
In other words, for a small increment in time, ∆, the conditional probability that an x–survivor
fails in the time interval (x, x + ∆] is roughly equal to the product h(x) ∆. Another possible
interpretation of h(x) is the rate at which failures occur per unit of time relative to the portion of
the population which has not yet failed.
When we want to predict the chance of failure at age x for a newly born or produced unit having
F (x) as its CDF we have to use f (x), i.e., f (x) is an unconditional predictor for risk to fail at
x. When we know that a unit has survived up to x, we have to use h(x) which is a conditional
predictor. Comparing numerically f (x) to h(x) we notice:
• f (0) = h(0),
There is a fundamental difference between the hazard rate function h(x) and the conditional
failure density f (y | X > x).
1. h(x) is a function of x, the age reached, whereas f (y | X > x) is a function of the future
lifetime y following a given age x.
2. Both, h(x) and f (y | X > x) are non–negative, but h(x) is not a density function as it is
R∞
not normalized, instead we have h(x) dx = ∞, see Theorem 5.
0
12 1 The Hazard Rate and its Relatives
1. h(x) ≥ 0 ∀ x ≥ 0, (1.7c)
Z∞
2. h(x) dx = ∞. (1.7d)
0
= ln S(0) − ln S(∞)
= ln 1 − ln 0
= ∞.
The hazard rate measures the propensity to fail or to die depending on the age reached and it thus
plays a key role in characterizing the process of aging and in classifying lifetime distributions, see
Sect. 2. Generally, HR more precisely describes the stochastic regularity of the variate ‘lifetime’
than the positively skewed course of PDF or the monotone courses of CDF or CCDF. We will
distinguish between
• monotone hazard rates, either increasing, when the unit is wearing out with age, or decreas-
ing, when the unit is improving with age, and
• non–monotone hazard rates either U–shaped (= bathtub–shaped) as, e.g., is the case with
the age–specific death rate in human life tables, or having any other non–monotone course,
e.g., an inverted bathtub–shape.
It is easily possible to express the hazard rate of a population by its PDF, CDF, and CCDF.5
f (x)
h(x) = , (1.8a)
R∞
f (u) du
x
F 0 (x)
= , (1.8b)
1 − F (x)
S 0 (x) d ln S(x)
= − = − . (1.8c)
S(x) dx
Conversely, we may write the PDF, CD, and CCDF of a population in terms of its HR. Integrating
in (1.8c) yields
Zx x
h(u) du = − ln S(x)
0
0
= − ln S(x) + ln S(0)
= − ln S(x), because S(0) = 1. (1.9a)
5 k
It is also possible to write the moments E X in terms of the hazard rate, see M UTH (1974).
1.1 The Univariate Continuous Case 13
so that x
Z
F (x) = 1 − exp − h(u) du . (1.9c)
0
Thus, the constant hazard rate model gives the exponential distribution.
2. The linear hazard rate model
From
h(x) = a + b x ∀ x ≥ 0, a ≥ 0, b > 0,
we find, see Example 1:
b
f (x) = (a + b x) exp −a x − x2 ,
2
b
F (x) = 1 − exp −a x − x2 ,
2
b
S(x) = exp −a x − x2 .
2
f (x) = ex exp{−ex + 1} ,
F (x) = 1 − exp{−ex } ,
S(x) = exp{−ex } .
Based on h(x) = f (x) S(x) it appears that the approximate unconditional probability of failure
in (x, x + dx], Pr(x < X ≤ x + dx) ≈ f (x) dx, is equal to the product of the probability of
surviving beyond x and the approximate conditional probability of failure in (x, x + dx] :
expressing
R x the survival
probability in terms of the future lifetime. Looking at S(x) =
exp − 0 h(u) du in (1.9b) we see the survival probability expressed in terms of the past
lifetime.
Pr(x − ∆ < X ≤ x |X ≤ x)
rh(x) := lim
∆→0 ∆
f (x)
= , F (x) > 0. (1.10a)
F (x)
(1.10a) can be derived along the lines of (1.6a) – (1.7b) with A := ‘X ≤ x’ and ‘B := X ≤ x − y’, so that
A ∩ B = ‘X ≤ x − y’. From (1.10a) it is seen that rh(x) describes the probability of an immediate past
6
Newer papers on the topic are: C HANDRA /ROY (2001, 2005), G UPTA /NANDA (2001), K UNDU et al. (2009),
NANDA /G UPTA (2001, 2004), and S ANKARAN et al. (2007).
1.1 The Univariate Continuous Case 15
failure, given that the unit has already failed at time x, as opposed to the immediate future failure, given
that the unit has not failed at time x, described by h(x).
h(x) and rh(x) are related as
S(x)
rh(x) = h(x) , S(x) < 1, (1.10b)
1 − S(x)
F (x)
h(x) = rh(x) , F (x) < 1. (1.10c)
1 − F (x)
Both rates are equal to one another for x = x0.5 , otherwise we have
< rh(x) for x < x ,
0.5
h(x) (1.10d)
> rh(x) for x > x .
0.5
The reversed hazard rate may be expressed by the PDF, CDF, and CCDF of X as
f (x) F 0 (x) −S 0 (x)
rh(x) = Rx = = . (1.10e)
F (x) 1 − S(x)
f (u) du
0
It is also possible to express the PDF, CDF, and CCDF in terms of the reversed hazard rate. From (1.10a)
we have
d ln F (x)
rh(x) = . (1.10f)
dx
Integrating (1.10f) yields
Z∞ ∞
rh(u) du = ln F (x)
x
x
= − ln F (x), because F (∞) = 1. (1.10g)
so that ∞
Z
S(x) = 1 − exp − rh(u) du (1.10i)
x
and
∞
R
d exp − rh(u) du
dF (x) x
f (x) = =
dx dx
∞
Z
= rh(x) exp − rh(u) du . (1.10j)
x
In the previous excursion we have seen that the exponential distribution has a constant hazard rate, but the
reversed hazard of that distribution is decreasing. From (1.10h–j) we see that we cannot find a distribution
defined on [0, ∞) having a constant reversed hazard rate, but the reflected exponential distribution
defined on (−∞, 0] with
f (x) = λ eλ x ; x ≤ 0, λ > 0;
F (x) = eλ x
1. H(0) = 0,
2. limx→∞ H(x) = ∞,
and furthermore
F (x) = 1 − exp − H(x) , (1.11c)
d exp − H(x)
f (x) = − . (1.11d)
dx
Vice versa we have the following relations between PDF, CDF, and CCDF on the one hand and
CHR on the other hand:
∞
Z
H(x) = − ln f (u) du , (1.11e)
x
= − ln S(x), (1.11f)
= − ln[1 − F (x)] . (1.11g)
But what is the meaning of H(x)? — Whereas h(x) ∆ can be given an intuitive interpretation as
Pr(x < X ≤ x + ∆ | X > x), H(x) cannot. H(x) is not the sum or the integral of conditional
probabilities because the conditioning event changes with x, and there is no law of probability
leading to H(x). Thus, H(x) does not have a probabilistic connotation. Yet H(x) plays a key
role in reliability and survival analysis, because of the exponentiation formula (1.11b) which says
that with H(x) specified we have
we may claim that the time to failure, X, of a unit coincides with the time at which its cumulative hazard
H(x) crosses a random threshold Z, where Z has an exponential distribution with scale parameter equal to
one, i.e., X = H −1 (Z). The random threshold Z, where Z = H(X), is defined as the hazard potential
of the unit. We may interpret Z as an unknown resource with which the unit is endowed at the time of its
inception. With Z considered a resource, H(x) can be interpreted as the amount of resource consumed by
time x and the HR, h(x) = −dH(x) dx, can be considered the rate at which this resource is consumed.
The unit fails when this resource becomes depleted. The term ‘potential’ refers to a feature parallel to
that of life potential, see Sect. 1.1.2.6. The difference here is that we are alluding to a unit’s resistance to
failure rather than its capacity for work.
Another possibility giving insight into (1.12) is the provision of an indifference principle for
reliability and survival analysis. Corresponding to every non–negative variate X having an ab-
solutely continuous survival function S(x) = Pr(X > x), there exists a variate Z taking values
H(x), 0 ≤ H(x) < ∞, whose survival function is an exponential with scale parameter equal
to one. The survival function of X is indexed by x, x ≥ 0, whereas that of Z is indexed by
Rx
H(x) = − dS(u) S(u).
0
H(xΛ ) = Λ, Λ ≥ 0, or (1.14a)
xH
Λ = H −1 (Λ), Λ ≥ 0. (1.14b)
The hazard quantile plays a role in hazard plotting and in designing hazard papers, see Sections 9.2
and 9.3. Based on (11c,g) we see that the ordinary quantile or percentile xP , 0 ≤ P ≤ 1, and the
hazard quantile xH
Λ , Λ ≥ 0, are linked as
xP = xH
− ln(1−P ) , and (1.14c)
xH
Λ = x1−exp(Λ) . (1.14d)
In Sect. 1.1.1.4 we have introduced the conditional lifetime variate Y | X > x := X − x | X > x,
called future lifetime or remaining lifetime of an x–survivor. The pertaining PDF reads
f (x + y)
f (y | X > x) = , y ≥ 0.
S(x)
The mean of this variate, denoted µ(x),8 as it depends on the age reached, is called mean residual
life (MRL):
Looking at (1.15b) we see that µ(x) is the area beneath the survival function to the right of of x
divided by the ordinate S(x) at x, corresponding to the fraction surviving x.
The mean residual life µ(x) must not be confused with the mean age of an x–survivor:
Z∞
1
E(X | X > x) = u f (u) du. (1.16a)
S(x)
x
Example 1/2: HR, CHR, and MRL of the linear hazard rate distribution
Figure 1/2: HR, CHR, and MRL of the linear hazard rate distribution with a = 0 and b = 1
1.1 The Univariate Continuous Case 19
For the linear hazard rate distribution defined in Example 1/1 we have
h(x)= a + b x,
b
H(x) = a x + x2 ,
2 " !#
2 r r
a π (a + b x)2
exp 1 − erf
2b 2b 2b
µ(x) =
b
exp −a x − x2
2
The following theorem gives the properties of MRL, see S WARTZ (1973).
Theorem 6: If µ(x) is the MRL of a survival function S(x) with finite mean E(X) = µ then:
1) µ(x) ≥ 0 ∀ x ≥ 0, (1.17a)
2) µ(0) = E(X), (1.17b)
0
3) if S(x) is absolutely continuous, then µ (x) ≥ −1, (1.17c)
Z∞
1
4) dx diverges, (1.17d)
µ(x)
0
x
µ(0) Z 1
5) S(x) = exp − du . (1.17e)
µ(x) µ(u)
0
Proof of Theorem 6: It is fairly obvious why property 1) would be necessary since MRL is a
conditional expectation of a non–negative variate. Part 1) further follows from (1.15b) because
S(x) ≥ 0 ∀ x ≥ 0.
2) follows from (1.15b) with (1.5c) observing S(0) = 1.
To proof 3) we take a closer look at the derivative of µ(x), starting with (1.15b):9
R∞
−S 2 (x) + f (x) S(u) du
0 x
µ (x) =
S 2 (x)
= h(x) µ(x) − 1. (1.18)
1 S(x)
− = −∞ ,
µ(x) R
S(u) du
x
Z∞
d
S(u) du
dx
x
= ,
R∞
S(u) du
x
9
Rx
By differentiating the numerator and denominator of µ(x) = S(u) du S(x) it can be shown that
0
d −1
limx→∞ µ(x) = limx→∞ − dx ln f (x) .
20 1 The Hazard Rate and its Relatives
∞
Z
d
= ln S(u) du . (1.19a)
dx
x
Zz Z∞
1
lim dx = lim ln µ(0) − ln S(u) du
z→∞ µ(x) z→∞
0 z
Z∞
= ln µ(0) − ln 0, as S(∞) = 0
= ln µ(0) + ∞.
x R∞
Z S(u) du
1
x µ(x)
exp − du = = S(x). (1.19e)
µ(u) µ(0) µ(0)
0
Finally, 5) follows from cross multiplication of each side of (1.19e) by µ(0) µ(x).
(1.17d) is known as the inversion formula10
which serves as a starting point in expressing the
other four representatives of a lifetime distribution in terms of µ(x) :
x
0
1 + µ (x) Z
1
f (x) = µ(0) exp − du , (1.20a)
µ2 (x) µ(u)
0
x
0
1 + µ (x) Z 0
1 + µ (u)
= exp − du , (1.20b)
µ(x) µ(u)
0
x
Z
µ(0) 1
F (x) = 1 − exp − du , (1.20c)
µ(x) µ(u)
0
10
M EILIJSON (1972) gives another proof of this formula based on the L APLACE transform.
1.1 The Univariate Continuous Case 21
1 + µ0 (x)
h(x) = , (1.20d)
µ(x)
Zx
µ(x) 1
H(x) = ln + du, (1.20e)
µ(0) µ(u)
0
Zx
1 + µ0 (u)
= du. (1.20f)
µ(u)
0
Example 1/3: Finding PDF, CDF, CCDF, HR, and CHR from a given MRL
What are the five representatives of a lifetime distribution when its MRL is given by
µ(x) = a + b x; x ≥ 0, a > 0, b > 0?
The resulting distribution may be called linear mean residual lifetime distribution.11 Applying (1.17e)
and (1.20a–f) we find after some manipulation:
1+b
h(x) = ,
a + bx
1+b a + bx
H(x) = ln ,
b a
1/b
a a
S(x) = ,
a + bx a + bx
1/b
a a
F (x) = 1 − ,
a + bx a + bx
1/b
a
a (1 + b)
a + bx
f (x) = .
(a + b x)2
MRL may also be written in terms of PDF, CDF, CCDF, HR, and CHR:12
R∞
u f (x + u) du
0
µ(x) = , (1.21a)
R∞
f (u) du
0
R∞
u f (u) du
x
= − x, (1.21b)
R∞
f (u) du
x
R∞
1 − F (u) du
x
µ(x) = , (1.21c)
1 − F (x)
11
BARLOW /P ROSCHAN (1975) have shown that any mixture of exponential distributions yields a distribution with
decreasing HR what — see Sect. 2.3 – is equivalent to an increasing MRL. Based on this result M ORRISON
(1978) proved that when taking the gamma as the mixing distribution the result is a distribution with a linearly
increasing MRL which can be identified as the PARETO distribution of the second kind.
12
Note that it is possible for the MRL to exist but for the hazard rate function not to exist and
vice versa. If, e.g.,
2
we modify the C AUCHY distribution to a half–C AUCHY distribution having f (x) = 2 [π (1 + x )], x ≥ 0,
the MRL does not exist whereas h(x) = 2 [(1 + x2 ) (π − 2 arctan x)].
22 1 The Hazard Rate and its Relatives
R∞
S(u) du
x
µ(x) = , (1.21d)
S(x)
R∞
z
R
exp − h(u) du dz
x
µ(x) = 0x , (1.21e)
R
exp − h(u) du
0
R∞
exp{−H(u)} du Z∞
x
µ(x) = = exp H(x) − H(u + x) du. (1.21f)
exp{−H(x)}
0
G UESS /P ROSCHAN (1978) stated several bounds for MRL depending on the moments, the CDF
and the percentile function of X. G UPTA (1981) showed how to express the moments of X in
terms of the mean residual lifetime. He also stated that MRL is the reciprocal of the hazard rate
of the asymptotic forward and backward recurrence times of a renewal process. Both recurrence
times have the same asymptotic distribution with PDF
1 − F (x)
f ∗ (x) = (1.22a)
E(X)
and CCDF
Z∞
1
S ∗ (x) = S(u) du. (1.22b)
E(X)
x
The corresponding HR is
f ∗ (x)
h∗ (x) =
S ∗ (x)
which upon inserting (1.22a,b) and taking the reciprocal gives
Z[∞
S ∗ (x) 1
∗
= S(u) du = µ(x). (1.22c)
f (x) S(x)
x
Two survival functions S0 (x) and S1 (x) are said to have proportional mean residual life if
µ1 (x) = Θ µ0 (x) ∀ x ≥ 0 and Θ > 0, (1.23a)
where µ0 (x) and µ1 (x) are the respective mean residual lives at time x. It can be shown that if
S0 (x) and S1 (x) have proportional mean residual life, then
∞ 1/Θ−1
Z
S 0 (u) du
S1 (x) = S0 (x) . (1.23b)
µ0 (0)
x
The hazard rate and the mean residual life are conditional concepts, both are conditioned on
survival to time x. An essential difference between HR and MRL is that the former accounts only
for the immediate future in assessing the event ‘unit failure’, whereas the latter accounts for the
whole future. This is readily seen if we multiply both h(x) and µ(x) by S(x):
dS(x)
h(x) S(x) = − , (1.24a)
dx
Z∞
µ(x) S(x) = S(u) du. (1.24b)
x
1.1 The Univariate Continuous Case 23
The right–hand side of (1.24a) depends on the probability law at the point x only, whereas the
right–hand side of (1.24b) depends on the probability law of X at all points in (x, ∞). This in-
tuition explains the difference between the two. Both, MRL and HR are needed in practice. In
theory we define classes of distributions depending on the behavior of MRL and HR, see Chap-
ter 2. The MRL function has a tremendous range of applications. For example, WATSON /W ELLS
(1961) use MRL in studying burn–in. Actuaries apply MRL to setting rates of benefits for life
insurance. Distributions with increasing MRL have been found useful as models in the social
science for the duration of wars and strikes or of jobs, a phenomenon called ‘inertia’.
In Tab. 1/1 we have summarized the most important relationships between the six representatives
of the lifetime distribution scattered in this and the preceding sections. The table has been ar-
ranged as an input–output table showing how to switch over from one representative to another
one.
Suppose we have a continuous random variable X with known representatives of its distribution,
and we consider a new random variable Y which is some function of X, i.e., let
y = g(x) (1.25a)
x = g −1 (y) (1.25b)
exists. When seeking the representatives of the Y –distribution in terms of those of the X–
distribution we have to distinguish between two cases.
In the first case let y = g(x) be a strictly increasing function. Then, if X is less than or equal
to x it follows that Y is less than or equal to the unique value of y that corresponds to the given
value of x. Thus, if X ≤ x, then Y ≤ g(x). Conversely, if Y ≤ y, then X ≤ g −1 (y), and the
probabilities of these events are equal, i.e.,
Pr(Y ≤ y) = Pr X ≤ g −1 (y)
F (y) = F g −1 (y) .
(1.26a)
(1.26a) can be confusing since the CDFs on opposite sides of the equation are not the same
functions. The one on the left–hand side is the CDF of Y , whereas the one on the right–hand side
is for the random variable X. To clarify this, we write (1.26a) as
FY (y) = FX g −1 (y) .
(1.26b)
From (1.26b), which relates the CDFs of X and Y , we can derive relationships for the PDFs,
CCDFs, HRs, and CHRs as well.13 Since the CCDF is the complement of the CDF it follows
from (1.26b) that 1 − SY (y) = 1 − SX g −1 (y) or that
SY (y) = SX g −1 (y) .
(1.26c)
13
Generally, the MRL of Y cannot be given easily and as an exact function of the X–MRL, see the excursus at the
end of this section.
24
Table 1/1: Relations among the six functions describing a continuously distributed stochastic lifetime
H
HHHH
HH
H to
H
HH
H f (x) F (x) S(x) h(x) H(x) µ(x)
HH
from H H
HH
HH
HH
R∞
∞ u f (x + u) du
Rx R∞ f (x) R 0
f (x) − f (u) du f (u) du − ln f (u) du
0 x x
R∞ R∞
f (u) du f (u) du
x x
R∞
0
[1 − F (u)] du
F (x) x
F (x) F 0 (x) − 1 − F (x) − ln{1 − F (x)}
1 − F (x) 1 − F (x)
R∞
0
S(u) du
0 −S (x) x
S(x) −R (x) 1 − S(x) − − ln[S(x)]
S(x) S(x)
u
R∞ R
x x x exp − h(v) dv du
R R R Rx x
h(x) h(x) exp − h(u) du 1 − exp − h(u) du exp − h(u) du − h(u) du 0x
0 0 0 0
R
exp − h(u) du
0
R∞
exp{−H(u)} du
d {exp[−H(x)]} x
H(x) − 1 − exp{−H(x)} exp{−H(x)} H 0 (x) −
dx exp{−H(x)}
1 + µ0 (x) µ(0) µ(0) µ(x)
× µ(0) × 1− × × ln +
µ2 (x) µ(x) µ(x) 1 µ(0)
µ(x) x x x {1 + µ0 (x)} −
R 1 R 1 R 1 µ(x) Rx 1
× exp − du × exp − du × exp − du + du
0 µ(u) 0 µ(u) 0 µ(u) 0 µ(u)
1 The Hazard Rate and its Relatives
1.1 The Univariate Continuous Case 25
Next, the PDF is the derivative of the CDF, so we differentiate both side of (1.26b) with respect
to y, obtaining
d
fY (y) = FY (y)
dy
d
FX g −1 (y)]
=
dy
d −1
= fX g −1 (y)
g (y), (1.26d)
dy
using the chain rule of differentiation. Since g −1 (y) is simply x we can simply write
dx
fY (y) = fX g −1 (y)
. (1.26e)
dy
Furthermore, HR is the ratio of the PDF and the CCDF. Thus,
fY (y)
hY (y) =
SY (y)
dx
fX g −1 (y)
dy
=
SX g −1 (y)
dx
= hX g −1 (y)
. (1.26f)
dy
Finally, the CHR is the negative of the ln–transformed CCDF, see (1.11f). So we have
HY (y) = − ln SY (y)
= − ln SX g −1 (y)
= HX g −1 (y) .
(1.26g)
In the second case y = g(x) is a strictly decreasing function and the reasoning and the results
change a bit. In this case we see that if X is less than x, then Y will be greater than the value of
y which corresponds to the given value of x, conversely, if Y > y, then X < g −1 (y). In terms of
probabilities we have
Then
FY (y) = 1 − SY (y) = 1 − FX g −1 (y) = SX g −1 (y)
(1.27b)
and
d d
FY (y) = − FX g −1 (y)
fY (y) =
dy dy
dx
= −fX g −1 (y)
(1.27c)
dy
by the chain rule. Since x = g −1 (y) is a decreasing function the derivative dx dy in (1.27c) will
In general, the CHR of Y cannot be written in terms of the X–CHR, but in can be expressed in
terms of the X–CDF as
dx
x = g −1 (y) = y −1 , = −y −2 .
dy
From the representatives of the X–distribution in Example 1/4 and using (1.27a–e) we find
−1
SY (y) = 1 − e−y ,
−y −1
FY (y) = e ,
−2 −y −1
fY (y) = y e ,
−y −1
e y −2
hY (y) = y −2 −1 = ,
1 − e−y e−y−1 − 1
h −1
HY (y) = − ln 1 − e−y .
The distribution of Y is recognized as the type–II maximum extreme value distribution, also known as
inverse W EIBULL distribution.
1.1 The Univariate Continuous Case 27
We now explore two special transformations. The first one is the linear transformation:
y = g(x) = a + b x, b 6= 0. (1.28a)
where FX (x) is the increasing CDF of X. We note that x = g −1 (y) = FX−1 (y), 0 ≤ y ≤ 1, i.e.,
x is given by the percentile function of X. From (1.26b) we have
h i
FY (y) = FX g −1 (y) = FX FX−1 (y) = y,
(1.29b)
therefore
d
fY (y) =
FY (y) = 1. (1.29c)
dy
Thus, Y has the reduced uniform distribution with
SY (y) = 1 − y, (1.29e)
1
hY (y) = , (1.29f)
1−y
HY (y) = − ln(1 − y). (1.29g)
In some situations, units may not come from a homogeneous population. A demographer who is
to construct a nation’s life table might encounter several ethnic groups having different patterns of
mortality. A reliability engineer, for instance, might have a component that has been manufactured
in one of two facilities, but is not certain which one the unit comes from. In finite mixture
models, a unit is assumed to be from one of m populations. The case m = ∞ is called countable
mixture. When there is a single population that is mixed by a continuous parameter Θ (for
example, the amount of impurities present in a raw material or the temperature of solder applied
in a circuit board), a stochastic parameter model (= continuous mixture model) is appropriate.
Suppose that F (x | θ) represents the lifetime CDF given that Θ = θ and that G(θ) represents the
CDF of the random parameter Θ. The function F (x), defined by
Z
F (x) = F (x | θ dG(θ), (1.31a)
all θ
which is the marginal CDF of X, is called compound distribution of F (·) and G(·). F (x | θ) is
known as the kernel and G(·) is the mixing (or compounding) distribution. If the entire mass
of the corresponding measure of G(·) is confined to a countable number of points θ1 , θ2 , . . . and
the masses at θj ; j = 1, 2, . . . ; are G(θj ), then (1.31a) takes the form
∞
X
F (X) = F (x | θj ) G(θj ), (1.31b)
j=1
which is a countable mixture CDF.15 If the entire mass of the corresponding measure G(·) is
confined to only a finite number of finite points θ1 , θ2 , . . . , θm , then (1.31a) becomes a finite
mixture of m components whose CDF is given by
m
X
F (x) = F (x | θj ) G(θj ). (1.31c)
j=1
pj := G(θj ) and Fj (x := F (x | θj ),
14
Suggested reading for this section: A L –H USSAINI /S ULTAN (2001).
15
For example, the non–central χ2 –distribution is a countable mixture of P OISSON and χ2 –distributions.
1.1 The Univariate Continuous Case 29
Let the lifetime X have an exponential distribution with a positive parameter (= scaling factor) θ :
f (x | θ) = θ e−θ x , x ≥ 0.
Now suppose that θ > 0 is a realization of a random variable Θ which also has an exponential distribution,
but with parameter λ :
g(θ) = λ e−λ θ , λ > 0.
In this case the compound distribution of X is found using the integration by parts:
Z∞
f (x) = θ e−θ x λ e−λ θ dθ
0
Z∞
= λ θ e−θ (x+λ) dθ
0
∞
λ e−θ (x+λ)
Z
λ
= − + e−θ (x+λ) dθ
x+λ x+λ 0
30 1 The Hazard Rate and its Relatives
∞
λ θ e−θ (x+λ)
λ −θ (x+λ)
= − − e
x+λ (x + λ)2 0
λ
= , x ≥ 0.
(x + λ)2
This is recognized as a special case of the log–logistic distribution, see Sect. 3.1. The corresponding CDF
and CCDF are
λ
F (x) = 1 − ,
x+λ
λ
S(x) = .
x+λ
So the HR and CHR follow as
1
h(x) = ,
x+λ
x+λ
H(x) = ln .
λ
A MRL µ(x) does not exist. We notice that, while the HR of each f (x | θ) is a constant, namely h(x | θ) =
θ, the HR of the mixture is decreasing. This result holds for finite mixtures of exponential distributions,
see below.
It can be shown that in the case of a finite mixture the HR and the MRL of the compound dis-
tribution may be written in terms of the HRs hj (x) and of the MRLs µj (x) of the m mixed
distributions:
Pm
pj Sj (x) hj (x)
j=1
h(x) = m , (1.34a)
P
pj Sj (x)
j=1
m
P
pj Sj (x) µj (x)
j=1
µ(x) = m . (1.34b)
P
pj Sj (x)
j=1
Thus, the HR and MRL of the mixed model may be considered as a weighted average of the HRs
and MRLs of the individual populations, the weights being pj Sj (x). One interesting property of
a mixed exponential model is that it has a decreasing HR. Suppose, Xj ; j = 1, 2, . . . , m; are
exponentially distributed with scale parameter bj , bj > 0, respectively, then
m
P 1 x
pj exp −
f (x) j=1 bj bj
h(x) = = m , x ≥ 0.
S(x) P x
pj exp −
j=1 bj
It can P
be shown that this HR is a decreasing function, decreasing from
the average of the failure
m 1
rates, j=1 pj bj , at x = 0, to the minimum of the failure rates, 1 max(bj ), as x → ∞. This
suggests one possible justification for a decreasing HR model.
Up to now we have considered modeling lifetime of single units, components, people etc. by using
hazards and its relatives. However, it is especially true in the engineering sciences that pieces of
17
Suggested reading for this section: BARLOW /P ROSCHAN (1975), C ROWDER et al. (1991, Chapter 9), L EEMIS
(1995, Chapter 2), M EEKER /E SCOBAR (1998, Chapter 15), S MITH (2002, Chapter 3).
1.1 The Univariate Continuous Case 31
equipment consist of many — possibly different — interacting components. The term ‘reliability’
is commonly used to describe the ‘survival’ of such components and of such a system. Essentially,
the reliability of a component is the probability that it is operational. The primary concern of
engineers when looking at a system of components is its reliability and how the reliabilities of
individual components affect the reliability of the entire system. Once the the system reliability
has been found we can calculate the system hazard applying (1.8c). We will only give a short
introduction into the theory of reliability of systems; more details may be found in the suggested
readings.
It is certainly true that the reliability of components may change with time. However, initially we
make the assumption that at some instant in time we are able to observe the components and know
whether they are functioning or not. Let Ci ; i = 1, 2, . . . , m; denote component i and suppose
that each component has one of two operational states: ‘functioning’ and ‘not functioning’. For
each i the indicator zi associated with Ci is defined by
1 if Ci is functioning
zi = ; i = 1, 2, . . . , m. (1.35a)
0 if C is not functioning
i
A structure function is a useful tool in describing the way m components are related to form a
system. The structure function defines the system state as a function of the component states and
is given by
1 if the system is functioning
φ(z1 , . . . , zm ) = (1.35b)
0 if is not functioning.
Since there are m components there are 2m different values that the system state vector
z = (z1 , z2 , . . . , zm )
= min(z1 , z2 , . . . , zm ), (1.36b)
m
Y
= zi . (1.36c)
i=1
These three different ways of expressing the value of the structure function are equivalent, al-
though (1.36c) is preferred because of its compactness. The block–diagram in Fig. 1/3 visual-
izes a series system of m components.18 Systems that function only when all their components
function should be modeled as series systems.
18
A block–diagram is a graphic device for expressing the arrangement of the components to form a system. If
a path can be traced through functioning components from left to right on a block–diagram, then the system
functions. The boxes represent the components, and either component numbers i or probabilities Pi are placed
inside the boxes.
32 1 The Hazard Rate and its Relatives
A parallel system functions when one or more of its components function. Its structure function
φP (z) assumes the value 0 when z1 = z2 = . . . = 0, and 1 otherwise. Therefore,
0 if zi = 0 ∀ i,
φP (z) = (1.37a)
1 if there exists an i such that z = 1
i
= max(z1 , z2 , . . . , zm ), (1.37b)
m
Y
= 1 − (1 − zi ). (1.37c)
i=1
See Fig. 1/4 for a block–diagram of a parallel arrangement of m components. Such an arrange-
ment is appropriate when all components must fail for the system to fail.
Figure 1/4: Parallel system block–diagram
To avoid studying structure functions that are unreasonable, a subset of all possible system of m
components, that is, coherent systems, has been defined. A system is coherent if
and
2. there are no irrelevant components, i.e., component Ci is irrelevant if, for all states of the
other components in the system (that is, for all values of zj for j 6= i)
The structure function of a coherent system may be quite difficult to describe in simple terms.19
However, it can be shown that the structure function of any coherent system is bounded above and
19
Some techniques in this context are the formation of path vectors, minimal path vector, cut vectors, and minimal
cut vector.
1.1 The Univariate Continuous Case 33
below by the structure functions of parallel and series systems what inevitably leads to bounds on
the reliability of coherent systems, see (1.44c).
Theorem 7: If φC (z) is the structure function of a coherent system of m components in the state
vector z = (z1 , z2 , . . . , zm ), then
m
Y m
Y
φS (z) = zi ≤ φC (z) ≤ φP (z) = 1 − (1 − zi ). (1.39)
i=1 i=1
Series and parallel systems are coherent systems. Before showing how the structure function is
related to the system reliability we present some other types of coherent systems.
if the components are independent. The structure function is the product of the struc-
ture functions of the k parallel subsystems each consisting of r components. Fig. 1/6
shows such a 2 × 2 structure where (1.40b) results into
φ(z) = 1 − (1 − z11 ) (1 − z12 ) 1 − (1 − z21 ) (1 − z22 ) ,
and there are nine system state vectors — out of 16 possible vectors — leading to
φ(z) = 1.
Figure 1/6: Block–diagram of a 2 × 2 series–parallel system with
component–level redundancy
2. k–out–of–m system
Another way of increasing system reliability consists in supplying more components than
are necessary for functioning. By a k–out–of–m system (k ≤ m) we mean a system of m
components which will function provided at least k of its components are functioning. This
means that the structure function is
k
P
1 if zi ≥ k,
φ(z) = i=1 (1.41a)
Pk
0 if
zi < k.
i=1
Fig. 1/7 shows the block–diagram of a 2–out–of–3 system which looks like that of a series–
parallel system with system–level redundancy. Note that this diagram does not reflect the
physical layout, but rather the paths through the system that will allow operation of the
system.
Figure 1/7: Block–diagram of a 2–out–of–3 system
3. Bridge system
Bridge–structure systems provide another useful way of improving the reliability of certain
systems. Fig. 1/8 illustrates a simple bridge system where component 3 is the bridge.
If component 3 is working (not working), this system has the same structure as Fig. 1/6
(Fig. 1/5).
Figure 1/8: Block–diagram of a bridge system
A moment’s reflection on the diagram in Fig. 1/8 reveals that the system functions if any
one of the following sets of components functions:
These sets are referred to as minimal path sets. Since one or more of these sets of com-
ponents must function for the system to function, the block–diagram may be written as
a parallel arrangement of these sets, each set being a series arrangement of its members.
Thus, the structure function corresponding to Fig. 1/8 is
φ(z) = 1 − (1 − z1 z2 ) (1 − z1 z3 z5 ) (1 − z2 z5 ) (1 − z2 z3 z4 ). (1.42)
These m variates can be written as a random state vector Z. The probability that Ci is functioning
at a certain time is given by
Pi = Pr(Zi = 1). (1.43a)
These m values can be written as a reliability vector:
P = (P1 , P2 , . . . , Pm ). (1.43b)
For instance, applying (1.43d) to the structure function φS (·) of a series system — see (1.36c) —
we find
RS (P ) = E φS (Z)
"m #
Y
= E Zi
i=1
m
Y
= E Zi , because of independence
i=1
m
Y
= Pi . (1.44a)
i=1
We can state the following rule to find the reliability of a coherent system with independent
components:
Applying this rule to (1.39) we can state that for any coherent system with structure function
φC (Z), Z = (Z1 , Z2 , . . . , Zm ), its reliability R(P ) is bounded above and below by the relia-
bilities of a series system and a parallel system, each having the same components:
m
Y m
Y
RS (P ) = Pi ≤ R(P ) ≤ RP (P ) ≤ 1 − (1 − Pi ). (1.44c)
i=1 i=1
This inequality is not especially sharp. For instance, having m = 5 identical components acting
independently with P1 = P2 = . . . = P5 = 0.9 we find
In order to introduce time dependency into the reliability function we have to substitute Pi by the
survival function Si (x). Let
S(x) = S1 (x), S2 (x), . . . , Sm (x) ,
1.1 The Univariate Continuous Case 37
then the time dependent system reliability function is denoted R S(x) and the system hazard
rate will be
dR S(x) dx
h(x) = − . (1.45)
R S(x)
In general, (1.45) will not result into a handsome formula,
even when we assume identical com-
ponents so that Si (x) = S(x) ∀ i. In most cases dR S(x) dx has to be determined by numerical
differentiation. Therefore, we only take a look at the hazard rate functions of the two most simple
systems, i.e., the series and the parallel systems.
The reliability function of a series system is
m
Y
RS S(x) = Si (x). (1.46a)
i=1
Now, by x
Z
Si (x) = exp − hi (u) du , (1.46b)
0
see (1.9b), where hi (.) is the HR of component Ci , we first have
X m Zx
RS S(x) = exp − hi (u) du (1.46c)
i=1 0
must be the hazard rate of the series system which is the sum of the component hazard rates.
Thus, the series system hazard rate is the higher the more components are linked together. This is
in accordance with the fact that the series system reliability is a decreasing function of the number
of components. When the components have identical lifetime distributions we have hi (x) =
h(x) ∀ i and (1.46e) turns into
hS (x) = m h(x). (1.46f)
Because of (1.36b) this is the hazard rate of the minimum order statistic.
The reliability function of a parallel system is
m
Y
RP S(x) = 1 − Fi (x). (1.47a)
i=1
which — because of (1.37b) — is nothing but the hazard rate of the maximum order statistic.
Let
f (x)
h(x) =
1 − F (x)
be the hazard rate of a component, then (1.47c) can be transformed into
m−1
m F (x) 0
hP (x) = m−1
h(x), F (x) = 1, (1.47d)
P i
F (x)
i=0
m
= m−1
h(x). (1.47e)
P −i
F (x)
i=0
(1.47e) follows from (1.47d) when dividing the numerator and the denominator on the right–hand
m−1 . m−1P −i
side by F (x) . The factor m F (x) goes to 0 as x → 0, and it goes to 1 as
1=0
x → ∞, thus the hazard rate of the parallel system is always less than the hazard rate of an
individual component. This is in accordance with the fact that the reliability of a parallel system
is always higher than that of an individual component.
The accelerated life model and the proportional hazards model are designed to include a vector
z of covariates (= explanatory variables) zi ; i = 1, . . . , k; in a lifetime model. zi influences
the lifetime X of the unit under study, and the zi are non–random. Covariates may account
for the fact that the population of units is not truly homogeneous. Other possibilities for the
elements of z include cumulative load applied, time–varying stress, and environmental factors.
The difference between accelerated life models and proportional hazards models is that in the
first case the covariates affect the rate at which the unit ages, and in the second case the covariates
increase or decrease the hazard rate. So, in accelerated life models the survival function has to be
modeled and in proportional hazards models the hazard rate has to be modified.
We first give a short introduction to accelerated life models. The question here is how to link the
covariates to the survival function. One approach is to define one lifetime model when z = 0,
called the baseline model and other models for z 6= 0. Analysis is simplified when there is only
a single model appropriate for all values of z. The survivor function of X in the accelerated life
model is
S(x) = S0 x ψ(z) , x ≥ 0, (1.48a)
where S0 (·) is a baseline survival function and ψ(z) is a link function. The covariates are
linked to the lifetime by ψ(z), satisfying
With these attributes of ψ(z), z = 0 implies S0 (x) = S(x). A very popular choice for ψ(z) is
the log–linear link function
ψ(z) = exp β 0 z .
(1.48c)
The vector β represents regression coefficients
and z is a vector of non–random regressors. With
(1.48c) the covariates accelerate β 0 z > 0 or decelerate β 0 z < 0 the rate at which a unit
21
Suggested reading for ‘acceleration’: N ELSON (1990), and for ‘proportional hazards’: C OX /OAKES (1984).
1.1 The Univariate Continuous Case 39
moves through time with respect to the baseline case. Other, less popular choices for the link
function are −1
ψ(z) = β 0 z and ψ(z) = β 0 z .
With these two specifications it may happen that ψ(z) < 0 for some value of β resulting into a
negative lifetime. The other
lifetime distribution representatives for an accelerated lifetime model
with S(x) = S0 x ψ(z) are
F (x) = 1 − S(x) = S0 x ψ(z) , (1.48d)
f (x) = ψ(z) f0 x ψ(z) , (1.48e)
f (x) f0 x ψ(z)
h(x) = = ψ(z) = ψ(z) h0 x ψ(z) , (1.48f)
S(x) S0 x ψ(z)
n o
H(x) = − ln S0 x ψ(z) = H0 x ψ(z) . (1.48g)
We recognize that these formulas resemble those of the variable of the variable transformation of
Sect. 1.1.2.1.
Whereas accelerated lifetime models modify the rate that the unit moves through time, propor-
tional hazard models modify the hazard rate by the factor ψ(z) :
h0 (x) is called the baseline hazard, representing the hazard rate for a unit having ψ(z) = 1. As
before, a popular choice for the link function here is the log–linear form (1.48c), and the hazard
rate increases when β 0 z > 0 and decreases when β 0 z < 0. The ‘proportional’ terminology arises
in a perfectly natural way. If two units 1 and 2 have lifetimes depending on respective vectors of
covariate values z 1 and z 2 , then
h1 (x) h0 (x) ψ(z 1 ) ψ(z 1 )
= = ,
h2 (x) h0 (x) ψ(z 2 ) ψ(z 2 )
showing clearly how the baseline hazards cancel from this ratio, so that the hazard ratio for the
two units does not depend on lifetime x. The other lifetime distribution representatives can be
determined from (1.49a):
Zx
H(x) = h(u) du
0
Zx
= ψ(z) h0 (u) du
0
= ψ(z) H0 (x), (1.49b)
x
Z
S(x) = exp − ψ(z) h0 (u) du
0
Zx
= exp −ψ(z) h0 (u) du
0
x ψ(z)
Z
= exp− h0 (u) du
0
ψ(z)
= S0 (x) , (1.49c)
40 1 The Hazard Rate and its Relatives
ψ(z)
F (x) = 1 − S0 (x) , (1.49d)
ψ(z)−1
f (x) = f0 (x) ψ(z) S0 (x) . (1.49e)
Example 1/7: Accelerated life model and proportional hazards model for W EIBULL baseline
The W EIBULL baseline survival function for the accelerated life model is
h x c i
S0 (x) = exp − ; x ≥ 0; b, c > 0,
b
where b is a scale parameter and c is a shape parameter. Introducing a link function ψ(z) into S0 (·)
according to (1.48a) gives
SA (x) = S0 x ψ(z)
c
x ψ(z)
= exp − . (1.50a)
b
Thus, the accelerated
lifetime has a W EIBULL distribution as well, but the scale parameter has changed
from b to bA = b ψ(z). The hazard rate belonging to (1.50a) is
hA (x) = ψ(z) h0 (x)
c−1
c x ψ(z)
= ψ(z)
b b
c−1
c x ψ(z)
= . (1.50b)
b ψ(z) b
Truncation and censoring are two operations which often are confused in statistical literature, but
there is a clear distinction between these two concepts. Truncation is confined to the distribution
22
Suggested reading for this section: C OHEN (1991).
1.1 The Univariate Continuous Case 41
or to the population whereas censoring is related to samples. So we talk about truncated distri-
butions and censored samples, and we may have a censored sample from a truncated distribution.
Truncation means that the original or natural support of a distribution has been shrunken so that
the portion of the population in the truncated part can never be observed and a certain part of the
original probability mass will be cut off. Thus, truncation modifies the distribution and leads to
conditional distributions. Censoring, either from an unmodified distribution or from a truncated
distribution, modifies the selection of the random variables and is thus related to the sampling
process. Censoring means that — for one reason or the other — the statistician refrains from
measuring the exact value of a unit’s characteristic when this value falls inside a certain area,
e.g., is greater or smaller than a given threshold. Censored samples produce two types of obser-
vations, those with known and complete value of the characteristic under study, and those which
fall into a special region of the characteristic and are thus only known by their frequency and not
by their value. A censored observation is distinct from a missing observation in that the order of
the censored observation relative to some of the uncensored observations is known and conveys
information regarding the distribution being sampled.
There are three common types of truncation:
• double truncation, also known as truncation on both sides or truncation from below
and above.
Regarding censoring we have substantial more types which are fully described and discussed in
R INNE (2009, p. 291–312) and in Sect. 4. Truncation of a lifetime distribution leads to different
types of lifetime:
• Left truncation gives future lifetime or remaining lifetime, i.e., lifetime greater than xl , the
lower point of truncation. Left truncation may be realized by burn–in or by preselecting
of apparently weak looking units (freaks).
• Right truncation gives early lifetime or young lifetime, i.e., lifetime less than xu , the upper
point of truncation. Right truncation will be met with when for economic or safety reasons
items will be in operation for at most xu units of time.
• Double truncation gives interim lifetime, i.e., lifetime which is less than xu , but greater
than xl and thus is in the interval [xl , xu ]. We may think of interim lifetime as either future
lifetime truncated on the right or as early lifetime truncated on the left.
We first present results for the future lifetime which has already been introduced in Sect. 1.1.1.4
in conjunction with the introduction of the hazard rate. Future lifetime beyond xl will be denoted
X − xl | X ≥ xl := Y | xl
and its distributional representatives — expressed in terms of the original distribution —are
f (xl + y) f (xl + y)
fl (y | xl ) = = , xl ≥ 0, y ≥ 0, (1.52a)
1 − F (xl ) S(xl )
F (xl + y) − F (xl ) S(xl ) − S(xl + y)
Fl (y | xl ) = = , (1.52b)
1 − F (xl ) S(xl )
S(xl + y) 1 − F (xl + y)
Sl (y | xl ) = = , (1.52c)
S(xl ) 1 − F (xl )
42 1 The Hazard Rate and its Relatives
fl (y | xl ) f (xl + y)
hl (y | xl ) = = = h(xl + y). (1.52d)
Sl (y | xl ) S(xl + y)
The hazard rate for the future lifetime Y of a distribution truncated on the left at xl is identical
to that of the original distribution at x = xl + y, i.e., the courses of these two hazard rates only
differ by a translation.
Zy
Hl (y | xl ) = hl (u | xl ) du,
0
Zy
= h(xl + u) du,
0
xZl +y
= h(v) dv, v = xl + u,
xl
= H(xl + y) − H(xl ). (1.52e)
which — in Sect. 1.1.1.e — was denoted µ(xl ) and called the MRL of X. The MRL of the future
lifetime — after some manipulations — is
Z∞
1
µl (y) = S(xl + v) dv = µ(xl + y), (1.52h)
S(xl + y)
y
i.e., the MRL of the left–truncated variable, truncated at xl , after y units of time is the same as
that of the original variable at age xl + y. Like the HR the MRL is translated. The percentile
function of Y | xl follows from
F (xl + yP ) − F (xl )
F yP | xl = = P, 0 ≤ P ≤ 1,
1 − F (xl )
as
yP = F −1 p + (1 − P ) F (xl ) − xl ,
(1.52i)
i.e., yP is equal to the percentile of order P + (1 − P ) F (xl ) of the original distribution, but
reduced by the value of the truncation point xl . Sometimes (particularly, when the distribution is
highly skewed), the median is preferred to the mean, in which case, the quantity ‘median residual
life’ at xl is preferred to the mean residual lifetime. The median residual lifetime at xl is the
length of the interval from xl to that time where one–half of the units alive at xl will still be alive.
Now we give results for the early lifetime, denoted
xu − X | X ≤ xu =: Y | xu .
1.1 The Univariate Continuous Case 43
f (y) f (y
fu (y | xl ) = = , xu > 0, 0 ≤ y ≤ xu , (1.53a)
F (xu ) 1 − S(xu
F (y) 1 − S(y)
Fu (y | xl ) = = , (1.53b)
F (xu ) 1 − S(xu )
S(y) − S(xu ) F (xu ) − F (y)
Su (y | xl ) = = , (1.53c)
1 − S(xu ) F (xu )
fu (y | xu ) f (y) S(y)
hu (y | xl ) = = = h(y) . (1.53d)
Su (y | xu ) S(y) − S(xu ) S(y) − S(xu )
Since S(y) > S(y) − S(xu ) for y < xu the hazard rate of the right–truncated distribution is
greater than that of the original distribution. (1.53d) shows that truncation from below markedly
affects the course of the HR, whereas truncation from above only shifts the HR, see (1.52d) and
Fig. 1/9. Since the HR at y is conditional on survival to y, truncation below y is immaterial.
However, truncation above y has an effect on the HR at y, since the time interval remaining for
failing is shortened. It should be clear that as y → xu from below, hu (y | xu ) becomes indefinitely
large since the interval for failing approaches zero.
Hu (y | xu ) = − ln Su (y | xu ) = ln F (xu ) − ln S(y) − S(xu ) . (1.53e)
For the mean of Y | xu we find
Zxu
E(Y | xu ) = Su (v | xu ) dv
0
x
1 Z u
= S(v) dv − xu S(xu ) , (1.53f)
1 − S(xu )
0
We notice that µu (y) approaches zero with x → xu . The percentile function of Y | xu follows
from
F (yP )
Fu (yP | xu ) = = P, 0 ≤ P ≤ 1,
F (xu )
as
yP = F −1 P F (xu )
(1.53h)
i.e., yP is equal to the percentile of order P F (xu ) of the original distribution.
The results for the interim lifetime
(xu − xl ) − (xu − X) xl ≤ X ≤ xu = X − xl | xl ≤ X ≤ xu =: Y | xl ; xu
are the following
f (xl + y)
fl,u (y | xl ; xu ) =
F (xu ) − f (xl )
f (xl + y)
= , 0 ≤ xl < xu , 0 ≤ y ≤ xu − xl , (1.54a)
S(xl ) − S(xu )
44 1 The Hazard Rate and its Relatives
F (xl + yP ) − F (xl )
Fl,u (yP | xl ; xu ) = =P
F (xu ) − F (xl )
as
yP = F −1 P [F (xu ) − F (xl )] + F (xl ) − xl .
(1.54h)
We notice that
• by setting xu = ∞ the formulas (1.54a–h) turn into those for the case of the left–truncation,
i.e., into formulas (1.52a–i),
• by setting xl = 0 the formulas (1.54a–h) turn into those for the case of right–truncation i.e.,
into formulas (1.53a–h) and
The reduced R AYLEIGH distribution, equivalent to the reduced W EIBULL distribution with shape parame-
ter equal to 2, has
f (x) = 2 x exp − x2 , x ≥ 0,
F (x) = 1 − exp − x2 ,
S(x) = exp − x2 ,
1.1 The Univariate Continuous Case 45
h(x) = 2 x,
H(x) = x2 ,
k
E Xk
= Γ 1+ ,
2
√
3 π
E(X) = Γ = ≈ 0.88623,
2 2
1√
π exp x2 erfc(x).23
µ(x) =
2
Truncation from below at xl gives
fl (y | xl ) = 2 (xl + y) exp − y (2 xl + y) ; xl > 0, y 0;
Fl (y | xl ) = 1 − exp − y (2 xl + y) ,
Sl (y | xl ) = exp − y (2 xl + y) ,
hl (y | xl ) = 2 (xl + y),
Hl (y | xl ) = y (2 xl + y 2 ),
1√
π exp x2l erfc(xl ),
E(Y | xl ) =
2
1√
π exp (xl + y)2 erfc(xl + y),
µl (y) = µ(xl + y) =
q 2
yP = x2l − ln(1 − P ) − xl .
23
Rx
erf(x) = 1 − √2π exp − u2 du is the error function and erfc= 1−erf the complementary error function.
0
46 1 The Hazard Rate and its Relatives
2 (xl + xu )
hl,u (y | xl ; xu ) = ,
1 − exp (xl + y)2 − x2u
( )
exp − (xl + y)2 − exp − x2u
Hl,u (y | xl ; xu ) = − ln ,
exp − x2l − exp − x2u
√
exp x2l 2 (xu − xl ) + exp x2u π erf(xl ) − erf(xu )
E(Y | xl ; xu ) = ,
2 exp x2l − exp x2u
2 √
exp xl + y 2 (xl − xu − y) + exp x2u π erf(xl + y) − erf(2 xl − xu )
µl,u (y) = 2 ,
2 exp xl + y − exp x2u
q
yP = − ln P exp − x2u + (1 − P ) exp − x2l − xl .
Fig. 1/9 shows the hazard rates for the original distribution together with those of the three types of trun-
cation where the truncation points are xl = 1 and xu = 3, respectively.
Figure 1/9: Hazard rates of the reduced R AYLEIGH distribution, non–truncated and truncated at xl = 1
and/or xu = 3
The integral
Z∞
Π(x) = S(u) du (1.55a)
x
= µ(x) S(x) (1.55b)
is the life potential, i.e., the total number of expected time units to be spend by the fraction of
units in the population which survive the age x. Let N be the number of persons at age x = 0,
i.e., N is the size of the population, then in life tables and in the actuarial sciences the product
N × Π(x) is known as ’the expected total number of years lived beyond age x by persons alive
at age x’. There, this quantity is denoted Tx and is measured in ‘population units × time units’,
1.1 The Univariate Continuous Case 47
i.e., person–years. When the population is mechanical equipment the quantity N × Π(x) will be
called or machine–hours. In the engineering sciences the complement of Π(x) to µ = E(X) :
Zx
µ − Π(x) = S(u) du (1.55c)
0
tells what fraction of the original life potential Π(0) = µ is still available at age x. Whereas
S(x) says what fraction of the initial size of population units has survived the age x, S(x)
e tells
what fraction of the initial number of lifetime units has not been ‘consumed’ up to x. S(x) e can
be regarded as a survival function, but not as one of population units, but as one of lifetime units.
Thus, we may call S(x)
e the potential survival function. Looking at (1.56a) we see that
S(x)
e ≥ S(x) for increasing MRL
and
S(x)
e ≤ S(x) for decreasing MRL.
S(x)
e is — like S(x) — a monotone and decreasing function with
S(0)
e = 1 and S(∞)
e = 0.
Besides S(x)
e we can define the following representatives of the life potential distribution:
Example 1/9: Life potential of the exponential and the reduced R AYLEIGH distributions
The exponential distribution with f (x) = 1b exp − xb is the only continuous distribution where S(x) =
S(x)
e ∀ x ≥ 0. This is a consequence of its constant MRL: µ(x) = b ∀ x ≥ 0.
For the reduced R AYLEIGH distribution of Example 1/8 we have
1√ Π(x) 2 exp − x2
Π(x) = π erfc(x), x ≥ 0, S(x)
e = h(x) = √
= erfc(x), e .
2 E(X) π erfc(x)
From Fig. 1/10 we see that S(x)
e < S(x), thus, the stock of lifetime units decreases faster than the stock of
h(x) > h(x), but both rates approach to one another with x → ∞.
population units. We further see that e
25
In general, this natural rate of depreciation will be different from the rate of depreciation applied in accounting
because there we have to allow for economic and fiscal reasons.
1.2 The Univariate Discrete Case 49
Often, we have P0 = 0, but, for generality, we assume that P0 is not necessarily equal to zero.
The case P0 6= 0 corresponds to a non–zero portion of dud units in a reliability context or to a
non–zero probability that a fetus dies at birth or to an egg failing to hatch in a biostatistics context.
Different representatives for such discrete lifetime distributions will be described next.
The most basic representative of a discrete lifetime distribution is its PMF
Pr(X = i) = Pi ; i = 0, 1, 2, . . . ; (1.57)
where X is the random time to failure or to death and i is the observed number of time units to
this event.
The survival or reliability function is
The reliability function is a unique function of the probabilities Pi , and similarly the Pi are deter-
mined uniquely by the Si :
Pi = Si − Si+1 ; i = 0, 1, 2, . . . (1.58b)
26
Suggested reading for the section: K EMP (2004), L AI (2013), S ALVIA /B OLLINGER (1982), S HAKED et al.
(1995).
27
This definition is different from that of continuous variate, see (1.4a).
50 1 The Hazard Rate and its Relatives
The distribution function or failure function is the complement to the survival function
0 ≤ hi ≤ 1,
whereas in the continuous case we have h(x) ≥ 0. We may write hi in terms of Pi and/or Si :
P
hi = Pi , (1.60b)
Pj
j≥i
Pi
= , (1.60c)
Si
Si+1
= 1− . (1.60d)
Si
Conversely, we may express Pi and Si in terms of hi . Because S0 = 1 we may write Si for
i = 1, 2, . . . as the telescope product
Si Si−1 S 2 S1
Si = ··· . (1.61a)
Si−1 Si−2 S 1 S0
From (1.60d) we have
Si+1
1 − hi = ; i = 0, 1, 2, . . . (1.61b)
Si
Combining (1.61a) and (1.61b) we find29
i−1
Y
Si = (1 − hj ); i = 0, 1, 2, . . . (1.61c)
j=0
In the continuous case we have with respect to the cumulative hazard function
Zx Zx
f (u)
H(x) = h(u) du = du
S(u)
0 0
= − ln S(x).
28
Some authors define the discrete hazard rate as Pr(X = i) Pr(X > i).
k
29 Q
Remember: ai = 1 for k < j.
i=j
1.2 The Univariate Discrete Case 51
In the discrete case we have a dilemma when defining the cumulative hazard function, because, in
general, summing the hazard rate values hi — as analogue to integrating h(x) in the continuous
case — is not equal to taking the negative of the logarithm of the survival function Si . Thus,
two possible, but different choices for the discrete case exist, which give rise to two estimators in
Sect. 6.2:
1 Hi := − ln Si , (1.62a)
called cumulative hazard function by K EMP (2004), and
i
X
2 Hi := hj , (1.62b)
j=0
The pseudo–hazard rate h∗i and the hazard rate hi are linked as
C OX /OAKES (1984, p. 15) prefer to define the cumulative hazard rate for discrete lifetimes as
X
H(x) = ln[1 − h(xj )],
xj <x
because the relationship S(x) = exp[−H(x)] will be presumed for discrete lifetimes. If the h(xj ) are
small, we have for the C OX /OAKES definition
X
H(x) ≈ h(xj ).
xj <x
1 Hi and 2 Hi are linked as follows: From (1.62a) using (1.58a) and (1.61c) we find
X i−1
Y i−1 h
X i
1 Hi = − ln Pi = − ln (1 − hj ) = − ln 1 − 2 Hj + 2 Hj−1 . (1.63a)
j≥i j=0 j=0
For a discrete distribution with increasing hazard rate the condition h0 ≤ h1 ≤ . . . applied to
(1.61c) yields the inequality
Si ≤ (1 − h0 )i = (1 − P0 )i ≈ exp(−i h0 ) = exp(−i P0 ). (1.66)
This inequality is reversed for a decreasing hazard rate distribution.
The mean residual life function is defined by K ALBFLEISCH /P RENTICE (1980, p. 7), L AWLESS
(1982, p. 44) and L EEMIS (1995, p. 57) as
Li := E(X − i | X ≥ i), i ≥ 0. (1.67a)
Therefore, in the discrete case we have30
P
j Pj
j≥i
Li = P − i, (1.67b)
Pj
j≥i
P
Sj
j>i
= , (1.67c)
Si
XY j
= (1 − hk ) (1.67d)
j≥i k=i
.P
30 P
In (1.67b) the term j Pj Pj is the mean age at death of an i–survivor.
j≥i j≥i
1.2 The Univariate Discrete Case 53
X
Li = exp 1 Hi − 1 Hj , (1.67e)
j>i
j
XY
= 1 − 2 Hk + 2 Hk−1 . (1.67f)
j≥i k=i
From (1.67b) we see that, since the MRL function is defined for all i ≥ 0, and everything in this
expression except i is constant between mass function values, MRL decreases with a slope of −1
at all time values for which there is no mass.
Reverting to (1.66), which
P holds for an increasing hazard rate distribution, we find — see
(1.72a) — from E(X) = Sj the inequality
j>0
1 − h0 1 − P0
E(X) ≤ = . (1.68)
h0 P0
This inequality is reversed for a decreasing hazard rate distribution.
What can be said about L = limi→∞ Li ? — Let
Pi+1 Si
h = lim hi , P = lim , S = lim ,
i→∞ i→∞ Pi i→∞ Si+1
• L = (S − 1)−1 ,
• P = S −1 ,
• h = (L + 1)−1 .
For h = 0 we find
S = P = 1 and L = ∞,
and for h = 1 we have
P = L = 0 and S = ∞.
These extremes do in fact occur. For example, the Y ULE distribution
ρ Γ(ρ + 1) Γ(i)
Pi = ρ B(i, ρ + 1) = ; i = 1, 2, . . . ; ρ ∈ R+ ;
Γ(i + ρ + 1)
has h = 0, and the P OISSON distribution
λi −λ
Pi = e ; i = 0, 1, . . .
i!
has h = 1, see Example 1/10.
The discrete lifetime distribution representatives Pi , Si , hi , 1 Hi , and 2 Hi may be expressed in
terms of Li . From (1.67c) we have
Li Si = Si+1 + Si+2 + . . .
and furthermore
Li−1 Si−1 = Si + Si+1 + . . .
with difference Li−1 Si−1 − Li Si = Si . Solving for Si gives
Li−1
Si = Si−1 , (1.69a)
1 + Li
54 1 The Hazard Rate and its Relatives
which upon substituting Si−1 by Si−2 Li−2 (1 + Li−1 ), and so forth until S0 L0 (1 − L1 ) =
L0 (1 − L1 ), results into
i−1
Y Lj
Si = . (1.69b)
1 + Lj+1
j=0
In the continuous case it does not matter whether we define MRL by E(X − x | X > x) or by
E(X − x | X ≥ x), but in the discrete case there is a difference. MRL defined as
With (1.67a) the MRL function at time i = 0 is the mean of the lifetime distribution:
X X
L0 = j Pj = Sj = E(X) (1.72a)
j≥0 j>0
whilst
E(X)
µ(0) = L1 + 1 = . (1.72b)
1 − P0
For this reason Li should be preferred over µ(i).
In Tab. 1/2 we have collected all the relations between the six representatives of a discrete lifetime
distribution.
Table 1/2: Relations among the six functions describing a discrete stochastic lifetime
l
ll
ll to
l
l Pi Si hi 1 Hi 2 Hi Li
l
froml
l l
ll
P
! i j Pj
P P P X P j≥i
Pi − Pj Pi − ln Pj Pj P −i
j≥i Pj j≥i j=0
Pk Pj
1.2 The Univariate Discrete Case
i−1
Q i−1
Q i−1
P i
P j
P Q
hi hi (1 − hj ) (1 − hj ) − − ln(1 − hj ) hj (1 − hk )
j=0 j=0 j=0 j=0 j≥i k=i
exp − 1 Hi − i−1
P h i P
1 Hi exp − 1 Hi 1 − exp 1 Hi − 1 Hi+1 − − 1 − exp 1 Hj − 1 Hj+1 exp 1 Hi − 1 Hj
j=0 j>i
− exp − 1 Hi+1
2 Hi − 2 Hi−1 × i−1
Q i−1
P j
P Q
2 Hi i−1 1 − 2 Hj + 2 Hj−1 2 Hi − 2 Hi−1 − ln 1 − 2 Hj + 2 Hj−1 − (1 − 2 Hk + 2 Hk−1
j=0 j=0 j≥i k=i
Q
× 1 − 2 Hj + 2 Hj−1
j=0
Li
1− × i−1 i−1 i
1 + Li+1 Y Lj Li X Lj X Lj
Li i−1 1− − ln (i + 1) − −
Y Lj 1 + Lj+1 1 + Li+1 1 + Lj+1 1 + Lj+1
× j=0 j=0 j=0
j=0
1 + Lj+1
55
56 1 The Hazard Rate and its Relatives
λi −λ
Pi = e ; i = 0, 1, . . . ; λ > 0. (1.73a)
i!
Applying (1.60b) to (1.73a) we find
−1
λ2
λ
hi = 1+ + + ... ; i = 0, 1, . . . , (1.73b)
i+1 i+2
so
lim hi = 1 (1.73c)
i→∞
and
e−λ ≤ hi ≤ 1. (1.73d)
Fig. 1/11 shows the probabilities Pi , the hazard rate values hi and the mean residual life values Li for
λ = 5. We have an increasing HR and a decreasing MRL, a result which does not depend on the value of
λ. For computing MRL we have used the recursion (1.70) starting with L0 = E(X) = λ.
Excursus: Results for a distribution having both discrete and continuous components
A lifetime distribution may have both discrete and continuous components, e.g., a device has a non–zero
probability of failing on switching on and/or off whereas otherwise the chance of failing has a continuous
density. In the mixed case the density is a sum of discrete and continuous parts. Specifically, if hc (x)
denotes the hazard rate for the continuous part and mass points occur at xi ; i = 0, 1, . . . ; then the overall
survivor function can be written by the so–called ‘pseudo–integral’ as
x
Z i−1
Y
S(x) = exp− hc (u) du (1 − hj ). (1.74a)
0 j=0
1.2 The Univariate Discrete Case 57
The corresponding HR is X
h(x) dx = hc (x) dx + hj δ(x − xj ) (1.74b)
j
Otherwise, the PMF first increases and then decreases. For α = 1 we have the original model of
S ALVIA /B OLLINGER, and for α = 0 we are back to the geometric distribution, but with hazard
rate hi = 1 − c ∀ i.
There are several discrete models which — depending on the value of a certain parameter — have
a constant, an increasing or a decreasing hazard rate, respectively, like the continuous W EIBULL
c x c−1
distribution with hazard rate h(x) = ; x ≥ 0, b > 0, c > 0; see R INNE (2009). The
b b
type–I discrete W EIBULL model of NAKAGAWA /O SAKI (1975) has
β −iβ
hi = 1 − q (i+1) ; i = 0, 1, . . . ; 0 < q < 1; β > 0; (1.78a)
with corresponding
β β
Pi = q i − q (i+1) , (1.78b)
iβ
Si = q . (1.78c)
The parameter β plays the same role as the shape parameter c in the continuous W EIBLL distri-
bution.
The type–II discrete W EIBULL distribution of S TEIN /DATTERO (1984) directly mimics the
c x c−1
continuous hazard rate by
b b
α iβ−1 for i = 1, 2, . . .
hi = , α > 0, β > 0. (1.79a)
0 for i = 0 and i > m
i−1
Y
Pi = α iβ−1 1 − α j β−1 ; i = 1, 2, . . . (1.79c)
j=1
i−1
Y
Si = 1 − α j β−1 . (1.79d)
j=1
with corresponding
i
X n h io
Pi = exp−d j β 1 − exp −d (i + 1)β , (1.80b)
j=1
i
X
Si = exp−d jβ . (1.80c)
j=1
2. The hazard rate concept and the mean residual life concept are somewhat difficult to extend
to the multivariate situation.
3. A third difficulty is that often the sample is not big enough in relation to the dimension of
the model in order to find ‘good’ estimates of the model and its parameters. Furthermore,
many data are censored in such a way that one cannot determine whether or not the variates
are independent.
Sometimes two or more lifetime variables X1 , . . . , Xm are of interest simultaneously and a mul-
tivariate model is required. For example, a device may have two or more integral parts and it may
be desired to model the joint lifetime distribution of these parts. Let
X = (X1 , . . . , Xm )0 ; m = 2, 3, . . .
x = (x1 , . . . , xm )0
a vector of its realizations. Then, a multivariate distribution can be specified either in terms of the
joint survival function
∂ m F (x) ∂ m S(x)
f (x) := =− (1.81c)
∂x1 . . . ∂xm ∂x1 . . . ∂xm
where f (·), S(·), h(·) are the marginal functions, respectively. (1.82a) may easily be extended
to the case of more than two variates.
Some authors like J OHNSON /KOTZ (1975) take the point of view that, for a concept such as
‘multivariate hazard rate’, it is unreasonable to expect a single value to represent this aspect of
a multivariate distribution. The basic idea underlying the univariate definition is that of rate of
decrease of ‘survivors’ with increase in value x of X as, e.g., in a life table where the hazard rate
is in fact the force of mortality. When there are two or more variates this rate depends on which
variate is changed and we need a different rate for each variate. So, J OHNSON /KOTZ defined the
joint multivariate hazard rate of m absolutely continuous variables X1 , . . . , Xm as the vector
hX (x) := − (∂/∂x1 ), . . . , −(∂/∂xm ) ln S(x)
= −grad ln S(x). (1.83a)
Sometimes hX (x) is called the hazard gradient of X. For convenience we will write a compo-
nent of the vector hX (x) as
∂
hi (x) := − ln S(x); i = 1, . . . , m. (1.83b)
∂xi
(1.83a,b) are motivated by the fact that in the univariate case we have
dH(x) d ln S(x)
h(x) = =− ,
dx dx
see (1.11a,f).
If the multivariate hazard rate (1.83a) is constant, i.e., does not vary with any of x1 , . . . ,xm , so
that hX (x) = c, this means that, whenever the hazard rate exists, we have ∂ ln S(x) ∂xi =
62 1 The Hazard Rate and its Relatives
Thus, the Xi are mutually independent exponential variables if and only if the multivariate
hazard
rate
is constant. We may distinguish between strictly constant vector hazard rates hX (x) =
c as defined above, and locally constant vector hazard rates for which hi (x) does not depend
on xi , though it may depend on other x’s.
A bivariate exponential distribution, BED for short, has both marginal distributions as exponential. As
KOTZ /BALAKRISHNAN /J OHNSON (2000) show, many of such BEDs exist. We will present these BEDs
in their standard form, but location and scale parameters can easily be introduced, if needed, through
appropriate linear transformations.
G UMBEL’s BED has the joint survival function
The latter is not exponential whereas the marginal distributions of each X1 and X2 are standard exponen-
tial. If θ = 0, then X1 and X2 are mutually independent.
The joint multivariate hazard rate of G UMBEL’s BED is
h1 (x) 1 + θ x2
hX (x) = = . (1.84d)
h2 (x) 1 + θ x1
The components in (1.84d) are constant with respect to variation in the corresponding variable, i.e., h1 (x)
does not depend on x1 nor does h2 (x) on x2 , but not with respect to variation in the other variable. So, the
distribution of X = (X1 , X2 ) has a locally, but not strictly constant bivariate hazard rate, see the graphs
on the right–hand side of Fig. 1/12.
The scalar multivariate hazard rate (1.82a) for G UMBEL’s BED using
results in
h(x1 , x2 ) = (1 + θ x1 ) (1 + θ x2 ) − θ. (1.84e)
Fig. 1/12 displays four functions of G UMBEL’s BED with parameter θ = 1. In the upper left corner we
have the PDF of (1.84b) and in the lower left corner the scalar multivariate HR of (1.84e). On the right–
hand side we see the first (upper graph) and the second (lower graph) component of the joint multivariate
HR of (1.84d).
33
Results for the multivariate normal distribution can be found in G UPTA /G UPTA (1997), M A (2000), M C G ILL
(1992), NAVARRO /RUIZ (2004).
1.3 The Multivariate Cases 63
Another BED is that of M ARSHALL /O LKIN (2007). The physical model consists of two components, sub-
jected to shocks that are always fatal. These shocks are assumed to be governed by independent P OISSON
processes with parameters λ1 , λ2 and λ12 , according as the shock applies to component 1 only, compo-
nent 2 only, or both components, respectively. The joint survival function of the lifetimes X1 and X2 of
the two components is
h i
S(x) = exp − λ1 x1 − λ2 x2 − λ12 max(x1 , x2 ) ; λ1 , λ2 , λ12 > 0; x1 , x2 ≥ 0 (1.85a)
exp − λ x − (λ + λ ) x for 0 ≤ x ≤ x
1 1 2 12 2 1 2
= (1.85b)
exp − (λ + λ ) x − λ x for 0 ≤ x ≤ x .
1 12 1 2 2 2 1
λi
Pr(Xi < Xj ) = ; i, j = 1, 2; i 6= j; (1.85d)
λ1 + λ2 + λ12
λ12
Pr(X1 = X2 ) = . (1.85e)
λ1 + λ2 + λ12
(1.85e also is the correlation coefficient between X1 and X2 . The joint PDF of this BED — using S(x) of
(1.85a) — is
λ (λ + λ ) S(x) for 0 < x < x
2 1 12 2 1
f (x) = λ1 (λ2 + λ12 ) S(x) for 0 < x1 < x2 (1.85f)
λ12 S(x) for x1 = x2 > 0.
64 1 The Hazard Rate and its Relatives
These hazard rates are strictly increasing, but for x2 (x1 ) fixed, the first (second) component is a non–
decreasing function of x1 (x2 ).
R INNE (2009, p. 173 ff.) has extended the two models above and further BEDs by power transformation
to bivariate W EIBULL distributions.
The joint multivariate mean residual life, defined by A RNOLD /Z AHEDI (1988), is the vector
µ1 (x)
..
µX (x) = = E(X − x | X ≥ x) (1.86a)
.
µm (x)
where
µi (x) = E(Xi − xi | X ≥ x)
R∞
S(x1 , . . . , xi−1 , xi + u, xi+1 , . . . , xm ) du
0
= ; i = 1, 2, . . . , m (1.86b)
S(x)
whenever S(x) > 0, see also (1.15b). It can be shown easily that the following relationship holds
between the hi (x)’s of hX (x), see (1.83b), and the µi (x)’s of µX (x) :
∂
µi (x) = µi (x) hi (x) − 1; i, 2, . . . , m. (1.86c)
xi
Looking at G UMBEL’s BED of Example(1/11) we find
1
1 + θ x2
µX (x) = ,
1
1 + θ x1
showing that µi (x) is constant with respect to xi , but decreases with respect to the other variable.
For G UMBEL’s BED we have
Another multivariate hazard rate concept has been proposed by C OX (1972) viewing the multi-
variate lifetime as a point process. In the bivariate case we have the following four components
of the hazard rate vector:
Pr(x ≤ Xi < x + ∆x | X1 ≥ x X2 ≥ x)
λi = lim ; i = 1, 2; (1.87a)
∆x→0 ∆x
Pr(x1 ≤ X1 < x1 + ∆x | X1 ≥ x1 X2 = x2 )
λ12 (x1 | x2 ) = lim ; x1 > x2 ; (1.87b)
∆x→0 ∆x
Pr(x2 ≤ X2 < x2 + ∆x | X1 = x1 X2 ≥ x2 )
λ21 (x2 | x1 ) = lim ; x1 < x2 . (1.87c)
∆x→0 ∆x
1.3 The Multivariate Cases 65
In terms of the joint survivor function S(x1 , x2 ) for X1 and X2 it is readily seen that
∂S(x1 , x2 ) ∂x1
λ1 (x) = − (1.87d)
S(x1 , x2 )
x1 =x2 =x,
2
∂ S(x1 , x2 ) ∂x1 ∂x2
λ12 (x1 ) | x2 ) = − , x1 > x2 , (1.87e)
∂S(x1 , x2 ) ∂x2
with similar expressions for λ2 (x) and λ21 (x2 | x1 ). The functions (1.87a–c) completely specify
the joint distribution of X1 and X2 . The joint PDF of X1 and X2 can be shown to be
( )
Rx2 Rx1
λ2 (x2 ) λ12 (x1 | x2 ) exp − λ1 (u) + λ2 (u) du − λ12 (u | x2 ) du , x1 ≥ x2
0 x2
( ) (1.87f)
Rx1 Rx1
λ1 (x1 ) λ21 (x2 | x2 ) exp − λ1 (u) + λ2 (u) du − λ12 (u | x1 ) du , x1 ≤ x2 .
0 x1
This can be verified by viewing the process as a point process. For example, with x1 ≥ x2 , the
probability of having no failures in [0, x2 ) and then the event X2 ∈ [x2 , x2 + ∆x2 ) is
x
Z2
λ2 (x2 ) ∆x2 exp − [λ1 (u) + λ2 (u)] du .
0
Conditional on this, the probability of no further failures in [x2 , x1 ) and the event x1 ∈ [x1 , x1 +
∆x1 ) is x
Z1
λ12 (x1 | x2 ) ∆x1 exp − λ12 (u | x2 ) du .
x2
Remember, that in the discrete case the hazard rat is a conditional probability. The discrete
multivariate conditional hazard rate functions of (X1 , X2 ) are defined as
provided the conditions in the above conditional probabilities have positive probabilities. Other-
wise, these functions are set to 1.
The meaning of these functions is as follows. The functions λ1 (x), λ2 (x) and λ12 (x) describe
the initial hazard rates, i.e., the hazard rates before a failure of any component. Given that no
failure has occurred before time x, then, at time x one of the following four events must occur:
66 1 The Hazard Rate and its Relatives
Now suppose that one component failed at x1 (or x2 ) and that the other component stayed alive
at that time. Then, conditional on X1 = x1 (or X2 = x2 ), the hazard rate of the live component
at time x > x1 (or x > x2 ) is given by λ2 (x | x1 ) [or λ1 (x | x2 )].
The hazard rates given in (1.88b–f) are the discrete analogs of the bivariate hazard rate functions
described in C OX (1972), see (1.87a–c) for the absolute continuous case. But in C OX there is
no analog of (1.88d) since absolute continuity of the distribution of (X1 , X2 ) is assumed there.
Failure of both components at the same time has zero probability in the absolute continuous case,
but in the discrete case it may be positive and is given by λ12 (x).
From (1.88b–f) we see that the joint distribution of X1 and X2 determines the conditional hazard
rate functions. But also the converse is true, i.e., (1.88b–f) determine Pr(x1 , x2 ) of (1.88a). For
more details see S HAKED et al. (1995) who also give necessary and sufficient conditions on
the functions (1.88b–f) which ensure that they are hazard rate functions of some random vector
(X1 , X2 ).
2 Aging Criteria and Classes of
Univariate Lifetime Distributions1
It is quite natural and obvious to classify lifetime distributions by using so–called aging criteria.
In the context of lifetime analysis aging does not mean that a statistical unit becomes older in
the sense of chronological calendarian time, rather aging is a notion pertaining to the behavior of
residual life. Aging is thus the phenomenon that a chronological older unit has a shorter residual
life in some statistical sense than a newer or chronological younger unit. We may distinguish
between
• positive (true or adverse) aging indicating a decline — in some way or the other — of
residual life with growing age x.
• negative (inverse or beneficial) aging when residual life is increasing with x in some way
or the other.
Lifetime distributions are mostly characterized with respect to aging by the behavior of
Hazard rate classes will be discussed in Sections 2.1 and 2.2. Mean residual life classes are the
topic of Section 2.3. But there are more statistical concepts used in classifying lifetime distribu-
tions. These will be presented in Section 2.4 where we will also show how all the aging criteria
are linked.
Classes of lifetime distributions based on notions of aging afford statisticians an opportunity to
consider problems of a character somewhat different from the usual. Instead of assuming that
he knows nothing about the underlying lifetime distribution, the statistician assumes that he does
not know the parametric form of the distribution, but that he does know, for example, that the
hazard rate is increasing. More generally, he knows that some type of aging property holds for
the lifetime distribution; this aging property give rise to a corresponding geometric property of the
distribution. Knowing that a lifetime distribution belongs to a certain class, it is possible by using
certain additional information to give approximations and bounds of the percentiles, moments
and survival probabilities of this distribution. Of course, it is possible to test whether certain
hypotheses on aging hold or not, see Sect. 10.3.
This chapter only present results for univariate distributions. Readers interested in aging criteria
for multivariate distributions are referred to B LOCK / S AVITS (1982, 1988), H ARRIS (1970) or
S HAKED /S HANTHIKUMAR (1987, 1988).
distributions with decreasing hazard rate also of some interest. Here, the terms ‘increasing’ and
‘decreasing’ are not used in the strict sense, but increasing (decreasing) stands for non–decreasing
(non–increasing). Note that with this convention the continuous exponential distribution and the
discrete geometric distribution with constant hazard rates belong to both classes. There are, of
course, examples such as dynamic loading of structures, where a non–monotonic hazard rate
function would be appropriate. Structures undergoing adjustment and modification also tend to
have a non–monotonic hazard rate.
The assumption that a lifetime distribution has a monotone hazard rate is quite strong as we
shall show, but such distributions possess many useful and interesting properties. Most results
on monotone hazard rates hold for the continuous as well as for the discrete case, but there are
some differences, especially in the way how to detect whether the distribution’s hazard rate is
increasing or decreasing. So we have decided to present the continuous and the discrete cases in
two separate Sections 2.1.1 and 2.1.2. In Section 2.1.3 we will introduce the related concept of
the hazard rate average and see when this is increasing or decreasing.
h(x1 ) ≤ h(x2 )
(≥)
implies
Zx Zx
h(x1 + u) du ≤ h(x2 + u) du.
(≥)
0 0
2
Suggested reading for this section: BARLOW /M ARSCHALL (1964), BARLOW /M ARSHALL /P ROSCHAN (1963),
BARLOW /P ROSCHAN (1965, 1975).
2.1 Monotone Hazard Rate Distributions 69
That is, x +x x +x
Z2 Z1
exp − h(u) du ≤ exp − h(u) du
(≥)
x2 x1
implying
We will now show how the IHR (DHR) property is related to the future lifetime of x–survivors.
Let x be the age of a statistical unit, then its survival function of future life Y | x is given, see
(1.52c), as
S(x + y)
S(y | x) = (2.2a)
S(x)
which can be written in terms of the hazard rate h(x) :
( )
x+y
R
exp − h(u) du
0
S(y | x) = x
R
exp − h(u) du
0
x+y
Z
= exp − h(u) du . (2.2b)
x
From (2.2b) we see that the conditional survival probability is an increasing (decreasing) function
of x, the age reached, if and only if the hazard rate is decreasing (increasing). Thus, complemen-
tary to Definition 1 of IHR and DHR given above, we can state the following
Definition 2: F (·) is IHR (DHR) if and only if S(y | x) is decreasing (increasing) in x for any
y > 0, x ≥ 0 such that S(x) > 0.
Introducing the notions stochastically larger (smaller), we may characterize IHR (DHR) in still
another way. A variate X1 with distribution F1 (x) is called stochastically smaller (larger) than a
variate X2 with F2 (x), abbreviated
st
X1 ≤ X2 (2.3a)
(≥)
if
F1 (x) ≥ F2 (x) ∀ x. (2.3b)
(≤)
Evidently we have
st
Y | x1 ≥ Y | x2 for x1 ≤ x2 (2.3c)
(≤)
if the underlying lifetime distribution F (·) is IHR (DHR), i.e., the future lifetimes Y | x become
stochastically smaller (larger) with growing x.
The most important geometric properties of the IHR (DHR) lifetime distributions are stated in the
following two Theorems 10 and 11.
Theorem10: F (·) is IHR (DHR) if and only if its logarithmic survival function ln S(x) is concave
(convex).
Because H(x) = − ln S(x), see (1.11f), we may express Theorem 10 in terms of the cumulative
hazard rate as well:
70 2 Aging Criteria and Classes of Univariate Lifetime Distributions
Rx
F (·) is IHR (DHR) if and only if its CHR H(x) = h(u) du is convex (concave).
0
We will first give a proof of Theorem 10 based on Definition 1 not assuming the existence of a
density function.
Proof 1: Let S(x) = 1 − F (x) = exp − H(x)]. Then
F (x + y) − F (x) n o
= 1 − exp − H(x + y) − H(x) ,
1 − F (x)
and F (·) is IHR (DHR) if and only if H(x + y) − H(x) is increasing (decreasing) in x for all
y > 0. Thus F (·) is IHR (DHR) if and only if H(x) is convex (concave).
Assuming a continuous distribution with existing density we have
Proof 2: F (·) is IHR (DHR) if its hazard rate has a non–negative (non–positive) first derivative,
i.e.,
2
S(x) f 0 (x) − f (x)
dh(x) d f (x)
= = 2 ≥ 0. (2.4a)
dx dx S(x)
S(x) (≤)
For ln S(x) to be concave (convex) its second derivative has to be non–positive (non–negative),
i.e.,
2
d2 ln S(x) S(x) f 0 (x) − f (x)
d f (x)
= − =− 2 ≤ 0. (2.4b)
dx2 dx S(x)
S(x) (≥)
1. F (·) is IHR if and only if S(x) = 1 − F (x) is a P ÓLYA frequency function of order 2.
2. F (·) is DHR if and only if S(x + y) is totally positive of order 2 in x and y for x + y ≥ 0.
S(x) = exp − xc ,
h(x) = c xc−1 ,
H(x) = xc .
3
Two more geometrical properties of these classes are:
1. The density function of a DHR distribution is a decreasing function.
2. The density function of an IHR distribution need not be unimodal.
4
The reduced W EIBULL distribution has a location parameter set to 0 and a scale parameter set to 1.
2.1 Monotone Hazard Rate Distributions 71
when S(x) < 1 and r ≥ 0. Hence, IHR distributions have finite moments of all orders. DHR
1
distributions necessarily must not have finite moments. For example, F (x) = 1 − , x ≥ 0,
1+x
is DHR, but the mean does not exist.
BARLOW /P ROSCHAN (1965) have given and proved a lot of theorems on IHR (DHR) distribu-
tions which rest on the fact that the exponential distribution is the boundary distribution between
these two classes. We only cite four of their results giving bounds for the survival probability and
for the moments.
1. If F (·) is IHR (DHR) with known percentile xP of order P, 0 < P < 1, i.e., F (xP ) = P,
then
≥ exp(−α x) for x ≤ xP ,
(≤)
S(x) (2.5a)
≤ exp(−α x) for x ≥ x ,
P
(≥)
where
ln(1 − P )
α=− . (2.5b)
xP
2. If F (·) is IHR with known mean µ, a sharp lower bound for S(·) is
exp(−xµ) for x < µ
S(x) ≥ (2.6a)
0 for x ≥ µ,
and a sharp upper bound is
1 for x ≤ µ
S(x) ≤ (2.6b)
exp(−ω x) for x > µ,
3. If F (·) is DHR with known mean µ a sharp upper bound for S(·) is
exp(−x/µ) for x ≤ µ,
S(x) ≤ µ (2.7)
for x ≥ µ.
xe
The sharp lower bound for DHR distributions is zero.
4. For an IHR distribution with µ = E(X) we have the following inequality on moments:
µr = E X r ≤ r! µr ; r = 1, 2, . . .
(2.8a)
µr = E X r ≥ r! µr ; r = 1, 2, . . .
(2.8b)
An interesting question is whether or not the monotone hazard rate is preserved under certain op-
eration with variates. We assume that the involved random variables are statistically independent.
1. A mixture
m
X m
X
F (x) = αi Fi (x); αi ≥ 0, αi = 1; (2.9)
i=1 i=1
of m DHR distributions is a DHR distribution. Mixing of IHR distributions does not nec-
essarily result in an IHR distribution.
2. A convolution of IHR distributions also is IHR, especially if X1 and X2 are IHR with
rates h1 (x) and h2 (x), respectively, then Y = X1 + X2 has hazard rate hY (x) ≤
hazard
min h1 (x), h2 (x) . The sum of DHR variates is not DHR.
3. A coherent structure of IHR (DHR) components does not necessarily have an IHR (DHR)
lifetime distribution, i.e., the IHR (DHR) class is not closed under formation of coherent
systems. But, parallel and series systems of identical IHR components are IHR. For series
systems the components do not have to be identical.
4. Order statistics from IHR distributions also have IHR distributions. However, this is not
true for spacings from an IHR distribution. Order statistics from a DHR distribution are
not necessarily DHR. However, spacings from a DHR distribution are DHR.
Tab. 2/1 in Sect. 2.4 summarizes preservation results for other classes of lifetime distributions.
Pi = Pr(X = i); i = 0, 1, . . .
5
Suggested reading for this section: G UPTA et al. (1997), K EMP (2004), L ANGBERG et al. (1980), S HAKED et
al. (1995).
2.1 Monotone Hazard Rate Distributions 73
Pi
hi = ∞ ; i = 0, 1, . . .
P
Pj
j=i
P
is non–decreasing (non–increasing). Notice that hi ≤ 1. As the survival function Pj rarely can
j≥i
be given in closed form it is not easy to determine the monotonicity of the hazard rate by using the
difference hi+1 − hi . In Theorem 11 we have seen that the IHR (DHR) property of a continuous
distribution can be determined by the curvature of ln f (x). Because the PMF Pi plays the same
role for the CDF (CCDF) as the PDF f (x) in the continuous case, i.e., giving the increment in
CDF (decrement in CCDF), a simple criterion for determining the monotonicity can be based on
the curvature of the PMF, see G UPTA et al. (1997).
Define
Pi − Pi+1 Pi+1
ηi := =1− (2.10a)
Pi Pi
and
Pi+1 Pi+2
∆ηi = ηi+1 − ηi = − . (2.10b)
Pi Pi+1
Recalling that PMF is log convex if
2
Pi Pi+2 > Pi+1 ∀i
b) Pi = P0 = 1/(m + 1); i = 0, 1, . . . , m
ci
c) Pi = ; i = 0, 1, . . . , m
1 + c + c2 + . . . + cm
This distribution is IHR, too.
74 2 Aging Criteria and Classes of Univariate Lifetime Distributions
Thus, in order to find out whether a discrete distribution is IHR or DHR or not monotone we just
have to study the behavior of the ratio of two adjacent probabilities.
Binomial distribution
n
Pi = P i (1 − P )n−i ; i = 0, 1, . . . , n, n ∈ N, 0 < P < 1;
i
Pi+1 n−i P
=
Pi i+1 1−P
n+1 P
∆ηi = > 0 ⇒ IHR
(i + 1) (i + 2) 1 − P
aPi −1
Pi = ; i = 1, 2, . . . , 0 < P < 1, a = − ln(1 − P ) ;
i
Pi+1 i
= P
Pi i+1
P
∆ηi = − < 0 ⇒ DHR
(i + 1) (i + 2)
k+i−1 k
Pi = q (1 − q)i ; i = 0, 1, . . . , 0 < q < 1, k > 0;
i
Pi+1 k+i
= (1 − q)
Pi (i + 1)
< 0 for 0 < k < 1 ⇒ DHR
k−1
∆ηi = (1 − q) = 0 for k = 1 ⇒ geometric distribution
(i + 1 (i + 2)
> 0 for k > 1 ⇒ IHR
P OISSON distribution
λi −λ
Pi = e ; i = 0, 1, . . . , λ > 0;
i!
Pi+1 λ
=
Pi i+1
λ
∆ηi = > 0 ⇒ IHR
(i + 1) (i + 2)
We have just seen that the ratio of two adjacent probabilities serves to investigate the behavior of
the hazard rate. G UPTA et al. (1997) also show that it is possible to compute the hazard rate when
these ratios are known. The fundamental equation, which has to be evaluated, gives the reciprocal
6
For more discrete distributions see Sect. 3.2.
2.1 Monotone Hazard Rate Distributions 75
1
HRA(x) = − ln S(x) (2.12a)
x
is increasing (decreasing) in x, x ≥ 0, or, equivalently, if
1/x1 1/x2
S(x1 ) ≥ S(x2 ) for 0 ≤ x1 ≤ x2 . (2.12b)
(≤)
Rx
Recall from (1.11f) that − ln S(x) represents the cumulative hazard rate H(x) = h(u) du,
0
when the hazard rate exists and we then have
Zx
1 H(x)
HRA(x) = h(u) du = . (2.12c)
x x
0
H(x2 ) H(x2 )
≥ for 0 ≤ x1 ≤ x2 . (2.12d)
x2 (≤) x2
or
H(x)
h(x) ≥ ∀ ≥ 0, (2.13b)
(≤) x
7
Suggested reading for this section: BARLOW /P ROSCHAN (1975), L ANGBERG et al. (1982).
76 2 Aging Criteria and Classes of Univariate Lifetime Distributions
i.e., the increment h(x) of H(x) has to be greater (smaller) than the hazard rate average HRA(x)
for all x. For example, looking at the reduced W EIBULL distribution we have
H(x)
H(x) = xc , h(x) = c xc−1 , = HRA(x) = xc−1
x
and
≤ H(x) = xc−1 for 0 < c ≤ 1 ⇒ DHRA and DHR
h(x) = c x c−1 x
≥ H(x) = xc−1 for c ≥ 1 ⇒ IHRA and IHR.
x
1/x
It is obvious that an IHRA distribution is characterized by decreasing S(x) on [0, ∞), while
1/x
a DHRA distribution is characterized by increasing S(x) on [0, ∞). Hence, we can formu-
late
Theorem 12: A distribution is IHRA (DHRA) if and only if
1/a
S(a x) ≥ S(x) , 0 < a < 1, x ≥ 0. (2.14)
(≤)
BARLOW /P ROSCHAN (1975, pp. 91 ff.) give the following stochastic model leading to an IHRA
lifetime distribution. A device is subject to shocks occurring randomly in time according to a
P OISSON process, each shock independently causes random damage to the device. The damages
accumulate until a critical threshold or capacity is exceeded, at which time the device fails. This
time to failure is governed by an IHRA distribution. For closure and inheritance of IHRA (DHRA)
distributions see Tab. 2/1 in Sect. 2.4.
In the discrete case we have — see (1.62a,b) — two different CHRs, the cumulative hazard rate
function
X
1 Hi = − ln Si = − ln Pj ; i = 0, 1, . . .
j≥i
Definition: A discrete distribution is IHRA (DHRA) in the sense of the cumulative hazard rate if
or equivalently if
1/(i+1) 1/i
Si ≤ Si−1 ; i = 1, 2, . . . (2.17b)
(≥)
or equivalently if
i 2 Hi−1
≥ ; i = 1, 2, . . . (2.17d)
i + 1 (≤) 2 Hi
Theorem 15: If a discrete lifetime distribution is IHR (DHR) then it is IHRA (DHRA) in the sense
of the cumulative hazard rate as well as in the sense of the accumulated hazard rate.
h0 ≤ h1 ≤ . . . ≤ hi−1 ≤ hi ≤ . . .
then, because of 0 ≤ hi ≤ 1 ∀ i,
and
i−1 i−2
1 X 1X
1 HRAi − 1 HRAi−1 = − ln(1 − hj ) + ln(1 − hj )
i+1 i
j=0 j=0
i−2
P i−1
P
(i + 1) ln(1 − hj ) − i ln(1 − hj )
j=0 j=0
=
i (i + 1)
i−2
P
ln(1 − hj ) − ln(1 − hi−1 ) − ln(1 − hi−1 )
j=0
= ≥0
i (i + 1)
and the distribution is IHRA. Similarly, if
h0 ≥ h1 ≥ . . . ≥ hi−1 ≥ hi ≥ . . .
then
ln(1 − h0 ) ≤ ln(1 − h1 ) ≤ . . . ≤ ln(1 − hi−1 ) ≤ ln(1 − hi ) ≤ . . .
78 2 Aging Criteria and Classes of Univariate Lifetime Distributions
and
1 HRAi − 1 HRAi−1 ≤ 0
and the distribution is DHRA.
We now look at 2 HRAi . If
h0 ≤ h1 ≤ . . . ≤ hi−1 ≤ hi ≤ . . .
then
i
X i−1
X
i 2 Hi − (i + 1) 2 Hi−1 = i hj − (i + 1) hj
j=0 j=0
i−1
X
= (hi − hj ) ≥ 0
j=0
h0 ≥ h1 ≥ . . . ≥ hi−1 ≥ hi ≥ . . .
then
i 2 Hi − (i + 1) 2 Hi−1 ≤ 0
and the distribution is DHRA.
When the distribution is not IHR (DHR) Theorem 15 must not hold. We recommend to express
the behavior of the hazard rate average in the discrete sense by means of 2 HRAi , because (2.16b)
is the average by construction, i.e., a sum divided by the number of its summands.
Whether a hazard rate is DIHR or IDHR can be found out by investigating its first derivative in
the case of a continuous variate or its first difference in the case of a discrete variate. A quite
general definition which extends the idea of DIHR and IDHR to situations where the hazard rate
itself does not exist is
Definition: A lifetime distribution F (x) with x ∈ [0, ∞) is said to be DIHR (IDHR) if there
exists a x0 > 0 such that H(x) = − ln[1 − F (x)] is concave (convex) on [0, x0 ) and convex
(concave) on [x0 , ∞).
G LASER (1980) has given sufficient conditions to characterize a given lifetime distribution as
being IHR, DHR, IDHR, and DIHR, assuming that its PDF f (x) is continuous and twice differ-
entiable on [0, ∞). These conditions rest upon the reciprocal of the hazard rate
1 S(x)
g(x) := = (2.18a)
h(x) f (x)
G LASER (1980) has supplemented Theorem 16 by the following lemma that helps to avoid finding
a root y0 of g 0 (·).
Lemma: Suppose(2.19a) or (2.19b) hold in Theorem 16.
Among the popular and classical distributions we seldom find one which is DIHR or IDHR. Two
prominent exceptions are the log–normal distribution with10
(ln x − a)2
1 1 p
f (x) = √ exp − , x > 0, a = E(ln X) ∈ R, b = Var(ln X),
b 2π x 2 b2
1 1 ln x − a
= φ , (2.20a)
b x b
ln x − a
F (x) = Φ , (2.20b)
b
and the inverse G AUSSIAN distribution, also called WALD distribution with
r
b (x − a)2 a3
b
f (x) = exp − , x > 0, a = E(X) > 0, b = > 0, (2.21a)
2 π x3 2 a2 x Var(X)
"r # " r #
b x 2b b x
F (x) = Φ − 1 + exp Φ − +1 . (2.21b)
x a a x a
These two distributions are IDHR irrespective of their parameter values. Figures 2/1 and 2/2
depict the hazard rates of these distributions for several combinations of parameter values.
There are several approaches to construct DIHR and IDHR distributions. We only mention:11
1. directly specifying a hazard rate that has a bathtub or inverted bathtub shape and then re-
covering its CDF and PDF using (1.9c,d),
Ru
2. Φ(u) = φ(z) dz is the CDF of the standardized normal distribution which cannot be given in closed form.
−∞
11
For more approaches see L AI et al. (2001).
2.2 Non–monotone Hazard Rate Distributions 81
θ
h(x) = δ x + ; x ≥ 0, δ, θ, β ≥ 0. (2.23a)
1+βx
δ x is an increasing term and θ (1 + β x) is a decreasing term. For β > 0 (2.23a) results in
δ x2
θ + δ x (1 + β x)
f (x) = exp − , (2.23b)
(1 + β x)θ/β+1 2
exp − δ x2 /2
F (x) = 1 − , (2.23c)
(1 + β x)θ/β
δ/x2 δ ln(1 + β x)
H(x) = + . (2.23d)
2 β
For β = 0 we have to take the limit of f (x) and F (x) as β → 0. Special cases of (2.23a) — see
Fig. 2/3 — are:
An example for a compound distribution with possible bathtub–shaped hazard rate is the expo-
nential power distribution of D HILLON (1981), called D HILLON–I distribution. The hazard rate
reads:
c x − a c−1 x−a c
h(x) = exp ; x ≥ a, a ∈ R, b, c > 0. (2.24a)
b b b
This hazard rate is displayed in Fig. 2/5 further down. The following functions belong to (2.24a):
When b = 1 we have a log–W EIBULL distribution, also known as type–I extreme value distri-
bution of the minimum, and the hazard is increasing for b ≥ 1. The bathtub–shaped hazard rate
1 1 − b 1/b
comes up for 0 < b < 1 with a change point (minimum) at x = .
a b
A first example for a generalized distribution is S TACY’s (1962) generalized gamma distribution13
with14
d xd c−1
x d
f (x) = d c exp − ; x ≥ 0; b, c, d > 0. (2.25)
b Γ(c) b
This distribution includes many other distributions as special cases, see R INNE (2009, pp. 111 ff.).
The behavior of the hazard rate does not depend on the scaling parameter b, but it depends on
c d − 1 as follows:
• cd − 1 < 0
? d ≤ 1 ⇒ DHR
? d > 1 ⇒ DIHR
• cd − 1 > 0
? d ≥ 1 ⇒ IHR
? d < 1 ⇒ IDHR
• cd − 1 = 0
13 1 x d−1 x
The ordinary gamma distribution with f (x) = exp − ; x ≥ 0, b, d > 0; is IHR for
b Γ(d) b b
d ≥ 1 and IHR for 0 < d ≤ 1.
R∞
14
Γ(z) = uz−1 e−u du is the complete gamma function.
0
84 2 Aging Criteria and Classes of Univariate Lifetime Distributions
A second example is the generalized exponential geometric distribution given by S ILVA et al.
(2010) with
1 − exp(−b x) c
F (x) = ; x ≥ 0, b, c > 0, p ∈ (0, 1), (2.26a)
1 − p exp(−b x)
c−1
c b (1 − p) exp(−b x) 1 − exp(−b x)
f (x) = c+1 , (2.26b)
1 − p exp(−b x)
c−1 −1
c b (1 − p) exp(−b x) 1 − exp(−b x) 1 − p exp(−b x)
h(x) = c c (2.26c)
1 − p exp(−b x) − 1 − exp(−b x)
Z∞
1
• µ(x) = E(X − x | X ≥ x) = S(u) du in the continuous case and
S(x)
x
1 X
• Li = E(X − i | X ≥ i) = Sj in the discrete case
Si
j>i
may also be used to characterize aging and to classify lifetime distributions. While the hazard
rate function at x provides information about a small interval just after x, the MRL function at
x considers information about the whole interval after x (all after x). This intuition explains the
difference between the two.
When MRL is monotone and increasing (decreasing) — abbreviated IMRL (DMRL) — we have
beneficial (adverse) aging. But there are also distributions with non–monotone MRL. Of special
interest in this case are the DIMLR class where MRL has a bathtub shape (first decreasing, af-
terwards increasing) and the IDMRL class with an upside–down bathtub shape (first increasing,
afterwards decreasing).
What functions are MRL functions? — Several characterizations are possible which answer this.
We cite the characterization given by G UESS /P ROSCHAN (1988, p. 217).16
Theorem 17: Consider the following conditions:
A function µ(x) satisfies (i) – (v) if and only if µ(x) is the MRL function of a non–degenerate at
x = 0 lifetime distribution.
Note that condition (ii) rules out the the degenerate at x = 0 distribution. (iv) is a statement on
the expected time of death given that a unit has survived to time x, see (1.16a,b). Theorem 17
delineates which functions can serve as MRL functions, and hence, it provides models for life–
lengths. For recovering the other representatives of a lifetime distribution see (1.20a–f).
We now look at the relationship between the HR classes and the MRL classes.
Theorem 18: A lifetime distribution that is IHR (DHR) has a decreasing (increasing) MRL func-
tion, i.e., the IHR (DHR) class is contained in the DMRL (IMRL) class.
The implication IHR ⇒ DMRL (DHR ⇒ IMRL) of Theorem 18 cannot be reversed in general.
We give a proof of Theorem 18 for a discrete lifetime distribution:17
X Sj Sj+1
Li − Li+1 = −
Si Si+1
j>i
j−1
" #
X Y
= (hj − hi ) (1 − hk ) ≥ 0, (2.27)
j>i (≤)
k=i+1
accordingly as hj ≥ hi , j > i. Therefore, when hi < hi+1 ∀ i (= IHR), the MRL function
(≤)
decreases and when hi > hi+1 ∀ i (= DHR), the MRL function increases.
S(x) = exp − xc
where
Z∞
γ(a, z) = ua−1 e−u du
z
17
For a proof in the continuous case see B RYSON /S IDDIQUI (1969).
86 2 Aging Criteria and Classes of Univariate Lifetime Distributions
Fig. 2/4 shows the hazard rate functions and the corresponding mean residual life functions.
A discrete distribution analyzed by K EMP (2004) is the zero–inflated geometric distribution with
0 < λ < 1 and 0 < α < 1. We have
h0 = 1 − α λ, hi = 1 − λ for i ≥ 1, and (2.29a)
Thus P
Sj
αλ
j>0
L0 = = ,
S0 1−λ
P (2.29c)
Sj
j>i λ
Li = = , i > 0.
Si 1−λ
Now
h0 > h1 = h2 = . . . and L0 < L1 = L2 = . . .
Therefore, the zero–inflated geometric distribution is DHR and IMRL.
Figure 2/4: HR and corresponding MRL of two reduced W EIBULL distributions
From the fact that IHR ⇒ DMRL and DHR ⇒ IMRL one may conjecture that DIHR implies
IDMRL, i.e., a bathtub–shaped hazard rate implies an upside–down MRL. The following Theo-
rem 19 is given by and proved by M I (1995).
Theorem 19: If the hazard rate function h(x) has a bathtub shape, then the associated MRL has
an upside–down bathtub shape.
Fig. 2/5 demonstrate Theorem 19 for the exponential power distribution of D HILLON (1981) with
the hazard rate given in (2.24a), which is DIHR for 0 < b < 1.
Rz
18
Γ(a, z) = ua−1 e−u du is the incomplete gamma function.
0
2.3 MRL classes of Distributions 87
The converse of Theorem 19 is not necessarily true as demonstrated by the following example of
M I (1995):
x2 + 1 for 0 ≤ x < 1,
µ(x) = 2x for 1 ≤ x < 2,
4 exp − 0.25 (x − 2)
for x ≥ 2.
This µ(x) has an upside–down bathtub shape and is a MRL function of a certain lifetime distri-
bution. Applying (1.20d) we find the corresponding hazard rate function:
2x + 1
for 0 ≤ x < 1,
x2 + 1
3
h(x) = for 1 ≤ x < 2,
2x
0.25 exp0.25 (x − 2) − 1 for x ≥ 2.
88 2 Aging Criteria and Classes of Univariate Lifetime Distributions
This hazard rate is not bathtub–shaped, instead it is bathtub–shaped over [0, 2), drops down to
0.25/e ≈ 0.0920 at x = 2 and increases to infinity over (2, ∞).
From Theorem 19 we may conjecture that a distribution with bathtub–shaped hazard rate might
have a bathtub–shaped mean residual life function. As Fig. 2/6 shows this conjecture holds at
least for the log–normal distribution.
The reader interested in DIHR and IDHR residual life functions of discrete distributions should
consult G UESS /PARK (1988).
IAF
⇐⇒
=⇒ NBUHR =⇒ NBUHRA
IHRA =⇒ NBU =⇒
=⇒
PF2 =⇒ IHR =⇒ NBUE =⇒ HNBUE
DMRL =⇒
⇐⇒
IIHRA
DAF
⇐⇒
=⇒ NWUHR =⇒ NWUHRA
DHRA =⇒ NWU =⇒
=⇒
DHR =⇒ NWUE =⇒ HNWUE
IMRL =⇒
⇐⇒
DIHRA
PF2 means P ÓLYA density of order 2 and is the strongest aging criterion.
We now present and discuss those criteria in Fig. 2/7 which have not been described in the pre-
ceding sections and we start on the left–hand side and move to the right. The criteria IAF and
DAF have been introduced by B RYSON /S IDDIQUI (1969), and IAF (DAF) stands for increasing
(decreasing) specific aging factor. The specific aging factor is defined as
S(x) S(y)
A(x, y) := ; x, y ≥ 0. (2.30a)
S(x + y)
19
Suggested reading for the section: BARLOW /P ROSCHAN (1975), B RYSON /S IDDIQUI (1969), J OHNSON /
KOTZ /BALAKRISHNAN (1995, pp. 663 ff.), K EMP (2004), K LEFSJ Ö (1982a,b), M ARSHALL /P ROSCHAN
(1972).
2.4 Classification According to other Aging Criteria 89
Notice the interchangeability of the arguments x and y and the relationship to NBU and NWU
in (2.32). If a distribution is NBU (NWU), its specific aging factor results as A(x, y) ≥ 1. The
(≤)
motivation for A(x, y) may be seen as follows: Consider two units with lifetimes described by
one and the same distribution F (·), and let x denote the chronological age of one of them. The
other unit is ‘new’, i.e., has a chronological age of zero. Then S(y) is the probability that the
new unit will survive for at least a duration y. Correspondingly, the ratio S(x + y) S(x) is the
probability that the older unit will survive for that same duration, given its prior survival up to
time x. The specific aging factor is the a comparison of these two survival probabilities. It will
be strictly greater than unity if and only if the older unit has ‘aged’ in that it has less chance of
surviving for duration y than does a new unit. The range of A(x, y) is the extended positive real
line; however, it is undefined if either numerator factor vanishes.
Definition: A distribution function F (·) is called IAF if
and DAF if
We have
Zx
1
HRA(x, 0) = HRA(x) = h(u) du. (2.31b)
x
0
To prove the equivalence of criteria IHR and IIHRA we need the following
90 2 Aging Criteria and Classes of Univariate Lifetime Distributions
Lemma: Let h(x) be integrable with no more than finitely many discontinuities in any finite
interval. Then h(x) is monotone increasing for all x > 0 if and only if
y+x
Z
1
h(y) ≤ h(u) du ≤ h(y + x) ∀ y ≥ 0, x > 0.
x
y
Proof of IHR ⇔ IIHRA: In accordance with IHR, h(x2 ) ≥ h(x1 ) ∀ x2 ≥ x1 ≥ 0, let h(·) be
monotone increasing, and choose x2 ≥ x1 . Then
y+x
Z 2 y+x
Z 2
1 1 1
HRA(x2 , y) − HRA(x1 , y) = − h(u) du + h(u) du
x2 x1 x2
y y+x1
y+x y+x
Z 2 Z 1
x2 + x1 1 1
= h(u) du − h(u) du
x2 x2 − x1 x1
y+x1 y
x2 − x1
≥ h(y + x1 ) − h(y + x1 )
x2
x+x
Z 2 y+x
Z 1
1 1
h(u) du ≥ h(u) du.
x2 x1
y y
x+x
R
h(u) du
x
lim = h(y + ),
x→0 x
Since y and x2 are arbitrary, the above Lemma applies to prove the monotonicity of h(·).
The equivalence of DHR and DIHRA in the lower chain of Fig. 2/7 can be proved along the same
lines.
The probability of an x–survivor living another y units of time is
S(x + y)
S(y | x) = Pr(Y > y | X ≥ x) = ,
S(x)
whereas the probability of a new unit living more than y units of time is
Definition:20 A lifetime distribution is said to be NBU (new better than than used) or NWU
(new worse than used) accordingly as the conditional survival probability S(y | x) is is less (or
greater) than the unconditional survival probability S(y), i.e.,
S(x + y)
S(y) ≥ ∀ x ≥ 0, y ≥ 0. (2.32)
(≤) S(x)
We will prove the implications IHR ⇒ NBU and DHR ⇒ NWU for a discrete distribution, where
NBU (NWU) means
Si+j
≤ Sj , i 6= j = 0, 1, . . .
Si (≥)
Proof: From (1.61c) we have
i+j−1
Q
(1 − hk ) j−1
Si+j k=0
Y 1 − hi+k
= = .
Si Sj i−1
Q j−1
Q 1 − hk
(1 − hk ) (1 − hk ) k=0
k=0 k=0
For a continuous distribution it is easily shown that NBU (NWU) can be equivalently stated as
S(x + y)
S(y) ≥
(≤) S(x)
20
H OLLANDER /PARK /P ROSCHAN (1986) introduced subclasses of of the NBU (NWU) distributions, called new
better (worse) than used of age x0 . For these classes the survival probability at age 0 is greater (smaller) than or
equal to the conditional survival probability at specified age x0 > 0 :
They presented preservation and non–preservation properties of the two classes under various reliability opera-
tions and also showed how to test whether or not a distribution is new better than than used at age x0 .
92 2 Aging Criteria and Classes of Univariate Lifetime Distributions
and
Zx x+y
Z
h(u) du ≤ h(u) du; x, y > 0.
(≥)
0 y
But for discrete NBU (NWU) distributions the corresponding equivalence of
Si+j
Sj ≥ ; i, j = 0, 1, . . .
(≤) Si
and
j
X i+j
X
hk ≤ hk
(≥)
k=0 k=i
does not hold.
From the mean residual life of an x–survivor
Z∞
1
µ(x) = E(Y | x ≥ x) = S(u) du,
S(x)
x
We will prove the implication NBU ⇒ NBUE (NWU ⇒ NWUE) for a discrete distribution.
Proof: If
Si+j ≤ Si Sj ; j = 0, 1, . . .
(≥)
then X X
Si+j ≤ Si Sj .
j≥0 (≥) j≥0
R∞
Definition: A continuous distribution F (·) with finite mean µ = S(u) du is said to be HNBUE
0
(harmonic new better than used in expectation) if
Z∞
x
S(u) du ≤ µ exp − for x ≥ 0. (2.34a)
µ
x
2.4 Classification According to other Aging Criteria 93
If the reversed inequality is true, F (·) is said to be HNWUE (harmonic new worse than used
in expectation). This gives a dual class to the HNBUE class of distributions in the same way as
the IHR, IHRA, NBU, NBUE, and DMRL classes have their duals. The HNBUE and HNWUE
classes have been introduced by ROLSKI (1975). The names HNBUE resp. HNWUE are to be
explained as follows. Starting from the mean residual life
Z∞
1
µ(x) = S(u) du
S(x)
x
This inequality says that the integral harmonic mean of µ(u), 0 ≤ u ≤ x, is less than or equal to
the integral harmonic mean of µ. For HNBUE distributions we have
1 for x ≤ µ
S(x) ≤
µ−x
(2.34c)
exp
for x > µ
µ
holds.
Definition: For an absolutely continuous distribution F (·) with hazard rate h(·), we say that F (·)
is NBUHR (NWUHR) — new better (worse) than used in hazard rate — if
Definition: For an absolutely continuous distribution F (·), we say that F (·) is NBUHRA
(NWUHRA) — new better (worse) than used in hazard rate average — if
Zx
1 ln 1 − F (x)
h(0) ≤ h(u) du = − ∀ x ≥ 0. (2.36)
(≥) x x
0
21
Notice that from (2.34a) we first have
µ(x) x
S(x) ≤ exp − .
µ µ
Using (1.19e) we then find x
Z
1 x
exp − du ≤ exp −
µ(u) µ
0
and
Zx
1 x
− du ≤ − .
µ(u) µ
0
In the following Table 2/1 we have compiled from H OLLANDER /P ROSCHAN (1984) and
K LEFSJ Ö (1982a) the results pertaining to the closure of classes of lifetime distributions under
three reliability operations: mixture and convolution of distributions and formation of coherent
systems.
Table 2/1: Closure and inheritance of classes of lifetime distributions
under reliability operations
ZP
B(P, c, d) = tc−1 (1 − t)d−1 dt − incomplete beta function
0
Z∞
Γ(a) = ta−1 e−t dt − complete gamma function
0
Zu
Γ(a, u) = ta−1 e−t dt − incomplete gamma function
0
Z∞
γ(a, u) = ta−1 e−t dt − complementary incomplete gamma function
u
1
Suggested reading for this section: J OHNSON /KOTZ /BALAKRISHNAN (1994, 1995), L EEMIS(1995),
M EEKER /E SCOBAR (1998), PATEL (1973), R INNE (2009, 2010). The interactive program ContDist which
is written in MATLAB and which is included in the accompanying file ‘Distributions.zip’ displays for all dis-
tributions presented here a graph of the functions f (x), S(x), h(x) and — if existing — µ(x) for any set of
user–chosen parameter values.
96 3 Presentation of Univariate Parametric Distributions
Zu
1
exp[−t2 2] dt − CDF of the standardized normal distribution
Φ(u) = √
2π
−∞
1
exp[−u2 2] du − PDF of the standardized normal distribution
φ(u) = √
2π
Zx
2
exp − u2 du − error function
erf(x) = 1 − √
π
0
Alpha distribution
This distribution is related to the normal distribution in the following way: Consider Y ∼
N o(µ; σ), truncated to the left of y = 0. Then, X = 1/Y has an alpha distribution with pa-
rameters α = µ/σ and b = 1/σ. The alpha distribution has been applied to tool wear and has also
been suggested in modeling lifetimes under accelerated life testing, see S ALVIA (1985):
" #
b 2
b exp −0.5 α −
x
f (x) = √ ; x ≥ 0, α ∈ R, b > 0
2 π Φ(α) x2
b
Φ α−
x
S(x) = 1 −
Φ(α)
" #
b 2
b exp −0.5 α −
x
h(x) = ⇒ IDHR
√
b
2 π Φ(α) − Φ α − x2
x
µ(x) − does not exist
√
The PDF has its mode at x = b α2 + 8 − α 4 which moves to the left (right) as α (b)
increases.
Arcsine distribution
This distribution is a special case of the beta distribution (see below) with shape parameters c =
d = 0.5.2 The name is derived from the fact that its CDF and CCDF are written in terms of the
arcsine function, the inverse of the sine function. The arcsine distribution with a = 0 and b > 0
having support [−b; b] gives the position at random time of a particle engaged in simple harmonic
motion with amplitude b > 0.
1
f (x) = s ; a − b ≤ x ≤ a + b; a ∈ R, b > 0
x−a 2
bπ 1 −
b
x−a
arcsin
1 b
S(x) = −
2 π
2
A beta distribution with c + d = 1, but c 6= 0.5 is sometimes called a generalized arcsine distribution.
3.1 Continuous Distributions 97
1
h(x) = s 2 ⇒ DIHR
x−a x−a
b 1− arccos
b b
p
a + b − x) (x + b − a)
µ(x) = (a − x) + ⇒ IDMRL
x−a
arccos
b
The arcsine distribution is a location–scale distribution. The PDF is symmetric and U–shaped,
its minimum is at x = a with f (a) = (b π)−1 . The bathtub–shaped hazard rate has its minimum
at x∗ ≈ a − 0.4421 b with h(x∗ ) ≈ (1.8197 b)−1 . The upside–down shaped MRL has +its max-
imum at x being the solution of (a + b − x ) (x + b − a) − (a − x ) arccos x b−a =
+
p +
+ +
h + i2
(a + b − x+ ) (x+ + b − a) arccos x b−a
p
.
Beta distribution
The name of this distribution has its origin in the complete beta function B(c, d) which is part of
the formulas for the PDF and other distribution representatives. The PDF of a beta distribution
can take on a great variety of shapes depending on its two shape parameters c, d ∈ R :
• symmetric for c = d,
• unimodal for c > 1 and d > 1,
• U–shaped for c < 1 and d < 1,
• J–shaped for d ≤ 1 ≤ c, but c 6= d,
• inversely (or reflected) J–shaped for c ≤ 1 ≤ d, but c 6= d.
The beta distribution also includes several other distributions as special cases, e.g.,
• the uniform distribution for c = d = 1,
• the right–angled negatively (positively) skewed triangular distribution for d = 1 and c =
2 (c = 1 and d = 2),
• the arcsine distribution for c = d = 0.5,
• the power function distribution for d = 1, c > 0.
1 (x − a)c−1 (a + b − x)d−1
f (x) = ; a ≤ x ≤ a + b; a ∈ R; b, c, d > 0
B(c, d) bc+d−1
(x−a)/b
Z
1
S(x) = 1 − uc−1 (1 − u)d−1 du
B(c, d)
0
= 1 − I( x−a ) (c, d) = I(1− x−a ) (d, c)
b b
The function Iz (c, d) is the incomplete beta function ratio which has to be evaluated numerically.
DIHR for 0 < c < 0.8 and d arbitrarily
h(x) − no closed form ⇒ ∼
IHR for all other combinations of c and d
IDMRL for 0 < c < 0.8 and d arbitrarily
µ(x) − no closed form ⇒ ∼
DMRL for all other combinations of c and d
98 3 Presentation of Univariate Parametric Distributions
BIRNBAUM–SAUNDERS distribution
This distribution has been suggested by B IRNBAUM /S AUNDERS (1968, 1969) as a lifetime model
for materials subject to cyclic patterns of stress where the ultimate failure comes from the growth
of prominent flaws.
r r
x b r !2
+ r
b x 1 x b
f (x) = √ exp− 2 − ; x ≥ 0; b, c > 0
2cx 2π 2c b x
p p
The variate Y = x/b − b/x c has a standard normal distribution, so
" r r !#
1 x b
S(x) = Φ − − .
c b x
h(x) = f (x)/S(x) has no closed form, but it is IDHR with a maximum at x∗ ≈ b/(−0.4604 +
1.8417 c)2 which is close to zero for b small and c large, so that the upside–down bathtub shape
does not come up clearly and the hazard rate seems to be DHR. The IDHR pattern is best seen for
b and c around 1. For h(x) we further notice:
• h(0) = 0 and
1
• lim h(x) = .
x→∞ 2 b c2
µ(x) has no closed form and it is DIMRL for b and c around 1, otherwise IMRL.
IMRL for 0 < d ≤ 1
µ(x) − no closed form ⇒ DIMRL for d > 1
does not exist for c d ≤ 1
The hazard rate has — for d > 1 — a maximum at x∗ = a + b (d − 1)1/d with h(x∗ ) =
(c/b) (d − 1)1−1/d .
3.1 Continuous Distributions 99
C AUCHY distribution
The C AUCHY distribution is a symmetric distribution defined on R. Moments and thus MRL do
not exist.
( " #)−1
x−a 2
f (x) = πb 1+ ; x ∈ R; a ∈ R, b > 0
b
1 1 x−a
S(x) = − arctan
2 π b
1
h(x) = " 2 # ⇒ IDHR
x−a π x−a
b 1+ − arctan
b 2 b
The hazard rate has its maximum at x∗ ≈ a + 0.428978 b with h(x∗ ) ≈ 0.7246/b.
χ distribution
The χ distribution is related to the χ2 distribution. For ν = 2 the χ distribution is equal to the
R AYLEIGH distribution.
2
x
xν−1 exp −
2
f (x) = ν ; x ≥ 0; ν > 0
2ν/2−1 Γ
2
Γ(ν/2, x2 /2)
S(x) = 1 − ν ;
Γ
2
2
x
xν−1 exp −
2 DIHR for 0 < ν < 1
h(x) = h ν i ⇒
2ν/2−1 Γ − Γ(ν/2, x2 /2)
IHR for ν ≥ 1
2
IDMRL for 0 < ν < 1
µ(x) − no closed form ⇒
DMRL for ν ≥ 1
χ2 distribution
The χ2 distribution is a special case of the gamma distribution (see below) with b = 2.
x
xν/2−1 exp −
f (x) = ν 2 ; x ≥ 0; ν > 0
ν/2
2 Γ
2
Γ(ν/2, x/2)
S(x) = 1 − ν ;
Γ
2
x
DHR for 0 < ν < 2
xν/2−1 exp −
h(x) = h ν 2 i ⇒ 0.5 for ν = 2 ( exponential distribution)
2 ν/2 Γ − Γ(ν/2, x/2)
2
IHR for ν > 2
100 3 Presentation of Univariate Parametric Distributions
IMRL for 0 < ν < 2
µ(x) − no closed form ⇒ 2 for ν = 2
DMRL for ν > 2
x−a 2
x−a x−a
0.199339 − + − 0.0506606 cos π
2b b b
µ(x) ≈ b ⇒ DMRL
x−a 1 x−a
0.5 1 − − sin π
b π b
D HILLON–I distribution
D HILLON (1979) has proposed the following hazard rate
h(x) = k λ c xc−1 + (1 − k) β xβ−1 b exp b xβ
which is a linear combination of two components, 0 ≤ k ≤ 1 being the combining linear factor.
λ and b are the scale factors in the first and second component, respectively, while c and β are
3.1 Continuous Distributions 101
shape parameters in the two components, where λ, b, c, β > 0. This model includes several
other distributions, e.g., the G OMPERTZ –M AKEHAM distribution for c = β = 1, the W EIBULL
distribution for k = 1 and the Log–W EIBULL distribution (= extreme value distribution of type
I for the minimum) for k = 0, β = 1, and is capable of representing different courses of the
hazard rate. The D HILLON–I distribution, proposed in D HILLON (1981) results when k = 0 and
thus is less complicated, but still models increasing and bathtub–shaped hazard rates. Introducing
a location parameter a, a ∈ R, the D HILLON–I distribution has:
c−1
x−a c x−a c
c x−a
f (x) = exp 1−exp + ; x ≥ a; a ∈ R; b, c > 0
b b b b
x−a c
S(x) = exp{1 − exp
b
c−1 c DIHR for 0 < c < 1
c x−a x−a
h(x) = exp ⇒
b b b IHR for c ≥ 1
IDMRL for 0 < c < 1
µ(x) − no closed form ⇒
DMRL for c ≥ 1
For 0 < c < 1 the hazard rate hast its maximum at x∗ = a + b [(x − a)/b]1/c with h(x∗ ) =
(c/b) [(1 − c)/c](c−1)/c exp[1 − c)/c].
D HILLON–II distribution
This distribution of D HILLON (1981) is capable of generating decreasing or upside–down
bathtub–shaped hazard rates.
c ( c+1 )
c+1 x−a x−a
f (x) = ln +1 exp − ln +1 ; x ≥ a; a ∈ R; b, c ≥ 0
x−a+b b b
( c+1 )
x−a
S(x) = exp − ln +1
b
c DHR for c = 0
c+1 x−a
h(x) = ln +1 ⇒
x−a+b b IDHR for c > 0
IMRL for c = 0
µ(x) − no closed form ⇒ DIMRL for 0 < c < 2
∼
DMRL for c > 2
∼
For c > 0 the hazard rate has its maximum at x∗ = a + ec − 1 with h(x∗ ) = (c/e)c (c + 1) b.
c x − a c−1 x − a c
f (x) = exp −
; x ∈ R; a ∈ R; b, c > 0
2b b b
102 3 Presentation of Univariate Parametric Distributions
a−x c
1 − 0.5 exp − for x ≤ a
b
S(x) =
x−a c
0.5 exp − for x ≥ a
b
a − x c−1 a−x c
c exp −
b b
for x ≤ a
c
a − x
b 2 − exp −
b
h(x) =
x − a c−1 x−x c
c exp −
b b
for x ≥ a
c
x − a
b exp −
b
The hazard rate is — except for c = 2 — far from being monotone; it is asymmetric around x = a
in any case. The mean residual life function generally decreases, but for c > 3 it is not monotone.
∼
Exponential distribution
Formerly, the exponential distribution was regarded as the prototype of a lifetime distribution.
1 x−a
f (x) = exp − ; x ≥ a; a ∈ R, b > 0
b b
x−a
S(x) = exp −
b
1
h(x) = ⇒ IHR and DHR
b
µ(x) = b ⇒ IMRL and DMRL
x − a c−1
c x−a
f (x) = exp − 1 − exp ; x ≥ a; a ∈ R, b > 0
b b b
c
x−a
S(x) = 1 − 1 − exp −
b
DHR for 0 < c ≤ 1
h(x) − no closed form ⇒
IHR for c ≥ 1
IMRL for 0 < c ≤ 1
µ(x) − no closed form ⇒
DMRL for c ≥ 1
3.1 Continuous Distributions 103
F distribution
This distribution, also known as F ISHER distribution, who discovered it in the context of variance
analysis, is the distribution of the ratio of two independently
distributed χ2 variables. More
X1 ν1
precisely, if X1 ∼ χ2ν1 and X2 ∼ χ2ν2 , then X = ∼ Fν1 ,ν2 . The parameters ν1 , ν2 are
X2 ν2
called degrees of freedom, but nevertheless they are not restricted to integer values.
ν1 + ν2
Γ ν1 /2
2 ν1 x(ν1 −2)/2
f (x) = ν ν
1 2 ν2 (ν1 +ν2 )/2 ; x ≥ 0; ν1 , ν2 > 0
Γ Γ ν1
2 2 1+ x
ν2
There are no closed formulas for S(x), h(x) and µ(x). As E(X) = ν2 (ν2 − 2) only exists for
ν2 > 2, MRL also exists only for ν2 > 2. The hazard rate is either IDHR or IHR and the mean
residual life function is either DIMRL or DMRL, depending on the ν1 –ν2 –combination.
F R ÉCHET distribution
This distribution is also known as the extreme value distribution of type II for the minimum.
" #
c a − x −c−1 a − x −c
f (x) = exp − ; x ≤ a; a ≤ 0; b, c > 0
b b b
" #
a − x −c
S(x) = exp −
b
c a − x −c−1
h(x) = ⇒ IHR
b b
µ(x) − no closed form ⇒ DMRL
Gamma distribution
For c ∈ N this distribution is called E RLANG distribution.
c−1 x−a
(x − a) exp −
b
f (x) = c
; x ≥ a; a ∈ R; b, c > 0
b Γ(c)
x−a
γ c,
b
S(x) =
Γ(c)
c−1 x−a
(x − a) exp −
b DHR for 0 < c ≤ 1
h(x) = ⇒
x−a IHR for c ≥ 1
bc γ c,
b
IMRL for 0 < c ≤ 1
µ(x) − no closed form ⇒
DMRL for c ≥ 1
x−a c−1
x−a
c (1−p) exp − 1−exp −
b b
f (x) = c−1 ; x ≥ a; a ∈ R; b, c > 0, p ∈ (0, 1)
x−a
b 1 − p exp −
b
c
x−a
1 − exp − b
S(x) = 1 −
x−a
1 − p exp −
b
x − a c−1
x−a
c (1 − p) exp − 1 − exp −
b b
h(x) = c+1 c
x−a c
x−a x−a
b 1 − p exp − − 1 − p exp − 1 − exp −
b b b
We have
c−1
IDHR for p ∈ , 1 and c > 1,
c+1
h(x) = c−1
IHR for p ∈ 0, c+1 and c > 1,
DHR otherwise.
S(x) = 1 − I 1
(c, d), see Beta distribution
1+exp(−x/b)
d
h(x) − no closed form ⇒ IHR with lim h(x) =
x→∞ b
b
µ(x) − no closed form ⇒ DMRL with lim µ(x) =
x→∞ d
Interchanging c and d gives the type IV generalized logistic distribution for −X. For c = d we
have the type III generalized logistic distribution. For d = 1 we have the type I generalization,
and type II results for d = 1 and −X.
106 3 Presentation of Univariate Parametric Distributions
G OMPERTZ distribution
This distribution has been suggested by G OMPERTZ to smooth the course of mortality rates in
human life tables for higher ages.
α
f (x) = α exp(β x) exp [1 − exp(β x)] ; x ≥ 0; α, β > 0
β
α
S(x) = exp [1 − exp(β x)]
β
h(x) = α exp(β x) ⇒ IHR
µ(x) − no closed form ⇒ DMRL
3.1 Continuous Distributions 107
G UMBEL distribution
This is one of the extreme value distributions, namely, type I for the maximum. Because of the
PDF formula it is also known as double exponential distribution.
1 x−a x−a
f (x) = exp − − exp − ; x ∈ R; a ∈ R, b > 0
b b b
x−a
S(x) = 1 − exp − exp −
b
x−a
exp −
b 1
h(x) = ⇒ IHR with lim h(x) =
x−a x→∞ b
b exp exp − −1
b
x∗ − a
h(x) has a maximum at x∗ being the solution of b−π (x∗ −a)+2 (x∗ −a) arctan = 0.
b
µ(x) does not exist.
108 3 Presentation of Univariate Parametric Distributions
Half–logistic distribution
Left truncation of the logistic distribution at its mean (= mode = median) x = a gives the half–
logistic distribution.
x−a
2 exp
b
f (x) = ; x ≥ a; a ∈ R, b > 0
x−a 2
b 1 + exp
b
2
S(x) =
x−a
1 + exp
b
1
h(x) = ⇒ IHR
x−a
b 1 + exp
b
Half–normal distribution
Left truncation of the normal distribution at its mean (= mode = median) x = a gives the half–
normal distribution.
r
(x − a)2
1 2
f (x) = exp − ; x ≥ a; a ∈ R, b > 0
b π 2 b2
a−x
S(x) = 2 Φ
b
(x − a)2
exp −
1 2 b2
h(x) = √ ⇒ IHR
b 2π a−x
Φ
b
H JORTH distribution
This distribution is capable to show different types of aging.
x2
θ + δ x (1 + β x)
f (x) = exp −δ ; x ≥ 0; β, δ, θ > 0
(1 + β x)θ/β+1 2
x2
exp −δ
2
S(x) =
(1 + β x)θ/β
For β = 0 we have to take the limits of f (x) and S(x) leading to
δ 2
f (x) = exp −θ x − x (θ + δ x),
2
δ 2
S(x) = exp −θ x − x .
2
3.1 Continuous Distributions 109
IHR for θ = 0
DHR for δ = 0
θ
h(x) = δ x + ⇒ DIHR for 0 < δ < θ β
1+βx
IHR for δ ≥ θ β
constant for β = δ = 0
does not exist for δ = θ = 0
DMRL for θ = 0
IMRL for δ = 0
µ(x) − no closed form ⇒ .
IDMRL or DMRL for 0 < δ < θ β
DMRL for δ ≥ θ β
constant for β = δ = 0
In case of DIHR the hazard rate has its minimum at x∗ = θ β/δ − 1 β with h(x∗ ) =
p
q
2 θ δ (θ β) − δ β.
h(x) has a maximum at x∗ which is the solution of 2 exp b (x∗ − a)2 2 b − 3 (x∗ − a)2 +
6 (x∗ − a)2 = 0.
c . n h c io
h(x) has a maximum at x∗ which is the solution of b
x∗ −a 1 − exp − x∗b−a = c+1
c .
L APLACE distribution
This distribution is also known as double, bilateral or two–tailed exponential distribution.
1 |x − a|
f (x) = exp − ; x ∈ R; a ∈ R, b > 0
2b b
1 |x − a|
S(x) = 1 − 1 + sign(x − a) 1 − exp −
2 b
|x − a|
exp −
b
h(x) =
|x − a|
b 2 − 1 + sign(x − a) 1 − exp −
b
µ(x) − no closed form
h(x) is increasing over (−∞, a) and constant with h(x) = 1 b over [a, ∞). µ(x) is decreasing
over (∞, a) and constant with µ(x) = b over [a, ∞).
3.1 Continuous Distributions 111
Logistic distribution
The logistic distribution shares many properties with the normal distribution.
x−a
exp
1 b
f (x) = ; x ∈ R; a ∈ R, b > 0
b x−a 2
1 + exp
b
x − a −1
S(x) = 1 + exp
b
x − a −1
1 1
h(x) = 1 + exp − ⇒ IHR with lim h(x) =
b b x→∞ b
Log–gamma distribution
We present the version of the log–gamma distribution given in J OHNSON /KOTZ /BALAKRISHNAN
(1995, pp. 89ff.),3 which — for c 6= 1 — is a generalization of the extreme value distribution of
type I for the minimum (= log–W EIBULL distribution).
1 x−a x−a
f (x) = exp c − exp ; x ∈ R; a ∈ R, b, c > 0
b Γ(c) b b
x−a
γ c, exp
b
S(x) =
Γ(c)
c d x c−1
b b for 0 ≤ x ≤ b
x c
c + d − d
b
h(x) = c d b d+1
b x
for x ≥ b
d
b
c
x
µ(x) − complicated closed form
HR and MRL have rather different courses depending on the special c–d–value combination, e.g.,
for c > 1 and d > 1 we have IHR over [0, b) and DHR over [b, ∞) with DIMRL or IMRL over
[0, ∞). MRL does not exist for d ≤ 1.
Log–logistic distribution
In economics this distribution is known as F ISK distribution where it describes the distribution of
income.
c x − a c−1
b b
f (x) = 2 ; x ≥ a; a ∈ R, b, c > 0
x−a c
1+
b
1
S(x) =
x−a c
1+
b
c x − a c−1
b b DHR for 0 < c ≤ 1
h(x) = c ⇒
x−a IDHR for c > 1
1+
b
does not exist for 0 < c ≤ 1
µ(x) − no closed form ⇒
IMRL or DIMRL for c > 1
In case of IDHR the hazard rate has its maximum at x∗ = a + b (c − 1)1/c with h(x∗ ) =
(1/b) (c − 1)(c−1)/c .
3.1 Continuous Distributions 113
[ln(x − a) − α]2
1
f (x) = √ exp − ; x ≥ a; a ∈ R, α ∈ R, β > 0
β (x − a) 2 π 2 β2
ln(x − a) − α
S(x) = Φ −
β
ln(x − a) − α
φ −
β
h(x) = ⇒ IDHR
ln(x − a) − α
β (x − a) Φ −
β
[ln(a − x) − α]2
1
f (x) = √ exp − ; x < a; a ∈ R, α ∈ R, β > 0
β (a − x) 2 π 2 β2
ln(a − x) − α
S(x) = Φ −
β
ln(a − x) − α
φ −
β
h(x) = ⇒ IHR
ln(a − x) − α
β (a − x) Φ −
β
L OMAX distribution
This distribution is also known as PARETO distribution of the second kind.
x − a −c−1
c
f (x) = 1+ ; x ≥ a; a ∈ R; b, c > 0
b b
x − a −c
S(x) = 1+
b
c
h(x) = ⇒ DHR
x−a−b
does not exist for 0 < c ≤ 1
µ(x) = x−a−b
for c > 1 ⇒ IMRL
c−1
M UTH distribution
1 x−a 1 x−a x−a 1
f (x) = exp c − c exp − exp c +c + ;
b b c b b c
x ≥ a; a ∈ R, b > 0, 0 < c ≤ 1
1 x−a x−a 1
S(x) = exp − exp c +c +
c b b c
1 x−a 1−c
h(x) = exp c − c ⇒ IHR with h(a) = ∀c
b b b
µ(x) − no closed form ⇒ DMRL
Normal distribution
The normal or G AUSS distribution is of utmost importance in statistics. We have parameterized
this distribution
p by a which is the mean µ = E(X) and by b which is the standard deviation
σ = Var(X).
(x − a)2
1 1 x−a
f (x) = √ exp − = φ ; x ∈ R, a ∈ R, b > 0
b 2π 2 b2 b b
x−a a−x
S(x) = 1 − Φ = Φ
b b
x−a
φ
1 b
h(x) = ⇒ IHR
b a − x
Φ
b
µ(x) = a + b2 h(x) − x ⇒ DMRL
3.1 Continuous Distributions 115
3 (a − x)2
h(x) = ⇒ DIHR with minimum at x∗ = a and h(x∗ ) = 0
b3 + (a − x)3
0.75 b4 − b3 x + 0.25 x4
µ(x) = ⇒ IDMRL with maximum at x∗ ≈ a − 0.596072 b
b3 − x3
x−a 2 x−a 4
0.75 − 2 − 0.25
b b
µ(x) = b 2 ⇒ DMRL
x−a x−a
2+ −1
b b
The DIHR hazard rate its minimum at x∗ = a + b (1 − c)1/c with h(x∗ ) = (1 b) (1 − c)(c−1)/c .
R AYLEIGH distribution
This distribution is a special case of the χ–distribution when ν = 2, a special case of the
W EIBULL distribution when c = 2 and a special case of the generalized gamma distribution
when c = 1 and d = 2. It is also a linear hazard rate distribution.
" #
1 x−a 2
x−a
f (x) = exp − ; x ≥ a; a ∈ R, b > 0
b 2 b
" #
1 x−a 2
S(x) = exp −
2 b
x−a
h(x) = ⇒ IHR
b
µ(x) − no closed form ⇒ DMRL
a − x c−1 a−x c
c
f (x) = exp − ; x ≤ a; a ∈ R, b > 0
bb b
a−x c
S(x) = 1 − exp −
b
a − x c−1
c
b
h(x) = ⇒ IHR
x−a c
b exp −1
b
µ(x) − no closed form ⇒ DMRL
Semi–elliptical distribution
q
This distribution is also known as W IGNER’s semi–circle distribution. For b = 2 π ≈ 0.7979
the graph of f (x) is a semi–circle, otherwise a semi–ellipse.
s
x−a 2
2
f (x) = 1− ; a − b ≤ x ≤ a + b; a ∈ R, b > 0
bπ b
s
x−a 2
1 1 x − a x−a
S(x) = − 1− + arcsin
2 π b b b
s
x−a 2
4 1−
b
h(x) = s ⇒ IHR
2
x−a x−a x−a
b π−2 1− + arcsin
b b b
t distribution
This distribution is also known as S TUDENT’s distribution, the pseudonym of W. S. G OSSET, its
discoverer.
ν+1
Γ −(ν+1)/2
x2
ν
f (x) = √ ν 1 + ; x ∈ R; ν > 0
πνΓ ν
2
There are no closed formulas for S(x), h(x) and µ(x). The latter does not exist for ν ≤ 1.
for smaller ν we have IDHR and DIMRL. With ν → ∞ the t distribution goes to a normal
distribution, so then we have IHR and DMRL.
T EISSIER distribution
This distribution, suggested by T EISSIER (1934), is characterized by an exponentially declining
mean residual life function.
118 3 Presentation of Univariate Parametric Distributions
1 x−a x−a x−a
f (x) = exp − 1 exp 1 + − exp ; x ≥ a; a ∈ R, b > 0
b b b b
x−a x−a
S(x) = exp 1 + − exp
b b
1 x−a
h(x) = exp − 1 ⇒ IHR
b b
1 x−a
µ(x) = exp − ⇒ DMRL
b b
V–shaped distribution
We present the symmetric V–shaped distribution which may be regarded as a linear approximation
to the U–shaped parabolic distribution.
2 (2 a + b − 2 x)
for a ≤ x ≤ a + b 2
b2
f (x) = a ∈ R, b > 0
2 (2 x − 2 a − b)
for a + b 2 ≤ x ≤ a + b
b2
2 (x − a) (a + b − x)
1− for a ≤ x ≤ a + b 2
b 2
S(x) =
2
0.5 − (2 x − 2 a − b) for a + b 2 ≤ x ≤ a + b
b 2
2 (2 a + b − 2 x)
b2 + 2 (a − x) (a + b − x) for a ≤ x ≤ a + b 2
h(x) = ⇒ DIHR
2a + b − 2x
for a + b 2 ≤ x ≤ a + b
(a − x) (a + b − x)
3 − 2 y [3 + y (2 y − 3)]
b for 0 ≤ y ≤ 0.5
6 [1 + 2 y (y − 1)]
x−a
µ(y) = 2
with y = ⇒ DMRL
(y − 1) (y + 0.5) b
b
for 0.5 ≤ y ≤ 1
3 y (1 − y)
W EIBULL distribution
This distribution is also known as extreme value distribution of type III for the minimum.
c x − a c−1 x−a c
f (x) = exp − ; x ≥ a; a ∈ R; b, c > 0
b b b
x−a c
S(x) = exp −
b
c x−a
c−1 DHR for 0 < c ≤ 1
h(x) = ⇒
b b IHR for c ≥ 1
b
1 c
IMRL for 0 < c ≤ 1
x−a
µ(y) = exp(y c ) γ ,y ⇒ with y =
c c DMRL for c ≥ 1
b
We conclude this section on continuous distributions by showing a typical output of the pro-
gramm ContDist. After choosing one of the 62 distributions implemented in that program — here
the D HILLON–II distribution — the program shows — as a reminder — the PDF–formula of
this distribution. Then the user is asked to input a value for each of the pertaining parameters.
The program checks these values for admissibility. The chosen parameter values are displayed
together with the graphs of f (x), S(x), h(x) and µ(x).
120 3 Presentation of Univariate Parametric Distributions
Figure 3/1: PDF-formula display of the D HILLON–II distribution by the program ContDist
Figure 3/2: Display of the functions of a D HILLON–II distribution by the program ContDist
seldom exists in closed form and must be found numerically by summing the Pi ’s. Consequently,
the hazard rate
Pi
hi =
Si
mostly has no closed form, too. This statement also holds with respect to the mean residual life
function
Li = E(X − i|X ≥ i)
when the support is finite. For an infinite support i = imin , imin +1, ..., ∞ we avoid the evaluation
P∞
of the sum Sj with an infinite number of summands by remembering that — for existing
j=i+1 P
E(X) — we have Limin = Sj = E(X) − imin as Simin = 1. Thus, we can evaluate Li for
j>imin
i > imin as
h i i
X
Li = E(X) − imin − Sj Si ; i = imin + 1, imin + 2, ...
j=imin +1
Together with Pi and — if existing in closed form — Si , hi , Li we will give the ratio
Pi+1
qi =
Pi
Pi+1 Pi+2
∆ηi = − = qi − qi+1 ,
Pi Pi+1
which has been defined in (2/10b), will be given, too, as it serves in identifying the type of
monotonicity of the hazard rate, i.e.:
Many univariate discrete distributions can be explained by the so–called urn model. An urn (=
population) contains either a finite number N or an infinite number of balls from which either a
finite number M or a fraction P is red. The color ’red’ stands for any attribute. Sampling from
this urn may be either with or without replacement of each ball drawn before drawing the next
ball. Then the PMF gives the probability of having a certain number of red balls in the sample
or of the number of balls to be drawn until the first, the second or so on red ball is found in the
sample.
There is a caveat for the numerical evaluation of discrete distributions as severe rounding errors
may distort the result for extreme values of the parameters or of the variable of the distribution.
122 3 Presentation of Univariate Parametric Distributions
Binomial distribution
The binomial PMF gives the probability of having i red balls in a sample of size n, drawn with
replacement from an urn with fraction P of red balls.
n
Pi = P i (1 − P )n−i ; i = 0, 1, ..., n; n ≥ 1, 0 < P < 1
i
n−i P
qi =
i+1 1−P
n+1
∆ηi = > 0 ∀ i ⇒ IHR
(i + 1) (i + 2)
Li − no closed form ⇒ DMRL
Si = (1 − P )i
qi = (1 − P )
∆ηi = 0
qi = 1 − P
∆ηi = 0
P0 = 1 − α λ
0 < λ < 1, 0 < α < 1
Pi = α (1 − λ) λi ; i = 1, 2, 3, ...
S0 = 1
Si = α λi ; i = 1, 2, 3, ...
h0 = 1 − α λ
⇒ DHR
hi = 1 − λ; i = 1, 2, 3, ...
αλ
L0 =
1−λ
⇒ IMRL
λ
Li = ; i = 1, 2, 3, ...
1−λ
Hypergeometric distribution
The hypergeometric PMF gives the probability of having i red balls in a sample of size n, drawn
without replacement from a population of size N containing M red balls.
M N −M
i n−i max(0, M + n − N ) ≤ i ≤ min(n, M )
Pi = ;
N n, N, M ∈ N+ ; n < N, M < N
n
(M − i) (n − i)
qi =
(i + 1) (N − M + i + 1)
∆ηi > 0 ⇒ IHR
qi , ∆ηi are as with the ordinary hypergeometric distribution. We have IHR and DMRL. For i =
1, 2, ..., min(n, M ) the hazard rates of the ordinary and the positive hypergeometric distributions
are identical.
Logarithmic distribution
This distribution, also known as logarithmic series distribution, is derived from the M AC L AU -
RIN series expansion
P2 P3
− ln(1 − P ) = P + + + ....
2 3
It is the limit as m → 0 of the zero–truncated negative binomial distribution.
aPi 1
Pi = ; i = 1, 2, 3, ...; 0 < P < 1, a = −
i ln(1 − P )
P
E(X) = a
1−P
i
qi = P
i+1
P
∆ηi = − < 0 ⇒ DHR
(i + 1) (i + 2)
j=1 j
IHR for P large and r small
hi ⇒
DIHR otherwise
DMRL for P large and r small
Li ⇒
IDMRL otherwise
3.2 Discrete Distributions 125
Matching distribution
There is a population of size N ∈ N+ and the entities of this population are numbered 1, 2, ..., N.
In the classical matching model the entities are arranged in a random order. Let X be the number
of entities for which their position in the random order is the same as the number assigned to
them. N −i
1 X (−1)j
Pi = ; i = 0, 1, ..., N ; N ∈ N+ ,
i! j!
j=0
where PN = 1 N ! and PN −1 = 0. hi and Li are not monotone.
Negative binomial distribution
We look at successive random trials, each having a constant probability P of success (= drawing
of a red ball). The number of extra trials to perform in order to observe a given number m of
successes has a negative binomial distribution. For integer m it is called PASCAL distribution
and for m = 1 we have the geometric distribution.
m+i−1
Pi = P m (1 − P )i ; i = 0, 1, 2, ...; 0 < P < 1; m = 1, 2, ...
i
More generally, m may be any positive real number. Then we write:
Γ(m + i)
Pi = P m (1 − P )i ; i = 0, 1, 2, ...; 0 < P < 1; m > 0.
Γ(i + 1) Γ(m)
As Γ(z + 1) = z! for integer z the latter version of the negative binomial distribution is more
general than the first one.
1−P
E(X) = m
P
m+i
qi = (1 − P )
1+1
< 0 for
0 < m < 1 ⇒ DHR with IMRL
m−1
∆ηi = (1 − P ) = 0 for m = 1 ⇒ hi = P with Li = 1/P
(i + 1) (1 + 2)
> 0 for m > 1 ⇒ IHR with DMRL
Occupancy distribution
We have N distinct objects, e.g., balls, and m distinct boxes or cells. Now consider the placement
of these objects into the m boxes. The number of ways to do this clearly mN . Each of these ways
is considered equiprobable. We are interested in the distribution of X, the number of empty boxes
in a placement.
m−i
i+j N
m X m−i
Pi = (−1)j 1− ; i = 0, 1, ..., m − 1; m, N ∈ N+
i j m
j=0
We have:
• DMRL.
P OISSON distribution
This distribution is for the number of occurrences of an event in an interval of given length when
the intensity of event–occurrence in this interval is λ.
λi
Pi = exp(−λ); i = 0, 1, 2, ...; λ > 0
i!
E(X) = λ
λ
qi =
i+1
λ
∆ηi = > 0 ⇒ IHR
(i + 1) (i + 2)
λi
Pi = exp(−λ); i = 1, 2, 3, ...; λ > 0
i! (eλ − 1)
λ
E(X) =
1 − e−λ
λ
qi =
i+1
λ
∆ηi = > 0 ⇒ IHR
(i + 1) (i + 2)
P ÓLYA distribution
This distribution gives the probabilities of the number of successes (= red balls) in a rather general
urn model. An urn contains N balls, M being red. A sample of size n is to be drawn from this
urn. After each ball drawn this ball is given back to the urn together with K balls of the color just
drawn, K may be any integer ..., −1, −1, 0, 1, 2, ... . K < 0 means that a number |K| of balls
of the color just drawn is eliminated from the urn, but the ball just drawn is laid back anyway.
Three special values of K lead to special distributions:
i
Q n−i
Q
[M + (j − 1) K] [N − M + (j − 1) K]
n j=1 j=1
Pi = n ;
i Q
[N + (j − 1) K]
j=1
N −M M
i = 0, 1, ..., n for K ≥ 0; max 0, n+ ≤ i ≤ min n, − for K < 0
K K
n, N, M ∈ N+ ; M < N ; K = ..., −2, −1, 0, 1, 2, ...; N + K (n − 1) > 0
n−i M + iK
qi =
i + 1 (i + 1) [N − M + (n − i + 1) K]
∆ηi > 0 ⇒ IHR
Runs distribution
There are N0 balls labeled 0 and N1 balls labeled 1, arranged in random order. Let r0j be the
number of runs of j consecutive 0’s and r1j that of j consecutive 1’s. Then we have
X X
j r0j = N0 and j r1j = N1 .
j j
S ALVIA –B OLLINGER’s DHR distribution This and the following distribution of S ALVIA /B OL -
LINGER (1982) have their origin in looking for a rather simple form for the hazard rate, namely
some modification of a harmonic series.
c
hi = ; i = 0, 1, 2, ...; 0 < c < 1
i+1
P0 = h0 = c
i−1
c Y j+1−c
Pi = ; i = 1, 2, ...
i+1 j+1
j=0
S0 = 1
i−1
Y j+1−c
Si = ; i = 1, 2, ...
j+1
j=0
i+1−c
qi =
i+2
c+1
∆ηi = − < 0 ⇒ DHR with lim hi = 0
(i + 2) (i + 3) i→∞
αi + 1 − c
qi =
α (i + 1) + 1
α c − α2
∆ηi = − < 0 ⇒ DHR with lim hi = 0
[α (i + 1) + 1] [α (i + 2) + 1] i→∞
2
hi =
m − i + 2)
m−i
qi =
m−i+1
1
∆ηi = > 0 ⇒ IHR
(m − i) (m − i + 1)
m−i
Li = ⇒ DMRL
3
• m = 2 k, k = 1, 2, ...
• m = 2 k + 1; k = 1, 2, ...
We require m > 3. For m = 2 we will have the uniform distribution over two points i = 1 and
i = 2.
Case 1: m = 2 k, k =1, 2, ...
i
k (k+1) for i = 1, 2, ..., k
Pi =
m−i+k for i = k + 1, k + 2, ..., 2 k (= m)
k (k+1)
1 + 2 ik(1−i)
(k+1) for i = 1, 2, ..., k
Si =
(i−2 k−1) [i−2 (k+1)]
2 k (k+1) for i = k + 1, k + 2, ..., 2 k (= m)
2i
i (1−i) 2 k (k+1) for i = 1, 2, ..., k
hi =
2
2−i+2 k for i = k + 1, k + 2, ..., 2 k (= m)
i+1 for i = 1, 2, ..., k
i
qi =
2 k−i for i = k + 1, k + 2, ..., 2 k − 1
2 k−i+1
∆ηi > 0 ⇒ IHR
Li − complicated form ⇒ DMRL
Case 1: m = 2 k +
1, k = 1, 2, ...
i
(k+1)2
for i = 1, 2, ..., k + 1
Pi =
m−i+1 for i = k + 2, k + 3, ..., 2 k + 1 (= m)
(k+1)2
1 + 2i(k+1)
(1−i)
2 for i = 1, 2, ..., k + 1
Si =
(i−2 k−3) [i−2 (k+1)] for i = k + 2, k + 3, ..., 2 k + 1 (= m)
2 (k+1)2
2i
i (1−i) 2 (k+1)2
for i = 1, 2, ..., k + 1
hi =
2
3−i+2 k for i = k + 2, k + 3, ..., 2 k + 1 (= m)
i+1 for i = 1, 2, ..., k + 1
i
qi =
2 k−i for i = k + 2, k + 3, ..., 2 k
2 k−i+1
∆ηi > 0 ⇒ IHR
Li − complicated form ⇒ DMRL
3.2 Discrete Distributions 131
Uniform distribution
1
Pi = ; i = 1, 2, ..., m; m ∈ N+ ∨ m ≥ 2
m
i−1
Si = 1−
m
1
hi =
m−i+1
qi = 1
∆ηi = 0 ⇒ IHR
m−i
Li = ⇒ DMRL
2
P1 = α
i−1
Y
Pi = α iβ−1 1 − α j β−1 ; i = 2, 3, ..., m; 0 < α < 1, β > 0
j=1
∞ for 0 < β ≤ 1
m=
int α−1/(β−1) for β > 1
S1 = 1
i−1
Y
Si = 1 − α j β−1 ; i = 2, 3, ..., m
j=1
132 3 Presentation of Univariate Parametric Distributions
DHR for 0 < β < 1
β−1
hi = αi ⇒ constant (IHR and DHR) for β = 1 with hi = α ; i = 1, 2, ..., m
IHR for β > 1
IMRL for 0 < β < 1
Li − no closed form ⇒ 1−α ; i = 1, 2, ..., m
constant for β = 1 with Li = α
DMRL for β > 1
P0 = 1 − exp(−d)
n o i
X
1 − exp − d (i + 1)β j β ; i = 1, 2, ...; β ∈ R, d > 0
Pi = exp−d
j=1
S0 = 1
i
X
Si = exp−d j β ; i = 1, 2, ...
j=1
= 1 − exp − d (i + 1)β
hi
DHR for β < 0
⇒ constant (IHR and DHR) for β = 0 with hi = 1 − exp(−d) ; i = 0, 1, ...
IHR for β > 0
IMRL for β < 0
Li − no closed form ⇒ 1 ; i = 0, 1, ...
constant for β = 0 with Li = exp(d)−1
DMRL for β > 0
Y ULE distribution
The Y ULE distribution plays a role in biostatistics where it is the distribution for the number of
species of biological organisms per family, see also X EKALAKI (1983a, b).
Γ(i) Γ(ρ + 1)
Pi = ρ B(i, ρ + 1) = ρ ; i = 1, 2, ...; ρ > 0
Γ(i + ρ + 1)
S1 = 1
i
qi =
1+ρ+i
−ρ − 1
∆ηi = < 0 ⇒ DHR
(1 + ρ + i) (2 + ρ + i)
does not exist for ρ ≤ 1
Li − no closed form ⇒
IMRL (linear) for ρ > 1
hi − no closed form
[2 i + 1]−α − [2 i + 3]−α
qi =
[2 i − 1]−α − [2 i + 1]−α
h α α i
2 4
(2 i − 1)α 1 − 2 1 − 3+2 i + 1 − 5+2 i
∆ηi = − α α
< 0 ⇒ DHR
(2 i − 1) − (2 i + 1)
does not exist for α ≤ 1
Li − no closed form
IMRL for α > 1
134 3 Presentation of Univariate Parametric Distributions
We conclude this section on discrete distributions by showing a typical output of the programm
DiscDist. After choosing one of the 31 distributions implemented in that program — here the
discrete W EIBULL distribution of type I — the program shows — as a reminder — the PMF–
formula of this distribution. Then the user is asked to input a value for each of the pertaining
parameters. The program checks these values for admissibility. The chosen parameter values are
displayed together with the graphs of Pi , Si , hi and Li .
Figure 3/3: PMF-formula display of the W EIBULL type I distribution by the program DiscDist
Figure 3/4: Display of the functions of a W EIBULL type I distribution by the program DiscDist
Part II
Inferential Aspects
4 Sampling Lifetime Data1
The estimation approach as well as the testing approach for the hazard rate and for any other life-
time representative have to take account of the type of the sampled data, i.e., on the way the data
set has been generated and on the form in which the data are handed over to the statistician, either
as individual observations (non–grouped) or as frequency counts per interval of time (grouped).
We will revert to the latter aspect by the end of this chapter when commenting on Figures 4/1 and
4/2.
The true problem of sampling lifetime data is the fact that the characteristic to be measured is time
itself. We might have long–lasting life–testing experiments unless we shorten them in one way
or the other, e.g. by acceleration, see Sect. 1.1.2.4, or by censoring. Thus, in a lifetime data set
we will find observations of complete lifetimes from birth to death (called failure data hereafter)
and incomplete lifetimes ending before death or failure (called censored data hereafter). When
the data set consists of the entire time spans ranging from birth or start to death or failure of each
sampled unit, the data set is said to be complete or uncensored. In practice, uncensored lifetime
data sets will be the rare exception. Clinical studies and biological trials or technical life testing
will seldom lead to complete lifetimes of all sampled units for several reasons. A sample is called
incomplete or censored when it consists of time spans covering the whole period of the unit’s
existence as well as of time spans with missing early and/or lifetime. The latter type of time spans
are called censored times. In most cases the late lifetime is missing because the observation of an
individual is not terminated by its death or its failure but by some other event. Thus, in clinical
trials there may a loss to follow–up of a person or its death due to another risk other than that
under study, and in technical life testing we meet planned withdrawal alive or stopping of the
experiment before all units have failed.
We may distinguish between random censoring and non–random censoring. Most of the results
presented in Part II are valid for random censoring only, but in practice the findings are also used
when censoring is deterministic and planned. An assumption concerning random as well as non–
random censoring is that censoring is non–informative. This means that the failure mechanism
and the censoring mechanism are assumed to act independently. Stated otherwise, for each unit
the censoring must not be predictive for the future and unobserved failure. Specifically, it must
be true for each unit at each lifetime x that
Pr [X ∈ [x, x + dx) | X ≥ x = Pr X ∈ [x, x + dx) | X ≥ x, Z ≥ x , dx small, (4.1)
Z being the censoring variate. (4.1) means that the probability of failing shortly after x, given
survival up to and including x, is unchanged by the added condition that censoring has not oc-
curred up to and including time x. Unfortunately, the truth of (4.1) cannot be tested from the
censored sample alone. In practice, a judgement about the truth of (4.1) should be sought on the
best available understanding of the nature of censoring applied.
A very simple random censoring process that is often realistic is one in which each unit is assumed
to be endowed with two random variables, a lifetime X and a censoring time Z, X and Z being
independent and continuous variates, having CDFs F (x) and G(z), respectively. For example, Z
may be the time associated to the happening of a competing risk. With n being the sample size
let (Xi Zi ); i = 1, 2, ..., n; be independent and define
Yi = min(Xi , Zi ), (4.2a)
1
Suggested reading for this chapter: R INNE (2009, Chapter 8).
138 4 Sampling Lifetime Data
The data from observing n units now consists of the pairs (yi , δi ), i.e., it is known which obser-
vation is a failure time and which is a censored time. The joint probability of (yi , δi ) is obtained
using f (x) and g(z), the PDFs of X and Z, respectively. We have
and
Pr(Yi = y, δi = 1) = Pr(Zi > y, Xi = y)
= 1 − G(y) f (y). (4.2d)
If G(Y ) and g(y) do not involve any parameters of F (y) and f (y), then the first factor on the
right–hand side of (4.2f) can be neglected and the resulting expression taken to be proportional to
the likelihood function of the data:
n
Y 1−δi
f (yi )δi 1 − F (yi )
L∝ , (4.2g)
i=1
which constitutes the basis for maximum likelihood estimation, see Sections 7.1 and 7.2.
Censoring may be a random event and prevails in clinical studies, e.g., a person being member
of a special cancer survival study dies of a stroke or has a fatal traffic accident. Non–random
censoring prevails in life testing of technical units where the times of removing non–failed units
are scheduled at the beginning of the experiment. Non–random censoring takes different forms.2
According to what part of a lifetime is cut off and not reported by the sampling process, we
distinguish between
• censoring from above (on the right) when we do not observe the failure of a unit, i.e., the
last part of a lifetime is missing,
• censoring from below (on the left) when we do not know the ‘date of birth’ of a unit, i.e.,
observation starts at an unknown age of the unit and the first part of its lifetime is missing,
• censoring on both side, which is a combination of censoring from above and below.
2
A detailed description of life test plans, their motivation and their economic as well as their statistical advantages
and drawbacks is given in R INNE (2009, Chapter 8).
139
Censoring from above is most common with clinical studies and with technical life testing, and
that is why we will assume this type of censoring throughout Part II unless stated otherwise.
We further distinguish between:
• type–I censoring (time–dependent censoring) when testing and observing of lifetimes are
suspended when a fixed time xend has been reached, (The maximum lifetime to be ob-
served is xend . The number of failures in xend is random. Sometimes, depending on further
censoring prescriptions, the number of censored lifetimes in xend is random, too.),
• type–II censoring (failure–dependent censoring) when testing and observing are sus-
pended by reaching a fixed number k of failures, k < n, n being the sample size, (The
maximum observable lifetime, which also is the length or the duration of the experiment,
is the k-th order statistic Xk:n which is random,3 )
• a combination of type–I
and type–II censoring, meaning that testing and observation stop
at min xend , Xk:n .
• single
censoring, when all units
that have not failed up to and including a certain time
xend , Xk:n or min xend , Xk:n are withdrawn from the test so that all censored times
are of equal length, or
The following two figures show the quantities and data appearing in lifetime sampling. Fig. 4/1 is
related to non–grouped data and depicts the most general case, i.e., multiple or random censoring
with possibly tied observations. Other types of sampling with non–grouped data result from
Fig. 4/1 when assigning special values to the quantities ci and di which represent counts. Let
x1 < x2 < ... < xk , k ≤ n, be the observed distinct times of a failure. We admit three
possibilities of ties:
As time is a continuous variable ties of type 2) and 3) are theoretically impossible, but in practice
time is nearly always counted in some unit, for instance in minutes, hours or so on, so that two or
more events may happen ‘simultaneously’. In order to avoid difficulties with case 2), censored
times tied with failure times, we adopt the convention of moving such uncensored times in a tie
a little amount to the right so that censoring is assumed to occur a little bit later than failure. This
convention is sensible, since a unit observed alive at time xi , certainly survives past xi .
Beside di , which is attached to a certain point of time xi , we have ci , ci ≥ 0, the observed number
of censored lifetimes between failure times xi and xi+1 . Thus, ci is attached to an interval of
time, more precisely, to an interval of random length, which is right–opened: [xi , xi+1 ); i =
3
Commonly, x1:n ≤ x2:n ≤ ... ≤ xn:n denotes the ordered sample values. Sometimes, when there is no danger
of confusion and in order to keep the mathematical notation as lean as possible we will refrain from using this
special notation and xi will stand for the i–th longest lifetime.
140 4 Sampling Lifetime Data
0, 1, ..., k; x0 = 0 and xk+1 = ∞. This interval is in accordance with the convention above and
any censoring happening at xi+1 is counted in the number ci+1 of the following interval.
Figure 4/1: Illustration of the numbers ci , di , ni and the failure times xi on the time axis (non–
grouped data)
The numbers ni ; i = 0, 1, ..., k; are attached to a point just prior to xi . ni is called the number
of units at risk at xi , and it counts the number of units which are alive and not censored and
which are exposed to the risk of failure at xi . The quantities ni , ci , di are linked as
where
n0 = n, d0 = 0.
The sample size n can be expressed as
i.e., it is divided into the number of units with complete lifetimes and of incomplete lifetimes,
respectively. The number of censored lifetimes in the right–opened interval [xi , xi+1 ) is
ci = ni − di − ni+1 . (4.3c)
and especially
k−1
X
d k = nk = n − di .
i=1
2. When there are no tied failure times (di = 1 ∀ i) and no censoring we have
a) k = n and
b) ni = n − i + 1; i = 1, 2, ..., n.
3. For single censoring of type–II with ` as given number of failures to be observed and
possibly tied failure times we have k so that
k−1
X k
X
di < ` ≤ di
i=1 i=1
k
P
and c0 = c1 = ... = ck−1 = 0 and ck = n − di .
i=1
141
4. For single censoring of type–I with the single censoring time xend and possibly tied failure
times we may have
a) no failures before xend , so that neither d1 , d2 , ..., dk nor c1 , c2 , ..., ck exist and c0 =
n0 = n, or
b) k ≥ 1 failure times x1 < x2 < ... < xk < xend so that ck = nk − dk ≥ 0 lifetimes
will be censored at xend somewhere behind the last failure time xk . It might happen
that ck = 0 when all units have failed before xend .
Figure 4/2: Illustration of the numbers cj , dj , nj for a divided time axis (grouped data)
Fig. 4/2 depicts the situation for grouped sampling data. The time axis is divided into m + 1
intervals, not necessarily of equal length:
where t0 = 0, tm = tend and tm+1 = ∞, and tend is an upper limit on observation. These
intervals are fix and have non–random width
The analysis of grouped lifetime data is done by actuarial methods within a so–called life table. In
life tables for human populations, i.e., in demography, tend generally is 100 years and the interval
width has a constant length of one year for the first m = 100 intervals. But in general, the widths
need not be constant. Each member in a sample of n units whose lifetime starts at t0 either has
a failure time or a censoring time. These times are counted per interval. We now define the
following quantities:
where n1 = n is the sample size. In the last interval Im+1 = [tm , tm+1 ) = [tm , ∞) it can be
considered that only uncensored lifetimes are in this interval since the nm+1 units not failed by
tm = tend must fail somewhere in Im+1 . Thus we have
Pr(X = xi )
hi = Pr(X = xi | X ≥ x− i) = .
Pr(X ≥ xi )
As has been shown in (1.61c), the survival function can be written in terms of the hazard rate:
Y
S(x) = (1 − hi ). (5.1a)
i : xi ≤x
which reduces the problem of estimating the survival function to that of estimating the hazard
rate at each observed failure time xi . Choosing the maximum likelihood procedure we have an
appropriate element for the likelihood function as3
Li = hdi i (1 − hi )ni −di ; i = 1, 2, ..., k. (5.2a)
This expression is correct since
2. ni −di is the number of units on test not failing at xi with 1−hi as the probability of failing
after xi , conditioned on survival to time xi .
which is a so–called product—limit estimator (PLE). This version is commonly known as the
K APLAN /M EIER estimator (KME), see K APLAN /M EIER (1958).4 This estimator is a maxi-
mum likelihood estimator, too, as it is a function of the maximum likelihood estimated hi . In
(5.3a) the censored observations are not forgotten, they have been allowed for in ni , the number
of units at risk just before xi , and the effect of censored observations in the survival function
estimator is a larger downward step compared to the step–size if the there had been no censoring.
One problem that arises with the KME is that it is not defined past the last observed failure time
xk . The usual way to handle this problem is to cut off the estimator at xk . But there are other
suggestions. Some authors define
• S(x)
b = 0 for x > xk when dk = nk , i.e., when no sample units survived past xk ,
k
Q
• S(x)
b = 1 − di ni for x > xk when dk < xk , i.e., when there are sample units
i=1
surviving past xk .
4
Another PLE has been suggested by H ERD (1960) and J OHNSON (1964), see the Excursus further down.
144 5 Hazard Rate Estimation and the K APLAN /M EIER and N ELSON /A ALEN Approaches
The first suggestion means that the in the sampled population no individual would survive age xk ,
whereas the second suggestion supposes ever–lasting lifetime for some individuals in the sampled
population.
We now mention two special variants of the KME–formula (5.3a).
1. When there are neither multiple failures nor censored observations before xk , the longest
recorded failure time, we will have
which is the familiar staircase empirical survival function with a downward step of size 1/n
at each xi . The hazard rate estimator (5.2e) turns into
1
hi = b
b h(xi ) = ; i = 1, ..., n or k. (5.3c)
n−i+1
2. When we have records on all observed times — failure times as well as censored times —
which are given as pairs (yi , δi ), see (4.2a,b), the KME of (5.3a) may be written as
Y n − i δi
S(x) =
b , (5.3d)
n−i+1
i : yi ≤x
when there are no tied failure times. yi is any observation, either censored or not. The
hazard rate estimator for this case is
1
for δi = 1,
hi = b
b h(yi ) = n−i+1 (5.3e)
not defined for δi = 0.
Excursus: The K APLAN /M EIER estimator and the H ERD /J OHNSON estimator written in terms of
reverse ranks
Besides the KME we have another PLE which has been proposed by H ERD (1960) and J OHNSON (1964)
and which will be called H ERD /J OHNSON estimator, abbreviated HJE. When there are no tied obser-
vations both estimators can be defined recursively using the reverse ranks ri of y1 < y2 < ... < yn :
ri = n − i + 1, i = 1, 2, ..., n. (5.4)
Using ri the KME of (5.3d) turns into
δ
ri − 1 i
S(x) = KME Pi =
b b KME Pi−1 ;
b yi ≤ x < yi+1 ; i = 1, ..., n; (5.5a)
ri
with starting value
KME P0
b = 1. (5.5b)
The HJE is defined as
δi
ri
S(x)
b = HJE Pi =
b HJE Pi−1 ;
b yi ≤ x < yi+1 ; i = 1, ..., n; (5.6a)
ri + 1
5.1 Estimating the Hazard Rate and the Survival Function 145
b = n+1−i
HJE Pi
n+1
while the KME will be
b = n − i.
KME Pi
n
We conclude this excursus by stating that the KME being a MLE is more popular than the HJE.
ni − d i
di
∂L(h1 , ..., hk )
2 + 2
for i = `
− = h i (1 − h i ) ; i, ` = 1, ..., k. (5.7a)
∂hi ∂h`
for i 6= `
0
Substituting hi by its MLE the diagonal elements, which are not equal to zero, read
n3i
∂L(h1 , ..., hk )
− = . (5.7b)
∂hi ∂hi
hi =di /ni di (ni − di )
hi is
Thus, the estimated variance of b
di (ni − di )
Var
c b hi = (5.7c)
n3i
hi in (5.3e) and of 1 − b
When the data set is given as pairs (yi , δi ) the variance of b hi is
n−i
3
for δi = 1
(n − i + 1)
Var hi = Var 1 − hi =
c b c b (5.7e)
not defined for δi = 0
Pointwise confidence intervals for hi can now be obtained via the normal approximation. A two–
sided (1 − α)–confidence interval such as
q q
hi − τ1−α/2 Var hi ≤ hi ≤ hi + τ1−α/2 Var
b c b b c bhi (5.7f)
refers to a single observation. A larger multiplier than the standard normal percentile τ1−α/2
would be needed for a simultaneous confidence interval over more than one lifetime.
146 5 Hazard Rate Estimation and the K APLAN /M EIER and N ELSON /A ALEN Approaches
and then
X
c ln S(x)
Var b ≈ c ln 1 − b
Var hi
i : xi ≤x
X di
= . (5.8c)
ni (ni − di )
i : xi ≤x
h i2 X di
c S(x)
Var b ≈ S(x)
b , (5.8d)
ni (ni − di )
i : xi ≤x
which is known as G REENWOOD’s formula, G REENWOOD (1926). When the data are given as
pairs (yi , δi ) so that (5.3d,e) and (5.7e) hold, we have instead of (5.8b-d)
1
for δi = 1
(n − i) (n − i + 1)
c ln(1 − b
Var hi ) ≈ , (5.8e)
not defined for δi = 0
X δi
c ln S(x)
Var b ≈ , (5.8f)
(n − i) (n − i + 1)
i : yi ≤x
h i2 X δi
c S(x)
Var b ≈ S(x)
b . (5.8g)
n − i) (n − i + 1)
i : yi ≤x
A pointwise confidence interval for S(x) can be obtained analogous to (5.7f). G REENWOOD’s
formula may be seen unstable in the right tail of the distribution,5 so some authors have proposed
an alternative and simpler estimator originating in the binomial distribution, namely
h i2 h i
b i ) 1 − S(x)
S(x b
c S(x
Var b i) ≈ . (5.9)
ni
A rationale for (5.9) is given in C OX /OAKES (1984, p. 51) who also suggest likelihood based
confidence intervals resting upon a χ2 –distribution.
5
We see that (5.8d) grows with xi because di comes closer to ni . (5.8d) would even become ∞ when for the last
observed failure time xk we would have dk = nk .
5.1 Estimating the Hazard Rate and the Survival Function 147
The following data have often been used in the literature on lifetime analysis, e.g., see C OX /OAKES (1984)
or L EEMIS (1986). The following observations (yi , δi ) with time measured in weeks
yi 6 6 6 6 7 9 10 10 11 13 16 17 19 20 22 23 25 32 32 34 35
δi 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0
are the ordered times of remission, i.e., freedom of symptoms in a precisely defined medical sense, of 21
leukaemia patients who have been treated with the drug 6–MP (= 6-mercapto-purine). 12 patients have
been lost to follow–up (δi = 0). From the data above we have extracted and displayed in Tab. 5/1 the
‘failure times’ xi , where — in this example — failure is a positive event.
Table 5/1: Estimates of hi and S(xi ) for the 21 leukaemia patients’ data
i xi ni di hi
b Var(
c b hi ) S(xb i) c S(x
Var b i)
Figure 5/1: Estimated hazard rate and survival function with pointwise 95%–confidence intervals for the
21 leukaemia–patients’ data
148 5 Hazard Rate Estimation and the K APLAN /M EIER and N ELSON /A ALEN Approaches
Under the non–predictive censoring assumption the KME can be motivated in several ways. This
estimator is
1. the generalized MLE in the same sense that the empirical distribution function is in the
case of uncensored data, see M ILLER (1981, pp. 57 ff.),
2. the limit of the life–table estimators, i.e., for data grouped in time intervals (see Sec. 8.2),
as the intervals increase in number and decrease in length,
Example 5/2: The redistribution–to–the-right algorithm applied to the 21 leukemia patients’ data
The algorithm starts with an empirical distribution that puts mass 1/n at each observation yi ; i =
1, 2, ..., n; and then eliminates and moves the mass of each censored observation (yi , 0) by distribut-
ing it equally to all observations to the right of it. After the last redistribution the estimated survival
function at each (yi , 1) is unity minus the sum of the redistributed masses up to and including (yi , 1).
When this process is continued through all the observed data and the resulting mass function is processed
as described in the beginning of this example the KME results. To check this for one special value, e.g.,
for x5 = 16, note that
S(16)
b = 1 − [1/7 + 6/119 + 32/595 + 16/255 + 16/255] = 0.6275
Of course, both estimators have differing statistical properties and the generated estimates will
not be identical to each other.
In Chapter 1, see (1.11f), we have seen that the cumulative hazard rate H(x) and the survival
function S(x) are related as
H(x) = − ln S(x).
So, the indirect estimator of H(x), sometimes called the natural estimator of H(x), is
H(x)
e = − ln S(x).
b (5.10)
6
Suggested reading for this section: A ALEN (1978), E LANDT–J OHNSON /J OHNSON (1980, Chapter 2),
G ROSS /C LARK (1975, Sect. 4.7), L AWLESS (1982, Sect. 2.4), M ILLER (1981, Chapter 3), N ELSON (1969,
1970, 1972, 1982), S MITH (2002, Chapter 6).
7
When we have grouped data we have to take the life table estimator of S(x), see Sect. 6.2, which resembles the
formula for tied failure times.
8
For the definition of ni and di see (4.31) and for that of yi and δi see (4.2a,b).
150 5 Hazard Rate Estimation and the K APLAN /M EIER and N ELSON /A ALEN Approaches
The direct estimator of H(x) comes with different names, either as empirical accumulated
hazard function, see (1.62b), or as N ELSON /A ALEN estimator. It has first been suggested
by N ELSON (1972) in a reliability context and has been rediscovered by A ALEN (1978) using
modern counting process techniques. Using the MLEs of hi in (5.2e) or (5.3e) the empirical
accumulated hazard rate function results as9
P di
for tied failure times,
i : x i ≤x n i
H(x) =
b
P δi (5.12a)
for untied failure times,
i : yi ≤x (n − i + 1)
j
b j ) ≈ P d i n2 .
c H(x
A ALEN (1978) gives an approximation: Var i
i=1
Since H(x)
e = − ln S(x)
b we have from (5.11a)
Y di
H(x)
e = − ln 1−
ni
i : xi ≤x
X di
= − ln 1 −
ni
i : xi ≤x
X di d2i
= + + ... .
ni 2 n2i
i : x ≤x i
Thus, H(x)
b in (5.12a) can be viewed as first–order approximation to H(x).
e Comparing the esti-
mators H(x) and H(x) we see that for a given set of failure data
b e
• H(x)
b produces smaller estimates than H(x)
e and
where
tj−1 + tj ni + ni+1
t∗j = and n∗j = .
2 2
5.2 Estimating the Cumulative Hazard rate 151
see Example 5/3 and Fig. 5/2. Under certain regulatory conditions one can show that both esti-
mators are
• non–parametric MLEs,
• consistent,
• converge weakly to G AUSSIAN PROCESSES, see M ILLER (1981), pp. 67 ff.), meaning that
for fixed x, the estimators are approximately normally distributed.
Example 5/3: Estimation of the CHR for the 21 leukaemia patients’ data
• In Sect. 6.1 we will give basic definitions of life table functions and show how they are
related.
• Sect. 6.2 is the core of this chapter where we see how life table functions including the
hazard rate are to be estimated.
• In Sect.6.3 we review some further estimators of the hazard rate which are related to the
life table approach.
Ii = [tj−1 , tj ); j = 1, 2, ..., m + 1;
wj = tj − tj−1 ; j = 1, 2, ..., m + 1.
tend is an upper limit of observation, and for life tables we usually have tend = 100 years.4 For
those life tables the interval width, generally, is constant and amounts to one year. In order to
document the high infant mortality and its fast decline we sometimes have a finer division for the
1
The first life tables have been compiled by J. G RAUNT in 1666 and E. H ALLEY in 1693. Another pioneer is
the Prussian statistician and demographer J. P. S ÜSSMILCH (1707 – 1767). In the 19th century B. G OMPERTZ
(1779 – 1865) and W. M AKEHAM (1829 – 1871) were interested in the graduation of crude death rates for older
people.
2
Suggested reading for this section: E LANDT–J OHNSON /J OHNSON (1980), S HYROCK /S IEGEL (1976),
S PIEGELMAN (1968).
3
Another, but not so popular way of grouping is to construct the intervals so that they will contain fixed numbers
of failures. By this way, the interval limits and their widths are random. Hazard rate estimation for this kind of
grouping will be presented in Sect. 6.3
4
As human life expectancy is growing we nowadays may find tend = 105 years or even tend = 110 years,
especially for economically developed countries.
6.1 Life Table Functions 153
first and second years of life, i.e., days for the first week after birth, weeks and months thereafter.
So–called abridged life tables on the other hand, have a width of five or even ten years.
A life table combines two types of quantities:
The updating formula shows that there are flows into the stock as well as out of the stock. Looking
at a demographic life table we will only see flows out of the stock, where the stock is the number
of persons `x alive at exact age x and the outgoing flow is the number 1 dx of persons dying in
[x, x + 1).
We will first look at the quantities and functions coming up in a demographic life table. Later on,
when describing hazard rate estimation, we will change and simplify the notation and introduce
some more quantities. The key quantities of a demographic life table are 1 qx ; x = 0, 1, ..., xend ;
the conditional probabilities of dying in [x, x + 1), given an individual is alive at exact age x.
Roughly speaking, this ratio is obtained from dividing the number of persons dying in [x, x + 1),
originating from a nation’s mortality statistics by the number of persons entering the age of x,
originating from a nation’s population census. Then, a number `0 as starters at age x = 0 is
taken to derive all the other quantities. `0 is called the radix of the life table, and it is often
conventionally taken as 100,000 or 10,000. The `x , the expected numbers of persons at exact
age x, are found recursively by applying
1 px = 1 − 1 qx (6.1a)
to `0 with 1 px as an individual’s conditional probability of not dying in [x, x + 1), given being
alive at exact age x. We thus find
`x = (1 − 1 qx−1 ) `x−1
= 1 px−1 `x−1
x−1
Y
= `0 1 py . (6.1b)
y=0
The quantity
1 dx = 1 qx `x = `x − `x+1 (6.1c)
is the expected number of deaths in [x, x + 1) from where we may write
1 dx `x − `x+1
1 qx = = (6.1d)
`x `x
and
`x+1
1 px = 1 − 1 qx = . (6.1e)
`x
Whereas `x is the expected number of survivors out of `0 at exact age x, the expected propor-
tion of survivors out of `0 is
`x
Πx = . (6.1f)
`0
154 6 Estimating the Hazard Rate from Life Tables
This is in fact the survival function S(x). With (6.1a,b) we may write Πx as a telescope product:
x−1
Y x−1
Y
Πx = 1 py = (1 − 1 qy ), (6.1g)
y=0 y=0
which parallels the product–limit estimator (PLE) of (5.3a). Also, the expected proportion sur-
viving for k years, given alive at exact age x, is
x+k−1
Y x+k−1
Y
k px = 1 py = (1 − 1 qy )
y=x y=x
`x+1 `x+2 `x+k `x+k
= ... = . (6.1h)
`x `x+1 `x+k−1 `x
There are some more basic functions. The first one is Lx , the expected total number of years
lived in [x, x + 1). Lx is nothing but the number of ‘person × years’ that `x persons, aged x
exactly, are expected to live through [x, x + 1) and is recognized as a contribution of what is
called total–time–on–test statistic in life testing. Each member in the group who survives the full
year x to x + 1 contributes exactly one year to Lx , whereas each member who dies in [x, x + 1)
only contributes a fraction of a year to Lx . Formally, we have
x+1
Z Z1
Lx = `y dy = `x+u du. (6.2a)
x 0
This integral may be evaluated exactly when the age at death of each member is known. An
approximation to Lx is
`x + `x+1
Lx ≈ = `x − 0.5 1 dx , (6.2b)
2
assuming the deaths to be equally distributed within [x, x + 1). If the interval width is other than
one year, Lx of (6.2b) has to be multiplied by this width. The approximation (6.2b) tends to
overestimate Lx for younger ages and to underestimate it for older ages.
Tx is the expected total number of years live beyond age x by the `x persons alive at that age:5
Tx = Lx + Lx+1 + ... + Lxend −1
−x−1
xendX
= Lx+u . (6.2c)
u=0
Of course,
Tx = Tx−1 + Lx (6.2d)
and using the approximation (6.2b)
xend
`x X
Tx ≈ + `u
2
u=x+1
P−1
xend
= (x + 0.5) 1 dx (6.2e)
x=0
end −1
xX
`0
= + x 1 dx .
2
x=0
5
Remember that xend is the oldest age reported, which is assumed not to be survived so that `end = 0 and we have
1 dxend −1 = `xend −1 − `xend = `xend −1 . Furthermore, since all persons entering life at x = 0 will die before xend
we have
xend −1
X
`0 = 1 dx .
x=0
6.2 Estimators for Life Table Functions Including the Hazard Rate 155
o
The basic functions 1 qx , `x , 1 dx , Lx , Tx and ex are usually tabulated in a standard format as
in Tab. 6/1, which is the life table 2000 – 2002 for German males. It has been constructed using
updated results from the 1987 German population census and death statistics for the years 2000
through 2002. It is common practice to take the death statistics for more than one year to allow
for years of over–mortality and of sub–mortality.
Table 6/1: Extraction from the German life table 2000 – 2002 for males
o
x 1 qx `x 1 dx Lx Tx ex
0 0.00451281 100, 000 451 99, 605 7, 537, 995 75.38
1 0.00043340 99, 549 43 99, 527 7, 438, 390 74.72
2 0.00024513 99, 506 24 99, 493 7, 338, 863 73.75
3 0.00022031 99, 481 22 99, 479 7, 239, 370 72.77
4 0.00013878 99, 459 14 99, 452 7, 139, 899 71.79
.. .. .. .. .. .. ..
. . . . . . .
96 0.30296858 2, 242 679 1, 902 5, 541 2.47
97 0.32184621 1, 563 503 1, 311 3, 639 2.33
98 0.34097780 1, 060 361 879 2, 328 2, 20
99 0.36031243 698 252 573 1, 448 2.07
100 0.37979995 447 170 362 876 1.96
Source: Statistisches Bundesamt, ed., (2004)
6.2 Estimators for Life Table Functions Including the Hazard Rate6
In this section we assume strictly grouped data, i.e., we only know how many sample units either
failed or had censored lifetimes in each interval. The case, where we have for each interval
individually recorded failure times and censored times, will be treated in Sect. 6.3.
We now revert to the notation for grouped data as has been introduced in Fig. 4/2, i.e., we drop
the two indices in n qx , n px and n dx , which indicate flows out of the interval [x, x + n), and
we switch from x and y to i and j, conventionally used in counting. The demographic life table
6
Suggested reading for this section: E LANDT–J OHNSON /J OHNSON (1980), G EHAN (1969), K IMBALL (1960),
L ONDON (1988), M ILLER (1981), M ÜLLER et al. (1997), S INGPURWALLA /W ONG (1983), S MITH (2002).
156 6 Estimating the Hazard Rate from Life Tables
does not know censored observations which usually might be encountered in life tables used in
displaying and evaluating data from clinical and biological survival studies or from life testing.
The type of life table used now is outlined in Tab. 6/2. We start with commenting upon the seven
columns 2 – 8 which either reflect observations (nj , n0j , cj , dj ) or are quantities defined on the
time axis (tj , t∗j , wj ). The last five columns show estimates, and their estimators are the core of
this section.
Table 6/2: Lay–out of a non–demographic life table
1.
Ij = [tj−1 , tj ); j = 1, 2, ..., m + 1; tm+1 = ∞, t0 = 0; (6.4a)
is the half–open time interval. The last interval is infinite in length.
2.
wj = tj − tj−1 ; j = 1, 2, ..., m; (6.4b)
is the width of the interval j. A constant width will be denoted w. The widths are required
to estimate rates such as PDF and HR. Since the width of the last interval is infinite, no
estimate of either PDF and HR can be given for this interval.
3.
tj−1 + tj
t∗j = ; j = 1, 2, ..., m; (6.4c)
2
is the midpoint of interval j. The midpoints are used as point of reference for the estimated
PDF and HR which are assumed to be constant within Ij .
5. dj ; j = 1, 2, ..., m + 1; is the total number of units who die or fail in the j–th interval.
n1 = n, (6.4d)
n being the sample size. nj is the number of units exposed to the risk of either dying
(failing) or being censored in Ij . The updating formula linking the nj ’s is
Ztj
πj = f (u) du = F (tj ) − F (tj−1 ) = S(tj−1 ) − S(tj )
tj−1
= Πj − Πj+1 ; j = 1, 2, ..., m + 1; with Πm+2 = 0. (6.5b)
Combining πj and Πj we have the following conditional probability of failing in Ij , given survival
up to the start of Ij :
πj Πj+1
qj = =1− . (6.5c)
Πj Πj
The complement
Πj+1
pj = 1 − q j = (6.5d)
Πj
is the conditional probability of surviving Ij . We immediately see that
Thus, Πj is the cumulated product of the conditional survival probabilities for the first j − 1
intervals. (6.5e) is the life table analogue of the product limit estimator (5.3a). Upon combining
(6.5c,e) we may write the unconditional failure probability in Ij as8
j−1
Y
πj = Πj qj = qj pi , (6.5f)
i=1
k
8 Q
Remember ai = 1 for k < 1.
i=1
158 6 Estimating the Hazard Rate from Life Tables
and we have
π1 = q 1
p1 = 1 − qi = Π2
i.e., all members of the sample will be observed dying or failing. The set {d1 , d2 , ..., dm+1 } has
a multinomial distribution:
m+1
Y π dj
Pr(d1 , d2 , ..., dm+1 ) = n! (6.6b)
dj !
j=1
with
E(dj ) = n πj , (6.6c)
Var(dj ) = n πj (1 − πj ), (6.6d)
Cov(dj , dk ) = −n πj πk , j 6= k. (6.6e)
b j = nj ; j = 2, 3, ..., m + 1;
Π (6.7c)
n
which is also binomially distributed and has
E(Π
b j ) = Πj , (6.7d)
Πj (1 − Πj )
Var(Π
bj) = . (6.7e)
n
We mention that Π
b i and Π
b j with i < j are positively correlated:
s
1 − Πi 1 − Πj
Cor(Π
b i, Π
bj) = . (6.7f)
Πi Πj
With n → ∞ the distribution of Π b j goes to a normal distribution, and its mean and variance are
estimated by (6.7d,e) upon substituting Πj by its estimator Π
b j . Thus, we easily have approximate
confidence intervals for Πj .
6.2 Estimators for Life Table Functions Including the Hazard Rate 159
In (6.5c) we have seen that the conditional probability of dying or failing in Ij = [tj−1 , tj ),
conditional on survival up to tj−1 , is qj . So, the proportion of deaths or failures in Ij ,
dj
qbj = (6.8a)
nj
q j | nj ) = q j ,
E(b (6.8b)
nj − d j
pj | nj ) = pj with pbj = 1 − qbj =
E(b , (6.8c)
nj
pj q j
qj | nj ) = Var(b
Var(b pj | n j ) = . (6.8d)
nj
Conditional on (n1 = n, n2 , ..., nj ) the random variables qbj , ..., qbj are mutually independent.
Besides the estimator of Πj in (6.7c) there is another estimator which rests upon (6.5e):
b j = pb1 · pb2 · ... · pbj−1 ; j = 1, 2, ..., m + 1; pb0 = 1.
Π (6.9a)
Because of the mutual independence of the qbj ’s and hance of the pbj ’s the estimator (6.9a) is
unbiased and has the approximate conditional variance9
j−1
b j | n1 , ..., nj ) = Π2j
X qi
Var(Π ; j = 2, 3, ..., m + 1; (6.9b)
n i pi
i=1
With censoring (6.7a) does no longer hold and the number nj of units entering Ij = [tj−1 , tj ) is
not the number of survivors up to tj−1 because some of the lifetimes censored before tj−1 may
last longer than tj−1 . Consequently, Πj cannot be estimated by (6.7c), and we have to revert to
9
For a proof see E LANDT–J OHNSON /J OHNSON (1980, p. 140).
160 6 Estimating the Hazard Rate from Life Tables
(6.5e) and its conditional survival probabilities in search for an estimatorΠ b j . Thus, we have to
look for an estimator of qj = 1 − pj under censoring. The estimator dj nj in (6.8a) might be
expected to underestimate qj , since it is possible that some of the units censored in Ij might have
failed or died before the end of Ij , hat they not been censored first. It is therefore desirable to
make some adjustments for the censored units. The most commonly used procedure is to estimate
qj by the so–called actuarial estimator, also called standard life table estimator, which is
dj d
qbj = = j0 , (6.10a)
ni − cj 2 nj
i.e., we replace nj in (6.8a) by n0j , the average number of units exposed to risk of failing or dying,
see (6.4g). This adjustment is arbitrary, but sensible in many situations. Its appropriateness
depends on the failure and censoring process of course. Once estimates qbj and pbj = 1 − qbj have
been calculated, Πj can be estimated using (6.9a).
Conditioned on nj and cj and assuming that qbj of (6.10a) is approximately a binomial proportion
we have
qj | nj , cj ) = Var(b
Var(b pj | nj , cj )
pj q j pj q j
≈ = . (6.10b)
n0j nj − cj 2
c qj | nj , cj ) is found by replacing pj and qj by their estimators. Conditional on
An estimator Var(b
the sets {nj } = (ni , ..., nj ) and {cj } = (c1 , ..., cj ), G REENWOOD’s formula for the conditional
variance of Πb j , derived for the uncensored data in (6.9b,c), is approximately valid here when nj
is replaced with n0j = nj − cj 2. So we have
j−1
c Π b 2j
b j | {nj }, {cj }) ≈ Π
X qbi
Var( ; j = 2, 3, ..., m + 1. (6.10c)
n0i pbi
i=1
This estimator is reasonable provided E(n0j ) is not too small, though, when there is a lot of cen-
soring, m + 1, the number of intervals, should not be too small. (6.10c) sometimes tends to
underestimate the variance of Πb j for intervals in the right–hand tail of the lifetime distribution,
0
essentially when E(nj ) is quite small. However, in such instances the distribution of Π b j is typ-
ically highly skewed, and its variance is not a particularly good indicator of estimator precision
anyway.
Estimating f (t∗j )
The PDF f (x) of a lifetime distribution is also called curve of deaths in the context of life table
∗
analysis. With tj = (tj−1 + tj ) 2 as midpoint and wj = tj − tj−1 as width of the j–th interval10
S(tj−1 ) − S(tj )
≈
wj
Πj − Πj+1
=
wj
Πj − pj Πj
=
wj
10
As the last interval Im+1 has length ∞, in all the following formulas j ranges from 1 to m.
6.2 Estimators for Life Table Functions Including the Hazard Rate 161
Πj q j
=
wj
πj
= ; j = 1, 2, ..., m; see (6.5f); (6.11)
Πj
i.e., f (t∗j ) is the unconditional probability πj of failing in Ij per unit width, the definition of the
PDF.
To arrive at an estimator we insert estimators of Πj and qj . When the sample has not been censored
we take qbj = dj nj and Π b j = nj n and have the following estimator of the PDF at the midpoint
t∗j of Ij :
Π
b j qbj dj
fb(t∗j ) = = . (6.12a)
wj n wj
fb(t∗ ) is taken to construct the histogram–estimator of the PDF. From (6.6c,d) and (6.5c) we have
j
E(dj ) = n πj = n qj Πj , (6.12b)
Var(dj ) = n πj (1 − πj ) = n qj Πj (1 − qj Πj ), (6.12c)
The estimated version of (6.12d) results when qj and Πj are replaced by their estimators qbj =
dj nj and Π
b j = nj n, respectively:
c fb(t∗j ) = dj (n − dj ) .
h i
Var (6.12e)
n3 wj2
Π
b j qbj pb1 pb2 ...b
pj−1 qbj
fb(t∗j ) = = =: g(b
p1 pb2 ...b
pj−1 pbj ), (6.13a)
wj wj
where qbj = dj n0j and pbj = 1 − qbj . For this estimator we can only give a conditional variance.11
The problem now is to find this variance as the function g(b p1 pb2 ...b
pj−1 pbj ) of j variates. Based on
the large–sample–approximation formula
j
X ∂g ∂g
Var g(b pj−1 pbj ) ≈
p1 pb2 ...b pi , pbk ) ,
Cov(b
∂ pbi ∂ pbk
i,k=1
Estimating h(t∗j )
As is well known the hazard rate at any point of time is defined as h(t) = f (t) S(t). For t = t∗j
b ∗j ) = Πj + Πj+1 = Πj (1 + pbj ) .
b b b
Π(t (6.14a)
2 2
Upon combining (6.13a) and (6.14a) we have
fb(t∗j ) 2 qbj
h(1) (t∗j ) =
b = . (6.14b)
∗
Π((tj )
b wj (1 + pbj )
h(1) (t∗j ) is the most popular estimator, sometimes called classical or actuarial estimator of h(t∗j ).
b
(6.14b) can be transformed by inserting qbj = dj n0j into
dj
h(1) (t∗j ) =
b (6.14c)
wj (n0j
− dj 2)
which is known by the name central death rate. The denominator wj (n0j − dj 2) estimates the
When n0j is small, (6.14d) looses accuracy. This implies that if there are very few survivors in the
(1) ∗
later intervals, the computation of Var
c b h (tj ) | {nj } is not worthwhile for these later stages.
Another estimator for h(t∗j ) is found by converting the conditional failure probability qbj = dj n0j
into a rate:
qbj dj
h(2) (t∗j ) =
b = , (6.15a)
wj wj n0j
called death rate. Its estimated approximate variance, conditioned on {n0j }, follows from (6.10b)
as
(2) ∗ 1 pbj qbj
h (tj ) | {n0j } ≈ 2 0 .
Var
c b (6.15b)
wj nj
S(t∗j − w 2) − S(t∗j + w 2)
∗
lim b (1)
h (tj ) = w ∗ = e
∗ + w 2)
h(1) (t∗j ), (6.16a)
n→∞
2 S(t j − w 2) − S(t j
S(t∗j − w 2) − S(t∗j + w 2)
h(2) (t∗j ) =
lim b h(2) (t∗j ).
= e (6.16b)
w S(t∗j − w 2)
n→∞
We see, that
2 1
h(1) (t∗j ) ≤
e h(2) (t∗j ) ≤ ,
and e
w w
12
See M ÜLLER et al. (1997).
6.2 Estimators for Life Table Functions Including the Hazard Rate 163
so that neither statistic can approximate h(t∗j ) whenever h(t∗j ) > 2 w. This is the underlying
\ j wj ) = dj .
1 − exp(−h (6.17b)
nj
Observing dj = nj − nj+1 and pbj = nj+1 nj we find, after some manipulations, S ACHER’s
estimator13
(3) ln pbj
hj = −
b . (6.17c)
wj
This is a MLE too, and as it is a natural log–function of the binomial variate dj its variance can be
approximated by the method of statistical differentials using (1.30b). The result for the estimated
version of the conditional variance then is
(3) 1 qbj
Var[
c b hj | {nj }] ≈ 2 . (6.17d)
wj nj pbj
Often, (6.17c,d) are applied when there is censoring, so that nj is replaced with n0j . This procedure
is satisfying as long as the numbers of censored lifetimes per interval do not differ much. As
(3)
G EHAN (1969) reports, Monte Carlo Studies have shown that b h(1) (t∗j ) is less biased than b
hj ,
(3)
and S INGPURWALLA /W ONG (1983) report that b h is almost positively biased whereas b
j h(1) (t∗ ) j
tends to be negatively biased as t∗j increases.
n = 200 pieces of equipment had been put on a life test for a certain type of failure. The test was scheduled
to last at most tend = 240 hours. By the end of the test 5 pieces had not failed. The reason for censoring
in this life test is failure due to another reason than that under study and due to withdrawal of non–failed
units for special further investigations. Table 6/3 gives the data and Table 6/4 displays estimates together
with their estimated variances.
13
An estimator similar to (6.17c) is
(4) 1
hj = − ln pbj−1 + ln pbj ,
b
2
qj +0.5 qbj2 −... we are — for hj and qj very small —
and from the series presentation of ln pbj = ln(1− qbj ) = −b
back to (6.15a).
j
9
8
7
6
5
4
3
2
j Var(b Π Var( 1
11
10
c qj | n0j ) bj c Π b j | {n0j }) fb(t∗j ) Var[ h(1) (t∗j ) Var[
c fb(t∗j )|{n0j }] b c b h(3) (t∗j ) Var[
h(1) (t∗j )|{n0j }] b c b h(3) (t∗j )|{n0j }]
164
qbj pbj
(7.10a) (7.10b) (7.9a) (7.10c) (7.13a) (7.13b) (7.14c) (7.14d) (7.17c) (7.17d)
Ij
240 − ∞
0 − 24
72 − 96
48 − 72
24 − 48
96 − 120
216 − 240
192 − 216
168 − 192
144 − 168
120 − 144
1 0.0050 0.9950 0.249 1 0 0.208 0.010 0.021 0.044 0.021 0.044
2 0.0202 0.9798 0.995 0.9950 0.249 0.835 0.208 0.085 0.180 0.085 0.180
−
wj
24
24
24
24
24
24
24
24
24
24
3 0.0310 0.9690 1.553 0.9749 1.224 1.260 0.534 0.131 0.287 0.131 0.287
t∗j
−
84
60
36
12
228
204
180
156
132
108
4 0.0753 0.9247 3.742 0.9447 2.625 2.963 3.093 0.326 0.757 0.326 0.760
5
6
4
1
dj
11
17
36
40
27
24
14
5 0.1416 0.8584 7.171 0.8736 5.584 5.154 9.595 0.635 1.670 0.636 1.689 0
1
2
1
3
1
3
2
1
1
0
cj
6 0.1882 0.8118 10.645 0.7499 9.588 5.879 12.747 0.865 2.744 0.869 2.804
5
7 0.3493 0.6507 19.852 0.6088 12.305 8.861 29.358 1.764 7.428 1.791 8.141
nj
17
36
73
116
144
171
187
194
199
200
Table 6/3: Data for life table estimation
8 0.4966 0.5034 34.481 0.3961 12.568 8.196 25.555 2.752 18.747 2.859 23.618
5
n0j
35
186
200
9 0.4857 0.5143 71.370 0.1994 8.596 4.036 6.421 2.673 37.704 2.771 46.847
16.5
72.5
114.5
143.5
169.5
193.5
198.5
10 0.6667 0.3333 134.680 0.1026 5.112 2.849 3.495 4.167 118.371 4.578 210.438
11 1 0 0 0.0342 1.985 − − − − − −
Table 6/4: Estimates (with variances) of life table quantities
6 Estimating the Hazard Rate from Life Tables
6.3 Related Hazard Rate estimators 165
The t∗ji –s can be processed to give the amount of ‘item × time units’ spent in Ij :
nj
X
Lj = (t∗ji − tj−1 )
i=1
X X
= (t+
ji − tj−1 ) + (t−
ji − tj−1 ) + nj+1 wj . (6.18b)
i i
Lj is the sample equivalent of Lx in (6.2a), called the amount exposed to risk. The central death
rate for Ij , which is taken as an estimator of the hazard rate in Ij , results as
dj
h(5) (t∗j ) =
b , (6.18c)
Lj
expressing the number of failures per item per unit of time. We note that when we assume that
over Ij
166 6 Estimating the Hazard Rate from Life Tables
• the time at failure has mean (tj−1 + tj ) 2 and
• the time of censoring has mean (tj−1 + tj ) 2, too,
(6) dj − 1
hj =
b , tj−1 ≤ t < tj , (6.19b)
j −1
dP
(nj − i) (t+
j,i+1 − t+
ji )
i=0
where t+
ji is the time of failure of item i in Ij . The denominator in (6.19b) is nothing but the
(6)
number of time units lived in Ij by those items that failed in Ij . The estimated variance of b
h is j
(6) 2
(6) hj
b
Var
c hj
b ≈ . (6.19c)
dj − 2
Assumption (6.19a), although suitable for some purposes, is not consistent with most data on
mortality and failure intensities which indicate that h(t) usually is a non–decreasing function
of t. K IMBALL (1960) has given hazard rate estimators when (6.19a) does not hold, but instead it
is assumed that h(t) increases linearly with τ over the interval for which h(t) is being estimated.
The estimates and their variances are obtained recursively.
7 Maximum Likelihood Estimation of
Monotone Hazard Rates1
The direct non–parametric ML estimation of the hazard rate started as early as 1956 by papers of
G RENANDER (1956) and K IEFER /W OLFOWITZ (1956). They assumed a continuous distribution
with increasing hazard rate and a sample with uncensored data. Later papers generalized their
ideas to decreasing as well as to U-shaped hazard rates, to discrete distributions, to censored
samples and to the discovery of the statistical properties of the resulting estimators. A general
drawback of the ML–estimated hazard rate is its non–smoothness, i.e., the hazard rate is assumed
constant between observed failure times. Another drawback is that for U–shaped hazard rates the
change point has to be known and that for monotone hazard rates one first has to test whether the
distribution is IHR or DHR.2 We will only discuss the estimation of monotone hazard rates — for
U–shaped hazard rates see M YKYTYN /S ANTNER (1981) — and we will first present results for
complete sample data (Sect. 7.1) and then give results for censored samples (Sect. 7.2).
ML methods are especially apt for processing non–grouped data of smaller sample sizes. When
we have a continuous lifetime distribution, the estimates found here subsequently should be
smoothed in one way or the other, whereas for a discrete lifetime distribution the estimates found
here are definitive.
n n Z xi
X X
L = ln L = ln h(xi ) − h(u) du. (7.1b)
i=1 i=1 0
1
Suggested reading for this chapter: BARLOW et al. (1972), G RENANDER (1956), K IEFER /W OLFOWITZ (1956),
M ARSHALL /P ROSCHAN (1965), M YKYTYN /S ANTNER (1981), PADGETT (1988), PADGETT /W EI (1980),
P RAKASA R AO (1970)
2
Such test will be presented in Sect.10.2.2.
168 7 Maximum Likelihood Estimation of Monotone Hazard Rates
We first consider the case of a continuous IHR distribution. It is not possible to obtain a MLE
directly by maximizing either L or L, since (h(x) can be arbitrarily large. It follows from argu-
mentation of M ARSHALL /P ROSCHAN (1965) that
h
X n−1
X
L≤ ln h(xi ) − (n − i) (xi+1 − xi ) h(xi ) =: L∗ . (7.1c)
i=1 i=1
and
h(xn ) = ∞.
b (7.1e)
For the remaining values of x, b
h(x) is determined as
0 for x < x1
h(x) =
b h(xi ) for xi ≤ x < xi+1 ; i = 1, 2, ..., n − 1
b , (7.1f)
∞
for x ≥ xn
h(x) is a monotone increasing step–function, see Fig. 7/1. The estimator (7.1d) is consis-
so that b
tent and its — not simple looking — asymptotic distribution has been found by P RAKASA R AO
(1970). The corresponding estimators of S(x) and f (x) are obtained using b h(x) :
x
Z
S(x)
b = exp− b h(u) du
0
n X o
= exp − h(x) min(x, xi−1 ) − xi
b (7.1g)
fb(x) = b
h(x) S(x).
b (7.1h)
where x0 = 0. Ti is nothing but the total–time–on–test spent by the xi –survivors in the interval
[xi , xi+1 ), i.e., between the i–th and the (i + 1)–st failure times. Another estimator of h(x),
called naive estimator by some authors, is the reciprocal of Ti :
1
for xi ≤ x < xi+1 ; i = 0, 1, ..., n − 1
h(xi ) =
e (n − i) (xi+1 − xi ) (7.2b)
0 for x ≥ xn .
7.1 The Case of Complete Samples 169
The naive estimator is asymptotically unbiased, but it is not consistent since it has a limiting
non-degenerate distribution. Furthermore, since for any n distinct time points x1 , ..., xn the es-
h(xi ), ..., e
timators e h(xn ) are asymptotically independent, the graph of e h(x), x ≥ 0, will exhibit
wild fluctuations, prohibiting its use as an estimator for a monotone hazard rate. For this reason
S INGPURWALLA /W ONG (1983) have proposed a smoothed version of e h(x), smoothed by kernel
methods.
The estimator bh(xi ) in (7.1d) can be interpreted as the result of averaging naive estimators until
an increasing sequence b h(xi ) ≤ bh(x2 ) ≤ ... ≤ bh(xn ) has been found. First, the maximum of
∗
L in (7.1c) is found, giving h(xi ) in (7.2b). If there is a reversal, say e
e h(xi ) > e
h(xi+1 ), then set
h(xi ) = h(xi+1 ) in (7.1c) and repeat the procedure. After, at most n steps of this kind, a mono-
tone estimator is obtained. The maximum of L∗ derived with h(xi ) = h(xi+1 ) can be directly ob-
n o−1
h(xi ) and e
tained by replacing e h(xi+1 ) by their harmonic mean e h(xi )−1 + e
h(xi+1 )−1 2 .
Succeeding steps amount to further such harmonic averaging which is extended just to the point
necessary to eliminate all reversals.
The following n = 10 observations xi in Tab. 7/1 have been simulated from a W EIBULL distribution (see
Sect. 3.1) with parameters a = 0, b = 4 and c = 2 and thus come from an IHR distribution. Tab. 7/1
gives the ML–estimated hazard rate values according to (7.1d) together with the naive estimates according
to (7.2b).
Table 7/1: ML–estimates and naive estimates of an increasing hazard rate
1 0 0.69 0 0.1449
2 0.69 1.20 0.1720 0.2179
3 1.20 2.08 0.1720 0.1420
4 2.08 2.21 0.3110 1.0989
5 2.21 3.13 0.3110 0.1811
6 3.13 3.28 0.5894 1.3333
7 3.28 3.72 0.5894 0.5681
8 3.72 4.58 0.5894 0.3876
9 4.58 5.09 0.8230 0.9804
10 5.09 6.50 0.8230 0.7092
Figure 7/1: ML–estimates for a continuous distribution with increasing hazard rate
170 7 Maximum Likelihood Estimation of Monotone Hazard Rates
We now show how to find b h(x) from e h(x). There is a first reversal of e
h(x) between 0.2179 and 0.1420 (i =
−1 −1
−1
2 and i = 3), so we replace both values by (0.2179 + 0.1420 ) 2 = 0.1720. The next reversal
−1 −1
−1
is between 1.0989 and 0.1811 and both values are replaced by (1.0981 + 01811 ) 2 = 0.3130.
The next reversal is between 1.3333, 0.5681 and 0.3876 which are replaced by (1.3333−1 + 0.5681−1 +
−1
0.3876−1 ) 3 = 0.5894. The last reversal between 0.9804 and 0.7092 is replaced by (0.9804−1 +
−1
0.7092−1 ) 2
= 0.8230. Fig. 7/1 depicts the estimated hazard rate, the estimated density function and
the estimated survival function (solid lines), each supplemented by their true curves (dashed lines).
We now turn to a continuous DHR distribution. Estimation in the DHR case parallels that of the
preceding IHR case with some obvious modifications:
1. As the hazard rate is assumed decreasing there is no trivial estimate for x < x1 .
2. For the same reason the estimator is defined only for x ≤ xn , but it may be extended
beyond xn in any manner that preserves the DHR property.
where
ν−κ
h(xi ) = max min
b ; i = 2, 3, ..., n. (7.3b)
ν−1
ν≥i κ≤i−1 P
(n − j) (xj+1 − xj )
j=κ
As in the IHR case the estimator (7.3b) results from harmonic averaging the naive estimators until
a decreasing sequence has been reached.
The n = 6 observations xi in Tab. 7/2 come from a W EIBULL distribution with parameters a = 0, b = 4
and c = 0.8, so the sampled population has a decreasing hazard rate. Tab. 7/2 shows the estimated hazard
h(x) according to (7.3a,b) and the naive estimates. There is only one reversal for i = 4 and i =
rate b
−1
5. Replacing the values 0.1494 and 1.0000 by their harmonic mean yields (0.1494−1 + 1−1 ) 2
=
0.2600.
Fig. 7/2 shows the estimated hazard rate, density function and survival function (solid lines) together with
the true curves (dashed lines).
7.1 The Case of Complete Samples 171
Figure 7/2: ML–estimates for a continuous distribution with decreasing hazard rate
A related problem of interest occurs in the case of a discrete distribution. Let F (·) be a discrete
distribution with probability mass Pi at xi , the xi ordered increasingly. Then, for convenience we
encode xi =: i; i = 1, 2, .... The ratio
Pi
hi = ; i = 1, 2, ... (7.4a)
Si
with survival function
∞
X
Si = Pr(X ≥ xi ) = Pj (7.4b)
j=i
0 for i < 1
ki
hi =
e for i = 1, 2, ..., m (7.5c)
ki + ... + km
1 for i > m.
hi = ki (ki + ... + km ) is nothing but the estimator of the conditional probability Pi Si =
e
Pr(X = xi | X ≥ xi ) where
m
ki X
Pbi = and Sbi = kj n.
n
j=i
In Sect. 1.2.1 we have shown that for a discrete distribution the hazard rate corresponds to this
conditional probability, see (1.60c).
An increasing hazard rate of a discrete distribution is found by maximizing (7.5b) subject to
h1 ≤ h2 ≤ ... ≤ hm . The result is
k + k
κ κ+1 + ... + kν
hi = min max
b
κ ; i = 1, 2, ..., m. (7.6)
i≤ν≤m κ≤i P
(k + ... + k )
j m
j=ν
k + k
κ κ+1 + ... + k ν
hi = max min
b
κ ; i = 1, 2, ..., m. (7.7)
i≤ν≤m κ≤1 P
(k + ... + k )
j m
j=ν
Estimates of the probability mass function and the survival function can be found by evaluating
hi of (7.6) or (7.7).
(7.4c) and (7.4b), respectively, with the b
Tab. 7/3 contains the counts ki of n = 100 replicates of a binomial distribution with parameters N = 10
and P = 0.3. The binomial distribution always has an increasing hazard rate, see Sect. 3.2. In Tab. 7/3 we
hi according to (7.6) — the naive estimates e
also give — besides the b hi according to (7.5c).
We have a reversal between e h6 and e
h7 . The result
of averaging through adding the pertinent numerators
new new
and denominators is h6 = h7 = (8 + 1) (12 + 4) = 0.5625. Now, we have a reversal between
e e
e hnew
h5 , e e new
6 , h7 and h8 . Thus, we replace these estimates by (23 + 8 + 1 + 1) (35 + 12 + 4 + 3) = 0.6111.
e
Fig. 7/3 displays the estimated hazard rate together with the estimates PMF and survival function.
7.2 The case of Randomly Censored Samples 173
Table 7/3: ML–estimates and naive estimates of the increasing hazard rate of a discrete distribution
i ki hi
b hi
e
1 2 0.0200 0.0200
2 12 0.1224 0.1224
3 30 0.3488 0.3488
4 21 0.3750 0.3750
5 23 0.6111 0.6571
6 8 0.6111 0.6667
7 1 0.6111 0.2500
8 1 0.6111 0.3333
9 2 1.0000 1.0000
10 0 1.0000 1.0000
11 0 1.0000 1.0000
n
Y
L= f (yi )δi S(yi )1−δi (7.8a)
i=1
174 7 Maximum Likelihood Estimation of Monotone Hazard Rates
turns into
n
Y
L= h(yi )δi S(yi ). (7.8b)
i=1
h Ry i
From S(y) = exp − h(u) du we finally have the log–likelihood function
0
n n Z yi
X X
L= δi ln h(yi ) − h(u) du. (7.8c)
i=1 i=1 0
Suppose, that h(x) is increasing and — without loss of generality — further assume that y1 ≤
y2 ≤ ... ≤ yn . It follows from (7.1b) that
n
X n−1
X
L≤ δi ln h(yi ) − (n − i) (yi+1 − yi ) h(yi ) =: L∗ , (7.8d)
i=1 i=1
and the problem of maximizing L is equivalent to that of maximizing L∗ . The following results
are due to PADGETT /W EI (1980).
We denote the distinct uncensored failure times by x1 < x2 < ... < xk and let dj be the number
of uncensored failure times exactly at xj ; j = 1, 2, ..., k. Also, let cj denote the number of losses
(due to censoring) which occur in the interval [xj , xj+1 ) for j = 0, 1, ..., k, where x0 = 0 and
(j)
xk+1 = ∞. Furthermore, let the times of the cj losses be denoted by `ι ; ι = 1, 2, ..., λj . The
(j)
quantities just defined — without `ι — are illustrated in Fig. 4/1.
Now, for any given increasing h(x), we can define
0
for x < x1
∗
h (x) = h(xj ) for xj ≤ x < xj+1 ; j = 1, 2, ..., k − 1
h(x ) for x ≥ x .
k k
where
j−1
X j
X
aj = ci + di , (7.9b)
i=0 i=1
Xj Xj
bj = ci + di . (7.9c)
i=0 i=1
(0)
Replacing h `ι by zero for ι = 1, 2, ..., λ0 , we have that
k
X k
X
∗
L ≤ dj ln h(xj ) − αj h(xj ) =: L∗∗ , (7.10a)
j=1 j=1
7.2 The case of Randomly Censored Samples 175
say, where
cj
P (j)
`ι + (n − bj ) xj+1 − (n − aj ) xj ; j = 1, 2, ..., k − 1
ι=1
αj = (7.10b)
ck
P (k)
`ι − ck xk ; j = k.
ι=1
Since h∗ (x) is increasing, it follows that the maximization of L is equivalent to that of L∗∗ .
Note that only αk can be zero and this happens when there are no censored observations strictly
larger than xk , the largest uncensored lifetime observed. The problem of obtaining an estimator
of h(x) subject to its increasing is reduced to that of maximizing L∗∗ subject to the constraint
h(x1 ) ≤ h(x2 ) ≤ ... ≤ h(xk ).
In maximizing L∗∗ we have to distinguish two cases.
1. The last observation yn is uncensored so that αk = 0. In this case L∗∗ is unbounded, and
it is not possible to find MLEs of h(x) directly from L∗∗ . Following the argumentation of
M ARSHALL /P ROSCHAN (1965) we estimate h(x) by
0 for x < x1 ,
h(x) =
b h(xj ) for xj ≤ x < xj+1 ; j = 1, 2, ..., k − 1;
b (7.11a)
b h(xk ) for x ≥ xk ,
where
Pν
dµ
µ=κ
h(xj ) =
b min max ; j = 1, ..., k − 1;
ν
j≤ν≤k−1 1≤κ≤j P
(7.11b)
αµ
µ=κ
h(xk ) = ∞.
b
h(x) truly is not the MLE of h(x), but can be considered as the limit of a sequence of MLEs
b
in the sense of M ARSHALL /P ROSCHAN (1965).
2. The last observation yn is uncensored so that αk 6= 0. In this case the MLE of h(x) is given
by (7.11a) with
Pν
dµ
µ=κ
h(xj ) = min max
b
ν
; j = 1, ..., k. (7.11c)
j≤ν≤k 1≤κ≤j P
αµ
µ=κ
The ML–estimation of a decreasing hazard rate with sample data randomly censored on the right
follows along the same lines as above. But there are some evident minor modifications which we
already encountered in Sect. 7.1 for the change–over from the IHR case to the DHR case.
• As h(x) is now assumed decreasing there is no trivial estimate of h(x) for x < x1 .
• For the same reason the estimate is defined only for x < xk , but it may be extended beyond
xk in any manner that preserves the DHR property.
So we have
h(xj ) for xj ≤ x < xj+1 ; j = 1, 2, ..., k − 1;
h(x) = b
b
176 7 Maximum Likelihood Estimation of Monotone Hazard Rates
with Pν
dµ
µ=κ
h(xj ) = max min
b ; j = 1, ..., k − 1. (7.12)
ν
j≤ν≤k−1 1≤≤j P
αµ
µ=κ
Formulas (7.11a-c) and (7.12) do not only process multiply censored data sets, where the censored
and uncensored observations are mixed, but they also cope with complete data sets as well as with
data sets which are singly censored whether of type–I or of type–II. For these situations the input
has to be organized properly. First of all the observed times yi together with their indicators δi
have to be in ascending order with respect to y.
• When there are no censored observations within the sample of size n the input has to look
like:
y1 ≤ y2 ≤ ... ≤ yn−1 ≤ yn
.
1 1 ... 1 1
• When the sample of size n is singly censored of type–I with censoring time y` , ` ≤ n, the
input has to be
y1 ≤ y2 ≤ ... ≤ y ≤ y = y = ... = y n
`−1 ` `+1
.
1 1 ... 1 0 0 ... 0
• When the sample of size n is singly censored of type–II with censoring at the k–th failure
the input should read
y1 ≤ y2 ≤ ... ≤ y ≤ y = x = ... = y n
k−1 k k+1
.
1 1 ... 1 1 0 ... 0
Example 7/4: ML–estimation of an increasing hazard rate with multiply censored data
The following observations in Tab. 7/4 have been taken from K APLAN /M EIER (1958, p. 464). The hazard
rate has been found by (7.11a,c) because the last observation y8 is censored.
Table 7/4: Hazard rate estimate for K APLAN /M EIER’s data set
• non–parametric regression.
In this Section 8.1 we will only present PDF and HR kernel smoothing for non–grouped data.
Kernel estimators for the PDF and the HR have the same structure, similar formulas, and share
the same set of problems and nearly the same set of approaches to solve these problems. We start
this section by looking at PDF smoothers which have been the first field of kernel estimation and
which started by papers of ROSENBLATT (1956) and PARZEN (1962), whereas HR estimation by
kernels started a little bit later with papers of WATSON /L EADBETTER (1964a,b).
1
Suggested reading for this section: I ZENMAN (1991), P RAKASA R AO (1983), WAND /J ONES (1995).
2
C ACOULOS (1966) appears to be the first to call this smoothing function a kernel function. Previously it was
referred to as a weight function or as a window.
178 8 Smooth Hazard Rate Estimators
The most simple kernel estimator of a PDF for an uncensored set of distinct and ordered obser-
vations x1 < x2 < ... < xn is given by
n
1 X x − xi
fn (x) =
b K . (8.1)
nb b
i=1
The idea of this estimator is the following: The empirical distribution function Fbn (x) = i/n,
where i is the number of sample observations less or equal to x, or the empirical survival function
Sbn (x) = (n − i)/n are discrete discrete functions each placing mass 1/n at each of the observa-
tions xi , giving the rough empirical function. By formula (8.1) this probability mass is smeared
out continuously, smearing according to the choice of the kernel. The kernel is a smooth func-
tion3 that determines the pattern of how the mass 1/n is redistributed around the the observation
xi , and b, the bandwidth or window width,4 is responsible for ‘how far’ the kernel stretches
out to either side of xi when the kernel is symmetric. Stated in another way, one may say that
all observations xi that are within a distance b on either side of a given point x contribute to the
density estimate at this point x.
(8.1) can be motivated by generalizing the sliding histogram:
x+b
Fbn (x + b) − Fbn (x − b)
Z
1 b
= dFn (u) (8.2a)
2b 2b
x−b
where dFbn (·) is the empirical measure. (8.2a) is a special case of (8.1) when dFbn (·) = 1/n and
x − xi
1/2 for |x − xi | ≤ b
K = (8.2b)
b 0 else.
3
More on the properties and types of kernels is found in Sect. 8.1.1.3.
4
More on bandwidths is found in Sect. 8.1.1.4.
5
Censored observations may be tied among themselves or with uncensored data. In the latter case censored
lifetimes are moved a little amount to the right of the uncensored lifetime so that censoring is assumed to happen
later.
8.1 Kernel Smoothing 179
where x1 = mini (yi , 1), i.e., x1 is the shortest uncensored lifetime observed. Let
In the special case where all observations are uncensored (δi = 1 ∀ i), Sbn (x) is a staircase with
n steps and 1/n as constant step height and (8.3d) reduces to (8.1). When there are censored
observations we have fewer steps and the step heights are greater than 1/n.
In the case of tied observations, di ≥ 1 being the size of the tie at xi with at least one di > 1, we
have k < n distinct uncensored lifetimes xi . In the interval [xi , xi+1 ); i = 0, 1, ..., k; between
two uncensored lifetimes, where x0 = 0 and xk+1 = ∞, there may be ci , ci ≥ 0, censored
lifetimes. The number of sample units at risk just before xi is
where
n0 = n and d0 = 0.
For an illustration of these quantities see Fig. 4/1. The KME of the survival function in this case
of tied uncensored lifetimes, see (5.3a), reads
Y di
Sn (x) =
b 1− ; i = 1, 2, ..., k; (8.4a)
ni
i : xi ≤x
The formula of the PDF kernel estimator again is (8.3d), but the summation now goes from i = 1
to i = k and yi is replaced by xi :
k
1X x − xi
fn (x) =
b n ∆i K . (8.5)
b b
i=1
We know turn to the kernel estimator of the hazard rate. The rough empirical estimator to be
convoluted with a smooth kernel is given by the increments of an estimated CHR. In Sect. 5.2
we have found two such estimators, the indirect or natural estimator, see (5.10), and the direct
estimator, known as N ELSON /A ALEN estimator, see (5.12a), which is nothing but the cumulation
180 8 Smooth Hazard Rate Estimators
of the empirical hazard rate values. In kernel estimation the N ELSON /A ALEN estimator is pre-
ferred over the indirect estimator because it avoids taking logarithms and has a smaller variance,
compare (5.11c) to (5.12b). So we have when there are no tied observations
0 for x < x1 = min (y
i i , 1)
H(x) =
b
P δi (8.6a)
else,
n−i−1
i : yi ≤x
n Di
b n (yi ) − H
=H b n (yi−1 ); i = 1, 2, ..., n; H
b n (y0 ) = 0. (8.6b)
and
0 for (yi , 0)
n Di = (8.6c)
1
for (yi , 1).
n−i+1
For tied observations we have
0 for x < x1 = mini (yi , 1)
H(x)
b = P di (8.7a)
else,
i : xi ≤x ni
n Di
b n (xi ) − H
=H b n (xi−1 ); i = 1, 2, ..., k; H
b n (x0 ) = 0 (8.7b)
and
= di /n when there is no censoring in the sample
n Di (8.7c)
≥ d /n with censoring somewhere in the sample.
i
Like any statistical procedure, kernel estimators are recommended only if they possess desirable
properties. These properties depend on — besides the sample size — the chosen kernel and the
chosen size of the bandwidth, but the greatest influence comes from the bandwidth. Finite–sample
properties are available for special situations, but, in general, research emphasis is and has been
on large–sample properties.
Let ψ(x) denote the continuous curve to be estimated by kernel techniques, e.g.,
2
MSE ψ(x)
b = E ψ(x)
b − ψ(x)
n o2
= Var ψ(x)
b + Bias ψ(x)
b (8.9a)
When L2 –approaches are used in PDF kernel estimation the tail behavior of the density becomes
less important, possibly resulting in peculiarities in the tails of the density estimates. For this and
other reasons some authors prefer L1 –approaches like the integrated absolute error
Z
IAE ψ(·) = ψ(x)
b b − ψ(x) dx (8.12a)
which is invariant under monotone transformations with 0 ≤ IAE ≤ 2 for ψ(x) = f (x). The
expectation of (8.12a) over all ψ(·)
b yields the mean integrated absolute error
MIAE ψ(·)
b = E IAE ψ(·)
b . (8.12b)
The labor needed to get L1 –results is more difficult than that needed to obtain analogous L2 –
results. It should be realized that the MIAE and the MISE do not necessarily conform to the
human perception of closeness of a curve estimate to its target.
We now take a closer look at the L2 –criteria MSE and MISE when a PDF is to be estimated.
These results for f (x) are needed in Sect. 8.1.2 on indirect hazard rate smoothing which is based
on fbn (x). Furthermore, these results can — more or less easily — be transferred to and gener-
alised for direct hazard rate smoothing, i.e., smoothing the increments of the empirical cumulative
hazard rate as given in (8.8). To keep things easy we assume an uncensored sample with untied
observations so that the estimator to be studied reads
n
1 X x − Xi
fn (x) =
b K . (8.13)
n bn bn
i=1
(i) f (x) is such that its second derivative f 00 (x), which measures the curvature of f (x), is
continuous, square integrable and ultimately monotone.7
(ii) The bandwidth b := bn is a non–random sequence of positive numbers, where the depen-
dence on the sample size n will be suppressed in the following formulas in order to keep
the notation as lean as possible, but we assume
(iii) The kernel is a bounded PDF which is symmetric about Xi and has a finite second moment
about the origin.
We first look at the expectation of (8.13) at a given x ∈ R. The Xi ’s in (8.13) are iid (= inde-
pendently and identically distributed) variables with the same PDF as given by the target function
f (·). So we have
n
" #
1 X x − X i
E fbn (x; b) = E K
nb b
i=1
n
x−v
Z X
1
= K f (v) dv
nb b
i=1
x−v
Z
1
= K f (v) dv.
b b
7
An ultimately monotone function is one that is monotone over both (−∞, x∗ ) and (x∗ , ∞) for some x∗ > 0.
8.1 Kernel Smoothing 183
We now look at the variance of (8.13) which — after some manipulations like those for finding
the expectation — reads
Z
1 1 n b o2
K(z)2 f (x − b z) dz −
Var fn (x; b) =
b E fn (x, b)
nb n
Z
1 1 2
K(z)2 f (x) + o(1) dz − f (x) + o(1)
=
nb n
Z
1
f (x) K(z)2 dz + o(n b)−1
=
nb
1
f (x) R(K) + o(n b)−1
= (8.17a)
nb
8
R
Remember that — because of µ1 (K) = z K(z) dz = 0 — µ2 (K) is equal to the variance. Tab. 8/1 shows
this variance for different kernels.
184 8 Smooth Hazard Rate Estimators
and in general Z
R(φ) := φ(u)2 du (8.18)
−1
for any square–integrable
function φ(·). Since the variance is of order (n b) assumption (ii)
above assures that Var fbn (x; b) converges to zero.
Adding (8.17a) and the square of (8.16) gives the mean squared error of fbn (x; b) :
1 1
f (x) R(K) + b4 f 00 (x)2 µ2 (K)2 + o (n b)−1 + b4 .
MSE fbn (x; b) = (8.19)
nb 4
Integrating (8.19) under the integrability assumptions in (i) above we obtain
1 1
R(K) + b4 µ2 (K)2 R(f 00 ) + o (n b)−1 + b4 .
MISE fbn (·; b) = (8.20)
nb 4
The first two terms on the right–hand side constitute AMISE, the asymptotic mean integrated
squared error:
1 1
R(K) + b4 µ2 (K)2 R(f 00 ),
AMISE fbn (·; b) = (8.21)
nb 4
which is a large–sample–approximation to the MISE. AMISE is — besides n — influenced by
• the bandwidth b,
We see that the second term of AMISE, the integrated squared bias, is proportional to b4 , so for
this term to decrease one needs to take b to be small. However, taking b small means an increase
in the leading factor of the first term, the integrated variance, which is proportional to (n b)−1 .
Therefore, as n increases b should vary in such a way that each of the two terms of AMISE
becomes smaller. This is known as the variance–bias trade–off and is in accordance with the
intuitive role of b demonstrated in Fig. 8/1 below. For very small b, fbn (.; b) is very spiky and
hence very variable in the sense that, over repeated sampling from f (·) the spikes wold appear in
different places. There is, however, very little bias.
(8.21) lends to find the optimal bandwidth with respect to this criterion. The bandwidth mini-
mizing AMISE can be given in closed form as
1/5
R(K)
bAMISE = . (8.22)
n R(f 00 ) µ2 (K)2
Aside from its dependence on the known R(K), µ2 (K) and n, (8.22) showsRthat bAMISE is in-
versely proportional to the unknown R(f 00 )1/5 . The functional R(f 00 )1/5 = f 00 (x)2 dx mea-
sures the total curvature of f (·). Thus, for a PDF with little curvature, R(f 00 ) will be small and
a large bandwidth is called for, on the other hand, when R(f 00 ) is large, little smoothing with a
smaller bandwidth will be optimal. Unfortunately, direct use of (8.22) to choose a good band-
width in practice is impossible since R(f 00 ) is not known. Some proposals for estimating R(f 00 )
and then selecting b will be presented in Sect. 8.1.1.4.
9
Tab. 8/1 shows R(K) for different kernels.
8.1 Kernel Smoothing 185
Inserting (8.22) into (8.21) leads to the smallest possible AMISE of fbn (·; b) using a given ker-
nel K :
5h i1/5
inf AM ISE fbn (·; b) = µ2 (K)2 R(K)4 R(f 00 ) n−4/5 .
(8.23)
b>0 4
This expression gives the rate of convergence of the minimum AMISE to zero as n → ∞.
Under the stated assumptions, the best obtainable rate is of order n−4/5 . This rate is slower than
the typical parametric rate of order n−1 , e.g., E(X)
\ = X with Var X = Var(X)/n. To arrive
at a higher order of convergence one has to choose special kernels, the so–called higher–order
kernels, see Sect. 8.1.1.3.
We have randomly generated n = 80 realizations of the following mixture of two normal distributions:
Fig. 8/1 displays the true density as a solid line. Using the biweight kernel, see Tab. 8/1, with three different
bandwidths we have smoothed the data. Smoothing with b = 0.2 gives a very rugged curve because the
kernel is very narrow and the averaging process only covers relatively few observation. This estimate pays
too much attention to the particular data set at hand and does not allow for the variation across the sample
and thus is undersmoothed. Using b = 0.9 results in a much smoother estimate of f (·) which is really too
smooth since the true bimodality has been smoothed away, so this an oversmoothed estimate. The graph
in the middle of Fig. 8/1 is a compromise with b = 0.6. This kernel estimate is not overly noisy and the
structure of the true density, i.e., its bimodality, has been recovered.
(1) K(u) ≥ 0 ∀ u ∈ R,
R
(2) K(u) = K(−u) =⇒ u K(u) du = 0,
R
(3) K(u) du = 1,
Because of (4) such a kernel is called of order 2 or second order kernel. The argument of the
kernel is the scaled variable
x − xi
u= .
b
Second order kernels with an infinite support are, e.g.:
For more details of these kernels see Tab. 8/1. More popular, especially in HR and PDF estimation
of lifetime data, are kernels with finite support which mostly are polynomial functions related to
the beta distribution, more precisely, they are symmetric beta distributions on the interval [−1, 1].
Their generating formula is s
K(u) = κr,s 1 − |u|r I|u|≤1 (8.24a)
with
r
κr,s = , r > 0, s ≥ 0, (8.24b)
2 B(s + 1, 1/r)
and the beta function
Z1
Γ(p) Γ(q)
B(p, q) = v p−1 (1 − v)q−1 dv = . (8.24c)
Γ(p + q)
0
Among these kernels — some of which come with different names — the most popular are:
s = 0, r = 1 =⇒ κ1,0 = 1/2;
s = 1, r = 1 =⇒ κ1,1 = 1;
s = 1, r = 2 =⇒ κ2,1 = 3/4;
10
Asymmetric kernels will be needed when estimating near the boundaries, see further down.
8.1 Kernel Smoothing 187
After a suitable rescaling the G AUSS kernel is seen to be of the above type with r = 2, s = ∞.
Two other kernels with finite support but not of polynomial type are the cosine kernel
π π
K(u) = cos u I|u|≤1 (8.25)
4 2
and the semi–elliptical kernel
2p
K(u) = 1 − u2 I|u|≤1 . (8.26)
π
Tab. 8/1 summarizes the kernels mentioned above and displays them together with the pertaining
µ2 (K) = u K(u) du and R(K) = K(u)2 du. I|u|≤1 is the indicator function:
R 2 R
1 for u ∈ [−1, 1]
I|u|≤1 = (8.27)
0 for u else.
Fig. 8/2 displays all the kernels mentioned above.
Table 8/1: Common kernels
Name Formula µ2 (K) R(K)
1 1 1
uniform K(u) = I|u|≤1 ≈ 0.3333 = 0.5
2 3 2
(rectangular)
1 2
triangular K(u) = (1 − |u|) I|u|≤1 ≈ 0.1667 ≈ 0.667
6 3
3 1 3
1 − u2 I|u|≤1
E PANECHNIKOV K(u) = = 0.2 = 0.6
4 5 5
(quadratic)
15 2 1 5
biweight K(u) = 1 − u2 I|u|≤1 ≈ 0.1429 ≈ 0.7143
16 7 7
(quartic, biquadratic)
35 3 1 350
triweight K(u) = 1 − u2 I|u|≤1 ≈ 0.1111 ≈ 0.8159
32 9 429
(triquadratic)
70 3 35 175
tricube K(u) = 1 − |u|3 I|u|≤1 ≈ 0.1440 ≈ 0.7086
81 243 247
π π 8 π2
cosine K(u) = cos u I|u|≤1 1− ≈ 0.1884 ≈ 0.6169
4 2 π2 16
2p 1 16
semi–elliptical K(u) = 1 − u2 I|u|≤1 = 0.25 ≈ 0.5404
π 4 3 π2
1 1
exp − u2 2 , u ∈ R
GAUSS K(u) = √ 1 √ ≈ 0.2821
2π 2 π
−1 1
K(u) = π 1 + u2 )
C AUCHY , u∈R non–existent ≈ 0.1592
2π
1 1
L APLACE K(u) = exp − |u| , u ∈ R 2 = 0.25
2 4
exp(−u) π2 1
logistic K(u) = 2 u ∈ R ≈ 3.2899 ≈ 0.1667
1 + exp(−u) 3 6
188 8 Smooth Hazard Rate Estimators
Optimizing AMISE (8.21) with respect to the kernel is not an easy problem since the scaling of
K is coupled with the bandwidth b. E PANECHNIKOV (1969) has found the AMISE–optimizing
kernel to be
3
K(u) = 1 − u2 ) I|u|≤1 . (8.28)
4
Investigations have revealed that using another ‘suboptimal’ kernel does not cause great loss of
efficiency, i.e., seldom more than 5%. Indeed, these results suggest that most unimodal densities
perform about the same as each other when used as a kernel. Thus, the choice between kernels
can be made on other grounds such as computational efficiency. The kernel effects the local
smoothness whereas the bandwidth is responsible for the global smoothness of the estimate. The
smaller the sample size the greater the effect of the kernel and when n = 1, then the kernel wholly
determines the graph of the estimate.
Fig. 8/3 shows the effect of the kernel. The data used in this figure are: 12, 14, 15, 16, 18, 23,
35, 42, 50, and the bandwidth has been set to b = 10. The uniform kernel gives an estimate of
the density that is piecewise konstant, the triangular kernel and the L APLACE kernel have a kink
which is reflected in the estimated densities. Even the E PANECHNIKOV kernel gives an estimate
having a discontinuous first derivative which sometimes can be unattractive because of its kinks.
Very smooth estimates are produced by the G AUSS and the C AUCHY kernels, respectively. The
use of the triweight kernel seems to be a good compromise.
We have seen in (8.23) that the best obtainable rate of convergence of the kernel estimator consid-
ered there is of order n−4/5 . But it is possible to obtain a better rate of convergence at the price of
relaxing the restriction that the kernel be a density and K(u) ≥ 0 ∀ u. Such kernels are of higher
order than two. We say that K(u) is an `-th order kernel if
u0 K(u) du = 1,
R
µ0 (K) =
uj K(u) du = 0 for j = 1, 2, ..., ` − 1 and
R
µj (K) =
u` K(u) du 6= 0.
R
µ` (K) =
190 8 Smooth Hazard Rate Estimators
Still requiring that K(u) be symmetric we see that ` must be even. With ` → ∞ the convergence
rate can be made arbitrarily close to n−1 , the parametric convergence rate.
There are several rules to construct higher–order kernels. Let K` (u) denote the `-th order kernel
which is assumed to be differentiable, then formula
3 1
K`+2 (u) = K` (u) + u K`0 (u) (8.29)
2 2
can be use to generate higher–order kernels. Taking the G AUSS kernel
1
exp(−u2 2)
K2 (u) = √
2π
we find from (8.29)
1
K4 (u) = 0.5 (3 − u2 ) √ exp(−u2 2),
2π
and taking the triweight kernel
35 3
K2 (u) = 1 − u2 I|u|≤1
32
we have
105 2
1 − u2 1 − 3 u2 I|u|≤1 .
K4 (u) =
65
Fig. 8/4 shows these kernels. Notice the negative lobes of K4 (u) which entail that the resulting
smoothed density will not be a density itself. Higher–order kernels necessarily take on negative
values, so there is a price to be paid in interpretability and plausibility.
Figure 8/4: Fourth–order kernels based on G AUSS and triweight kernels, respectively
There are some situations where there is scope for improvement of the basic kernel presented up
to here. Some of the modifications will be needed in Sect. 8.1.3, and we will shortly present their
ideas here. A first modification is the local kernel estimator. Given that the optimal amount of
smoothing varies across the real line, an obvious extension of
n
1 X x − xi
fn (x; b) =
b K
nb b
i=1
8.1 Kernel Smoothing 191
is to that having different bandwidth b(x), say, for each x where f (·) is to be estimated. This
leads to the local kernel estimator
n
1 X x − x i
fbn [x; b(x)] = K , (8.30)
n b(x) b(x)
i=1
where a different basic kernel estimator is employed at each point. A popular method which fits
into the framework of (8.30) is the nearest neighbor kernel estimator that uses distances from
x to the data point being the k-th nearest to x.
A quite different idea from local kernel estimation is that of variable kernel estimation where
the single b is replaced by n values b(xi ); i = 1, 2, ..., n; rather than by b(x). The estimator has
the form
n
1X 1 x − xi
fn [x; b(xi )] =
b K , (8.31)
n b(xi ) b(xi )
i=1
so that the kernel centered on xi has associated with it its own scale parameter b(xi ) allowing
different degrees of smoothing depending on where xi is in relation to other data points. The aim
is to smooth out the mass associated with data values that are in sparse regions much more than
those situated in the main body of the data. Variable kernel estimation can also be realized by a
nearest neighbor approach, but using the distance from xi to the data point being the k-th nearest
to xi .
A special situation arises when the function to be estimated has a bounded support. For the
lifetime variable we have a naturally lower bound equal to zero. When the point x where to
estimate f (·) or h(·) is smaller than the bandwidth a symmetric kernel is not appropriate because
no lifetimes less than zero are observable. In the region 0 ≤ x < b the use of an asymmetric
kernel is suggested. This is also true when there is a right endpoint xend of the support where
we have to consider an asymmetric kernel for xend − b < x ≤ xend . The right endpoint is often
taken to be the greatest observed lifetime in the sample. The asymmetric kernels needed near the
boundaries and which are different for each x within a distance of b to the boundary are called
boundary kernels. In Sect. 8.1.3.1 we will present different types of boundary kernels.
The implementation of a kernel estimator requires the specification of a bandwidth b. One pos-
sibility is to choose the bandwidth subjectively by eye and on aesthetic grounds. This would
involve looking at several PDF or HR estimates over a range of bandwidths and selecting the
estimate that is the ‘most pleasing’ in some sense. One such strategy is to begin with a large
bandwidth and to decrease the amount of smoothing until fluctuations that are more random than
structural start to appear. However, there are also many circumstances where it is very beneficial
to have the bandwidth automatically selected from the data.
A method that uses the data x1 , x2 , ..., xn to produce a bandwidth b is called a bandwidth selec-
tor. A look into the professional journals reveals that work on constructing bandwidth selectors
is still going on. Available selectors can be roughly divided into two classes:
• quick and simple selectors, sometimes called ‘quick and dirty methods’, and
The first class consists of formulas which are easy to evaluate, but without any mathematical
guarantee of being close to the optimal bandwidth. These selectors often provide a starting point
for the subjective choice of the smoothing parameter. The two methods falling into the category
192 8 Smooth Hazard Rate Estimators
‘quick and dirty’ are the rule of thumb, sometimes called normal scale bandwidth selector, and
the maximal smoothing or oversmoothing principle. Both are based on the optimal bandwidth
minimizing the AMISE:
1/5
R(K)
bAMISE = .
n R(f 00 ) µ2 (K)2
Note that only the term R(f 00 ) is unknown in this expression.
The rule of thumb replaces the unknown PDF f (·) in this functional by a reference distribution
function. The reference distribution is rescaled to have variance equal to the sample variance. If,
e.g., we take K as the G AUSS kernel and the standard normal distribution as reference distribution
the rule of thumb yields the bandwidth
b n−1/5
bRoT = 1.0592 σ (8.32a)
b2 = (xi − x (n − 1) is the sample variance. A version which is more robust against
P
where σ
outliers in the sample uses the interquartile range rq as a measure of spread instead of the variance
giving the modified estimator
h rq i −1/5
bRoT = 1.0592 min σ b, n . (8.32b)
1.34
The maximal smoothing principle is due to T ERRELL (1990). He showed that there is a lower
bound for the functional R(f 00 ) for all densities having standard deviation σ, and this bound is
attained by the triweight density, see Tab. 8/1. Thus we have an upper bound for bAMISE leading
to the oversmoothed bandwidth
243 R(K) 1/5
bos = σ
b. (8.33)
35 µ2 (K)2 n
While bos will give a too large bandwidth for optimal estimation of a general density f (·) it
provides an excellent starting point for subjective choice of the bandwidth. A graphical strategy
is to plot an estimate with the bandwidth bos and then successively look at plots based on fractions
of bos to see what features are present in the data.
There are two fundamental approaches in the class of sophisticated selectors:
The right–hand side of (8.34b) is unknown since it depends on f (x). Using a method of moments
to estimate this term results in the least–squares cross–validation function
Z n
2Xb
LSCV (b) = fbn (x; b)2 dx − f−i (xi ; b) (8.35a)
n
i=1
8.1 Kernel Smoothing 193
where
1 X1 x − xj
fb−i (xi ; b) = K (8.35b)
n−1 b b
j6=i
is the density estimate based on the sample with xi deleted, often called the leave–one–out den-
sity estimator. This is the reason for the name ‘cross–validation’ which refers to the use of part
of a sample to obtain information about another part. It therefore seems reasonable to chose b
to minimize LSCV (b); the bandwidth chosen by this way is denoted bLSCV . This estimate of
b suffers a lot under sample variation, i.e., for different samples from the same distribution the
estimated bandwidths have a big variance. Another drawback of LSCV is that it often has several
minima. Simulation studies have shown that this problem can be fixed by selecting the largest
value of b for which a local minimum occurs.
The idea of plug–in methods goes back to W OODROOFE (1970). These methods are based on the
asymptotically best choice of b given by bAMISE in (8.22). The only unknown quantity in bAMISE
is the functional R(f 00 ). W OODROOFE proposed to use
a first bandwidth b1 to calculate fbn (x; b1 ),
00 00
take this estimate to calculate R(f ) = R fn (x; b1 ) and to plug R(f ) into (8.22) to obtain b2 ,
b b b
the final bandwidth. This direct plug–in rule may be generalized by iterating the process, i.e.,
b 00 ) = R fbn (x; b2 ) , plug R(f
calculating R(f b 00 ) into (8.22) to obtain b3 etc., until bi converges.
The second variant, called smoothed indirect estimator, is based on the integrated smoothed
density estimator
Zx
Sn (x) = 1 − fbn (u) du
b (8.36b)
0
where fbn (·) is a kernel estimator of f (·) with a kernel that has to be PDF itself. The results
from using (8.36a) or (8.36b) do not differ much, but as (8.36a) is a stair–case function the simple
11
Suggested reading for this section: L O et al. (1989), R ICE /ROSENBLATT (1976), WATSON /L EADBETTER
(1964a,b).
12
S ARDA /V IEU (1990) show how to find the ISE –minimizing bandwidth by cross–validation.
194 8 Smooth Hazard Rate Estimators
indirect estimator will generate hazard rate courses which are less smooth than those coming from
the smoothed indirect estimator.
To show this effect we have plotted in Fig. 8/5 both versions of the ratio–type estimator. This
figure rests upon the data of Example 5/1 (leukaemia patients’ data). We have used the func-
tion ‘ksdensity’ of MATLAB with a positive support, the E PANECHNIKOV kernel and band-
width 10.
Figure 8/5: Ratio–type estimates of the hazard rate for the leukaemia patients’ data
The smoothed indirect estimator has first been studied by WATSON /L EADBETTER (1964a,b) for
uncensored samples. The paper of L O et al. (1989) investigates this estimator for samples with
censored observations. We shortly repeat the results of WATSON /L EADBETTER, but will not go
further into the details of indirect smoothing as it is of no great importance in practice. WAT-
SON /L EADBETTER use a sequence {δn (x)} of smoothing functions tending, as n → ∞, to a
D IRAC delta–function.13 This delta–sequence method is quite general and covers several types
of smoothing methods, including the kernel method with δn (u) = (1/b) K(u/b). The smoothed
indirect estimator of WATSON /L EADBETTER reads
fbn (x)
hn (x) =
b (8.37a)
1 − Fbn (x)
13
The δ–function is (informally) a generalized function on R that is zero anywhere except at zero, with an integral
of unity over R. This unit impulse function may be written as
+∞ for x = 0
δ(x) =
0 for x 6= 0
+∞
R
with δ(x) dx = 1. Rigorously defined the δ–function is a distribution or a measure.
−∞
8.1 Kernel Smoothing 195
with
n
1X
fbn (x) = δn (x − xi ), (8.37b)
n
i=1
Zx
Fbn (x) = fbn (u) du. (8.37c)
0
δi is the censoring indicator corresponding to the i-th ordered observation yi . δi = 1 stands for an
uncensored observation and δi = 0 for a censored observation. bn (i, x) represents the bandwidth
function. The bandwidth will depend inversely on the sample size n, but we can also make it
dependent on the point x for which h(·) is to be estimated and/or on the data point yi processed
by the kernel. The numerous ways of specifying the bandwidth have been developed in order to
optimize the estimator and here they serve as guideline for the organization of this section.
196 8 Smooth Hazard Rate Estimators
The bandwidth function leads to different properties of the resulting hazard rate estimator whereas
the influence of the particular kernel function Kx (·) is only marginal except for the behavior
near
the boundaries. We thus may have different kernels near the boundaries, i.e., for x ∈ 0; b n (i, x)
and x ∈ xend − bn (i, x); xend , and some other kernel in the interior of the data body, i.e., for
x ∈ bn (i, x); xend − bn (i, x) .
We will first present boundary kernels (Sect. 8.1.3.1) before we turn to different possibilities of
choosing a bandwidth (Sect. 8.1.3.2 – 8.1.3.3).
Uniform kernel
U K(u) = 0.5 for for − 1 ≤ u ≤ 1
2 ) + 3 (1 − q) u
(8.41a)
L
4 2 (1 − q + q
U Kq (u) = U K(u) for − 1 ≤ u ≤ q
(1 + q)3
14
Suggested reading for this section: B OUEZMARNI /ROMBOUTS (2008), G ASSER /M ÜLLER (1979),
G ASSER /M ÜLLER /M AMMITZSCH (1985), K ARUNAMUNI /A LBERTS (2005), M ESSER /G OLDSTEIN (1993),
M ÜLLER (1991, 1993), M ÜLLER /WANG (1994).
8.1 Kernel Smoothing 197
E PANECHNIKOV kernel
E K(u) = 0.75 (1 − u2 ) for − 1 ≤ u ≤ 1
(8.41b)
L 64 (2−4q + 6q 2 −3q 3 ) + 240 (1−q)2 u
E Kq (u) = E K(u) for − 1 ≤ u ≤ q
(1 + q)4 (19 − 18 q + 3 q 2 )
Biweight kernel
15
B K(u) = (1 − u2 )2 for − 1 ≤ u ≤ 1
16
64 (8 − 24 q + 48 q 2 − 45 q 3 + 15 q 4 ) (8.41c)
+ 1120 (1 − q)3 u
L
B Kq (u) = B K(u) for − 1 ≤ u ≤ q
(1−q)5 (81−168q+126q 2 −40q 3 +5q 4 )
Triweight kernel
35
T K(u) = (1 − u2 )3 for − 1 ≤ u ≤ 1
32
256(−8(−16 + q(64 + 5q(−32 + q(43
+ 7(−4 + q)q))))) + 80640 (1 − q 4 )u (8.41d)
L
T Kq (u) = T K(u) for − 1 ≤ u ≤ q
6
(1 + q) (5359 + 5q(−3550 + q (4909
+ q (−3620 + q(1517 + 35(−10 + q)q)))))
Two other classes have been suggested by M ÜLLER (1991) and M ÜLLER /WANG (1994). Both
classes result as solutions of a variational problem under asymmetric support and lead to classes
of compactly supported polynomial kernel functions. The class proposed in M ÜLLER /WANG
(1994) gives rise to smaller leading constants of the asymptotic MSE than the previously sug-
gested class. When the basic kernel is the uniform kernel the corresponding boundary kernels in
both classes are the same as (8.41a), but for other types of basic kernels the formulas are different
from formulas(8.41b-d).
The left boundary versions of the 1991–class are
E PANECHNIKOV kernel
( 2 )
91 6(1 + u)(q − u) 1−q 1−q
E Kq (u) = 1+5 + 10 u for − 1 ≤ u ≤ q, (8.42a)
(1 + q)3 1+q (1 + q)2
Biweight kernel
( 2 )
30(1 + u)2 (q − u)
91 1−q 1−q
B Kq (u) = 1+7 + 14 u for −1 ≤ u ≤ q, (8.42b)
(1 + q)5 1+q (1 + q)2
Triweight kernel
( 2 )
3 (q−u)3
91 140(1 + u) 1−q 1−q
T Kq (u) = 1+9 + 18 u for − 1 ≤ u ≤ q, (8.42c)
(1 + q)7 1+q (1 + q)2
E PANECHNIKOV kernel
3q 2 − 2q + 1
94 12(u + 1)
E Kq (u) = + (1 − 2q)u for − 1 ≤ u ≤ q, (8.43a)
(1 + q)4 2
198 8 Smooth Hazard Rate Estimators
Biweight kernel
15(u + 1)2 (q − u) 5(1 − q)2
94 1−q
B Kq (u) = 2u 5 − 1 + (3q − 1) + for − 1 ≤ u ≤ q,
(1 + q)5 1+q 1+q
(8.43b)
Triweight kernel
70(u + 1)3 (q − u)2 7(1 − q)2
94 1−q
T Kq (u) = 2u 7 − 1 + (3q − 1) + for − 1 ≤ u ≤ q,
(1 + q)7 1+q 1+q
(8.43c)
Fig. 8/6 displays all three variants of the triweight boundary kernel.
Figure 8/6: Triweight boundary kernels
(left: linear multiple, center: M ÜLLER–91, right: M ÜLLER /WANG–94)
For all three classes of boundary kernels we can state the following results:
1. All kernels conform to the moments conditions of second order kernels:
Rq Rq Rq 2
Kq (u) du = 1, u Kq (u) du = 0, u Kq (u) du 6= 0.
−1 −1 −1
2. For the right boundary the formulas are the same, but with −u instead of u.
3. For q → 1 the boundary kernel approaches the basic kernel.
4. For q → 0 the boundary kernel takes on negative values. This might lead to negative
hazard rate estimates which should be replaced by zero values.
In (8.39) we have introduced the general form of the direct kernel estimator with a bandwidth
depending on the sample size n, the data to be processed {yi }, and the point x for which the
15
Suggested reading for this section: D IEHL /S TUTE (1988), L O et al. (1989), R AMLAU –H ANSEN (1983),
R ICE /ROSENBLATT (1976), TANNER /W ONG (1983), U ZUNOGULLARI /WANG (1992), WATSON /L EADBET -
TER (1964a,b), YANDELL (1983).
8.1 Kernel Smoothing 199
hazard rate is to be estimated. The simplest case of kernel estimation is given by a bandwidth
which is independent of {yi } as well as of x and is fixed for a given sample size:
bn = bn (i, x) ∀ i and ∀ x.
Sometimes this bandwidth is called global and is determined by one of the methods presented
in Sect. 8.1.1.4. Of course, the optimal, i.e., the MISE or AMISE minimizing global bandwidth
depends on the kernel chosen and on the distribution of the lifetime variable X and on that of
the censoring variable Z, respectively, see (8.52b). The fixed–bandwidth kernel estimator to be
presented in this section reads
n
1 X δi x − yi
hn (x) =
b K . (8.44)
bn n−i+1 bn
i=1
This estimator has been discussed extensively and analyzed with different techniques.
We first look at the properties of the estimator when there is no censoring, i.e., we have δi = 1 ∀ i
and each observation yi is a failure time xi . This case has already been studies by WAT-
SON /L EDBETTER (1064a,b), and subsequently by R ICE /ROSENBLATT (1976) and R AMLAU –
H ANSEN (1983). The estimator in this case of no censoring reads
n
1 X 1 x − xi
hn (x) =
b K . (8.45a)
bn n−i+1 bn
i=1
resulting in
n
X 1
hn (x) =
b Kb (x − xi ). (8.45c)
n−i+1
i=1
The failure times are assumed to be increasing order x1 < x2 < . . . < xn . Thus, the mean or
hn (x) is
expectation of b
n Z∞
h i X 1
E hn (x) =
b Kb (x − u) fi:n (u) du (8.46a)
n−i+1
i=1 0
where
n! n−i
F (u)i−1 1 − F (u)
fi:n (u) = f (u) (8.46b)
(i − 1)! (n − i)!
is the PDF of the i-th order statistic. Upon inserting (8.46b) into (8.46a) and changing the order
of summation and integration we have
h i Z∞ X
n
n
n−i
F (u)i−1 1 − F (u)
E bhn (x) = Kb (x − u) f (u) du. (8.46c)
i−1
0 i=1
As
n
n−i 1 − F (u)n
X n
F (u)i−1 1 − F (u)
=
i−1 1 − F (u)
i=1
200 8 Smooth Hazard Rate Estimators
and observing h(x) = f (x) (1 − F (x) we finally arrive at
h i Z∞
1 − F (u)n h(u) Kb (x − u) du
E bhn (x) =
0
Z∞ Z∞
= Kb (x − u) h(u) du − F (u)n h(u) Kb (x − u) du. (8.46d)
0 0
If x is such that F (x) < 1, the second term tends to zero geometrically as n → ∞. If f (x) is
twice continuously differentiable the first term of (8.46d) can be expanded in a TAYLOR series
analogous to (8.14a,b):
Z∞ Z∞
1 x−u f (u) f (x−bn v)
K = K(v) dv
bn bn 1−F (u) 1−F (x − bn v)
0 0
Z∞
b2n f (x) 00
f (x)
K(v) v dv+o b2n . (8.46e)
= +
1−F (x) 2 1−F (x)
0
In (8.46e) we see that the bias of b hn (x) depends on the second derivative of h(x) = f (x)
[1 − F (x)], but b
hn (x) is asymptotically unbiased.
WATSON /L EADBETTER (1964a,b) give the exact variance of b hn (x) as
h i R∞ K 2 (x − u)
Var b hn (x) = In F (u) du +
0 1 − F (u)
(8.46f)
R R K(x−u1 )K(x−u2 ) 1−F (u1 )n n −F (u )n
F (u 2 ) 1
2 F (u2 )n − dF (u1 )dF (u2 )
0≤u1 ≤u2 <∞ 1 − F (u2 ) 1−F (u1 ) F (u2 )−F (u1 )
where
1−F
(F − B)n − F n
Z
In (F ) = dB.
B
0
Formula (8.46f) is difficult to appraise for finite n, but as n → ∞ only the first term needs to be
considered with the result that
Z
h i 1 h(x)
Var b hn (x) ≈ K 2 (u) du. (8.46g)
n bn 1 − F (x)
This variance formula is similar in construction to that of the PDF kernel estimator given in
(8.17a). Furthermore, at each continuity point of h(x) the estimator is asymptotically normally
distributed.
hn (x) when there is censoring. We assume a random censorship
We now look at the properties of b
model and observe Yi = min(Xi , Zi ) together with δi = I(Xi ≤Zi ) . The Xi ; i = 1, 2, . . . , n; are
i.i.d. lifetimes with CDF FX (·) and PDF f (·). The Xi are independent of the censoring variables
Z1 , . . . , Zn each having CDF FZ (·) and PDF fZ (·). The CDF of Yi will be denoted FY (·) and is
given via
1 − FY (t) = 1 − FX (t) 1 − FZ (t) (8.47a)
and the PDF of the Yi follows as
fY (t) = fX (t) 1 − FZ (t) + fZ (t) 1 − FX (t) . (8.47b)
8.1 Kernel Smoothing 201
The observations yi are assumed increasingly ordered together with the corresponding indicator
δi .
Let
fX (t) 1 − FZ (t)
m(t) = if fY (T ) > 0, (8.48a)
fY (t)
then, see TANNER /W ONG (1983),
E(δi Yi = t) = m(t) (8.48b)
E(δi δj Yi = t, Yj = s) = m(t) m(s) ∀ i < j, t < s. (8.48c)
h i h i
Using (8.48b,c) the derivation of the formulas for E bhn (x) and for Var bhn (x) proceeds in
essentially the same way as in WATSON /L EADBETTER (1964a,b) for the uncensored case. To
illustrate the idea,
n Z∞
Yi = u)
h i X E(δ i
E bhn (x) = Kb (x − u) Y fi:n (u) du
n−i+1
i=1 0
Z∞ " nX n
#
i−1
n−i
= FY (u) 1 − FY (u) fY (u) m(u) Kb (x − u) du
i−1
0 i=1
Z∞
1 − FY (u)n
= fX (u) 1 − FZ (u) Kb (x − u) du
1 − FY (u)
0
Z∞
1 − FY (u)n hX (u) Kb (x − u) du,
= (8.49)
0
observing (8.47a), (8.48a) and hX (u) = fX (u) [1 − FX (u)]. The only difference between (8.49)
and (8.46d) is that we have to use the CDF of Y instead of the CDF of X in the censoring case.
The asymptotic variance (n → ∞) in the censoring case is similar to (8.46g):
Z
h i 1 hX (x)
Var hn (x) ≈
b K 2 (u) du. (8.50)
n bn [1 − FX (x)] [1 − FZ (x)]
depends on the order of the kernel, the bandwidth and the differentiability of the hazard rate.
Typically, the order of the kernel is chosen to be an even number with k = 2 being the standard
choice.The resulting bias and variance are
(−1)k
h i Z
k (k) k
Bias hn (x) = bn h (x)
b u K(u) du + o(1) (8.51a)
k!
Z
h i 1 h(x) 2
Var hn (x) =
b K (u) du + o(1) (8.51b)
n bn [1 − FX (x)] [1 − FZ (X)]
The influence of the bandwidth bn and the trade–off between the bias and the variance is seen
from (8.51) and (8.51b). The optimal rate for the MSE of bh( x) is attained when the squared bias
and the variance are of the same order. This results in an optimal MSE rate of convergence of
202 8 Smooth Hazard Rate Estimators
n2k (2k+1) , which is n4/5 for the standard choice of k = 2. This rate is slower than the usual rate
of n regardless of the order k. For the asymptotic distribution we further assume that
d = lim n b2k+1
n
n→∞
(−1)k
hn (x) − h(x) D
Z
b
k h(x) 2
√ −→ No h (x) ; K (u) du . (8.51c)
n bn k! [1 − FX (x)] [1 − FZ (x)]
Extensions to the estimation of derivatives h(k) (x) of the hazard function can be found in
M ÜLLER /WANG (1990b). These essentially involve a change in the kernel. Derivatives are of
interest to detect rapid changes in the hazard rate or for data based bandwidth choice.
To find the optimal global bandwidth, we have to restrict the range of x to a compact interval
[0, τ ] with FX (τ ) < 1 and FZ (τ ) < 1. The global optimal bandwidth which minimizes the
leading term of Z τ h
h i i2
MISE hn (x) = E
b hn (u) − h(u) du
b (8.52a)
0
is
1/(2k+1
τ
K 2 (u) du
1 Z R
h(u)
bopt = du h i . (8.52b)
n k [1−FX (u)][1−FZ (u)] (−1)k Rτ
k K(u)du
R τ
k (u) 2 du
0
k! 0 u 0 h
This optimal global bandwidth involves unknown quantities, so that in practice one has to find
alternatives. There is an extensive literature on bandwidth selection, see Sect. 8.1.1.4.
One way to pick a good bandwidth is to use a cross–validation technique for determining the
bandwidth that minimizes some measure of how well the estimate performs. One such measure
hn (x) over the range 0 to τ, see (8.52a), defined
is the mean integrated squared error (MISE) of b
by
h i Z τ h i2
MISE hn (x) = E
b hn (u) − h(u) du
b
0
Z τ Z τ Z τ
2 2
= E hn (u) du −2 E
b hn (u) h(u) du + E
b h (u) du . (8.53a)
0 0 0
This function depends both on the kernel used to estimate h(·) and on the bandwidth bn . Note
that, although the last term in (8.53a) depends on the unknown hazard rate, it is independent of
the choice of the kernel and of the bandwidth and can be ignored when finding the best value of
Rτ 2
bn . The first term of (8.53a) can be estimated by b h (u) du. If we evaluate b
n hn (·) at a grid of
0
points 0 < u1 < . . . < u` = τ, then, we find an approximation to this integral by some formula
hn (·), e.g., by the trapezoid rule as
of numerical integration over b
τ `−1
ui+1 − ui hb 2
Z X i
h2n (u) du ≈
b h2n (ui+1 ) .
hn (ui ) + b (8.53b)
0 2
i=1
where the sum is over all observed times between 0 and τ. Thus, to find the best value of bn which
minimizes the MISE for a fixed kernel, we find bn which minimizes the function
`−1 i 2 X y −y
X ui+1 −ui hb 2 2 i j δi δj
g(bn ) = hn (ui )+ hn (ui+1 ) −
b K . (8.53d)
2 bn bn n−i+1 n−j +1
i=1 i6=j
One also has to find alternatives to calculate the variance (8.51b) which contains unknown quan-
Rx
hn (x) for h(x), exp − b
tities. One possibility is to use b hn (u) du for 1 − FX (x), where the
0
integral may be evaluated trapezoid rule, and to neglect 1 − FZ (x) what will lead to an
by the
underestimation of Var bhn (x) for censored samples. Another possibility has been suggested
by K LEIN /M OESCHBERGER (1997, p. 153). Since the kernel–smoothed estimator is a linear
combination of the increments of the cumulated hazard rate
n
1 X x − yi
hn (x) =
b K ∆Hb n (yi )
bn bn
i=1
with
δi
∆H
b n (yi ) =
n−i+1
a crude estimator of Var bhn (x) follows as
n
1 X x − yi
Var hn (x) = 2
c b K ∆Var
c H b n (yi ) (8.54a)
bn bn
i=1
c H
with Var b n (x) given in (5.12b) so that
δi (n − i)
∆Var
c H b n (yi ) = . (8.54b)
(n − i + 1)3
The following data are from B YSON /S IDDIQUI (1961) and represent the ordered times (in days) at death
of n = 43 patients suffering from chronic granulocytic leukemia with x = 0 taken as the patient’s date of
diagnosis. This is an uncensored sample.
Fig. 8/7 shows the hazard rate estimated with four different kernels. The grid has 100 evenly spaced points
between 0 and 2509 in each case and the bandwidth is bn = 250 for all cases. The resulting hazard rate
is nearly constant for 0 < x < 1600 and is increasing for x ≥ 1600. With the exception of the uniform
kernel the other three kernels produce nearly the same hazard rates.
In Fig. 8/8 we find four estimated hazard rates, each using the E PANECHNIKOV kernel with a grid of 100
evenly spaced points between 0 and 2509, and bandwidths 600, 300, 200, 100, respectively. bn = 600 gives
an oversmoothed estimate whereas bn = 100 produces a rather erratic course. bn = 300 and bn = 200
seem to be good compromise. The computations for this example have been done by using the MATLAB
program Hazard 04 of the appended software package Inference.zip.
204 8 Smooth Hazard Rate Estimators
Figure 8/7: Hazard rate estimates for the survival time of 43 patients having granulocytic leukemia —
Different kernels and common bandwidth bn = 250
Figure 8/8: Hazard rate estimates for the survival time of 43 patients having granulocytic leukemia —
Different bandwidths and common E PANECHNIKOV kernel
Up to now we have presented results and formulas for samples with untied observations. The key
formulas have to be modified properly when the sample has tied observations where the ties may
8.1 Kernel Smoothing 205
occur among the uncensored as well as among the censored observations. We have the following
notation, also see Fig. 4/1:
dj ≥ 1 — number of failures at xj ,
hn (x) in (8.54a) now reads
and the crude estimator of Var b
k
1 X 2 x − xj dj (nj − dj )
Var
c b hn (x) = 2 K (8.55b)
bn bn
j=1
n3j
`−1
X ui+1 − ui hb 2 i 2 X xi − xj di dj
g(bn ) = h2n (ui+1 ) −
hn (ui ) + b K . (8.55c)
2 bn bn ni nj
i=1 i6=j
The fixed–bandwidth kernel estimator of the hazard rate has many good properties, e.g., asymp-
totic unbiasedness, mean square error consistency, asymptotic normality, but in practical applica-
tion it has been observed that a globally constant bandwidth leads to undesirable effects whenever
the data are not evenly distributed over the whole range of interest. The fixed–bandwidth kernel
estimator cannot adopt to unevenness in the distribution of the data and thus tends to oversmooth
in regions with many observations and to undersmooth in regions with few observations reveal-
ing many misleading peaks. One approach to overcome these problems of the fixed–bandwidth
estimator is to incorporate the idea of nearest neighbor into the definition of the bandwidth. The
resulting estimator has a bandwidth which adapts to the configuration of the data. There are two
estimators emerging from this idea depending on what is the point from where to look for the k-th
nearest neighbor.
16
Suggested reading for this section: BAGKAVOS /PATIL (2009), C HENG (1987), D ETTE /G EFELLER (1995),
G EFELLER /D ETTE (1991, 1992), G EFELLER /M ICHELS (1992), H ESS et al. (1999), M ÜLLER /WANG (1990b,
1994), N IELSEN (2003), S CH ÄFER (1985, 1986), TANNER (1983, 1984), TANNER /WANG (1984).
206 8 Smooth Hazard Rate Estimators
1. The local kernel estimator takes the distance from x, the point of interest where to estimate
h(x), to its k-th nearest neighbor among the observations as bandwidth.
2. The variable kernel estimator defines the bandwidth as distance from Xi , the i-th uncen-
sored observation in ascending order to its k-th nearest neighbor among the observations.
• How to find the k-th nearest neighbor when there are censored observations? — This
question has been answered in different ways as will be shown further down.
We first turn to the local kernel estimator for a sample of n uncensored and distinct observations
X1 , X2 , . . . , Xn . In this case, the kernel estimator of h(x) is defined as
n
X 1 1 x − Xi
h(x) =
b K , (8.57)
n − i + 1 D(kn , x) D(kn , x)
i=1
where D(kn , x) is the the distance of x to its k-th nearest neighbor among X1 , X2 , . . . , Xn .
D ETTE /G EFELLER (1995) have derived the asymptotic MISE of this estimator. TANNER (1983)
has shown that the estimator (8.57) converges (almost surely) to h(·) at each point of continuity of
h(·), provided that the sequence kn fulfills the condition kn = nα , α ∈ (0.5, 1). Other versions of
h(x) have been considered by L IU /VAN RYZIN (1985) and C HENG (1987). These
the estimator b
authors have proved asymptotic normality and strong consistency. Mathematical disadvantages
of (8.57) are due to the fact that the bandwidth function by the k-th nearest neighbor distance is
not differentiable at all x > 0 and that the integral from 0 to ∞ over (8.57) is not bounded in
general.
The problem of transferring appropriately the definition of D(kn , x) from the uncensored to the
censored setting has been solved in different ways.
S CH ÄFER (1985) points out that these distances are biased by the censoring distribution in
the sense that they adapt to the conditional density of Xi under the condition Xi ≤ Zi of
being uncensored rather than to the density function f (·) or the hazard rate to be estimated.
b n (x − d) ≤ kn − 1
n o
b n (x + d) − H
D2 (kn , x) = sup d > 0 H (8.59a)
n
17
The paper of TANNER gives a FORTRAN–code for the variable kernel estimator of the hazard rate.
8.1 Kernel Smoothing 207
resulting in
n
X δi 1 x − Yi
h(x) =
b K . (8.59b)
n − i + 1 D2 (kn , x) D2 (kn , x)
i=1
a) ”Even if no censored data were observed in the sample, D2 (kn , x) is not identical to
D(kn , x) of (8.57), i.e., D2 (kn , x) does not reveal the natural definition of nearest
neighbor distances in the uncensored setting.”
b) ”In addition, one inherent property of H b n (·) has an awkward effect on the definition
of nearest neighbor distances: the heights of the steps in H
b n (·) increase automatically
by definition as x → Yn . Consequently, in the right tail of the lifetime distribution
this effect dominates the value (kn − 1)/n used in the definition of D2 (·, ·).”
3. To avoid the problems mentioned in a) and b) above, the authors D ETTE and G EFELLER
propose a modification of S CH ÄFER’s idea defining
n kn − 1 o
D3 (kn , x) = sup d > 0 Sn (x − d) − Sn (x + d − 0) ≤
b b (8.60)
n
where Sbn (x + d − 0) denotes the limit from the left of the K APLAN /M EIER estimator of
S(·) — see (5.3d) — at the point x+d. Using Sbn (·) instead of H
b n (·) resolves the drawbacks
of D2 (·, ·).
In an intensive simulation study G EFELLER /D ETTE (1992) found that the kernel hazard rate esti-
mator based an D3 (·, ·) always had a smaller MISE than that based on D1 (·, ·) or D2 (·, ·).
We now turn to the variable kernel estimator of the hazard rate which has been proposed by TAN -
NER /W ONG (1984) using a bandwidth which is the distance D(kn , Xi ) between the observation
Xi and its k-th nearest neighbor among the remaining uncensored observations:
n
X 1 1 x − Xi
hn (x) =
b K , (8.61)
n − i + 1 D(kn , Xi ) D(kn , Xi )
i=1
assuming a sample with uncensored data. This definition of the bandwidth is independent of the
points of interest x and adapts only to the configuration of the data. The kernel estimator (8.61)
is differentiable for appropriate kernel functions K(·) at all x > 0 and its integral is bounded.
We mention that contrary to the hazard rate estimator with fixed bandwidth, the bandwidth here
is not globally constant and contrary to the local kernel estimators (8.57) – (8.59) the number
of observations influencing bh(x) is not fixed either. Statistical properties of (8.61), e.g., uniform
convergence to h(x), have been derived by S CH ÄFER (1985).
Example 8/3: Local and variable kernel estimators of the hazard rate
Using the data of the preceding Example 8/2 Fig. 8/9 shows the smoothed hazard estimated by the local and
the variable kernel method, respectively, taking the biweight kernel with k = 13 and 100 gridpoints. Evi-
dently and ceteris paribus the variable kernel method produces a smoother curve. The computations have
been done with the help of the MATLAB programs Hazard 05 and Hazard 06 of the appended software
package Inference.zip.
208 8 Smooth Hazard Rate Estimators
Figure 8/9: Hazard rate estimates for the survival time of 43 patients having granulocytic leukemia using
a local and a variable biweight kernel with k = 13 and 100 gridpoints
A kernel–based estimator with varying bandwidths can be found by optimizing the bandwidth for
a given point of interest x. For kernel estimators, the bandwidth regulates the trade–off between
local bias and local variance. Thus, another way to overcome the non–adaptive behavior of the
fixed–bandwidth estimator is to vary the local bandwidth in order to balance local variance and
local bias. A natural objective function is the local MSE expressing the estimation error as a
function of local bias and local variance. An optimal local bandwidth kernel estimator can be
found by minimizing an estimate of the local MSE with respect to the bandwidth. This problem
has been solved by M ÜLLER /WANG (1990b, 1994) who also give a MATLAB program named
HADES for doing this job.18
These places are called knots and in hazard rate estimation they — generally — are the observed
lifetimes Xi . The most commonly used splines are cubic splines, i.e., they consist of polynomials
of order 3 and their first derivatives coincide at the knots so that there are no sharp edges in the
curvature.
There are several types of spline methods, the most widely investigated spline method for hazard
smoothing is the penalized likelihood approach. Let η(x) = ln h(x) be the log hazard rate. Then
the log likelihood function for the censored data is
ZYi
Xn
L(η) = δi η(Yi ) − eη , (8.62a)
i=1 0
which is unbounded if no shape restriction on η(·) is imposed. A penalty P (η), measuring the
roughness of η(·), is therefore introduced in (8.62a). The penalized estimator ηb(·) of η(·) is the
maximum of the penalized log likelihood
ZYi
n
1 X α
L(η) = δi η(Yi ) − eη − P (η) (8.62b)
n 2
[i=1 0
among all η(·) in a H ILBERT space. α is a smoothing parameter playing the same role as the band-
width b in kernel estimation. A smaller α yields a better fit bit a more rough curve. JARJOURA
(1988) describes how to determine the smoothing parameter α by a cross–validation likelihood
approach.
The penalty function P (η) determines the kind of spline resulting from (8.62b). With
Z
(2) 2
P (η) = η (x) dx (8.62c)
we find an estimator which is twice continuously differentiable and a piecewise cubic polynomial
between two consecutive Xi0 s. For computing details see the papers of O’S ULLIVAN (1988a,b)
and for asymptotic results see G U (1996). A NDERSON /S ENTHILSELVAN (1980) take
Z
0 2
P (η) = h (x) dx (8.62d)
h(x) as a piecewise quadratic spline that may result in negative values under heavy
which leads to b
censoring. Another type of spline method is regression splines or B–splines which adopt a fixed
number of knots and basis functions. ROSENBERG (1995) describes how to select the number
and the location of the knots.
Wavelet–based hazard rate estimation has been treated by many authors for a long time. A
wavelet is a wave–like oscillation with an amplitude that begins at zero, increases, and then de-
creases back to zero. For instance, it can be visualized as a brief oscillation like one might see
recorded by a seismograph or heart monitor.
Let φ be the so–called ‘father wavelet’ and ψ the ‘mother wavelet’.21 We assume that the functions
• A plot provides a complete and easy–to–grasp picture of data according to an old Chinese
proverb saying that one picture is worth a thousand words, or in the present context, a
thousand numbers. A plot is particular useful in presenting data, since it aids in convincing
others of conclusions drawn from data by numerical methods.
• A plot provides a convenient means of fitting a theoretical distribution to data. This can
can be done by drawing a straight line by eye through the plotted data points on specialized
graph paper. This line is used to smooth, interpolate and extrapolate data. Estimates of
distribution parameters, percentiles, and predictions of number of failed and unfailed units
in specified periods of time are easily obtained from this straight line.
• A plot allows one to assess whether a chosen theoretical distribution provides an adequate
fit to the data or not. The data points will tend to plot as a straight line on the plotting paper
for a satisfactory distribution. Non–random departures of the plotted data from a straight
line can provide useful information to the statistician. Such departures may indicate that
the chosen distribution is incorrect, that there is more than one failure mode, or that certain
data points are outliers that do not fit in with the rest of the data.
The object of plotting in this chapter is the cumulative hazard function H(x). For this purpose
we first have to find estimates of H(x), see Sect. 5.2., which then will be displayed either on
normal (= naturally or linearly scaled) graph paper or on specialized graph paper as presented
in Sect. 9.2. The plot on normal graph paper serves to make non–parametric inference whereas
the plot on special hazard paper aims at making inference on a hypothetical parametric lifetime
distribution. We close this chapter by a section on how to find the appropriate hazard–scale for
distributions belonging to the location–scale family.
Hazard plotting — like probability plotting — can be successfully applied to location–scale dis-
tributions2 and to those distributions that after suitable transformation can be converted into a
location–scale type. Hazard plots and probability plots are closely related to one another, the
main difference is the scaling of the ordinate where to lay down the CHR–values instead of the
CDF–values and the choice of the plotting position, i.e., the ordinate–value to be plotted against
the ordered sample values xi on the abscissa. Each member of the location–scale family has an
ordinate–scaling of its own, distorted in such a way that, when the sample comes from the per-
tinent distribution, the plotted points on the graph paper will randomly scatter around a straight
line, thus giving a graphical and informal goodness–of–fit test. The fitted line — either fitted by
eye or by regression — enables the statistician to read off estimates of the location parameter and
of the scale parameter, respectively, either as point or as an interval on the abscissa.
We will first demonstrate how to construct a hazard paper and how to find estimates of the location
and scale parameters for a genuine location–scale distribution and for distributions transformed
to location–scale type. Then we comment on the choice of the plotting position.
A random variable X is said to belong to the location–scale family when its CDF
FX (x | a, b) = Pr(X ≤ x | a, b) (9.1a)
SX (x | a, b) = 1 − FX (x | a, b) (9.1e)
SY (y) = 1 − FY (y). (9.1f)
Let Λ, Λ ≥ 0, be a value of the CHR, then the hazard quantile of order Λ in the reduced case is
and consequently
xΛ = a + b yΛ . (9.1i)
The hazard paper for a location–scale distribution is now constructed by taking the horizontal
axis (abscissa) for x or xΛ and the vertical axis (ordinate) for y or yΛ 4 where the labeling of this
axis is according to the corresponding CHR–value Λ. This procedure gives a scaling with respect
to Λ which is non–linear, the only exception is the exponential distribution,5 where Λ and YΛ
coincide:
x−a
FX (x | a, b) = 1 − exp − = FY (y)
b
x−a
HX (x | a, b) = = y = HY (y) = Λ
b
yΛ = HY−1 (Λ) = Λ.
The probability grid on a probability paper and the hazard grid on a hazard paper for one and the
same distribution are related to one another because
Λ = − ln(1 − P ) (9.1j)
P = 1 − exp(−Λ), (9.1k)
where P is a given value of the CDF. Thus, a probability grid may be used for hazard plotting
when the P –scaling on the ordinate is supplemented by a Λ–scaling, see Fig. 9/1. Conversely, a
hazard paper may be used for probability plotting.
The extreme value distribution of type I for the minimum, the log–W EIBULL distribution, is a
genuine location–scale distribution:
x−a
FX (x | a, b) = 1 − exp − exp ; a ∈ R, b > 0, x ∈ R, (9.2a)
b
FY (y) = 1 − exp − exp(y) , y ∈ R. (9.2b)
yΛ = ln Λ (9.2d)
4
Some authors construct hazard paper by interchanging the axes. This approach has some justification when
looking at (9.1i), where the dependent variable is xΛ which normally is laid down on the ordinate, and yΛ is the
independent variable, to be displayed on the abscissa.
5
For a discrete distribution the exception is the geometric distribution with
Λ = HY (y) = P y; y = 0, 1, 2 . . . , 0 ≤ P < 1
and
Λ
yΛ = HY−1 (Λ) = .
P
214 9 Hazard Plotting
xΛ = a + b yΛ = a + b ln Λ. (9.2e)
The hazard paper of the log–W EIBULL distribution has a logarithmic scale on the ordinate and a
linear scale on the abscissa. For didactic reasons we have given three scalings in Fig. 9/1, the Λ–,
the P – and the y–scaling, and only the last one is linear.
When the straight line xΛ = a + b yΛ is given, we may find a and b by suitably chosen values of
Λ. For Λ = 1 we have
x1 = a + b ln 1 = a (9.2f)
and the distance between the x–hazard quantiles of order Λ = 1 and Λ = exp(1) = e leads to b :
xe − x1 = (a + b ln e) − (a + b ln 1) = b. (9.2g)
x−a c
FX (x | a, b, c) = 1 − exp − ; a ∈ R, b, c > 0, x ≥ a. (9.3a)
b
a, b, c are the original location, scale and shape parameters. When a, the lower threshold of X,
is known (mostly a = 0) or has been estimated in some way or the other, the transformed variable
X ∗ = ln(X − a) (9.3b)
where
a∗ = ln b, (9.3d)
b∗ = 1 c.
(9.3e)
So, the hazard paper of the W EIBULL distribution has the same ordinate as the log–W EIBULL
distribution, but a logarithmic scale on the abscissa.
We now turn to how to find an estimate of H(x) = Λ, i.e., how to find a plotting position on
the hazard grid. Hazard plots are not based on the PLE of S(x), which would give H(x e i) =
− ln S(xi ), but they rest upon the empirical cumulative hazard function (5.12a). When the data
b
set is singly type–II censored, the observed distinct lifetimes x1 < x2 < . . . < xk are the first k
lifetimes in a sample of size n,6 and the number of units at risk at xi is ni = n − i + 1. This gives
j
X 1
H(x
b j) = Λ
bj = ; j = 1, 2, . . . , k. (9.4)
n−i+1
i=1
The quantity ni = n − i + 1 in (9.4) is nothing but the reverse rank which results in the case
of random censoring when all observations — censored as well as uncensored — would be or-
dered, but then the summation is only over those reciprocal reverse ranks belonging to uncensored
observations, see Example 9/1.
One argument in support of (9.4) is that with singly type–II censoring it can be shown that the
estimator (9.4) is unbiased:
E H(x
b j ) = H(xj ). (9.5)
To prove this suppose X has survivor function S(x). As is well known, the random variable
U = S(X) has an uniform distribution on [0, 1], and hence W = − ln U = H(X) has the
reduced exponential distribution with PDF f (w) = exp(−w), w ≥ 0. Therefore, if X1 <
X2 < . . . < Xn are the ordered random observations in a sample of size n, the random variable
Wj = H(Xj ) is the j-th ordered observation in a random sample of size n from the reduced
exponential distribution. As is known too, see R INNE (2010, p. 121), the mean of Wj is
j
X 1
E(Wj ) = , (9.6)
n−i+1
i=1
and thus the stated result follows from (9.4). It is more tedious, see N ELSON (1972), to show that
if the data are progressively (= multiply) type–II censored, the result (9.5) still holds, where Xj
represents the j-th smallest uncensored observation. N ELSON (1982) suggests modified plotting
positions obtained by averaging the hazard step function at thejumps xj . The modified position
of the earliest failure x1 is half its regular hazard value Λ1 = 1 n. The modified positions agree
better with a distribution fitted by maximum likelihood.
How to find estimates for the location–scale parameters a and b? — A first possibility is least–
squares estimation of (9.1i), interpreted as a regression of YΛ on XΛ , where the regressand is
taken as uncensored observation of order j and the regressor is
YΛb j = HY−1 Λ
bj , (9.7a)
6
The reasoning will be the same for an uncensored sample where k = n.
216 9 Hazard Plotting
matrices
X1 1 YΛb 1
X2 1 YΛb 2 a
x=
..
; 1 =
..
; y
b = ..
; θ = ; A = 1 y
b (9.7b)
. . . b
Xk 1 YΛb k
This estimator is not statistically optimal as the regressor variables YΛb j are neither homoscedastic
nor free of autocorrelation.7
A second possibility is an eye–fitted straight line to the point–cluster on the hazard paper
whereby the user should keep in mind that the horizontal distances between the xj ’s and the
straight line have to be rendered as small as possible. If we look at (9.1i) we see that
Example 9/1: Hazard plotting and parameter estimation for a logistic distribution
The following table gives a simulated data set (n = 15) from a logistic distribution having a = 20 and
b = 2. The sample has been randomly censored. The logistic distribution has
−1
x−a
SX (x) = 1 + exp , a ∈ R, b > 0, x ∈ R, (9.9a)
b
giving the cumulated hazard rate
x−a
HX (x) = − ln SX (x) = ln 1 + exp . (9.9b)
b
From the reduced distribution we find
Λ = HY (y) = ln 1 + exp(y) , y ∈ R, (9.9c)
The hazard paper in Fig. 9/2 has an ordinate scaled according to (9.9c). The figure displays the plotted data
and the OLS–fitted straight line with parameter estimates
a = 21.1580, bb = 2.1428
b
7
See R INNE (2010, Chapter 4) for the alternative general least–squares estimator, which is implemented in the
MATLAB program LEPP appended to the monograph.
8
We have location–scale distributions where we have to choose other reduced quantiles than y = 0 and/or y = 1
to find hazard–quantile estimators for a and b, see, e.g., the arc–sine distribution in Sect. 9.3.
9.2 Hazard Plots and Hazard Paper 217
which do not differ much from the input parameters a = 20 and b = 2. In order to find the hazard–quantile
estimates we need
Λ0 = ln 1 + exp(0) = ln 2 ≈ 0.6931,
Λ1 = ln 1 + exp(1) ≈ ln 3.7183 ≈ 1.3133.
Fig. 9/2 then shows the way to read–off the hazard–quantile estimates which — apart from random er-
rors — agree with the OLS–estimates.
j
j xj δj nj δ j nj cj = P 1 ni
Λ
i=1
1 16.3 1 15 0.0667 0.0667
2 16.5 0 14 0 −
3 17.5 1 13 0.0769 0.1436
4 17.9 1 12 0.0833 0.2269
5 17.9 0 11 0 −
6 18.7 1 10 0.1000 0.3269
7 20.0 1 9 0.1111 0.4380
8 20.1 1 8 0.1250 0.5630
9 20.4 1 7 0.1429 0.7059
10 20.7 0 6 0 −
11 21.7 1 5 0.2000 0.9059
12 22.2 1 4 0.2500 1.1559
13 24.2 1 3 0.3333 1.4892
14 26.3 1 2 0.5000 1.9892
15 26.3 0 1 0 −
π − 2 arcsin x−a
b
S(x) = ; a ∈ R, b > 0, a − b ≤ x ≤ a + b
2π
Λ = HY (y) = ln(2 π) − ln π − 2 arcsin(y) ; −1 ≤ y ≤ 1
yΛ = cos π exp(−Λ) ; Λ ≥ 0
a = x(− ln[1/2]) ≈ x0.6931
b = x(− ln[1/3]) − x(− ln[2/3]) ≈ x1.0986 − x0.4055
C AUCHY distribution
1 1 x−a
S(X) = 2 − π ; a ∈ R, b > 0, x ∈ R
arctan b
Λ = HY (y) = ln(2 π) − ln π − 2 arctan(y) ; y ∈ R
yΛ = tan π exp(−Λ) ; Λ ≥ 0
a = x(− ln[1/2]) ≈ x0.6931
b = x(− ln[1/4]) − x(− ln[3/4]) ≈ x1.3863 − x0.2877
x−a
S(x) = 0.5 1 − sin b ;
π
b ≤ x ≤ a + π2 b
a ∈ R, b > 0, a − 2
= HY (y) = ln 2 − ln 1 − sin(y) ; − π2 ≤ y ≤ π
Λ 2
yΛ = arcsin 1 − 2 exp(−Λ) ; Λ ≥ 0
a = x(− ln[1/2]) ≈ x0.6931
b = x(ln{2/[1−sin(0.5)]}) − x(ln{2/[1−sin(−0.5)]}) ≈ x1.3460 − x0.3015
1 x−a 1 x−a
S(x) = 2 1− b − ; a ∈ R, b > 0, a − b ≤ x ≤ a + b
π sin π b
Λ = HY (y) = ln(2 π) − ln π (1 − y) − sin(π y) ; −1 ≤ y ≤ 1
yΛ − cannot be given in closed form
a = x(− ln[1/2]) ≈ x0.6931
b ≈ x(− ln 0.0908) − x(− ln 0.9092) ≈ x2.3391 − x0.0952
9.3 Hazard Papers for Location–scale Distributions 219
Exponential distribution
x−a
S(x) = 1 − exp ; a ∈ R, b > 0, x ≤ a
b
Λ = HY (y) = − ln 1 − exp(y) ; y ≤ 0
yΛ = ln 1 − exp(−Λ) ; Λ ≥ 0
a is the upper threshold of the reflected exponential distribution where Λ = HX (a) = ∞. Thus,
a cannot be read off as a point on the abscissa. Therefore, we propose the following procedure.
First, from the difference of x(y = −1) = a − 1 b and x(y = −2) = a − 2 b we find
we have
a = 2 x(Λ[−1]) − x(Λ[−2]) ≈ 2 x0.4587 − x0.1482 .
Extreme value distribution of type I for the minimum (Log–W EIBULL distribution)
x−a
S(x) = exp − exp b ; a ∈ R, b > 0, x ∈ R
Λ = HY (y) = exp(y); y ∈ R
yΛ = ln(Λ); Λ ≥ 0
a = x(exp[0]) = x1
b = x(exp[1]) − x(exp[0]) ≈ x2.7181 − x1
220 9 Hazard Plotting
2 x−a
S(x) = 1 − π arctan ; a ∈ R, b > 0, x ≥ a
b
= HY (y) = − ln 1 − π2 arctan(y) ; y ≥ 0
Λ
= tan π2 1 − exp(−Λ) ; Λ ≥ 0
yΛ
a = x0
b = x(− ln[1−(2/π) arctan(1)]) − x0 ≈ x0.6931 − x0
Half–logistic distribution
2 exp − x−a
b ; a ∈ R, b > 0, x ≥ a
S(x) =
1 + exp − x−a
b h i
2
Λ = HY (y) = − ln 1+exp(y) ; y≥0
yΛ = ln 2 exp(Λ) − 1 ; Λ ≥ 0
a = x0
b = x(− ln{2/[1+exp(1)]}) − x0 ≈ x0.6201 − x0
Half–normal distribution
x−a
S(x) = 2 1 − Φ b ; a ∈ R, b > 0, x ≥ a
Λ = HY (y) = − ln 2 1 − Φ(y) ; y ≥ 0
h i
yΛ = Φ−1 1 − exp(−Λ)
2 ; Λ≥0
a = x0
b = x(− ln{2 [1−Φ(1)]}) − x0 ≈ x1.1479 − x0
Φ(·) is the CDF of the standardized normal distribution and Φ−1 (·) is its percentile function.
2 x−a
S(x) = 1 − π arctan exp b ; a ∈ R, b > 0, x ∈ R
2
Λ = HY (y) = − ln 1 − π arctan exp(y) ; y ∈ R
= ln tan π2 1 − exp(−Λ)
yΛ ; Λ≥0
a = x(− ln{1−(2/π) arctan[exp(0)]}) ≈ x0.6931
b = x(− ln{1−(2/π) arctan[exp(1)]}) − x(− ln{1−(2/π) arctan[exp(0)]}) ≈ x1.14941 − x0.6931
9.3 Hazard Papers for Location–scale Distributions 221
L APLACE distribution
1 − 0.5 exp − a−x for x ≤ a
b
S(x) = ; a ∈ R, b > 0
0.5 exp − x−a for x ≥ a
b
− ln 1 − 0.5 exp(−y) for y ≤ 0
Λ = HY (y) =
− ln 0.5 exp(−y) for y ≥ 0
− ln 2 1 − exp(−Λ) for Λ ≤ − ln 0.5 ≈ 0.6931
yΛ =
− ln 2 exp(−Λ) for Λ ≥ − ln 0.5 ≈ 0.6931
Logistic distribution
x−a
−1
S(x) = 1 + exp b; a ∈ R, b > 0, x ∈ R
Λ = HY (y) = ln 1 + exp(y) ; y ∈ R
yΛ = ln exp(Λ) − 1 ; Λ ≥ 0
a = x(ln[1+exp(0)]) ≈ x0.6931
b = x(ln[1+exp(1)]) − x(ln[1+exp(0)]) ≈ x1.3133 − x0.6931
x−a
S(x) = 1 − Φ b; a ∈ R, b > 0, x ∈ R
Λ = HY (y) = − ln 1 − Φ(y) ; y ∈ R
= Φ−1 1 − exp(−Λ) ; Λ ≥ 0
yΛ
a = x(−[1−Φ(0)]) ≈ x0.6931
b = x(− ln[1−Φ(1)]) − x(−[1−Φ(0)]) ≈ x1.8407 − x0.6931
Φ(·) is the CDF of the standardized normal distribution and Φ−1 (·) is its percentile function.
222 9 Hazard Plotting
R AYLEIGH distribution
h i
x−a 2
S(x) = exp − 21
b ; a ∈ R, b > 0, x ≥ a
= HY (y) = y 2 2; y ≥ 0
Λ
√
yΛ = 2 Λ; Λ ≥ 0
a = x0
b = x0.5 − x0
a = x0
b = x(− ln[1−exp(−1)]) − x0 ≈ x0.4587 − x0
Semi–elliptical distribution
q
1 1 x−a x−a 2 x−a
S(x) = 2 − π b 1− b + arcsin b ; a ∈ R, b > 0, a − b ≤ x ≤ a + b
n h p io
Λ = HY (y) = − ln 12 − π1 y 1 − y 2 + arcsin(y) ; −1 ≤ y ≤ 1
yΛ − admissible solution, ı.e., − 1 ≤ yΛ ≤ 1, of
q
yΛ 1 − yΛ 2 + arcsin(y ) = π [0.5 − exp(−Λ)]
Λ
T EISSIER distribution
x−a x−a
S(x) = exp 1 + b − exp b ; a ∈ R, b > 0, x ≥ a
Λ = HY (y) = exp(y) − y − 1; y ≥ 0
yΛ − admissible solution, i.e., yΛ ≥ 0, of exp(yΛ ) − yΛ = 1 + Λ
a = x0
b = x(exp[1]−2) − x0 ≈ x0.7183 − x0
a+b−x 2
S(x) = ; a ∈ R, b > 0, a ≤ x ≤ a + b
b
Λ = HY (y) = −2 ln(1 − y); 0 ≤ y ≤ 1
= 1 − exp − Λ2 ; Λ ≥ 0
yΛ
a = x0
b = 2 (x1.3863 − x0 )
x−a 2
1
1− 2 1+ b for a − b ≤ x ≤ a
S(x) = ; a ∈ R, b > 0
x−a 2
1
2 1− for a ≤ x ≤ a + b
b
− ln 1 − 1 (1 + y)2 for − 1 ≤ y ≤ 0
2
Λ = HY (y) =
ln 2 − 2 ln(1 − y) for 0 ≤ y ≤ 1
q
2 1 − exp(−Λ) − 1 for Λ ≤ ln 2 ≈ 0.6931
yΛ =
1 − p2 exp(−Λ) for Λ ≥ ln 2 ≈ 0.6931
a = x(ln 2) ≈ x0.6931
b = x(ln 2− ln[1−0.5]) − x(− ln[1−0.5 (1−0.5)2 ]) ≈ x2.0794 − x0.1335
224 9 Hazard Plotting
Uniform distribution
x−a
S(x) = 1 − ; a ∈ R, b > 0, a ≤ x ≤ a + b
b
Λ = HY (y) = − ln(1 − y); 0 ≤ y ≤ 1
yΛ = 1 − exp(−Λ); Λ ≥ 0
a = x0
b = 2 (x(− ln[0.5]) − x0 ) ≈ 2 (x0.6931 − x0 )
V–shaped distribution
h i
1 a−x 2
2 1+ b for a − b ≤ x ≤ a
S(x) = h i ; a ∈ R, b > 0
1 a−x 2
2 1− b for a ≤ x ≤ a + b
ln 2 − ln(1 + y 2 ) for − 1 ≤ y ≤ 0
Λ = HY (y) =
ln 2 − ln(1 − y 2 ) for 0 ≤ y ≤ 1
p
2 exp(−Λ) − 1 for Λ ≤ ln 2 ≈ 0.6931
yΛ =
p1 − 2 exp(−Λ) for Λ ≥ ln 2 ≈ 0.6931
a = x(ln 2) ≈ x0.6931
b = x(ln 2−ln[1−0.52 ]) − x(ln 2−ln[1−(−0.5)2 ]) ≈ x0.9808 − x0.4700
We now turn to those distributions that by a ln–transformation can be converted into a location–
scale type. Generally, the original form of these distributions has three parameters, a location
(shift) parameter a, a scale parameter b and a shape parameter c. The location parameter a must
be known to make the ln–transformation. One way to find an estimate of a is by trial–and–error,
b i over ln(xi − a) is sufficiently linear.
i.e., a is chosen so that the hazard plot of Λ
Extreme value distribution of type II for the maximum (Inverse W EIBULL distribution)
" #
x − a −c
S(x) = 1 − exp − ; a ∈ R, b > 0, c > 0, x ≥ a
b
X ∗ = ln(X − a) has an extreme value distribution of type I for the maximum (G UMBEL distri-
bution) with parameters a∗ = ln b and b∗ = 1 c. Thus we find
a∗ ≈ x∗0.4587
b∗ ≈ x∗0.4587 − x∗0.0683
ln(x − a)–scaled
on the abscissa of a hazard paper with an ordinate scaled as Λ = − ln 1 −
exp − exp(y) , y ∈ R. b and c follow as re–transforms of a∗ and b∗ .
a∗ ≈ x∗1
b∗ ≈ x∗2.7181 − x∗1
9.3 Hazard Papers for Location–scale Distributions 225
on the − ln(a − x)–scaled abscissa of a hazard paper with an ordinate scaled as Λ = exp(y),
y ∈ R. b and c follow as re–transforms of a∗ and b∗ .
Extreme value distribution of type III for the maximum (Reflected W EIBULL distribution)
a−x c
S(x) = 1 − exp − ; a ∈ R, b > 0, c > 0, x ≤ a
b
X ∗ = − ln(a − X) has an extreme value distribution of type I for the maximum (G UMBEL
distribution) with parameters a∗ = − ln b and b∗ = 1 c. Thus we find
a∗ ≈ x∗0.4587
b∗ ≈ x∗0.4587 − x∗0.0683
− ln(x − a)–scaled abscissa of a hazard paper with an ordinate
on the scaled as Λ = − ln 1 −
exp − exp(y) , y ∈ R. b and c follow as re–transforms of a∗ and b∗ .
Extreme value distribution of type III for the minimum (W EIBULL distribution)
x−a c
S(x) = exp − ; a ∈ R, b > 0, c > 0, x ≥ a
b
X ∗ = ln(X − a) has an extreme value distribution if type I for the minimum (log–G UMBEL
distribution) with parameters a∗ = ln b and b∗ = 1 c. Thus we find
a∗ ≈ x∗1
b∗ ≈ x∗2.7181 − x∗1
x − a −c
S(x) = ; a ∈ R, b > 0, c > 0, x ≥ a + b
b
we find
a∗ = x∗0
b∗ = x∗1 − x∗0
on the ln(x − a)–scaled abscissa of a hazard paper with an ordinate scaled as Λ = y, y ≥ 0.
b and c follow as re–transforms of a∗ and b∗ .
Power function distribution
c
x−a
S(x) = 1 − ; a ∈ R, b > 0, c > 0, a ≤ x ≤ a + b
b
Thus we find
a∗ = x∗0
b∗ ≈ x∗0.9808 − x∗0.4700
on the ln(x − a)–scaled abscissa of a hazard paper with an ordinate scaled as Λ = − ln(1 − y),
0 ≤ y ≤ 1. b and c follow as re–transforms of a∗ and b∗ .
10 Testing Hypotheses on
Life Distributions
This chapter is mainly devoted to testing hypotheses concerning the hazard rate (Sect. 10.2), but
in Sect. 10.3 we will also look for tests deciding whether a life distribution has one or the other
aging property, which have been introduced in Sect. 2.4. The approaches of this chapter are non–
parametric, the exception being Sect. 10.2.1 where we test the constancy of the hazard rate. As
the only continuous distribution with constant hazard rate is the exponential distribution, the tests
of Sect. 10.2.1 will be test for an exponential distribution.
This chapter is organized as follows:
• Sect. 10.1 present those statistical concepts which directly or indirectly give the test statis-
tics for most of the subsequent tests. These concepts are order statistics which lead to
spacings and the TTT–statistics which are a function of spacings.
• Sect. 10.2 examines the behavior of the hazard rate, i.e., its course and curvature.
• Finally, the topic of Sect. 10.3 are tests for several classes of aging.
A topic non treated here is that of comparing hazard rates, see Q IU /S HENG (2008), and of testing
the equality of two hazard rates, see C HENG (1985).
n! `−1 n−`
fX(`) (x) = f (x) F (x) 1 − F (x) , x ∈ R (10.2a)
(` − 1)! (n − `)!
n
X n i n−i
FX(`) (x) = F (x) 1 − F (x) , x∈R (10.2b)
i
i=`
FX` (x) is the CCDF of a binomial distribution with parameters n and P = F (x) and
may be expressed by an incomplete beta function ratio (= CDF of a beta distribution with
parameters ` and n − ` + 1.)
Especially, we have:
• ` = 1 — sample minimum
n−1
fX(1) (x) = n f (x) 1 − F (x) , x∈R (10.2c)
n
FX(1) (x) = 1 − 1 − F (x) , x ∈ R (10.2d)
• ` = n — sample maximum
n−1
fX(n) (x) = n f (x) F (x) , x∈R (10.2e)
n
FX(n) (x) = F (x) , x ∈ R. (10.2f)
n!
k−1
f (x) f (y) F (x) ×
(k − 1)!(` − k − 1)!(n − `)!
`−k−1 n−`
× F (y) − F (x) 1 − F (y)
fX(k) ,X(`) (x, y) = (10.3a)
for k < `, x < y and x, y ∈ R
0 otherwise
FZ(x) FZ(y)
n!
k−1
u ×
(k − 1)!(` − k − 1)!(n − `)!
FX(k) ,X(`) (x, y) = 0 u (10.3b)
× (v − u)`−k−1 (1 − v)n−` dv du
for k < `, x < y and x, y ∈ R
Especially, we have:
n!
i−1
f (x) f (y) F (x) ×
(i − 1)!(n − i − 1)!
n−i−1
× 1 − F (y)
fX(i) ,X(i+1) (x, y) = (10.3c)
for i = 1, 2, . . . , n−1, x < y and x, y ∈ R
0 otherwise
10.1 Prerequisites: Order Statistics, Spacings, TTT–statistics 229
Order statistics form a M ARKOV process, more precisely: X(i) , 1 ≤ i ≤ n is a non-
homogeneous, discrete-parameter, real-valued M ARKOV process whose initial measure is given
by (10.2d):
n
FX(1) (x) = 1 − 1 − F (x) ,
and whose transition-CDF Pr(X(i+1) ≤ x | X(i) = y) is the CDF of the minimum of n − i
independent observations of the CDF F (·) truncated at y, namely:
n−i −n+i
Pr(X(i+1) ≤ x | X(i) = y) = 1 − 1 − F (x) 1 − F (y) for x > y. (10.5)
The PDF and CDF of order statistics from most parametric distributions have no nicely looking
and closed formulas. Two exceptions are the uniform and the exponential distributions which
play a dominant role in testing hypotheses on life distributions.
n!
fX` (x) = x`−1 (1 − x)n−`
(` − 1)!(n − `)!
for ` = 1, . . . , n; 0 ≤ x ≤ 1 (10.6c)
n!
fXk ,X` (x, y) = xk−1 (y − x)`−k−1 ×
(k − 1)!(` − k − 1)!(n − `)!
× (1 − y)n−` for k < `, x < y and x, y ∈ [0, 1] (10.6d)
` (n − ` + 1)
Var(X(`) ) = ; ell = 1, 2, . . . , n (10.6h)
(n + 1)2 (n + 2)
k (n − ` + 1)
Cov(X(k) , X(`) ) = ; k < `; k, ` = 1, . . . , n (10.6i)
(n + 1)2 (n + 2)
s
k (n − ` + 1)
Cor(X(k) , X(`) ) = ; k < `; k, ` = 1, . . . , n (10.6j)
` (n − k + 1)
1 exp − x
for x ≥ 0, b > 0
f (x) = b b (10.7a)
0 otherwise
0 for x < 0
F (x) = (10.7b)
1 − exp − x
for x ≥ 0
b
n! (n − ` + 1) x
fX(`) (x) = exp − ×
b (` − 1)!(n − `)! b
h x i`−1
× 1 − exp − ; x≥0 (10.7c)
b
n! x + y (n − ` + 1) x
fX(k) ,X(`) (x, y) = exp − ×
b2 (k − 1)!(` − k − 1)!(n − `)! b
h x ik−1 h x y i`−k−1
× 1 − exp − exp − − exp − ;
b b b
k < `, x < y and x, y ≥ 0 (10.7d)
n
!
n! 1X
fXn (x1 , . . . , xn ) = n exp − xi ; 0 ≤ x1 < x2 < . . . < xn < ∞ (10.7e)
b b
i=1
`
X 1
E(X(`) ) = b ; ` = 1, 2, . . . , n (10.7f)
n−i+1
i=1
` `
!2
2
X 1 X 1
= b2
E X(`) 2
+ ; ` = 1, 2, . . . , n (10.7g)
(n−i+1) n−i+1
i=1 i=1
`
2
X 1
Var(X(`) ) = b ; ` = 1, 2, . . . , n (10.7h)
(n − i + 1)2
i=1
k
2
X 1
Cov(X(k) , X(`) ) = Var(X(k) ) = b ; k < `; k, ` = 1, . . . , n (10.7i)
(n−i+1)2
i=1
v
u k
uP 1
u i=1 (n−i+1)2
u
Cor(X(k) , X(`) ) = u
u P̀ ; k < `; k, ` = 1, . . . , n (10.7j)
t 1
(n−i+1)2
i=1
10.1 Prerequisites: Order Statistics, Spacings, TTT–statistics 231
The difference Si of two adjacent order statistics X(i−1) , X(i) is called spacing:
For a lifetime variable the spacing is nothing but the waiting time (time elapsed) between the
(i + 1)-st and the i-th failure. Let
S = (S2 , . . . , Sn ) (10.8b)
be the vector of all spacings in a sample of size n, then by means of a linear transformation of
(10.4) we find
R∞ Q
n
n!
f (x+s2 +. . .+si ) dx for si > 0, 2 ≤ i ≤ n
fS (s2 , . . . , sn ) = −∞ i=2 (10.8c)
0 otherwise.
As this expression indicates, the formulas for the PDFs of sets of arbitrary spacings are not par-
ticularly simple, although they are derived straightforwardly. P YKE (1965, p. 400), based on the
M ARKOV property of order statistics, gives the following PDF of (Si , Sj ), i 6= j,
Z∞ Z∞
n!
fSi ,Sj (u, v) = [F (x)]i−2 [F (y) − F (x + u)]j−i−2 ×
(i − 2)!(j − i − 2)!(n − j)!
−∞ x+u
n−j
× [1 − F (y + v)] f (x) f (x + u) f (y) f (y + v) dy dx;
i 6= j; u, v > 0. (10.8e)
We now look at spacings from the uniform and the exponential distributions. These spacings are
of special interest in lifetime analysis.
Spacings of the reduced uniform distribution
Let Xn = (X(1) , X(2) , . . . , X(n) ) denote the order statistics in a sample of size n from the
reduced uniform distribution (10.6a,b). Set X(0) = 0 and X(n+1) = 1. The spacings are defined
by
Si = X(i) − X (i−1) for 1 ≤ i ≤ n + 1. (10.9a)
Since S1 + S2 + . . . + Sn = 1, the random vector S = (S1 , S2 , . . . , Sn+1 ) has a singular
distribution, but when restricted to this hyperplane has the PDF
It follows from (10.9b) that the PDF of S remains unchanged under any permutation of its co-
ordinates, i.e., uniform spacings are interchangeable variates. This implies in particular, that the
PDF of any Si is equal to that of S1 = X(1) and the joint PDF of any pair (Si , Sj ), i 6= j, is the
same as that of (S1 , S2 ). So we have
Hence, the joint PDF of exponential spacings is the product of n marginal exponential densities
n−i+1 (n − i + 1) s
fSi (s) = exp − ; s > 0; i = 1, . . . , n; (10.10b)
b b
b
E(Si ) = , (10.10c)
n−i+1
b2
Var(Si ) = . (10.10d)
(n − i + 1)2
is simply called a normalized spacing. It plays a dominant role in testing hypotheses on the
hazard rate.
Let 0 = X(0) < X(1) < X(2) < . . . < X(n) denote an ordered sample of size n from a life
−
R ∞ with CDF F (·), F (0 ) = 0, survival function S(·) = 1 − F (·) and finite mean
distribution
µ = o S(x) dx. Tn , the total time spent on test by the n sample units until the failure of the
longest living unit, may be expressed in two different ways:
i.e., as the area given by n horizontal beams of length X(i) , each having height equal to 1,
see the upper display in Fig. 10/1, or
i.e., as the area given by n vertical beams having length Di = X(i) − X(i−1) and corre-
sponding height n−i+1. Such a vertical beam gives the time spent on test of those n−i+1
units having lived in the interval (X(i) − X(i−1) ), see the lower display in Fig. 10/1.
• according to (10.12a) as
i
X
Ti = X(j) + (n − i) X(i) , (10.12c)
j=1
234 10 Testing Hypotheses on Life Distributions
• according to (10.12b) as
i
X
Ti = (n − j + 1) (X(j) − X(j−1) )
j=1
i
X
= Dj . (10.12d)
j=1
Figure 10/1: Two ways of expressing the total time spent on test
By plotting Ti∗ on the ordinate against the empirical CDF Fn (x(i) ) = i/n on the abscissa and
then connecting these points by straight lines we obtain a curve within the unit square of the
(x, y)–plane, called TTT–plot. The message of the TTT–plot is easy to understand: The shortest
living 100 (i/n)% of the sample units contribute 100 (Ti∗ )% of the total time lived by all sample
units.2
TTT–statistics were first used by E PSTEIN /S OBEL (1953) to make inference about the exponen-
tial distribution. Starting with a paper by BARLOW /C AMPO (1975), researchers have studied
generalizations of the original TTT–concept that have proven to be very useful in a great number
of applications, e.g., for model identifications, as a basis for the characterization of life distribu-
tion classes, for hypothesis testing, and to determine optimal age replacement intervals.
The empirical quantities defined in (10.12a-e) have theoretical counterparts.
F −1
Z (P )
HF−1 (P ) = S(x) dx, 0 ≤ P ≤ 1, (10.13a)
0
2
The TTT–plot resembles the L ORENZ–curve which is used in economics to describe the inequality in income
distributions. Contrary to the TTT–plot the L ORENZ–curve is never situated above the 45◦ –line.
10.1 Prerequisites: Order Statistics, Spacings, TTT–statistics 235
HF−1 (P ) HF−1 (P )
φF (P ) = = , (10.13c)
HF−1 (1) µ
the scaled TTT–transform of F (·); for examples see Fig. 10/4 and 10/5.
We realize — e.g. Fig 10/2 — that the TTT–plot of a sample from a population with F (·) will
approach the graph of the scaled TTT–transform φF (P ) of F (·) as n, the sample size, increases.
This is so, because
Fn−1
Z (i/n)
i
X
Ti = (n − j + 1) (X(j) − X(j−1) ) = 1 − Fn (x) dx,
j=1 0
Fn (·) being the empirical CDF and, because by the G LIVENKO —C ANTELLI theorem and the
strong law of large numbers with probability one:
F −1
Z (P )
Ti −→ 1 − F (x) dx uniformly in [0, 1]
0
−b ln(1−P
Z )
and
bP
φF (P ) = = P, 0 ≤ P ≤ 1. (10.14b)
P
So, the TTT–plot will be a 45◦ –line running from (0, 0) to (1, 1), see Fig. 10/2. The notation
of the TTT–transform as an inverse CDF indicates that the inverse of HF−1 (P ) will be a CDF of
some variate Y with support [0, µ]. In case of (10.14a) the corresponding CDF is
Figure 10/2: TTT–plots based on simulated exponential data (b = 5; n = 10, 50, 100) and the
scaled TTT–transform of the exponential distribution
2. If F (·) is strictly increasing or, equivalently, if F −1 (·) is continuous, then HF−1 (·) is con-
tinuous.
3. If F (·) is absolutely continuous and strictly increasing the derivative of HF−1 (P ) is found
to be
dHF−1 (P ) 1 − F (x) 1
= = , (10.15)
dP f (x) h(x)
for almost all P ∈ [0, 1], where h(·) is the hazard rate. The property, that the derivative
of the TTT–transform φF (P ) = HF−1 (P )/µ is proportional to the reciprocal of the hazard
rate, is of utmost importance in finding test statistics for hypotheses on the hazard rate in
later sections.
amounts to decide whether the sample comes from an exponential distribution or not.5 For this
purpose we may revert to one of the numerous existing goodness–of–fit tests, e.g., the tests of
KOLMOGOROV–S MIRNOV, A NDERSON –DARLING , C RAM ÉR –VON M ISES or WATSON.
Here, we will only recommend informal procedures which are based on graphs to be judged
personally. These approaches have neither a test statistic nor a calculable error probability. The
accompanying MATLAB–program HAZARD 09 produces the following graphs.
2) TTT–plot
The scaled TTT–transform of an exponential distribution is φF (P ) = P, i.e., a 45◦ –line
running from (0, 0) to (1, 1) and the scaled TTT–statistic Ti∗ from an exponential sample
will deviate randomly from this 45◦ –line.
2.1) For an uncensored sample of size n we plot Ti∗ on the ordinate against i/n on the
abscissa.
2.2) For a singly censored type–I or type–II sample we plot
Ti∗∗
Ti∗ = ; i = 1, 2, . . . , k; (10.17a)
Tk∗
against i/k with k as the total number of failed items observed and
i
X
Ti∗∗ = (k − j + 1) (x(j) − x(j−1) ); i = 1, 2, . . . , k. (10.17b)
j=1
2.3) For a multiply censored or randomly censored sample we take the K APLAN /M EIER
estimator of (10.16e) and plot
Ti∗∗∗
Ti∗ = ; i = 1, 2, . . . , , k; (10.18a)
Tk∗∗∗
against Fbn (x(i) ) for the uncensored observations (x(i) , 1). k is the total number of
failed items observed and
i b
X Sn (x(i) )+ Sbn (x(i−1) )
Ti∗∗∗ = ; i = 1, . . . , k; x(0) = 0; Sbn (x(0) ) = 1. (10.18b)
2 (x(i) −x(i−1) )
j=1
We mention that the TTT–graphs resulting from (10.17a,b) and (10.18a,b) would lie
below those that would result had the sample been uncensored.
1
f (x) ∆x b exp − x/b) ∆x ∆x
= = . (10.19)
1 − F (x) exp(−x/b) b
So, if we start with a large number n of items, we divide the x–axis into intervals
(0, ∆), (∆, 2 ∆), (2 ∆, 3 ∆), . . . where ∆ is suitably chosen, and if n1 , n2 , n3 , . . . are
the numbers of items failing in these intervals, then
n1 n2 n3
, , , ...
n n − n1 n − n1 − n2
should fluctuate within reasonable limits about a constant, namely the hazard rate 1/b.
10.2 Testing Hazard Rate Properties 239
The following n = 20 observations are randomly censored. They have been generated from a W EIBULL
distribution with scale parameter b = 1 and shape parameter c = 2.5, so the hazard rate is increasing more
than linear.
x(i) 0.2727 0.3877 0.4556 0.4565 0.5271 0.5487 0.5789 0.7846 0.8142 0.9329
δi 1 1 1 1 1 1 1 0 1 1
x(i) 0.9659 1.1217 1.1267 1.2378 1.3001 1.3012 1.3181 1.3803 1.4362 1.9594
δi 1 0 1 0 1 1 1 0 1 1
Fig. 10/3 shows the probability plot on the left where the points significantly deviate from the straight
OLS–fitted line. The same is true for the TTT–plot on the right where the points lie on a concave curve
above the 45◦ –line what is typical for IHR distributions.
Figure 10/3: Graphs for judging exponentiality
We will only present a few of these tests that can be found in the suggested reading for this section.
We first mention a graphical approach that starts from (10.13c) in connection with (10.15):
d φF (P ) d 1 −1 1 1
= HF (P ) = . (10.20)
dP dP µ µ h(x)
From (10.20) we see that the graph of the scaled TTT–transform φF (P ) will be
As the graph of φF (P ) runs from (0, 0) to (1, 1) it is clear that this graph will have
• a decreasing slope and lie above the 45◦ –line for an IHR distribution,
• an increasing slope and lie below the 45◦ –line for a DHR distribution,
see Fig. 10/4 which shows φF (P ) for three W EIBULL distributions having h(x) = c xc−1 : c =
0.5 gives DHR, c = 1 gives the exponential distribution and c = 2 gives IHR. The graph of the
scaled TTT–statistic Ti∗ approaches that of φF (P ) as n → ∞ and we can reject H0 in favor of HA
(H?A ) when the Ti∗ –graph wholly lies above (below) the 45◦ –line of the exponential distribution
and is concave (convex). In case of an uncensored sample of size n the level of significance will
be α = 1/n.
Figure 10/4: Scaled TTT–transforms of three W EIBULL distributions
A numerical testing approach attached to the curvature of the TTT–transform goes back to
K LFSJ Ö (1983a). This tests needs an uncensored sample! Suppose that the φF (P )–graph is
concave (convex). Since the graph of the scaled TTT–statistic Ti∗ converges to that of φF (P ), it is
reasonable to expect the TTT–plot based on a sample from an IHR (DHR) distribution to behave
concavely (convexly), too, i.e.:
2 Ti∗ − Ti−1
∗ ∗
− Ti+1 > 0; i = 1, . . . , n − 1; T0∗ = 0, Tn∗ = 1. (10.21a)
(<)
10.2 Testing Hazard Rate Properties 241
and we expect a positive (negative) value of A1 if F (·) is IHR (DHR), but not exponential. We
immediately see that — using the normalized spacings Di of (10.11) — A1 can be written as
A1 = T1∗ + Tn−1 − 1
D1 − Dn
= . (10.21c)
Tn
K LEFSJ Ö gives the asymptotic distribution of A1 under H0 as a L APLACE distribution and re-
marks that — because the numerator (D1 − Dn ) of A1 is independent of D2 , D3 , . . . , Dn−1 —
a test based on A1 is not consistent against the whole IHR (DHR) class. For this reason K LEFSJ Ö
suggests a second test based on the idea that, when φF (P ) is concave (convex) we would not only
expect (10.21a) to hold, but we also expect, for i = 1, 2, . . . , n − 2 and k = 2, 3, . . . , n − j,
that ∗
Tj+k − Tj∗ ν
Tj∗ + > T ∗ for ν = 1, 2, . . . , k − 1 (10.22a)
k/n n (<) j+ν
or
∗
− Tj∗ < ν Tj+k ∗
− Tj∗ .
k Tj+ν (10.22b)
(>)
For F (·) to be IHR (DHR), but not exponential, we expect A2 to be positive (negative). A2 can
be written more comfortably as
n
X Dj
A2 = αj (10.22d)
Tn
j=1
with
1
(n + 1)3 j − 3 (n + 1)2 j 2 + 2 (n + 1) j 3 .
αj = (10.22e)
6
K LEFSJ Ö then considers the slightly modified test statistic
r
7560
A = A2 (10.22f)
n7
which is asymptotically No(0, 1). So, the asymptotic critical values for A are the percentiles τγ
of No(0, 1) :
H0 is rejected in favor of HA (H?A ) at level α when A is greater (smaller) than the critical value.
Exact critical values have also been calculated by K LEFSJ Ö, and an extract of his table is given in
the following Tab. 10/1.
242 10 Testing Hypotheses on Life Distributions
p
Table 10/1: Critical values of A2 7560/n7
upper tail†
n α = 0.10 α = 0.05 α = 0.01
5 2.739 3.396 4.402
10 1.912 2.412 3.270
15 1.680 2.133 2.936
20 1.573 2.002 2.777
25 1.511 1.927 2.683
30 1.470 1.878 2.622
35 1.442 1.843 2.578
40 1.421 1.817 2.546
45 1.405 1.797 2.521
50 1.392 1.782 2.501
55 1.382 1.769 2.485
60 1.373 1.758 2.472
65 1.366 1.749 2.460
70 1.361 1.742 2.450
75 1.358 1.736 2.442
∞ 1.282 1.645 2.326
Source: K LEFSJ Ö (1983a, p. 922)
† For lower tail change the sign!
?
of H0 versus HA or HA are based on the normalized spacings Di = (n − i + 1) X(i) −
Many test
X(i−1) while other tests only utilize the ranks of the normalized spacings. One of the oldest
tests based on normalized spacings is the cttot–test (cumulative total–time–on–test) of E PSTEIN
(1960). see also BARLOW (1968). This test can be used for uncensored samples as well as for
singly censored type–I and type–II sample.
Pi The test
Pi rests upon the fact that under H0 the total lifes
(= successive TTT–statistics) Ti = j=1 Dj = j=1 (n − j + 1) X(j) − X(j−1) are uniformly
distributed over [0, Tr ] where 1 ≤ r ≤ n is the number of failures in a sample of size n. The test
statistic considered is
r−1
P P j r−1
P
Di (r − j) Dj
j=1 i=1 j=1
Kr = r = r . (10.23a)
P P
Di Di
i=1 i=1
BARLOW (1968) has given the exact percentage points kr, γ (critical values) for this test statistic,
see Tab. 10/2. H0 is rejected in favor of HA (H?A ) at level α when
Even for small r we can use a normal approximation of Kr under H0 . The approximate critical
values are
r
r−1 r−1
kr, γ ≈ + τγ , (10.23c)
2 12
B ICKEL /D OKSUM (1969) extensively studied tests of H0 versus HA (H?A ) based on the ranks of
the normalized spacings. Their test statistics (see below) are partially motivated by the test of
P ROSCHAN /P YKE (1967) which is as follows. Let
1 if Di ≥ Dj for i, j = 1, 2, . . . , n
Vij = (10.24a)
0 otherwise.
Pn (k)
Pr(Vn = k | H0 ) = , (10.24e)
n!
where Pn (k) is the number of orderings of D1 , D2 , . . . , Dn with exactly k inversions of indices.
An inversion of indices i < j occurs when Di > Dj . Vn is asymptotic normal with
n (n − 1) (2 n + 5) (n − 1) n
E(Vn ) = and Var(Vn ) = . (10.24f)
4 72
So, we have r
n (n − 1) (2 n + 5) (n − 1) n
vn, γ ≈ + τγ . (10.24g)
4 72
This test is justified asfollows: Under H0 the normalized spacings D1 , D2 , . . . , Dn are iid, each
with PDF exp(−x/b) b, so that Pr(Vij = 1) = 0.5 for i, j = 1, 2, . . . , n; i 6= j. However, under
HA we have Pr(Vij = 1) > 0.5 for i, j = 1, 2, . . . , n; i < j. Thus, each Vij and consequently
244 10 Testing Hypotheses on Life Distributions
Vn , tend to be large under HA , so that rejection of H0 in favor of HA occurs for large values of
Vn . We finally mention that the asymptotic relative efficiency of the cttot–test is higher than that
of the P ROSCHAN /P YKE–test.
Let Ri be the rank of Di . Based on these ranks B ICKEL /D OKSUM (1969) suggested a great
number of test statistics, e.g.:
n
X i Ri
W0 = ,
n+1 n+1
i=1
n
X i Ri
W1 = − ln 1 − ,
n+1 n+1
i=1
n
X i Ri
W2 = ln 1 − ln 1 − ,
n+1 n+1
i=1
n
X i Ri
W3 = ln − ln 1 − ln 1 − .
n+1 n + 1)
i=1
Large (small) values of these test statistics are significant for HA (H?A ).
The tests of cttot–test and the tests of K ELFSJ Ö and P ROSCHAN /P YKE have been implemented
in the accompanying MATLAB–program HAZARD 10.
A first uncensored data set of n = r = 15 from a W EIBUILL distribution with scale parameter b = 10 and
shape parameter c = 0.7 is:
So, this sample comes from a DHR–distribution. The cttot–test gives Kr = 5.58, so that H0 is rejected in
favor of H?A (DHR), the level of significance is about 0.10. The K LEFSJ Ö–test gives A = −1.766, so that
H0 is rejected in favor of H?A with a level of significance of about 0.10. The P ROSCHAN /P YKE–test gives
Vn = 41, so that H0 is rejected in favor of H?A with a level of significance of approximately 0.12. A second
uncensored sample of n = r = 20 from a W EIBULL distribution with b = 20, c = 2 is:
This sample comes from an IHR–distribution. The cttot–test gives Kr = 12.57 and H0 ist rejected in favor
of HA (IHR) with α < 0.01. The K LEFJ Ö–test with A = 3.260 rejects H0 in favor of HA with α 0.01.
The P ROSCHAN /P YKE–test with Vn = 122 rejects H0 in favor of HA with α ≈ 0.04.
• the bathtub shape (DIHR = decreasing–increasing hazard rate) where the hazard rate ini-
tially is decreasing during the so–called ‘infant mortality’ phase, then constant during the
‘useful life’ phase, and finally increasing during the ‘wear–out’ phase, and
7
Suggested reading for the section: A ARSET (1985, 1987), B ERGMAN (1979), K UNITZ (1989).
10.2 Testing Hazard Rate Properties 245
• the inverted bathtub shape (IDHR = increasing–decreasing hazard rate) where the three
phases mentioned above are changed in order.
From the behavior of the scaled TTT–transform φF (P ) of F (·) with respect to the hazard rate as
given in (10.20) we expect for the DIHR case that
• φF (P ) is convex and lies below the 45◦ –line for P being small, i.e., in the leftmost part of
the plot, and
• φF (P ) is concave and lies above the 45◦ –line for P being large, i.e., in the rightmost part
of the plot.
In the IDHR case the order of these phases of φF (P ) is reverted. Fig. 10/5 shows the scaled TTT–
transforms of a lognormal distribution, which is an IDHR distribution, and of a power function
distribution, which is DIHR.
based on the TTT-plot with i/n on the abscissa and Ti∗ on the ordinate. This test asks for uncen-
sored samples! We introduce
∗ i
min i ≥ 1 : Ti ≥
∗
Vn = n (10.25a)
n if T ∗ < i for i = 1, 2, . . . , n − 1;
i
n
∗ i
max i ≤ n − 1 : Ti ≤
∗
Kn = n (10.25b)
0 if T ∗ > i for i = 1, 2, . . . , n − 1;
i
n
Gn = Vn + n − Kn∗ ,
∗ ∗
(10.25c)
246 10 Testing Hypotheses on Life Distributions
and reject H0 in favor of HA when G∗n is large. The motivation for this test is that when the
distribution is DIHR, then we may expect Vn∗ as well as (n − Kn∗ ) to be large. G∗n obviously takes
on integer values in [2, n + 1].
The graph of φF (P ) for an IDHR distribution (inverted bathtub shape) behaves like the reflection
of φF (P ) for a DIHR distribution, see Fig. 10/5, the line of reflection being the 45◦ –line of the
exponential distribution. Thus, we can modify (10.25a-c) to test for IDHR in the following way:
i
min i ≥ 1 : Ti∗ ≤
∗∗
Vn = n (10.26a)
n if T ∗ > i for i = 1, 2, . . . , n − 1;
i
n
i
max i ≤ n − 1 : Ti∗ ≥
∗∗
Kn = n (10.26b)
0 if T ∗ < i for i = 1, 2, . . . , n − 1;
i
n
Gn = Vn + n − Kn∗∗ .
∗∗ ∗∗
(10.26c)
i.e., it rests upon the squared distances between the empirical CDF Fn (x) and the hypothetical
F (x). A ARSET now suggests the test statistic
Z1
Rn = ∆2n (u) du (10.29a)
0
where
√ h
n T ∗ − φF (u) for i − 1 < u ≤ i , 1 ≤ i ≤ n
i
i
∆n (u) = n n (10.29b)
0 for u = 0.
Rn rests upon the squared distances between the scaled TTT–statistics Ti∗ and the TTT–transform
of F (·). Under H0 F (·) is given by the exponential distribution and the test statistic turns into
n
X 2i − 1 n
Rn = Ti∗ Ti∗ − + . (10.29c)
n 3
i=1
According to the invariance principle Rn has the same asymptotic distribution as the C RAM ÉR –
VON –M ISES statistic Wn2 with the following percentage points:
The reason for a large value of Rn is any great discrepancy between Ti∗ and φF (P ) of the expo-
nential distribution. Thus, we cannot rely on only Rn to decide for DIHR (IDHR). We have to
take into account other evidences like the TTT–plot and the statistics G∗n or G∗∗
n .
The tests of B ERGMAN and A ARSET are implemented in the accompanying MATLAB–program
HAZARD 11.
We want to know whether the following data set (n = 50) comes from a DIHR distribution or not:
0.2 0.4 2 2 2 2 2 4 6 12
14 22 24 36 36 36 36 36 43 64
72 80 90 92 94 100 110 120 126 126
134 134 134 134 144 150 158 164 164 166
168 168 168 170 170 170 170 170 172 172
Fig. 10/6 clearly indicates DIHR. The B ERGMAN–test gives G∗50 = 45. As — according to Tab. 10/3 —
Pr(G∗50 ≥ 45) = 0.13927 this test has a level of significance for rejecting exponentiality against DIHR.
The A ARSET–test gives R50 = 1.2922 and we can reject exponentiality in favor of DIHR at α 0.001.
248 10 Testing Hypotheses on Life Distributions
Figure 10/6: TTT–plot for a data set coming from a DIHR distribution
When we submit the second data set (n = 20) of Example 10/2 to the B ERGMAN–test and to the A ARSET–
test we find G∗20 = 21, which is insignificant for DIHR as well as for IDHR, and R20 = 0..8750. The latter
statistic alone would be significant (α < 0.01) for deviation from exponentiality, but the TTT–plot in
Fig. 10/7 clearly indicates IHR and neither DIHR nor IDHR.
Figure 10/7: TTT–plot of a data set coming from an IHR distribution
When there is evidence for an IDHR or a DIHR distribution we often want to known that special
lifetime where the hazard rate changes from increasing to decreasing or vice versa. This special
10.3 Testing for Aging Classes 249
lifetime is called change point.8 There exists an extensive literature on turning points of the
hazard rate. Most papers deal with the estimation of the change point, see A NTONIADIS et al.
(2000), B EBBINGTON et al. (2008), J OSHI /M C E ACHERN (1997), L AI et al. (2001), L OADER
(1991), M ÜLLER /WANG (1990a), N GUYEN et al. (1984) or PATRA /D EY (2002). The paper of
H ENDERSON (1990) tests for the existence of a change point.
is increasing (decreasing). BARLOW /C AMPO (1975) and BARLOW (1979) give the following
theorem relating IHRA (DHRA) to the TTT–transform φF (P ):
‘If F (·) is a life distribution which is IHRA (DHRA) then φF (P ) P is decreasing
(increasing) for 0 < P < 1.’
Thus, since φF (P ) P being decreasing is a necessary (but not sufficient) condition for F (·) to be
IHRA K LEFSJ Ö (1983a) proposesa statistic which investigates whether the analogous property
holds for the TTT–plot. If φF (P ) P is decreasing we expect the corresponding to hold for the
TTT–plot. This means that
Ti∗ T∗
> j for j > i, i = 1, 2, . . . , n − 1. (10.30a)
i n j n
Multiplication by (i j) n and summing over i and j gives the test statistic
n−1
X n
X
B= j Ti∗ − i Tj∗ . (10.30b)
i=1 j=i+1
H0 : ‘F (·) is exponential.’
in favor of
HA : ‘F (·) is IHRA, but not exponential.’
(in favor of
H?A : ‘F (·) is DHRA, but not exponential’.)
8
In statistics the notion ‘change point’, generally has another meaning: it is that realization of a variate where
there is a change of the distribution function, see C S ÖRG Ö /H ORV ÁTH (1968) or K RISHNAIAH /M IAO (1988).
9
Suggested reading for this section: H OLLANDER /P ROSCHAN (1984), K LEFSJ Ö (1982b, 1983a).
250 10 Testing Hypotheses on Life Distributions
n
1 X
B= bj Dj (10.30c)
Tn
j=1
where
1 3
2 j − 3 j 2 + j (1 − 3 n − 3 n2 ) + 2 n + 3 n2 + n3 .
bj = (10.30d)
6
K LEFSJ Ö (1983a) has found the exact null distribution of the slightly modified test statistic
r
∗ 210
B =B . (10.30e)
n5
The upper 0.01, 0.05, 0.10 percentiles are in Tab. 10/4. The asymptotic distribution of B ∗ un-
der H0 is No(0, 1). Example 10/4 further down shows the working of this test which has been
implemented in the accompanying MATLAB–program HAZARD 12.
upper tail†
n α = 0.10 α = 0.05 α = 0.01
5 1.703 2.257 3.227
10 1.508 2.003 2.951
15 1.441 1.909 2.815
20 1.406 1.858 2.736
25 1.385 1.827 2.684
30 1.371 1.804 2.646
35 1.360 1.788 2.618
40 1.352 1.755 2.595
45 1.346 1.765 2.577
50 1.341 1.757 2.562
55 1.337 1.750 2.550
60 1.333 1.744 2.538
65 1.330 1.739 2.528
70 1.328 1.734 2.520
75 1.325 1.730 2.512
∞ 1.282 1.645 2.326
Source: K LEFSJ Ö (1983a, p. 923)
† For lower tail change the sign!
There are other tests for IHRA (DHRA). BARLOW (1968) derives a likelihood ratio test statistic,
lower percentiles of which are intended for testing IHRA against DHRA. He also suggests to
take the cttot–test of E PSTEIN (1960) — see (10.23a-c) — for testing H0 versus HA or H?A .
BARLOW /C AMPO (1975) proposed to take the number of crossings between the the TTT–graph
and the 45◦ –line as a test statistic and to reject HA when this number is small. B ERGMAN (1977)
gives the exact and the asymptotic distribution of this test statistic under H0 together with a table
of its CCDF.
10.3 Testing for Aging Classes 251
and NWU (new worse than used) if the reversed inequality holds. The following NBU/NWU test
of H OLLANDER /P ROSCHAN (1972) is motivated by considering the parameter
Z∞ Z∞
γ = S(x) S(y) − S(x + y) dF (x) dF (y)
0 0
Z∞ Z∞
1
= − S(x + y) dF (y)
4
0 0
1
= − ∆(F ) (10.31a)
4
with
Z∞ Z∞
∆(F ) = S(x + y) dF (y) = Pr(X1 > X2 + X3 ) (10.31b)
0 0
H0 : ‘F (·) is exponential.’
in favor of
HA : ‘F (·) is NBU, but not exponential.’
(in favor of
H?A : ‘F (·) is NWU, but not exponential.’)
RR
if 1 − Fn (x + y) dFn (x) dFn (y) is too small (large). H OLLANDER /P ROSCHAN (1972)
found it more convenient to reject H0 for small (large) values of the asymptotically equivalent
statistic
2 X
∗
J= Ψ X a1 , X a2 + X a3 (10.31c)
n (n − 1) (n − 2)
where
1 for a > b
Ψ(a, b) = (10.31d)
0 for a ≤ b
and ∗ as the sum over all n (n − 1) (n − 2) triples (a1 , a2 , a3 ) of three integers such that
P
1 ≤ a1 ≤ n, a1 6= a2 , a2 6= a3 and a2 < a3 . Defining
n (n − 1) (n − 2) X
∗
Mn = J= Ψ X a1 , X a2 + X a3 (10.31e)
2
and denoting X(1) ≤ X(2) ≤ . . . ≤ X(n) as the ordered X’s and since i ≤ max(j, k) implies
Ψ X(i) , X(j) + X(k) = 0 we can rewrite Mn as
X
Mn = Ψ X(i) , X(j) + X(k) (10.31f)
i>j>k
252 10 Testing Hypotheses on Life Distributions
with Mn = 0, 1, 2, . . . , n (n − 1) (n − 2) 6. The following Tab. 10/5, based on the critical
values of Mn to be found as table 4.1 in H OLLANDER /P ROSCHAN (1972) — gives critical values
jn, γ of the test statistic J in (10.31c). H0 has to be rejected in favor of HA (H?A ) at level α = 0.05
or 0.10 if J ≤ jn, α (J ≥ jn,1−α ). The normal approximation treats
r
1 432 n
J∗ = J − (10.31g)
4 5
as No(0, 1)–distributed.
Table 10/5: Critical values jn, γ of J
An application of this test — worked out with the MATLAB–program HAZARD 12 — is found
further down in Example 10/4.
is decreasing (increasing). K LEFSJ Ö (1982b) proved the following theorem connecting this aging
property with the TTT–transform φF (P ):
1 − φF (P )
Q(P ) =
1−P
is decreasing (increasing) for 0 ≤ P < 1.’
10.3 Testing for Aging Classes 253
Based on this theorem and on the same idea as in Sect. 10.3.1 we expect that — if F (·) is DMRL
(IMRL) — the following holds:
1 − Tj∗ 1 − Ti∗
< for j > i and i = 0, 1, 2, . . . , n − 1. (10.32a)
1 − j n (>) 1 − i n
After multiplication by (n − i) (n − j) n and summation we get the test statistic
n−1
X n
X
(n − j) (1 − Ti∗ ) − (n − i) (1 − Tj∗ ) .
K= (10.32b)
i=0 j=i+1
or equivalently
Z∞ h Z∞ i
S(u) du < µ(0) S(x), S(u) du > µ(0)S(x) , ∀ x > 0.
x x
The following theorem of K LEFSJ Ö (1983a) relates these properties to the TTT–transform
φF (P ) :
H0 : ‘F (·) is exponential.’
in favor of
HA : ‘F (·) is NBUE, but not exponential.’
(in favor of
H?A : ‘F (·) is NWUE, but not exponential.’)
Their statistic rests upon the weighted difference between µ(0) and µ(x) and reads
n
P 3n + 1
− 2 j X(j)
∗ j=1 2
K = n . (10.34d)
P
n Dj
j=1
10.3 Testing for Aging Classes 255
We note that
n−1
C=V − = n K ∗. (10.34e)
2
Hence, C, V and K ∗ are equivalent test statistics which can be traced back to the test statistic
n−1
P
Tj
j=1
Kn =
Tn
of (10.23a). We have
n−1
Kn = n K ∗ + . (10.34f)
2
Significantly large (small) values of K ∗ suggest NBUE (NWUE). We do not need to furnish
critical values of K ∗ because — based on (10.34f) — we can use the percentage points kn, γ of
Tab. 10/2 in the following way: “Reject H0 in favor of HA (H?A ) if
∗ 1 n−1 ∗ 1 n−1
K ≥ kn, 1−α − K ≤ kn, α − .”
n 2n n 2n
√
Under H0 we have K ∗ n → No(0, 1 12) so that for large n we reject H0 in favor of HA (H?A )
with level α if r r !
1 1
K ∗ ≥ τ1−α K ∗ ≤ τα .
12 n 12 n
This test has been implemented in the MATLAB–program HAZARD 12. For an application see
Example 10/4.
H0 : ‘F (·) is exponential.’
in favor of
HA : ‘F (·) is HNBUE, but not exponential.’
(in favor of
H?A : ‘F (·) is HNWUE, but not exponential.’)
He recommends
n
" #
j 2 1
X
Q1 = 3 1− − X(j) Tn (10.35a)
n 3
j=1
with
j 2 1−j n
1
αj = − + 1 − + (10.35c)
3 n 2n
and
1 for α < x
ϑj = (10.35d)
0 otherwise.
q q
Q1 45 n 4 > qn,1−α Q1 45 n 4 < qn,α .
We take the data set of Example 8/8: survival times (in days) of the n = 43 patients suffering from
granulocytic leukemia. x = 0 is taken as the patient’s date of diagnosis and begin of treatment. We want
to test of this sample comes from a distribution belonging to one or the other of the aging classes discussed
in Sect. 10.3.
The TTT–plot of Fig. 10/8 clearly indicates an IHR distribution. This is also confirmed by the three IHR
tests of Sect. 10.2.2 with E PSTEINS’s cttot–test statistic Kr = 24.11, K LEFSJ Ö’s test statistic A = 1.079
and P ROSCHAN /P YKE’s test statistic Vn = 512.
For the test of the aging classes we get the following results:
These results are in accordance with the chain of implications in Fig. 2/7. As we have evidence for IHR
we expect to have evidence for IHRA, NBU, DMRL, NBUE and HNBUE.
10.3 Testing for Aging Classes 257
Appendices
Bibliography
A-A-A-A-A
A ALEN , O. O. (1978): Nonparametric inference for a family of counting processes; Ann. Statist. 6,
701–726
A ARSET, M. V. (1985): The null distribution for a test of constant versus bathtub–failure rate; Scand. Jour.
Statist. 12, 55–62
A ARSET, M. V. (1987): How to identify a bathtub hazard rate; IEEE Trans. Rel. 36, 106–108
A L –H USSAINI , E. K. / S ULTAN , K. S. (2001): Reliability and hazard based on finite mixture models; in
BALAKRISHNAN /R AO (eds.): Handbook of Statistics, Vol. 20 (Advances in Reliability), 139–183, North–
Holland, Amsterdam etc.
A NDERSON , J. / S ENTHILSELVAN , A. (1980): Smooth estimates for the hazard function; Jour. Roy.
Statist. Soc. B 42, 322–327
A NDERSON , T. W. / DARLING , D. A. (1952): Asymptotic theory of certain goodness–of–fit criteria based
on stochastic processes; Ann. Math. Statist. 23, 193–213
A NTONIADIS , A. / G IJBELS , I. / M C G IBBON , B. (2000): Non–parametric estimation of the location of
a change point in an otherwise smooth hazard rate function under random censoring; Scand. Jour. Statist.
27, 501–519
A NTONIADIS , A. / G R ÉGOIRE , G. / M C K EAGUE , I. W. (1994): Wavelet methods for curve estimation;
Jour. Amer. Statist. Ass. 89, 1340–1353
A NTONIADIS , A. / G R ÉGOIRE , G. / NASON , G. (1999): Density and hazard rate estimation for right–
censored data using wavelet methods; Jour. Roy. Statist. Soc. B 61, 63–84
A RNOLD , B. C. / Z AHEDI , H. (1988): On multivariate mean remaining life functions; Jour. Multivar.
Analysis 25, 1–9
A SADIA , M. (1999): Multivariate distributions characterized by a relationship between mean residual life
and hazard rate; Metrika 49, 121–126
B-B-B-B-B
BAGKAVOS , D. / PATIL , P. (2009): Variable bandwidth for nonparametric hazard rate estimation; Comm.
Statist. — Theory & Methods 38, 1055–1078
BAIN , L. J. / E NGELHARDT, M. (1991): Statistical Analysis of Reliability and Life–Testing Models —
Theory and Methods, 2nd ed.; Marcel Dekker, New York etc.
BALAKRISHNAN , N. / C OHEN , A. C. (1991): Order Statistics and Inference: Estimation Methods; Aca-
demic Press, San Diego
BARLOW, R. E. (1968): Likelihood ratio tests for restricted families of probability distributions; Ann.
Math. Statist. 39, 547–566
BARLOW, R. E. (1979): Geometry of the total time on test transforms; Naval Res. Log. Quart. 26 393–402
BARLOW, R. E. / BARTHOLOMEW, D. J. / B REMNER , J. M. / B RUNK , H. D. (1972): Statistical Inference
under Order Restrictions; Wiley, New York etc.
BARLOW, R. E. / C AMPO , R. (1975): Total time on test processes and applications to failure analysis; in
BARLOW /F USSEL /S INGPURWALLA (eds.): Reliability and Fault Tree Analysis, SIAM, Philadelphia
BARLOW, R. E. / M ARSHALL , A. W. (1964): Bounds for distributions with monotone hazard rate I, II;
Ann. Math. Statist. 35, 1234–1257, 1258–1274
BARLOW, R. E. / M ARSHALL , A. W. / P ROSCHAN , F. (1963): Properties of probability distributions with
monotone hazard rate; Ann. Math. Statist. 34, 375–389
BARLOW, R.E. / P ROSCHAN , F. (1965): Mathematical Theory of Reliability; Wiley, New York etc.
BARLOW, R.E. / P ROSCHAN , F. (1969): A note on tests for monotone failure rate based on incomplete
data; Ann. Math. Statist. 40, 595–600
262 Bibliography
BARLOW, R. E. / P ROSCHAN , F. (1975): Statistical Theory of Reliability and Life Testing; Holt, Rinehart
and Winston, New York etc.
BASU , A. P. (1971): Bivariate failure rate; Jour. Amer. Statist. Ass. 66, 103–104
B EBBINGTON , M. / L AI , C. D. / Z ITIKIS , R. (2008): Estimating the turning point of a bathtub–shaped
failure distribution; Jour. Statist. Plan. Inf. 138, 1157–1166
B ERGMAN , B. (1977): Crossings in the total–time–on–test plot; Scand. Jour. Statist. 4, 171–177
B ERGMAN , B. (1979): On age replacement and the total time on test concept; Scand. Jour. Statist. 6,
161–168
B ERGMAN , B. / K LEFSJ Ö , B. (1984): The total time on test concept and its use in reliability theory;
Operations Research 32, 596–606
B ÉZANDRY, D. H. / B ONNEY, G. E. / G ANNOUN , A. (2005): Consistent estimation of the density and
hazard rate functions for censored data via the wavelet method; Statistics and Probability Letters 74,
366–372
B ICKEL , P. J. (1969): Tests for monotone failure rate II; Ann. Math. Statist. 40, 1250–1260
B ICKEL , P. J. / D OKSUM , K. A. (1969): Tests for monotone failure rate based on normalized spacings;
Ann. Math. Statist. 40, 1216–1235
B IRNBAUM , Z. W. / S AUNDERS , S. C. (1968): A probabilistic interpretation of miner’s rule; SIAM —
Jour. Appl. Math. 16, 637–652
B IRNBAUM , Z. W. / S AUNDERS , S. C. (1969): A new formula of life distribution; SIAM — Jour. Appl.
Prob. 6, 319–317
B LOCK , H. W. / S AVITS , T. H. (1982): The class of MIFRA lifetimes and its relation to other classes;
Nav. Res. Log. Quart. 29, 55–61
B LOCK , H. W. / S AVITS , T. H. (1984): Multivariate nonparametric classes in reliability; in K RISH -
NAIAH /R AO (eds.): Handbook of Statistics, Vol. 7 (Quality Control and Reliability), 121–129, North–
Holland, Amsterdam etc.
B OUEZMARNI , T. / ROMBOUTS , J. V. K. (2008): Density and hazard rate estimation for censored and
α–mixing date using gamma kernels; Jour. Nonpar. Statist. 20, 627–643
B OWMAN , A. (1984): An alternative method of cross–validation for the smoothing of density estimates;
Biometrika 71, 353–366
B RINDLEY, E. C. J R . / T HOMPSON , W. A. J R (1972): Dependence and aging aspects of multivariate
survival; Jour. Amer. Statist. Ass. 67, 822–830
B RYSON , M. C. / S IDDIQUI , M. M. (1969): Some criteria for ageing; Jour. Amer. Statist. Ass. 64,
1472–1483
B URR , I. W. (1942): Cumulative frequency functions; Ann. Math. Statist. 13, 215–232
C-C-C-C-C
C ACOULOS , T. (1966): Estimation of a multivariate density; Ann. Inst. Statist. Math. 18, 178–189
C HANDRA , N. K. / ROY, D. (2001): Some results on reversed hazard rate; Probability in the Engineering
and Informational Sciences 15, 95–102
C HANDRA , N. K. / ROY, D. (2005): Classification of distribution based on reversed hazard rate; Calcutta
Statist. Ass. Bull. 56, 231–249
C HENG , K. F. (1985): Tests for the equality of failure rates; Biometrika 72, 211–215
C HENG , P. E. (1987): A nearest neighbour hazard rate estimator for randomly censored data; Comm.
Statist. — Theory & Methods 16, 613–625
C OHEN , A. C. (1991): Truncated and Censored Samples; Marcel Dekker, New York etc.
C OX , D. R. (1972): Regression models and life tables; Jour. Roy. Statist. Soc. B 34, 187–220
C OX , D. R. / OAKES , D. (1984): Analysis of Survival Data; Chapman & Hall, London
Bibliography 263
E LANDT–J OHNSON , R. C. / J OHNSON , N. L. (1980): Survival Models and Data Analysis; Wiley, New
York etc.
E PANECHNIKOV, V. A. (1969): Non–parametric estimation of a multivariate probability density; Theory
Probab. Appl. 14, 153–158
E PSTEIN , B. (1960): Testing for the validity of the assumption that the underlying distribution of life is
exponential; Technometrics 2, 83–101, 167–183
E PSTEIN , B. / S OBEL , M. (1953): Life testing; Jour. Amer. Statist. Ass. 48, 486–502
F-F-F-F-F
FAILING , K. (1984): Neue Methoden zur nicht–parametrischen Schätzung von Dichte- und Hazardfunk-
tionen bei zensierten Daten mit Anwendungen in klinischen Studien; in K ÖHLER /TAUTU /WAGNER (eds.):
Der Beitrag der Informationsverarbeitung zum Fortschritt der Medizin, 92–99, Springer, Berlin etc.
G-G-G-G-G
G ASSER , T. / M ÜLLER , H.–G. (1979): Kernel estimation for regression functions; in G ASSER /ROSENBLATT
(eds.): Smoothing Techniques for Curve Estimation, Lecture Notes in Mathematics, 23–68, Springer,
Berlin/Heidelberg
G ASSER , T. / M ÜLLER , H.–G. / M AMMITZSCH , V. (1985): Kernels for nonparametric curve esti-
mation; Jour. Roy. Statist. Soc. B 47, 238–252
G EFELLER , O. / D ETTE , H. (1991): A comparative study on hazard function estimators employ-
ing nearest neighbour distances as bandwidths; in A DLASSNIG /G RABNER /B ENGTSON /H ANSEN
(eds.): Medical Information Europe 1991, 963–987, Springer, Berlin etc.
G EFELLER , O. / D ETTE , H. (1992): Nearest neighbour kernel estimation of the hazard function
from censored data; Jour. Statist. Comp. Simul. 43, 93–101
264 Bibliography
G EFELLER , O. / M ICHELS , P. (1992): A review on smoothing methods for the estimation of the
hazard rate based on kernel functions; in D ODGE /W HITTAKER (eds.): Proceedings of the 10th
Symposium on Computational Statistics, 459–464
G EHAN , E. A. (1969): Estimating survival functions from the life table; Jour. Chronic Disease
21, 629–644
G LASER , R. E. (1980): Bathtub and related failure rate characterizations; Jour. Amer. Statist.
Ass. 75, 667–672
G REENWOOD , M. (1926): The natural duration of cancer; Report on Public Health and Medical
Subjects; Her Majesty’s Stationary Office, London, Vol. 33, 1–26
G RENANDER ; U. (1956): On the theory of mortality measurement, Part II; Skand. Aktuarietidskr.
39, 125–153
G RIFFITH , W. S. (1982): Representations of distributions having monotone or bathtub–shaped
failure rate; IEEE Trans. Rel. 31, 95–96
G ROSS , A. J. / C LARK , V. A. (1975): Survival Distributions: Reliability Applications in the
Biomedical Sciences; Wiley, New York etc.
G U , C. (1996): Penalized likelihood hazard estimation: A general procedure; Statistica Sinica 6,
861–876
G UESS , F. M. / PARK , D. H. (1988): Modeling discrete bathtub and upside–down bathtub mean
residual life functions; IEEE Trans. Rel. 37, 545–549
G UESS , F. M. / P ROSCHAN , F. (1988): Mean residual life: Theory and application; in K RISH -
NAIAH /R AO (eds.): Handbook of Statistics, Vol. 7 (Quality Control and Reliability), 215–224,
North–Holland, Amsterdam etc.
G UPTA , P. L. / G UPTA , R. C. (1997): On the multivariate normal hazard; Jour. Multivar. Analy-
sis 62, 64–73
G UPTA , P. L. / G UPTA , R. C. / T RIPATHI , R. C. (1997): On the monotonic properties of discrete
failure rates; Jour. Statist. Plan. Inf. 65, 255–268
G UPTA , R. C. (1981): Moments in terms of the mean residual life function; IEEE Trans. Rel. 30,
450–451
G UPTA , R. D. / NANDA , A. K. (2001): Some results on reversed hazard rate ordering; Comm.
Statist. — Theory & Methods 30, 2447–2457
H-H-H-H-H
H ÄRDLE , W. / K ERKYACHARIAN , G. / P ICARD , D. / T SYBAKOV, A. (1998): Wavelets, Ap-
proximations and Statistical Applications; Springer, Berlin etc.
H ALL , P. / VAN K EILEGOM , I. (2005): Testing for monotone increasing hazard rate; Ann. Statist.
33, 1109–1137
H ARRIS , R. (1970): A multivariate definition for increasing hazard rate distribution functions;
Ann. Math. Statist. 41, 713–717
H ENDERSON , R. (1990): A problem with the likelihood ratio test for a change–point hazard rate
model; Biometrika 77, 835–843
H ERD , G. R. (1960): Estimation of reliability from incomplete data; Proc. of the Sixth National
Symposium on Reliability and Quality Control, 202–217
H ESS , K. R. / S ERACHITOPOL , D. M. / B ROWN , B. W. (1999): Hazard function estimators: A
comparative simulation study; Statistics in Medicine 18, 3075–3088
Bibliography 265
H JORTH , U. (1980): A reliability distribution with increasing, decreasing, constant, and bath–
tub–shaped failure rates; Technometrics 22, 99–107
H OLLANDER , M. / PARK , D. H. / P ROSCHAN , F. (1986): A class of life distributions for aging;
Jour. Amer. Statist. Ass. 81, 91–95
H OLLANDER , M. / P ROSCHAN , F. (1972): Testing whether new is better than used; Ann. Math.
Statist. 43, 1136–1146
H OLLANDER , M. / P ROSCHAN , F. (1975): Tests for mean residual life; Biometrika 62, 585–593
H OLLANDER , M. / P ROSCHAN , F. (1984): Nonparametric concepts and methods in reliability; in
K RISHNAIAH /S EN (eds.): Handbook of Statistics, Vol. 4 (Nonparametric Methods), 613–655,
Elsevier, Amsterdam etc.
I-I-I-I-I
I ZENMAN , A. I. (1991): Recent developments in nonparametric curve estimation; Jour. Amer.
Statist. Ass. 86, 205–224
J-J-J-J-J
JARJOURA , D. (1988): Smoothing hazard rates with cubic splines; Comm. Statist. — Simul. &
Comp. 17, 377–392
J OHNSON , L. G. (1964): The Statistical Treatment of Fatigue Experiments; Amsterdam
J OHNSON , N. L. / KOTZ , S. (1975): A vector multivariate hazard rate; Jour. Multivar. Analysis
5, 53–66, 498
J OHNSON , N. L. / KOTZ , S. / BALAKRISHNAN , N. (1994): Continuous Univariate Distribu-
tions, Vol. I, 2nd ed.; Wiley, New York etc.
J OHNSON , N. L. / KOTZ , S. / BALAKRISHNAN , N. (1995): Continuous Univariate Distribu-
tions, Vol. II, 2nd ed.; Wiley, New York etc.
J OHNSON , N. L. / KOTZ , S. / K EMP, A. W. (1992): Univariate Discrete Distributions; Wiley,
New York etc.
J OSHI , S. N. / M C E ACHERN , S. N. (1997): Isotonic maximum likelihood estimation for the
change point of a hazard rate; Sankhya A 59, 392–407
K-K-K-K-K
K ALBFLEISCH , J. D. / P RENTICE , R. L. (1980): The Statistical Analysis of Failure Time Data;
Wiley, New York etc.
K APLAN , E. L. / M EIER , P. (1958): Non–parametric estimation from incomplete data; Jour.
Amer. Statist. Ass. 53, 457–481
K ARUNAMUNI , R. J. / A LBERTS , T. (2005): On boundary correction in kernel density estima-
tion; www.cims.nyu.edu alberts/pub/SM2005.pdf
K EMP, A. W. (2004): Classes of discrete lifetime distributions; Comm. Statist. — Theory &
Methods 33, 3069–3093
K IEFER , J. / W OLFOWITZ , J. (1956): Consistency of the maximum likelihood estimator in the
presence of infinitely many identical parameters; Ann. Math. Statist. 27, 897–906
K IMBALL , A. W. (1960): Estimation of mortality intensities in animal experiments; Biometrics
16, 505–521
K LEFSJ Ö , B. (1982a): The HNBUE and HNWUE classes of life distributions; Nav. Res. Log.
Quart. 2, 331–344
266 Bibliography
K LEFSJ Ö , B. (1982b): On ageing properties and total time on test transforms; Scand. Jour.
Statist. 9, 37–41
K LEFSJ Ö , B. (1983a): Some tests against ageing based on the total time on test transform; Comm.
Statist. — Theory & Methods 12, 907–927
K LEFSJ Ö , B. (1983b): Testing exponentiality against HNBUE; Scand. Jour. Statist. 10, 65–75
K LEIN , J. P. / M OESCHBERGER , M. L. (1997): Survival Analysis; Springer, New York etc.
KOTZ , S. / BALAKRISHNAN , N. / J OHNSON , N. L. (2000): Continuous Multivariate Distribu-
tions, Vol. I; Wiley, New York etc.
K RISHNAIAH , P. R. / M IAO , B. Q. (1988): Review about estimation of change points; in
K RISHNAIAH /R AO (eds.): Handbook of Statistics, Vol. 7 (Quality Control and Reliability), 375–
402, North–Holland, Amsterdam etc.
K UNDU , C. / NANDA , A. K. / H U , T. (2009): A note on the reversed hazard rate of order
statistics and record values; Jour. Statist. Plan. Inf. 139, 1257–1265
K UNITZ , H. (1989): A new class of bathtub–shaped hazard rates and its application in a compar-
ison of two test–statistics; IEEE Trans. Rel. 38, 351–354
L-L-L-L-L
L AGAKOS , S. W. / BARRAJ , L. M. / D E G RUTTOLA , V. (1988): Nonparametric analysis of
truncated survival data with application to AIDS; Biometrika 75, 515–523
L AI , C. D. (2013): Issues concerning constructions of discrete lifetime models; Quality Technol-
ogy and Quality Management 10, 251—262
L AI , C. D. / X IE , M. / M URTHY, D.N. P. (2001): Bathtub shaped failure rate life distributions; in
BALAKRISHNAN /R AO (eds.): Handbook of Statistics, Vol. 20, (Advances in Reliability), 69–104,
Elsevier, Amsterdam
L ANGBERG , N. A. / L EON , R. V. / LYNCH , J. / P ROSCHAN , F. (1980): Extreme points of the
class of discrete decreasing failure rate life distributions; Math. Oper. Res. 5, 35–42
L ANGBERG , N. A. / L EON , R. V. / LYNCH , J. / P ROSCHAN , F. (1982): Extreme points of
the class of discrete decreasing failure rate average life distributions; TIMS — Studies in the
Management Sciences 19, 297–304
L AWLESS , J. F. (1982): Statistical Models and Methods for Lifetime Data; Wiley, New York etc.
L EEMIS , L. M. (1986): Lifetime distribution identities; IEEE Trans. Rel. 35, 170–174
L EEMIS , L. M. (1995): Reliability: Probabilistic Models and Statistical Methods; Prentice–Hall,
New Jersey
L I , L. (2002): Hazard rate estimation for censored data by wavelet methods; Comm. Statist. —
Theory & Methods 31, 943–960
L IU , Y. C. / VAN RYZIN , J. (1985): A histogram estimator of the hazard rate with censored data;
Ann. Statist. 13, 592—605
L O , S. H. / M ACK , Y. P. / WANG , J.–L. (1989): Density and hazard rate estimation for censored
data via strong representation of Kaplan/Meier estimate; Prob. Theo. Rel. Fields 80, 461–473
L OADER , C. R. (1991): Inference for a hazard rate change point; Biometrika 78, 749–757
L ONDON , D. (1988): Survival Models and Their Estimation, 2nd ed.; ACTEX Publications, Win-
stedt & Avon, Conn.
M-M-M-M-M
M A , C. (2000): A note on the multivariate normal hazard; Jour. Multivar. Analysis 73, 282–283
Bibliography 267
NANDA , A. K. / G UPTA , R. D. (2001): Some properties of reversed hazard rate function; Statist.
Methods 3, 108–124
NANDA , A. K. / G UPTA , R. D. (2004): Some properties of reversed hazard rate function —
Corrections; Statist. Methods 6, 90–91
NAVARRO , J. / RUIZ , J. M. (2004): A characterization of the multivariate normal distribution by
using the hazard gradient; Ann. Inst. Statist. Math. 56, 361–367
N ELSON , W. (1969): Hazard plotting for incomplete data; Journal of Quality Technology 1,
27–52
N ELSON , W. (1970): Hazard plotting methods for analysis of life data with different failure
modes; Journal of Quality Technology 2, 126–149
N ELSON , W. (1972): Theory and applications of hazard plotting for censored failure data; Tech-
nometrics 14, 945–966
N ELSON , W. (1982): Applied Life Data Analysis; Wiley, New York etc.
N ELSON , W. (1984): Accelerated Testing — Statistical Models, Plans, and Data Analysis; Wiley,
New York etc.
N GUYEN , H. T. / ROGERS , G. S. / WALKER , E. A. (1984): Estimation in change–point hazard
rate models; Biometrika 71, 299–304
N IELSEN , J. P. (2003): Variable bandwidth kernel hazard estimators; Jour. Nonpar. Statist. 15,
355–376
O-O-O-O-O
OAKES , D. / DASU , T. (1980): A note on residual life; Biometrika 71, 409–410
O’S ULLIVAN , F. (1988a): Nonparametric estimation of relative risk using splines and cross–
validation; SIAM — Jour. Sci. Statist. Comp. 9, 531–542
O’S ULLIVAN , F. (1988b): Fast computation of fully automated log–density and log–hazard esti-
mators; SIAM — Jour. Sci. Statist. Comp. 9, 363–379
P-P-P-P-P
PADGETT, W. J. (1988): Nonparametric estimation of density and hazard rate function when sam-
ples are censored; in K RISHNAIAH /R AO (eds.): Handbook of Statistics, Vol. 7 (Quality Control
and Reliability), 313–333, North–Holland, Amsterdam etc.
PADGETT, W. J. / S PURRIER , J. D. (1985): On discrete failure models; IEEE Trans. Rel. 34,
253–256
PADGETT, W. J. / W EI , L. J. (1980): Maximum likelihood estimation of a distribution with
increasing failure rate based on censored observations; Biometrika 67, 470–474
PARZEN , E. (1962): On estimation of a probability density function and mode; Ann. Math.
Statist. 33, 1065–1076
PATEL , J. (1973): A catalogue of failure distributions; Comm. Statist. 1, 281–284
PATIL , P. (1997): Nonparametric hazard rate estimation by orthogonal wavelet methods; Jour.
Statist. Plan. Inf. 60, 153–168
PATRA , K. / D EY, D. K. (2002): A general class of change point and change curve models for
lifetime data; Ann. Inst. Statist. Math. 54, 517–530
P RAKASA R AO , B. L. S. (1970): Estimation for distributions with monotone failure rate; Ann.
Math. Statist. 41, 507–529
Bibliography 269
S CHIFFMAN , D. A. (1986): The score statistic in constancy testing for a discrete hazard rate;
IEEE Trans. Rel. 35, 590–594
S EAL , H. L. (1954): The estimation of mortality and other decrement probabilities; Skand. Ak-
tuarietidskr. 37, 137–162
S HAKED , M. / S HANTHIKUMAR , J. G. (1987): Multivariate hazard rates and stochastic ordering,
Adv. Appl. Probab. 19, 123–137
S HAKED , M. / S HANTHIKUMAR , J. G. (1989): Multivariate conditional hazard rate and the
MIFRA and MIFR properties, Jour. Appl. Probab. 25, 150–168
S HAKED , M. / S HANTHIKUMAR , J. G. / VALDEZ –T ORRES , J. B. (1995): Discrete hazard rate
functions; Comput. Oper. Res. 22, 391–402
S HYROCK , H. S. / S IEGEL , J. S. (1976): The Methods and Materials of Demography; Academic
Press, San Diego
S ILVA , R. B. / BARRETO –S OUZA , W. / C ORSEIRA , G. M. (2010): A new distribuiton with
decreasing, increasing and upside–down bathtub failure rate; Comp. Statist. & Data Analysis 54,
935-944
S INGPURWALLA , N. D. (2006): The hazard potential: Introduction and overview; Jour. Amer.
Statist. Ass. 101, 1705–1717
S INGPURWALLA , N. D. / W ONG , M. Y. (1983): Estimation of the failure rate — A survey of
nonparametric methods (Part I – Non–Bayesian methods); Comm. Statist. — Theory & Methods
12, 559–588
S MITH , P. J. (2002): Analysis of Failure and Survival Data; Chapman & Hall / CRC Press, Boca
Raton
S PIEGELMAN , M. (1968): Introduction to Demography; rev. ed., Harvard University Press,
Cambridge/Mass.
S TACY, E. W. (1962): A generalization of the gamma distribution; Ann. Math. Statist. 33,
1187–1192
S TATISTISCHES B UNDESAMT (ed.) 2004): Perioden–Sterbetafeln für Deutschland (1871/1881
bis 2001/2003); Wiesbaden
S TEIN , W. E. / DATTERO , R. (1984): A new discrete Weibull distribution; IEEE Trans. Rel. 33,
196–197
S WARTZ , G. B. (1973): The mean residual life function; IEEE Trans. Rel. 22, 108–109
S WEET, A. L. (1990): On the hazard rate of the lognormal distribution; IEEE Trans. Rel. 39,
325–328
T-T-T-T-T
TANNER , M. A. (1983): A note on the variabel kernel estimator of the hazard function from
randomly censored data; Ann. Statist. 11, 994–998
TANNER , M. A. (1984): Data–based nonparametric hazard estimation (Algorithm AS 202); Jour.
Roy. Statist. Soc. (Applied Statistics) C 33, 248–258
TANNER , M. A. / W ONG , W. H. (1983): The estimation of the hazard rate function from ran-
domly censored data by the kernel method; Ann. Statist. 11, 989–993
Bibliography 271
First ZIP–file
Distributions.zip, which should be extracted into a new directory — perhaps named ‘Distributions’ —
contains the programs creating plots of the density function (or the probability mass function), the survival
function, the hazard rate and the mean residual life function of 62 continuous and 32 discrete distributions.
The programs are menu–driven. To invoke continuous (discrete) distributions type ContDist (DiscDist) into
the Command Window after you have switched to the directory mentioned above. After having chosen
a distribution you will see a picture with the formula of the PDF (PMF) together with the domain of
its parameters. After your parameter–input has been checked you will see a plot of the four functions
mentioned above. You can repeat with another set of parameter values for the same distribution or you can
go to another distribution.
Second ZIP–file
Inference.zip, which should be extracted into a new directory — perhaps named ‘Inference’ — contains
12 programs HAZARD xx intended to do estimation and testing as described in Part II of this monograph.
Here is information on the Hazard–programs.
HAZARD 01 — This program computes the pointwise hazard rate by maximum–likelihood, the survival
function according to K APLAN /M EIER and the cumulative hazard rate according to N ELSON /A ALEN, all
functions with 95%–confidence limits. The relevant formulas are in Chapter 5 of the monograph.
Input: A sample of non–grouped data stored in the Workspace as a (n × 2)–matrix named y. The first
column is for the observations in arbitrary order, the second column is for the corresponding censoring
indicator: 1 for an uncensored observation, 0 for an censored observation.
Output: A table with the numerical results and a figure showing the three estimated functions with their
95%–confidence limits.
HAZARD 02 — This program estimates a life table and all its functions according to the formulas in
Chapter 6 of the monograph.
Input: A (3 × k)–matrix named y has to be stored in the Workspace. The first column is for lower class
limits in ascending order, the second column for the number of censored lifetimes in corresponding class,
the third column for the uncensored lifetimes in the corresponding class. The program asks you for the
sample size n and the number k of classes.
Output: A life table with 12 columns and k rows and a figure displaying the histogram, the survival function
and the hazard rate.
HAZARD 03 — Maximum likelihood estimation of an increasing hazard rate for a continuous distribution
according to Chapter 7.
Input: A (n×2)–matrix named y has to be stored in the Workspace. The first column is for the observations
in ascending order, the second column for the corresponding censoring indicator: 1 for an uncensored
observation, 0 for a censored observation.
Output: A table showing the uncensored observations and the hazard rate which is constant between any
two uncensored observation, a figure displaying the graphs of the estimated hazard rate, the density function
and the survival function.
HAZARD 04 — Maximum likelihood estimation of a decreasing hazard rate for a continuous distribution
according to Chapter 7.
Input: same as in HAZARD 03
Output: same as in HAZARD 03
284 Included MATLAB-Programs
HAZARD 05 — Maximum likelihood estimation of the hazard rate for a discrete distribution with realiza-
tions x = 1, 2, 3, . . . according to Chapter 7.
Input: You are asked whether the hazard has to be increasing or decreasing. A vector y to be stored in the
Workspace with the counts for each realization.
Output: A table showing the estimated hazard rate, the probability mass function and the survival function
and a figure display the graph of the three functions.
HAZARD 06 — User–supplied fixed–bandwidth kernel estimation of the hazard rate with one out of four
kernels and corresponding boundary kernel
Input: A (n × 2)–matrix named y to be stored in the Workspace. The first column is for the — not
necessarily ordered — observations, the second column for the corresponding censoring indicator: 1 for
an uncensored observation, 0 for a censored observation. The program asks for the number of gridpoints
in the plot of the smoothed hazard rate, for the kernel to be used (uniform, E PANECHNIKOV, biweight or
triweight) and for a bandwidth. (The first bandwidth is set automatically.)
Output: List of all bandwidths chosen (maximum number: 20), plot of the pointwise cumulative hazard
rate and plot of each smoothed hazard rate with 95%–confidence limits.
HAZARD 07 — Local kernel estimation of the hazard rate with one out of four kernels and corresponding
boundary kernel
Input: A (n × 2)–matrix named y to be stored in the Workspace. The first column is for the observations
in ascending order, the second column for the corresponding censoring indicator: 1 for an uncensored
observation, 0 for a censored observation. There have to be no ties neither among the uncensored nor
among the censored observations, but ties between uncensored and censored observations are allowed. In
this case the uncensored observation precedes the censored observation The program asks for the number
of gridpoints in the plot of the smoothed hazard rate, for the kernel to be used (uniform, E PANECHNIKOV,
biweight or triweight) and for a parameter specifying the k–nearest neighbor.
Output: Plot of the pointwise cumulative hazard rate and plot of the smoothed hazard rate with 95%–
confidence limits.
HAZARD 08 — Variable kernel estimation of the hazard rate with one out of four kernels and correspond-
ing boundary kernel
Input: A (n × 2)–matrix named y to be stored in the Workspace. The first column is for the observations in
ascending order, the second column for the corresponding censoring indicator: 1 for an uncensored obser-
vation, 0 for a censored observation. There have to be no ties neither among the uncensored nor among the
censored observations, but ties between uncensored and censored observations are allowed. In this case the
uncensored observation precedes the censored observation The program asks for the number of gridpoints
in the plot of the smoothed hazard rate, for the kernel to be used (uniform, E PANECHNIKOV, biweight or
triweight) and for a parameter specifying the neighborhood.
Output: Plot of the pointwise cumulative hazard rate and plot of the smoothed hazard rate with 95%–
confidence limits.
HAZARD 10 — This program tests for IHR or DHR using the procedures of K LEFSJ Ö, E PSTEIN and
P ROSCHAN /P YKE depending on whether the sample is singly censored or uncensored.
Input: A column vector y to be stored in the Workspace with ascendingly ordered observations. The pro-
gram asks to enter r the number of observations and n the sample size. For r = n we have an uncensored
sample, for r < n a singly censored sample.
Output: TTT–plot and the test statistics with critical values.
Included MATLAB-Programs 285
HAZARD 11 — This program tests for bathtub–shape or inverted bathtub–shape of the hazard rate using
the procedures of B ERGMAN and A ARSET.
Input: A column vector y to be stored in the Workspace with n ascendingly ordered and uncensored obser-
vations. The program asks whether you want to test for bathtub-shape or for inverted bathtub–shape.
Output: TTT–plot and test statistics of B ERGMAN and A ARSET together with critical values.
HAZARD 12 — This program tests for aging classes other than IHR and DHR
Input: A column vector y to be stored in the Workspace with n ascendingly ordered and uncensored obser-
vations. The program asks you what aging class you want to test for.
Output: TTT–plot and test statistic for the chosen class together with critical values.