Reliability and Survival Analysis

Md. Rezaul Karim · M.
Ataharul Islam
Reliability
and Survival
Analysis
Reliability and Survival Analysis
Md. Rezaul Karim M. Ataharul Islam
•
Reliability and Survival

Analysis
123
Md. Rezaul Karim M. Ataharul Islam
Department of Statistics Institute of Statistical Research and Training
University of Rajshahi University of Dhaka
Rajshahi, Bangladesh Dhaka, Bangladesh
ISBN 978-981-13-9775-2 ISBN 978-981-13-9776-9 (eBook)

https://doi.org/10.1007/978-981-13-9776-9
© Springer Nature Singapore Pte Ltd. 2019
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Dedicated to
My elder sister Sayeda Begom, wife Tahmina
Karim Bhuiyan, and daughters Nafisa
Tarannum and Raisa Tabassum
Md. Rezaul Karim
My wife Tahmina Khatun, daughters Jayati

Atahar and Amiya Atahar, and beloved
Shainur Ahsan and Adhip Rahman
M. Ataharul Islam
Preface
Both reliability and survival analyses are very important techniques for analyzing
lifetime and other time-to-event data being used in various disciplines since a long
time. The survival analysis constitutes the core methodology of biostatistical sci-
ence that stemmed from living organisms including human, animal, patient, plant,
etc. A parallel development has been observed in engineering for the survival of
products or machines, in general nonliving things. The history of survival analysis
has been quite old, initially dealt with biometrical problems, but later on, converged
to more generalized developments under biostatistical science. The parallel devel-
opment in engineering, known as reliability, can be traced back, in a more formal
sense, since World War II. Although initially the developments in reliability
appeared very different from that of the survival analysis, over time there is a
growing feeling that both the fields have a large area of overlapping interests, in
terms of techniques, that can be studied by the users and researchers of both
reliability and survival analyses without any difficulty. This will benefit large
groups of researchers and users of the reliability and survival analysis techniques.
This book is aimed to address the areas of common interests with some examples.
As the statistical modeling of lifetime and various other time to events are used
extensively in many fields such as medical statistics, epidemiology, community
health, environmental studies, engineering, social sciences, actuarial science, and
economics, this book provides a general background applicable to such various
fields.
This book includes 12 chapters covering a wide range of topics. Chapter 1
introduces the concepts and definitions that are being used in both reliability and
survival analyses. Chapter 2 discusses the important functions along with their
relationships keeping in mind the needs of users of both reliability and survival
analyses. Emphasis is given to the terms generally used in both reliability and
survival analyses as well as some of the terms used under different names such as
reliability function or survival function, hazard function or failure rate function.
Chapter 3 includes some probability distributions such as exponential, Weibull,
extreme value, and normal and lognormal distributions. The estimation of param-
eters and some important properties for uncensored data are discussed and
vii
viii Preface
illustrated in this chapter. In both reliability and survival analyses, we need to

address the problem of incomplete data stemming from censoring and truncation.
Chapter 4 introduces the different types of censoring and truncation with the
construction of likelihood functions for analyzing the censored and truncated data.
Chapter 5 discusses the nonparametric methods which play an important role in
estimating the reliability or survival function as well as in testing different
hypotheses concerning empirical distribution function, product-limit estimator of
reliability or survival function, warranty claims rate, etc. Chapter 6 introduces some
widely used lifetime distributions for censored and left truncated data. This chapter
provides some inferential procedures for censored and left truncated data as well.
A very important area for analyzing reliability or survival data is fitting regression
models for identifying factors associated with survival time, probability, hazard,
risk, or survival of units being studied. Chapter 7 presents logistic regression,
proportional hazards, accelerated failure time, and parametric regression models
based on underlying probability distributions. In recent times, the task of modeling
of lifetime data for analyzing a wide range of non-normal data, normality
assumption may be considered as only a special case, on the basis of outcomes
belonging to the family of exponential distributions can be unified under the gen-
eralized linear models. The use of generalized linear models will make the esti-
mation and tests of fitting linear models for analyzing lifetime data very simple and
extensive. Chapter 8 presents the estimation and tests for analyzing various types of
lifetime data under different link functions. System reliability analysis plays an
important role in the field of reliability engineering. The system reliability depends
on the types, quantities, and reliability of its components. Chapter 9 introduces the
probability distribution and reliability function of the lifetime of a system as a
function of the probability distribution or reliability function of individual com-
ponents of the system. This chapter includes reliability block diagrams, series
system reliability, reliability of products with two or more causes of failure, parallel
system reliability, combined series and parallel systems reliability, and
k-out-of-n-system reliability. Chapter 10 includes the important issues of quality
variation in manufacturing and maintenance decision. Chapter 11 provides a
comprehensive discussion on stochastic models for analyzing reliability and sur-
vival data where time-to-event data are considered in terms of transitions from one
state to another state over time. This chapter includes Markov models with
covariate dependence as well. Chapter 12 introduces the possible extension of the
use of generalized linear models for analyzing big data. Some techniques are dis-
cussed based on divide and recombine framework. Finally, an appendix provides
the programming codes in R that are applied to analyze data in different examples
of the book.
This book is designed to provide important sources of guidelines, based on a
unified approach, for both researchers and graduate students in the fields of relia-
bility and survival analyses. The book is intended to provide a thorough under-
standing of reliability and survival analyses generally required for analyzing
lifetime or time-to-event data in the fields of medical statistics, epidemiology,
community health, environmental studies, engineering, social sciences, actuarial
Preface ix
science, economics, etc. The objective of the book is to present and unify funda-
mental and basic statistical models and methods applied for both reliability and
survival data analyses in one place from applications and theoretical points of view.
We have made attempts to keep the book simple for undergraduate and graduate
students of the courses applied statistics, reliability engineering, survival analysis,
biostatistics, and biomedical sciences as well as the book will be of interest to
researchers (engineers, doctors, and statisticians) and practitioners (engineers,
applied statisticians, and managers) involved with reliability and survival analyses.
We are grateful to our colleagues and students in the Department of Statistics
of the University of Rajshahi, ISRT of the University of Dhaka, Universiti Sains
Malaysia, The University of Electro-Communications, Luleå University of
Technology, King Saud University, and East West University. The idea of writing a
book on reliability and survival analyses has stemmed from teaching and super-
vising research students on reliability and survival analyses in different universities
for many years.
We want to thank D. N. Prabhakar Murthy, Kazuyuki Suzuki, Alireza Ahmadi,
N. Balakrishnan, D. Mitra, Shahariar Huda, and Rafiqul Islam Chowdhury for their
continued support to our work. We extend our deepest gratitude to Tahmina Sultana
Bhuiyan, Nafisa Tarannum, Raisa Tabassum, Tahmina Khatun, Jayati Atahar,
Amiya Atahar, Shainur Ahsan, and Adhip Rahman for their unconditional support
during the preparation of this manuscript. Further, we acknowledge gratefully
M. A. Basher Mian, M. Asaduzzaman Shah, M. Ayub Ali, M. Monsur Rahman,
M. Mesbahul Alam, Sabba Ruhi, Syed Shahadat Hossain, Azmeri Khan, Jahida
Gulshan, Israt Rayhan, Shafiqur Rahman, Mahfuza Begum, and Rosihan M. Ali for
their continued support.
We are grateful to the staff at Springer for their support. We like to thank our
Book Series Executive Editor William Achauer, Business & Economics, Springer
Singapore. We especially want to thank Sagarika Ghosh for her early interest and
encouragement and Nupoor Singh, Jennifer Sweety Johnson, and Jayanthi
Narayanaswamy who provided helpful guidance in the preparation of the book and
much patience and understanding during several unavoidable delays in completion
of the book.
Rajshahi, Bangladesh Md. Rezaul Karim

Dhaka, Bangladesh M. Ataharul Islam
Contents
1 Reliability and Survival Analyses: Concepts and Definitions . . . . . . 1

1.1 Introduction to Reliability and Survival Analyses . . . . . . . . . . . 1
1.2 Definitions of Some Important Terms . . . . . . . . . . . . . . . . . . . 3
1.3 Product, Product Performance, and Reliability . . . . . . . . . . . . . 5
1.4 Why Reliability and Survival Analyses? . . . . . . . . . . . . . . . . . . 6
1.5 Sources of Survival and Reliability Data . . . . . . . . . . . . . . . . . 7
1.6 Special Features of Survival and Reliability Data . . . . . . . . . . . 8
1.7 Objectives of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8 Outline of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Some Important Functions and Their Relationships . . . . . . . . . . . . 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Measures of Center . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Measures of Relationship . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Cumulative Distribution and Probability Density Functions . . . . 19
2.4 Reliability or Survival Function . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Conditional Reliability Function . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Failure Rate Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Mean Life Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.8 Residual Lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.9 Fractiles of Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.10 Relationship Among Functions . . . . . . . . . . . . . . . . . . . . . . . . 29
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Probability Distribution of Lifetimes: Uncensored . . . . . . . . . . . . . 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
xi
xii Contents
3.2.1 Mean Time to Failure and Variance . . . . . . . . . . . . . . 36

3.2.2 Median Time to Failure . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.3 Memoryless Property . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.4 Areas of Application . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.5 Estimation of Parameter . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.6 Test and Construction of Confidence Intervals . . . . . . . 39
3.3 Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Areas of Application . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.2 Estimation of Parameters . . . . . . . . . . . . . . . . . . . . . . 43
3.3.3 Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Extreme Value Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.1 Probability Density Function . . . . . . . . . . . . . . . . . . . . 47
3.4.2 Cumulative Distribution and Reliability/Survival
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.3 Hazard Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.5 Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5 Normal and Lognormal Distributions . . . . . . . . . . . . . . . . . . . . 51
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 Censoring and Truncation Mechanisms . . . . . . . . . . . . . . . . . . . . . 57
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Types of Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.1 Right Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.2 Left Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.3 Interval Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Construction of Likelihood Function . . . . . . . . . . . . . . . . . . . . 63
4.4.1 Type II Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4.2 Type I Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.3 Random Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.4 Any Type of Censoring and Truncation . . . . . . . . . . . . 67
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5 Nonparametric Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Empirical Cumulative Distribution Function . . . . . . . . . . . . . . . 72
5.2.1 Complete Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.2 Right-Censored Data . . . . . . . . . . . . . . . . . . . . . . . . . 73
Contents xiii
5.3 Product-Limit Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 74

5.3.1 Variances of ^SðtÞ and FðtÞ ^ .................. ... 75
^
5.3.2 Confidence Interval of FðtÞ . . . . . . . . . . . . . . . . . . ... 76
5.4 Age-Based Failure Rate Estimation . . . . . . . . . . . . . . . . . . ... 79
5.5 Hypothesis Tests for Comparison of Survival/Reliability
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 83
5.5.1 Comparison of Survival Functions for Two Groups ... 84
5.5.2 Comparison of Survival Functions for More Than
Two Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 90
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 92
6 Probability Distribution of Lifetimes: Censored and Left
Truncated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2.1 Estimation of Parameter: Type II Censoring . . . . . . . . . 96
6.2.2 Estimation of Parameter: Type I Censoring . . . . . . . . . 97
6.3 Extreme Value and Weibull Distributions . . . . . . . . . . . . . . . . . 103
6.3.1 Estimation of Parameters: Type I Censoring . . . . . . . . . 106
6.4 Normal and Lognormal Distributions . . . . . . . . . . . . . . . . . . . . 107
6.5 Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.2 Logistic Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.3 Proportional Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.4 Accelerated Failure Time Model . . . . . . . . . . . . . . . . . . . . . . . 127
7.5 Parametric Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.5.1 Exponential Regression Model . . . . . . . . . . . . . . . . . . 131
7.5.2 Weibull Regression Model . . . . . . . . . . . . . . . . . . . . . 134
7.5.3 Lognormal Regression Model . . . . . . . . . . . . . . . . . . . 136
7.5.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.2 Exponential Family and GLM . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.3 Expected Value and Variance . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.4 Components of a GLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.4.1 Components of GLM for Binary Outcome Data . . . . . . 146
8.4.2 Components of GLM for Exponential Data . . . . . . . . . 147
xiv Contents
8.5 Estimating Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

8.6 Deviance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.7 Exponential Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.8 Gamma Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.9 Bernoulli Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.10 Poisson Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.11 Weibull Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
9 Basic Concepts of System Reliability . . . . . . . . . . . . . . . . . . . . . . . . 161
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.2 Reliability Block Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
9.3 Series System Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
9.4 Parallel System Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.5 Combined Series and Parallel Systems Reliability . . . . . . . . . . . 173
9.6 K-Out-of-N System Reliability . . . . . . . . . . . . . . . . . . . . . . . . . 174
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
10 Quality Variation in Manufacturing and Maintenance Decision . . . 179
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
10.2 Reliability from Product Life Cycle Perspective . . . . . . . . . . . . 180
10.3 Effect of Quality Variations in Manufacturing . . . . . . . . . . . . . . 182
10.3.1 Assembly Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
10.3.2 Component Nonconformance . . . . . . . . . . . . . . . . . . . 183
10.3.3 Combined Effect of Assembly Errors and Component
Nonconformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
10.4 Month of Production—Month in Service (MOP-MIS)
Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
10.4.1 Notations for MOP-MIS Diagram . . . . . . . . . . . . . . . . 184
10.4.2 MIS-Based Warranty Claims Rate for Each MOP . . . . 185
10.4.3 An Illustration of MOP-MIS Diagram . . . . . . . . . . . . . 186
10.5 Maintenance of an Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
10.5.1 Optimum Preventive Replacement Time . . . . . . . . . . . 192
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
11 Stochastic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
11.2 Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
11.3 Higher-Order Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . 203
11.4 First-Order Markov Model with Covariate Dependence . . . . . . . 205
11.5 Second-Order Markov Model with Covariate Dependence . . . . . 208
11.6 Markov Model for Polytomous Outcome Data . . . . . . . . . . . . . 210
11.7 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Contents xv
12 Analysis of Big Data Using GLM . . . . . . . . . . . . . . . . . . . . . . . . . . 219

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
12.2 Sufficiency and Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . 221
12.3 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
12.4 Divide and Recombine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
12.4.1 Identity Link Function . . . . . . . . . . . . . . . . . . . . . . . . 224
12.4.2 Logit Link Function . . . . . . . . . . . . . . . . . . . . . . . . . . 226
12.4.3 D&R Method for Count Data . . . . . . . . . . . . . . . . . . . 231
12.4.4 Multinomial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
12.5 Some Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Appendix A: Programming Codes in R. . . . . . . . . . . . . . . . . . . . . . . . . . . 239

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
About the Authors
Md. Rezaul Karim obtained his Bachelor of Science and Master of Science
degrees in Statistics from the University of Rajshahi, Bangladesh, and his Doctor of
Engineering degree from the University of Electro-Communications, Tokyo, Japan.
For the last 24 years, he has been working at the Department of Statistics at the
University of Rajshahi, Bangladesh, where he is currently a Professor. He also
served as a visiting faculty at the Luleå University of Technology, Sweden. His
research interests include reliability analysis, warranty claim analysis, lifetime data
analysis, industrial statistics, biostatistics, and statistical computing. He has over 30
publications in statistics, reliability, warranty analysis, and related areas, and has
presented about 40 papers at numerous conferences and workshops in eight
countries. He is a coauthor of the book Warranty Data Collection and Analysis
(published by Springer in 2011) and has contributed chapters to several books. He
serves on the editorial boards of several journals including Communications in
Statistics, Journal of Statistical Research, International Journal of Statistical
Sciences, Journal of Scientific Research, and Rajshahi University Journal of
Science and Engineering. Further, he is a member of five professional associations.
M. Ataharul Islam is currently the QM Husain Professor at the Institute of Statistical

Research and Training, University of Dhaka, Bangladesh. He was formerly a
Professor of Statistics at the University of Dhaka, Universiti Sains Malaysia, King
Saud University, and the East West University of Dhaka. He served as a visiting faculty
at the University of Hawaii and University of Pennsylvania. He is a recipient of the
Pauline Stitt Award, Western North American Region (WNAR) Biometric Society
Award for content and writing, East West Center Honor Award, University Grants
Commission Award for book and research, and the Ibrahim Memorial Gold Medal for
research. He has published more than 100 papers in international journals, mainly on
longitudinal and repeated measures data including multistate and multistage hazard
models, statistical modeling, Markov models with covariate dependence, generalized
linear models, and conditional and joint models for correlated outcomes. He has
authored books on Foundations of Biostatistics, Analysis of Repeated Measures Data
and Markov Models, has coedited one book, and contributed chapters to several others.
xvii
Chapter 1
Reliability and Survival Analyses:
Concepts and Definitions
Abstract Both reliability and survival analyses are the specialized fields of math-
ematical statistics and are developed to deal with the special type of time-to-event
random variables. Reliability analysis includes methods related to assessment and
prediction of successful operation or performance of products. Nowadays, products
are appearing on the market with the assurance that they will perform satisfactorily
over its designed useful life. This assurance depends on the reliability of the product.
On the other hand, survival analysis includes statistical methods for analyzing the
time until the occurrence of an event of interest, where the event can be death, dis-
ease occurrence, disease recurrence, recovery, or other experience of interest. This
chapter introduces the basic concepts and definitions of some terms used extensively
in reliability and survival analyses. It also discusses the importance of reliability and
survival analyses and presents the outline of the book.
1.1 Introduction to Reliability and Survival Analyses
Both reliability and survival analyses are the specialized fields of mathematical statis-
tics and are developed to deal with the special type of time-to-event random variables
(lifetime, failure time, survival time, etc.).1 In the case of reliability, our concern is
to address the characteristics of survival times of products (item, equipment, compo-
nent, subsystem, system, etc.), whereas in the case of survival analysis, we address the
characteristics of lifetimes arising from problems associated with living organisms
(plant, animal, individual, person, patient, etc.). Hence, similar statistical techniques
can be used in these two fields due to the fact that the random variables of interest
in both fields have reasonable similarities in many respects. The theoretical devel-
opment and applications are primarily based on quite different foundations without
making use of these parallel but overlapping areas of similarities. However, it has
been felt by the researchers and practitioners of both the areas that they would be
benefited immensely if the statistical techniques of common interests can be shared
1 Sectionsof the chapter draw from the co-author’s (Md. Rezaul Karim) previous published work,
reused here with permissions (Blischke et al. 2011).
© Springer Nature Singapore Pte Ltd. 2019 1
M. R. Karim and M. A. Islam, Reliability and Survival Analysis,
https://doi.org/10.1007/978-981-13-9776-9_1
2 1 Reliability and Survival Analyses: Concepts and Definitions
conveniently. This is one of the compelling reasons to introduce reliability and sur-
vival analyses in a single book.
A salient feature of modern industrial societies is that new products are appearing
on the market at an ever-increasing pace. This is due to (i) rapid advances in technol-
ogy and (ii) increasing demands of customers, with each a driver of the other (Blischke
et al. 2011). Customers need assurance that a product will perform satisfactorily over
its designed useful life. This depends on the reliability of the product, which, in turn,
depends on decisions made during the design, development, and production of the
product. One way that manufacturers can assure customers of satisfactory product
performance is through reliability.
Reliability of a product conveys the concept of dependability and successful oper-
ation or performance. It is a desirable property of great interest to both manufacturers
and consumers. Unreliability (or lack of reliability) conveys the opposite (Blischke
et al. 2011). According to ISO 8402 (1986), reliability is the ability of an item to
perform a required function, under given environmental and operational conditions
and for a stated period of time. More technical definitions of reliability are given in
the next chapter.
The time to failure or lifetime of an item is intimately linked to its reliability, and
this is a characteristic that will vary from system to system even if they are identical
in design and structure (Kenett and Baker 2010). For example, if we use the same
automobile component in different automobiles and observe their individual failure
times, we would not expect them all to have the same failure times. The times to failure
for the components used in different automobiles would be different and be defined
by a random variable. The behavior of the random variable can be modeled by a
probability distribution which is a mathematical description of a random phenomenon
consisting of a sample space and a way of assigning probabilities to events. The basis
of reliability analysis is to model the lifetime by a suitable probability distribution
and to characterize the life behavior through the selected distribution. As mentioned
in Kenett and Baker (2010), reliability analysis enables us to answer questions, such
as:
(i) What is the probability that a unit will fail before a given time?
(ii) What percentage of items will last longer than a certain time?
(iii) What is the expected lifetime of a component?
Survival analysis is a branch of statistics that includes a set of statistical methods
for analyzing survival data where the outcome variable is the time until the occurrence
of an event of interest among living organisms. The event can be death, the occurrence
of disease, recurrence of disease, recovery from disease, etc. The time to event
popularly denoted as failure time or survival time can be measured in hours, days,
weeks, months, years, etc. For example, if the event of interest is a heart attack, then
the time to the event can be the time (in years/months/days) until a person experiences
a heart attack. Survival analysis enables us to answer questions, such as:
(i) What is the proportion of a population that will survive beyond a given time?
(ii) Among those who survive, at what rate will they die or fail?
1.1 Introduction to Reliability and Survival Analyses 3
(iii) Is it reasonable to consider multiple causes of death or failure?

(iv) How does the probability of survival vary with respect to particular circum-
stances or characteristics?
(v) Is there any dependency among event times of interest and other explanatory
variables?
Over the past few decades, the statistical analysis of survival data has become a
topic of considerable interest to statisticians and workers in the fields of medicine
and biological sciences.
Therefore, data collection, data analysis, and data interpretation methods for reli-
ability and survival data are important tools for those who are responsible for evalu-
ating and improving the reliability of a product or system and analyzing survival data
for living organisms. This chapter introduces the concepts and definitions of some
terms used extensively in reliability and survival analyses. These terms will be used
frequently in the remaining chapters of the book.
The outline of this chapter is as follows. Section 1.2 defines some important
terms used in reliability and survival analyses. Section 1.3 presents various issues
regarding products, product performance, and reliability. Section 1.4 discusses the
importance of reliability and survival analyses. Section 1.5 deals with sources of
reliability and survival data. The special features of reliability and survival data are
given in Sect. 1.6. The objectives of the book are discussed in Sect. 1.7, and an outline
of the book is given in Sect. 1.8.
1.2 Definitions of Some Important Terms
This section defines some important terms2 used in reliability and survival analyses,
which are referred to throughout the book.
Object In this book by an object, we mean item, equipment, component, sub-
system, system, etc., among products, and plant, animal, individual, person, patient,
etc., among living organisms in an experiment/study. Sometimes, the term object is
referred to as unit of experiment/study as well.
Event In statistics, the event means the outcome of an experiment or a subset of
the sample space. In reliability, by event, we mean failure, warranty claims, recovery
(e.g., repair, replace, return to work/service), etc., or any designated experience of
interest that may happen to the unit being considered in the experiment. In the case
of survival analysis, by event, we mean death, occurrence or recurrence of disease,
recovery from disease, etc.
Time In both reliability and survival analyses, we can define time by the following
categories:
(i) Study period—the whole period of experiment or investigation, or more specif-
ically from the beginning to the end of an experiment or investigation,
2 Other important terms will be explained in the respective chapters.

(ii) Time to event—the time until the occurrence of an event of interest,

(iii) Exposure time—the time period the object is exposed to the occurrence of the
event of interest in the study period.
Lifetime The times to the occurrences of some events of interest for some popu-
lation of individuals are termed as lifetimes (Lawless 1982). For example, the events
of interest may be deaths/failures of the objects during the period of the experiment.
Mathematically, one can think of lifetime as merely meaning nonnegative-valued
variable (Lawless 1982). The term lifetime is used for general reference, and other
terms such as survival time, failure time, and time to failure (TTF) will also be
frequently used.3
Censoring time If the event of interest occurs, then the complete time to event
is exactly known. However, in many situations (may be due to loss to follow up
from the study, discontinuity from the study, end of the study before the occurrence
of the event) the exact time of failure is not known resulting in partial information
about time which is termed as censoring time. The observation for which the time
is censored is denoted as a censored observation. Most of the reliability and survival
analyses depend on censored data.4
Failure Product failure is closely linked to product function. Failure is the termi-
nation of the ability of an item to perform a required function (IEC 50(191) 1990).
According to Nieuwhof (1984), equipment fails if it is no longer able to carry out
its intended function under the specified operational conditions for which it was
designed. Failure is often a result of the effect of deterioration. The deterioration
process leading to a failure is a complicated process that varies with the type of
product and the material used. The rate at which deterioration occurs is a function of
time and/or usage intensity. The death of any living organisms can also be considered
failure.
Fault A fault is the state of the system characterized by its inability to perform its
required function.5 A fault is, therefore, a state resulting from a failure. It is important
to differentiate between failure and fault.
Error The International Electrotechnical Commission defines an error to be a
“discrepancy between a computed, observed or measured value or condition and the
true, specified or theoretically correct value or condition” (IEC 50(191) 1990). As a
result, an error is not a failure, because it is within the acceptable limits of deviation
from the desired performance (target value). An error is sometimes referred to as an
incipient failure (Rausand and Oien 1996).
Random variable A random variable is useful in representing the outcome of an
uncertain event. When uncertainty is a significant feature of the real world, then the
probability theory plays an important role in measuring the uncertainty of a particular
event. A random variable can be either discrete or continuous. A discrete random
variable takes on at most a countable number of values (e.g., the set of nonnegative
3 Sometimes, the survival time refers to how long a specific object survived or will survive.
4 The detail on censored data is given in Chap. 4.
5 This excludes situations arising from preventive maintenance or any other intentional shutdown
period during which the system is unable to perform its required function.
1.2 Definitions of Some Important Terms 5
integers), and a continuous random variable can take on values from a set of possible
values which is uncountable (e.g., values in the interval (−∞, ∞)). Because the
outcomes are uncertain, the value assumed by a random variable is uncertain before
the event occurs. Once the event occurs, it assumes a certain value. The standard
convention used is as follows: X or Y or Z or T (upper case) represents the random
variable before the event, and the value it assumes after the event is represented by
x or y or z or t (lower case). For example, if we are interested in evaluating whether
an object survives for more than 5 years after undergoing cancer therapy, then the
survival time in years can be represented by the random variable T and small t equals
5 years. In this case, we then ask whether capital T exceeds 5 or T > t (Kleinbaum
and Klein 2012).
Details on some of these terms can be found in books on survival and/or reliability
analyses; see, for example, Jewell et al. (1996), Klein and Moeschberger (2003),
Kleinbaum and Klein (2012), and Moore (2016).
1.3 Product, Product Performance, and Reliability
According to ISO 8402 (1994), a product can be tangible (e.g., assemblies or pro-
cessed materials) or intangible (e.g., knowledge or concepts), or a combination
thereof. A product can be either intended (e.g., offering to customers) or unintended
(e.g., pollutant or unwanted effects). A product can be classified in many different
ways. According to Blischke et al. (2011), common ways of classification can be as
follows:
• Consumer nondurables and durables products: These are products that are used
in households. Nondurables differ from durables in the sense that the life of a
nondurable item (e.g., food) is relatively short, and the item is less complex than
a durable item (e.g., television and automobile).
• Industrial and commercial products: These are products used in businesses for their
operations. The technical complexity of such products can vary considerably. The
products may be either complete units (e.g., trucks and pumps) or components
(e.g., batteries, bearings, and disk drives).
• Specialized products: Specialized products (e.g., military and commercial aircraft,
ships, rockets) are usually complex and expensive, often involve in the state-of-
the-art technology, and are usually designed and built to the specific needs of the
customer. An example of a more complex product is a large system that involves
several interlinked products, such as power stations, communication networks,
and chemical plants.
The complexity of products has been increasing with technological advances. As
a result, a product must be viewed as a system consisting of many elements and
capable of decomposition into a hierarchy of levels, with the system at the top level
and parts at the lowest level (Blischke et al. 2011). There are many ways of describing
this hierarchy.6
In general, product performance is a measure of the functional aspects of a product.
It is a vector of variables, where each variable is a measurable property of the product
or its elements. The performance variables can be:
• Functional properties (e.g., power, throughput, and fuel consumption),
• Reliability-related properties (defined in terms of failure frequency, mean time to
failure (MTTF),7 etc.).
Products are designed for a specified set of conditions such as the usage mode,
usage intensity, and operating environment. When the conditions differ significantly
from those specified, the performance of the product is affected. Product performance
is also influenced by the skills of the operator and other factors (see Blischke et al.
2011).
Product reliability is determined primarily by decisions made during the early
stages (design and development) of the product life cycle, and it has implications
for later stages (marketing and post-sale support) because of the impact of unrelia-
bility on sales and warranty costs. It is important for the manufacturers to assess the
product reliability prior to launch of the product on the market. This generally can
be done based on limited information, such as data supplied by vendors, subjective
judgment of design engineers during the design stage, and data collected during the
development stage. However, the data from the field failures are needed to assess the
actual reliability and compare it with the design reliability or predicted reliability. If
the actual reliability is significantly lower than the predicted value, it is essential that
the manufacturer identifies the cause or causes emerging from design, production,
materials, storage, or other factors. Once this is done, actions can be initiated to
improve reliability. On the other hand, if the actual reliability is significantly above
the predicted value, then this information can be used to make changes to the mar-
keting strategy, such as increasing the warranty period and/or lowering the price that
will likely result in an increase in total sales (Blischke et al. 2011).
1.4 Why Reliability and Survival Analyses?
In today’s technological world, nearly everyone depends upon the continued func-
tioning of a wide array of complex machinery and equipment for their everyday
health, safety, mobility, and economic welfare (Dhillon 2007). Everyone expects the
products (cars, computers, electrical appliances, lights, televisions, etc.) to function
properly for a specified period of time. The results of the unexpected failure of the
product can result in unfavorable outcomes, such as financial loss, injury, loss of
life, and/or costly lawsuits. More often, repeated failure leads to loss of customer
6 See Blischke and Murthy (2000) for more details.

7 Mean time to failure is described in Chap. 2.
1.4 Why Reliability and Survival Analyses? 7
satisfaction and the company’s goodwill. It takes a long time for a company to build
up a reputation for reliability and only a short time to be branded as “unreliable”
after shipping a flawed product (NIST 2019). Therefore, continual assessment of
new product reliability and ongoing control of the reliability of a product are a prime
necessity to the engineers and managers in today’s competitive business environment.
There are many possible reasons for collecting and analyzing reliability data
from both customer’s and manufacturer’s perspectives. Some of them as mentioned
in Meeker and Escobar (1998) are:
• Assessing characteristics of materials,
• Predicting product reliability in the design stage,
• Assessing the effect of a proposed design change,
• Comparing components from two or more different manufacturers, materials, pro-
duction periods, operating environments, and so on,
• Assessing product reliability in the field,
• Checking the veracity of an advertising claim,
• Predicting product warranty claims and costs.
On the other hand, over the past few decades, the statistical analysis of survival data
has become a topic of considerable interest to statisticians and workers in medicine
and biological sciences. Some possible reasons for survival analysis are:
• Estimating the time to event for a group of individuals, such as time until second
heart attack for a group of myocardial infarction (MI) patients,
• Comparing time to event between two or more groups, such as treatment group
versus placebo group of patients,
• Assessing the relationship between the lifetime and the covariates, such as does
treatment groups and Eastern Cooperative Oncology Group (ECOG) performance
status influence lifetime of patients?
Therefore, data collection, data analysis, and data interpretation methods for reli-
ability and survival data are important tools for those who are responsible for eval-
uating and improving the reliability of a product or system and analyzing survival
data for living organisms.
1.5 Sources of Survival and Reliability Data
Sources of survival data include

• Clinical trials,
• Hospital/medical records,
• Death certificates,
• Government records and health surveys,
• Web site-specific available survival data.8
8 Pisani et al. (2002) mentioned three sources of site-specific survival data.

There are many sources of reliability data, and some of them are:
• Historical data,
• Vendor data,
• Research/laboratory test data,
• Handbook data,
• Field failure data/field service data,
• Warranty data,
• Customer support data.
For further discussion on these and other related issues, see MIL-HDBK 217E
(1986), Klinger et al. (1990), Ireson (1996), Meeker and Escobar (1998), and Pisani
et al. (2002).
1.6 Special Features of Survival and Reliability Data
There are some special features of survival and reliability data that distinguish them
from other types of data. These features include:
• Data are rarely complete, accurate, or without errors.
• Data are typically censored (exact failure times are not known).
• Usually, data are nonnegative values representing time.
• Generally, data are modeled using distributions for nonnegative random variables.
• Distributions and analysis techniques that are commonly used are fairly specific.
• In many instances, there may be corrupt and/or noisy data.
• Sometimes, data are affected by missing entries, missing variables, too few obser-
vations, etc.
• If there are multiple sources of data, incompatible data, data obtained at different
levels, then the reliability or survival analysis affected greatly.
• Distributions and analysis techniques that are commonly used are fairly specific.
• There are situations when all individuals do not enter the study or put on test at
the same time. This feature is referred to as “staggered entry.”
1.7 Objectives of the Book
As indicated in the previous section, reliability and survival data have a number of
typical features. Therefore, extracting the maximum amount of information requires
special statistical analysis techniques, and the use of this information to make proper
and effective decisions requires building suitable models. The objectives of this
book are to present and unify fundamental and basic statistical models and methods
applied to both reliability and survival data analyses in one place from applications
and theoretical points of view. Almost all of the topics will be covered by thoroughly
1.7 Objectives of the Book 9
prepared examples using real data, with graphical illustrations and programming
codes. These examples deal with results of the analyses, interpretation of the results,
and illustrations of their usefulness.
1.8 Outline of the Book
The book consists of an introductory chapter on basic concepts and definitions of

some terms used in reliability and survival analyses (this chapter) and additional
eleven chapters. The chapter titles and brief descriptions of their contents are as
follows:
Chapter 2: Some Important Functions and Their Relationships. There are a num-
ber of important basic functions extensively used in reliability and survival data
analyses. This chapter defines some of these functions, for example, probability den-
sity, cumulative density, reliability or survival, hazard, and mean life functions, that
will be applied in the later chapters. This chapter also derives the interrelationships
among these functions.
Chapter 3: Probability Distribution of Lifetimes: Uncensored. The survival pat-
terns of different products, components of a system, or lifetimes of human being or
living organisms vary greatly. Hence, different failure time distributions are needed
to characterize the diversity contained in the data. This chapter discusses some of
the major lifetime distributions (exponential, Weibull, extreme value, normal, and
lognormal) applied in reliability and survival analyses. These distributions are used
here for analyzing uncensored data only.
Chapter 4: Censoring and Truncation Mechanisms. Censoring and truncation are
the special types of characteristics of time-to-event data. For a censored observation,
only partial information about the random variable of interest is available. In the
case of truncation, some of the subjects may be dropped from the study due to the
implementation of some conditions such that their presence or existence cannot be
known. In other words, the truncated subjects are subjects to screening by some con-
ditions as an integral part of the study. This chapter presents the maximum likelihood
estimation method for analyzing the censored and truncated data.
Chapter 5: Nonparametric Methods. Nonparametric methods play an important
role in the use of graphical and analytical approaches in order to gain insights and
draw inferences without making any assumptions regarding the underlying probabil-
ity distributions of the data. This chapter discusses the nonparametric approaches for
analyzing reliability and survival data. It explains the empirical distribution function,
product-limit estimator of survival function, warranty claims rate, etc. This chapter
also deals with the hypothesis tests for comparison of two or more survival/reliability
functions.
Chapter 6: Probability Distribution of Lifetimes: Censored and Left Truncated.
This chapter discusses the maximum likelihood estimation method for analyzing
the censored and truncated data using some common lifetime distributions. The
likelihood functions under the schemes of different types of censoring and truncation
constructed in Chap. 4 will be applied in this chapter.
Chapter 7: Regression Models. In both reliability and survival analyses, regression
models are employed extensively for identifying factors associated with probability,
hazard, risk, or survival of units being studied. This chapter introduces some of
the regression models used in both reliability and survival analyses. The regression
models include logistic regression, proportional hazards, accelerated failure time,
and parametric regression models based on specific probability distributions.
Chapter 8: Generalized Linear Models. The concept of generalized linear
models has become increasingly useful in various fields including survival and relia-
bility analyses. This chapter includes the generalized linear models for various types
of outcome data based on the underlying link functions. The estimation and test
procedures for different link functions are also highlighted.
Chapter 9: Basic Concepts of System Reliability. A system is a collection of
components interconnected to a specific design in order to perform a given task.
The reliability of a system depends on the types, quantities, and reliabilities of its
components. This chapter discusses some basic ideas behind the analysis of the
reliability of a system. It derives the distribution and reliability functions of the
lifetime of the system as a function of the distribution or reliability functions of the
individual component lifetimes.
Chapter 10: Quality Variation in Manufacturing and Maintenance Decision.
Quality variations in manufacturing are one of the main causes of the high infant
(early) failure rate of the product. This chapter looks at the issues in modeling the
effect of quality variations in manufacturing. It models the effects of assembly errors
and component nonconformance. This chapter constructs the month of production—
month in service (MOP-MIS) diagram to characterize the claim rate as a function of
MOP and MIS. It also discusses the determination of optimum maintenance interval
of an object.
Chapter 11: Stochastic Models. In survival and reliability analyses, the role of
Markov chain models is quite useful in solving problems where transitions are
observed over time. It is very common in survival analysis that a subject suffer-
ing from a disease at a time point will recover at a later time. Similarly, in reliability,
a machine may change state from non defective to defective over time. This chapter
discusses the Markov chain model, Markov chain model with covariate dependence,
and Markov model for polytomous outcome data.
Chapter 12: Analysis of Big Data Using GLM. The application of the generalized
linear models (GLMs) to big data is discussed in this chapter using the divide and
recombine (D&R) framework. In this chapter, the exponential family of distributions
for binary, count, normal, and multinomial outcome variables and the corresponding
sufficient statistics for parameters are shown to have great potential in analyzing big
data where traditional statistical methods cannot be used for the entire data set.
In addition, an appendix provides the programming codes in R that are applied to
analyze data in different examples of the book.
References 11
References
Blischke WR, Murthy DNP (2000) Reliability. Wiley, New York

Blischke WR, Karim MR, Murthy DNP (2011) Warranty data collection and analysis. Springer,
London Limited
Dhillon BS (2007) Applied reliability and quality: fundamentals, methods and procedures. Springer,
London Limited
IEC 50 (191) (1990) International electrotechnical vocabulary (IEV) Chapter 191- Dependability
and quality of service. In: International electrotechnical commission, Geneva
Ireson WG (1996) Reliability information collection and analysis (Chapter 10). In: Ireson WG,
Coombs CF, Moss RY(eds) Handbook of reliability engineering and management, 2nd edn, New
York, McGraw Hill
ISO 8402 (1986) Quality vocabulary. International Standards Organization, Geneva
ISO 8402 (1994) Quality Vocabulary. International Standards Organization, Geneva
Jewell NP, Kimber AC, Lee MLT, Whitmore GA (eds) (1996) Lifetime data: models in reliability
and survival analysis. Springer, US
Kenett RS, Baker E (2010) Process improvement and CMMI® for systems and software. CRC
Press
Klein JP, Moeschberger ML (2003) Survival analysis: techniques for censored and truncated data,
2nd edn. Springer
Kleinbaum DG, Klein M (2012) Survival analysis: a self-learning text, 3rd edn. Springer, New York
Klinger DJ, Nakada Y, Menendez MA (1990) AT&T reliability manual. Van Nostrand-Reinhold,
New York
Lawless JF (1982) Statistical models and methods for lifetime data. Wiley, New York
Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley, New York
MIL-HDBK-217E (1986) Reliability prediction for electronic equipment. Available from naval
publications and forms center, 5801 Tabor Ave, Philadelphia, PA 19120
Moore DF (2016) Applied survival analysis using R. Springer, Berlin
Nieuwhof GWE (1984) The concept of failure in reliability engineering. Reliab Eng 7:53–59
NIST (2019) Engineering statistics handbook, NIST/SEMATECH e-Handbook of statistical meth-
ods. http://www.itl.nist.gov/div898/handbook/index.htm. Accessed May 23, 2019
Pisani P, Bray F, Parkin DM (2002) Estimates of the world-wide prevalence of cancer for 25 sites
in the adult population. Int J Cancer 97:72–81
Rausand M, Oien K (1996) The basic concept of failure analysis. Reliab Eng Sys Saf 53:73–83
Chapter 2
Some Important Functions and Their
Relationships
Abstract There are a number of important basic functions extensively used in relia-
bility and survival data analyses. This chapter defines some of these functions that will
be applied in the later chapters. These include probability density function, cumula-
tive density function, reliability or survival function, hazard function, and mean life
function. This chapter also derives the interrelationships among these functions.
2.1 Introduction
This chapter discusses some of the most important functions used in reliability and
survival data analyses.1 These functions can be used to draw inferences regarding
various probabilistic characteristics of lifetime variable, such as
• Estimation of the number of failures that occur in a given period of time,
• Estimation of the probability of success of an object in performing the required
function under certain conditions for a specified time period,
• Estimation of the probability that an object will survive or operate for a certain
period of time after survival for a given period of time,
• Determination of the number of failures occurring per unit time, and
• Determination of the average time of operation to a failure of an object.
Under the parametric setup, some of these functions can be applied to extrapolate
to the lower or upper tail of the distribution of a lifetime variable. Their properties
are investigated either exactly or by means of asymptotic results. These functions
are interrelated, and if any of them are known, the others can be derived easily from
their interrelationship.
The outline of this chapter is as follows. Section 2.2 discusses the summary statis-
tics, including the measures of center, dispersion, and relationship. Section 2.3 defines
the density function and distribution function of a random variable. Section 2.4
defines reliability or survival function. Sections 2.5 and 2.6 discuss the conditional
reliability function and failure rate function, respectively. The mean life function
1 Sectionsof the chapter draw from the co-author’s (Md. Rezaul Karim) previous published work,
14 2 Some Important Functions and Their Relationships
and residual lifetime are presented, respectively, in Sects. 2.7 and 2.8. The fractiles
of a distribution are presented in Sect. 2.9. Section 2.10 deals with the relationship
among various functions.
2.2 Summary Statistics
We begin our discussion of the statistical analysis of data by looking briefly at

some standard techniques for description and summarization of data. However, it
is important to carefully inspect the data before applying these techniques to ensure
that they are correct and suitable for analysis in the context of the objectives of the
study.
The purposes of the initial inspection of the data (Blischke et al. 2011) are to
• Verify the source of the data,
• Verify that the data include the specified variables,
• Verify the units of measurement,
• “Clean” the data by deleting or, if possible, correcting obviously incorrect data,
• Identify outliers or otherwise unusual observations,
• Check for missing data,
• Identify any other unusual features of data.
These activities are especially important when dealing with reliability and survival
data, which are often very prone to error. For valid results, incorrect data must be
dealt with very carefully in the analysis. Given an appropriate data set, the objectives
of a preliminary data analysis are to provide one or more of the following (Blischke
et al. 2011):
• A description of the key features of the data,
• A summarization of the information content of the data in an easily understood
format,
• Graphical representations of the data,
• Preparation of the data for detailed statistical analysis.
In analyzing data, we begin with a sample, that is, a set of observations (measure-
ments, responses, etc.), and perform various calculations and operations in order to
focus and understand the information content of the sample data. The word “statis-
tic” is used to refer to any quantity calculated from the data—averages, ranges, per-
centiles, and so forth. There are a number of statistics that are intended to describe
the sample and summarize the sample information. These statistics also provide a
foundation for statistical inference, which is the process of using the sample infor-
mation to infer something about the population from which the sample was drawn.
Let us consider that we have a sample comprising of n observations denoted by t 1 ,
t 2 , …, t n of the random variable T. For some purposes, it will be convenient to order
the observations from the smallest to the largest. The ordered set of observations will
be denoted as t (1) , t (2) , …, t (n) .
2.2 Summary Statistics 15
In this section, we will look at measures of center of a sample, measures of spread

or dispersion, and measures of relationship between two or more variables in a data
set.
2.2.1 Measures of Center
The most common measures of the center of a sample (also called measures of
location) are the sample mean (or average) and median. The sample mean of T,
denoted as t¯, is the simple arithmetic average given by
1
n
t¯ = ti . (2.1)
n i=1
The sample mean is the preferred measure for many statistical purposes. It is the
basis for numerous statistical inference procedures and is a “best” measure for the
purpose of measuring the center value of a data set. However, this measure may be
affected by extreme values. In that case, we need to consider an alternative measure
of location.
For a finite set of observations, the sample median is the value that divides the
ordered observations into two equal parts. The observations belonging in the first part
are less than or equal to the median, and the observations belonging in the second
part are greater than or equal to the median (Islam and Al-Shiha 2018). The sample
median is the 0.50-fractile (t 0.50 ) or the second quartile (Q2 ). Qi means ith (i = 1, 2,
3) quartile and is the value of the random variable such that 25 × i percent or less
observations are less than Qi and (100 − 25 × i) percent or less observations are
greater than Qi .2 Median is a natural measure of location since at least 50% of the
observations lie at or above the median and at least 50% lie at or below the median.
As mentioned, the mean is sensitive to extreme values (outliers) in the data and due
to the presence of outliers it can provide a somewhat distorted measure of location.
In such cases, the median provides a more meaningful measure of the location as it
is not affected by the extreme values. If the sample is perfectly symmetrical about its
center, the mean and median become the same. If the mean and median are different,
this is an evidence of skewness in the data. If the median is less than the mean, the
data are skewed to the right, and if the median is greater than the mean, the data are
skewed to the left.
An approach to deal with the data having outliers is to compute a trimmed mean,
which is obtained by removing a fixed proportion of both the smallest and the largest
observations from the data and calculating the average of the remaining observations.
A few other measures are sometimes used. These include the mode and various other
2Q
i means ith quartile, and t p means p-fractile of a sample. More on quartile and fractile can be
found in Sect. 2.9.
measures that can be defined as functions of fractiles, e.g., (Q3 − Q1 )/2, (t 0.90 −
t 0.10 )/2, and so forth.
2.2.2 Measures of Dispersion
A second descriptive measure commonly used in the statistical analysis is a measure

of dispersion (or spread) of the data. The measures of dispersion reflect the vari-
ability in the data and provide important insight in understanding the data. The most
important measures of dispersion for describing the data are the sample variance and
standard deviation. The sample variance of a random variable T denoted by s2 is
given by
⎧ n 2 ⎫
1 n
1 ⎨ n
1 ⎬
s2 = (ti − t¯)2 = ti2 − ti . (2.2)
n − 1 i=1 n − 1 ⎩ i−1 n i=1 ⎭
√
The sample standard deviation is s = s 2 and is the preferred measure for most
purposes since it is in units of the original data.
A measure of variability sometimes used for describing data is the interquartile
range, denoted by IQR and defined by IQR = Q3 − Q1 , where Q1 and Q3 are the first
and third quartiles of the data, respectively. An advantage of the interquartile range
is that it is not affected by extreme values. A disadvantage is that it is not readily
interpretable as is the standard deviation. If the sample data are free from outliers
or extreme values, then a preferred and simple measure of dispersion is the range,
which is defined as range R = t (n) − t (1) .
Another useful measure of dispersion in some applications is the coefficient of
variation (CV), defined by CV = s/t¯. This measure is unit free and tends to remain
relatively constant over measurements of different types, for example, weights of
individuals over different biological species and fuel consumption of engines of very
different sizes.
2.2.3 Measures of Relationship
When the data include two or more variables, measures of the relationship between
the variables are of interest. Here, we introduce two measures of strength of relation-
ship for two variables, the Pearson’s correlation coefficient r and a rank correlation
coefficient, r s .3
We assume a sample of bivariate data (x i , yi ), i = 1, …, n. The sample correlation
coefficient is given by
3 The subscript s is for Charles Spearman, who devised the measure in 1904.
n n n n
1
n−1 i=1 (xi − x̄)(yi − ȳ) 1 1
r= = xi yi − xi yi ,
sx s y (n − 1)sx s y i=1 n i=1 i=1
(2.3)
where sx and sy denote, respectively, the standard deviations of the variables, X and
Y. The numerator of (2.3), known as the sample covariance, can be used as a measure
of the relationship between two variables X and Y in certain applications.
The sample correlation coefficient, r, is the sample equivalent of the population
correlation coefficient, ρ, a parameter of the bivariate normal distribution, and as
such is a measure of the strength of linear relationship between the variables, with
ρ = 0 indicating no linear relationship. In the case of the bivariate normal distribution,
this is equivalent to the independence of the variables. Note that the correlation
coefficient is unit free. In fact, the ranges of ρ and r lie in the interval [−1, 1],
with the values −1 and +1 indicating that the variables are perfectly linear, with
lines sloping downward and upward, respectively. The general interpretation is that
values close to either extreme indicate a strong relationship and values close to zero
indicate a very little relationship between the variables.
An alternative measure of the strength of relationship is rank correlation. Rank
correlation coefficients are calculated by first separately ranking the two variables
(giving tied observations, the average rank) and then calculating a measure based
on the ranks. The advantage of this is that a rank correlation is applicable to data
down to the ordinal level and is not dependent on linearity. There are several such
coefficients. The most straightforward of these is the Spearman rank correlation r s ,
which is simply the application of (2.3) to the ranks. Note that rank correlation
can also be used to study trend in measurements taken sequentially through time.
In this case, the measurements are ranked and these ranks and the order in which
observations are taken can be used in the calculation of r s .
Another approach to the study of data relationships is the linear regression anal-
ysis, in which the linear relationship between the variables is explicitly modeled and
the data are used to estimate the parameters of the model. The approach is applicable
to nonlinear models as well. Further discussion on different regression models will
be discussed in Chaps. 7 and 8.
Graphical representation of data is also an important part of the preliminary anal-
ysis of data. The graphical representation of reliability and survival data includes
histogram, Pareto chart, pie chart, stem-and-leaf plot, box plot, and probability plot.
Detailed descriptions on these graphs are not given here. Additional details on the
above topics can be found in introductory statistics texts such as Ryan (2007) and
Moore et al. (2007), and reliability and biostatistics books such as Blischke and
Murthy (2000), Meeker and Escobar (1998), and Islam and Al-Shiha (2018). There
are many other graphical methods of representing both qualitative and quantitative
data. These are discussed in detail in Schmid (1983) and Tufte (1983, 1989, 1997).
Example 2.1 Table 2.1 shows a part of the warranty claims data for an automobile
component (20 observations out of 498).4 The data are taken from Blischke et al.
(2011). For the purpose of illustration, the variables age (in days) and usage (in
km at failure) are considered here; however, the original data have more variables,
such as failure modes, type of automobile that used the component, and zone/region,
discussed in Chap. 7.
Let X and Y denote the variables age (in days) and usage (in km at failure),
n n
respectively.
n For the above data, we
n have n = 20, i=1 x i = 2759,n i=1 yi =
429, 987, i=1 xi = 539, 143, i=1 yi = 14, 889, 443, 757, and i=1 xi yi =
2 2
80, 879, 839. The calculated descriptive (or summary) statistics for the variables age
(X) and usage (Y ) are shown in Table 2.2.
Table 2.1 A part of warranty claims data of an automobile component

Serial no. Age in days Usage (used km at Serial no. Age in days Usage (used km at
failure) failure)
1 136 36,487 11 169 18,175
2 104 2381 12 169 18,106
3 99 14,507 13 364 27,008
4 94 7377 14 78 11,600
5 94 10,790 15 78 7900
6 156 47,312 16 77 17,620
7 295 56,943 17 16 7762
8 300 45,292 18 165 39,487
9 54 5187 19 44 6420
10 82 4512 20 185 45,121
Table 2.2 Descriptive

Statistics Variables
statistics for the variables, age
and usage, of an automobile Age (X) Usage (Y )
component Mean 137.9 21,499
Median (Q2 ) 101.5 16,064
Trimmed mean 132.2 20,592
Standard deviation 91.3 17,237
CV (%) 66.22 80.17
First quartile (Q1 ) 78.0 7473
Third quartile (Q3 ) 169.0 38,737
IQR 91 31,264
Coefficient of correlation (r) 0.721
4 Theinformation regarding the names of the component and manufacturing company is not dis-
closed to protect the proprietary nature of the information.
For both the variables, age and usage, the sample means (137.9 days and
21499 km) are greater than the respective medians (101.5 days and 16064 km), indi-
cating skewness to the right. The trimmed means for the variables, age and usage,
are obtained by removing the smallest 5% and the largest 5% of the observations
(rounded to the nearest integer) and then calculating the means of the remaining
observations for both variables. These trimmed means (132.2 days and 20592 km)
are still considerably larger than the medians, indicating real skewness, beyond the
influence of a few unusually large observations. Since the CV of usage (80.17%) is
greater than the CV of age (66.22%), the relative variability of the variable usage is
larger than the relative variability of the variable age. The correlation coefficient
between age and usage is 0.721, indicating a positive correlation between the two
variables. Note that these descriptive statistics are based on a small subsample of the
original data and hence need to be interpreted cautiously.
2.3 Cumulative Distribution and Probability Density

Functions
Let T be a random variable which represents the outcome of an uncertain event.

The cumulative probability density function or cumulative density function (cdf) ,
also known as distribution function, for the random variable T is denoted by F(t; θ )
defined as the probability associated with the event {T ≤ t}
F(t; θ ) = P{T ≤ t}, t ≥ 0, (2.4)
where θ denotes the set of parameters of the distribution function. It may be noted
here that in case of other variables, say X, if X lies between [−∞, ∞], then F(x; θ ) =
P{X ≤ x}, −∞ ≤ x ≤ ∞. However, as the reliability and survival functions used
time which is nonnegative-valued variable, hence (2.4) will be used frequently in
the subsequent chapters of the book. Often the parameters are omitted for notational
ease so that one uses F(t) instead of F(t; θ ). F(t) has the following properties with
respect to (2.4):
• 0 ≤ F(t) ≤ 1 for all t.
• F(t) is a nondecreasing function in t.
• lim F(t) = 0 and lim F(t) = 1.
t→0 t→∞
• For t 1 < t 2 , P{t 1 < T ≤ t 2 } = F(t 2 ) − F(t 1 ).
When T is a discrete random variable, it takes on at most a countable number of
values in a set (t 1 , t 2 , …, t n ), with n being finite or infinite, and the distribution function
of T, F(t i ) = P{T ≤ t i }, is a step function with steps of height pi = P{T = t i }, i = 1,
2, …, n, at each of the possible values t i , i = 1, 2, …, n.5 In reliability engineering,
5 As before, the parameters may be omitted for notational ease, so that pi is often used instead of
pi (θ).
the function F(t) might be referred to as unreliability function or failure probability

function. In risk analysis, the same function might be called the probability of mission
failure (Kaminskiy 2013).
The cdf is important for determining the probability that a random observation
taken from the population will be less than or equal to a given value. It is sometimes
used for generating random observations in simulation studies. The cdf can also be
used for deriving the probability density function (pdf) of a random variable. When
T is continuously valued and F(t) is differentiable, the pdf of T, denoted by f (t; θ )
or f (t), is derived by
dF(t)
f (t) = (2.5)
dt
and the probability in the interval (t, t + δt] can be shown as
P{t < T ≤ t + δt} ≈ f (t)δt + O(δt 2 ). (2.6)
The probability density function is the most important mathematical function in

lifetime data analysis. This function can be used to derive other functions that are
important to lifetime data analysis, including the reliability (or survival) function,
the failure rate function, the mean life function, and the fractile (Bp) life function.
When T is a discrete random variable with values t 1 , t 2 , …, t n , the probability
mass function denoted by p(t i ;θ ) or simply p(t i ) is the probability that T = t i ,
p(ti ; θ ) = p(ti ) = P{T = ti }, i = 1, 2, . . . , n (2.7)
where p(ti ) has the following properties:

•
p(ti ) ≥ 0, for all t i , i = 1, 2, …, n.
n
• i=1 p(ti ) = 1.
Example 2.2 Let X denote the number of customer complaints within a day for a
product, then X is a discrete random variable. Suppose that for a product, X takes
on the values 0, 1, 2, 3, 4, and 5 with respective probabilities 0.05, 0.15, 0.25, 0.30,
0.20, and 0.05. The probability mass function and the distribution function of X are
shown in Fig. 2.1.
In this example, the probability that the daily customer complaints 3 or more
equals to P(X ≥ 3) = 0.30 + 0.20 + 0.05 = 0.55. Therefore, the probability that the
number of complaints per day is 3 or more is 55%.
Example 2.3 If T denotes the failure times (measured in 100 days) of an electronic
device, then T is a continuous random variable, where the original time variable is
divided by 100. Figure 2.2 shows the hypothetical probability density functions and
cumulative density functions of T for three different types of devices, denoted by A,
B, and C.
2.3 Cumulative Distribution and Probability Density Functions 21
Fig. 2.1 Probability mass function (left side) and distribution function (right side) for the number
of customer complaints (X)
Fig. 2.2 Cumulative density functions (left side) and probability density functions (right side) for
the failure times of devices A, B, and C
The graph of the cdf for device B shows that F(2) = 0.87. This indicates that
87% of device B will fail within 200 days, whereas within the same age (200 days),
approximately 99 and 76% failures will occur for the devices A and C, respectively.
The graph of the cdfs indicates that approximately the same percent of cumulative
failures will occur for the three devices within the age about 113 days.
2.4 Reliability or Survival Function
The ability of an item to perform a required function, under given environmental

and operational conditions and for a stated period of time, is known as the reliability
of the item (ISO 8402, 1986). In terms of probability, the reliability of an object
is the probability that the object will perform its intended function for a specified
time period when operating under normal (or stated) environmental conditions (Blis-
chke and Murthy 2000). In survival analysis, this probability is known as survival
probability.6
This definition contains four key components:
(i) Probability—The probability of the occurrence of an event. For example, a
timing chain might have a reliability goal of 0.9995 (Benbow and Broome
2008). This would mean that at least 99.95% are functioning at the end of the
stated time.
(ii) Intended function—This is stated or implied for defining the failure of an object.
For example, the intended function of the battery is to provide the required
current to the starter motor and the ignition system when cranking to start the
engine. The implied failure definition for the battery would be the failure to
supply the necessary current which prevents the car from starting.
(iii) Specified time period—This means the specified value of lifetime over the use-
ful life of the object measured in minutes, days, months, kilometers, number of
cycles, etc. For example, a battery might be designed to function for 24 months.
Sometimes, it is more appropriate to use two-dimensional time period; e.g., the
warranty period for a tire of a light truck might be stated in terms of first 2/32
in. of usable tread wear or 12 months from the date of purchase, whichever
comes first.
(iv) Stated environmental condition—These include environmental conditions,
maintenance conditions, usage conditions, storage and moving conditions, and
possibly other conditions. For example, a five-ton truck is designed to safely
carry a maximum of five tons. This implies that maximum five ton is a condition
of the usage environment for that truck.
The reliability function (or survival function) of the lifetime variable, T, denoted
by R(t) (or S(t)), where
R(t) = S(t) = P{T > t} = 1 − P{T ≤ t} = 1 − F(t), t ≥ 0 (2.8)
is the probability that an object survives to time t. It is the complement of the cumu-
lative density function. It has the following basic properties:
• R(t) is a nonincreasing function in t, 0 ≤ t < ∞.
• R(0) = 1 and lim R(t) = 0 or R(∞) = 0.
t→∞
• For t 1 < t 2 , P{t 1 < T ≤ t 2 } = F(t 2 ) − F(t 1 ) = R(t 1 ) − R(t 2 ).
The hypothetical reliability functions corresponding to the cumulative density
functions for devices A, B, and C discussed in Example 2.3 are shown in Fig. 2.3.
Figure 2.3 shows the probability that the device A will survive more than 100 days
is R(t = 1) = P{T > 1} = 0.5. That is, 50% of the device A survives past 100 days.
6 Thismeans the probability of surviving an object (individual, person, patient, etc.) for a specified
period of time.
2.4 Reliability or Survival Function 23
Fig. 2.3 Hypothetical

reliability functions for the
failure times for three
devices A, B, and C
This figure suggests that before age about 100 days, the reliability of the device C
is less than the reliability of device B and the reliability of device B is less than the
reliability of the device A, but they are in reverse order after the age about 120 days.
2.5 Conditional Reliability Function
The conditional probability that the item will fail in the interval (a, a + t], given
that it has not failed prior to a, is given by
P{(a < T ≤ a + t) ∩ (T > a)}

FT |T >a (t) = P{a < T ≤ a + t|T > a } =
P{T > a}
P{a < T ≤ a + t} F(a + t) − F(a)
= = . (2.9)
1 − P{T ≤ a} 1 − F(a)
It is also known as the conditional probability of failure or conditional cdf.

The conditional reliability function, RT |T ≥a (t), is defined as the probability that
an object will survive or operate without failure for a mission time t, given that
it is already functioning at time a. Corresponding to the conditional cdf (2.9), the
conditional reliability function can be expressed mathematically as
F(a + t) − F(a) R(a + t)

RT |T >a (t) = ST |T >a (t) = 1 − FT |T >a (t) = 1 − = .
1 − F(a) R(a)
(2.10)
2.6 Failure Rate Function
The failure rate function which is popularly known as hazard function, h(t), can be
interpreted as the probability that the object will fail in (t, t + δt] for small δt, given
that it has not failed prior to t. It is defined as
P{t < T ≤ t + δt|T > t}

h(t) = lim
δt→∞ δt
P{t < T ≤ t + δt}/P{T > t}
= lim
δt→∞ δt
1 P{t < T ≤ t + δt} f (t)
= lim = (2.11)
P{T > t} δt→∞ δt S(t)
which is the ratio of the probability density function to the survivor function. The
hazard function is also known as the instantaneous failure rate, failure rate function,
force of mortality, force of decrement, intensity function, age-specific death rate, and
its reciprocal is known as Mill’s ratio in economics (Islam and Al-Shiha 2018). It
indicates the “proneness to failure” or “risk” of an object after time t has elapsed. In
other words, it characterizes the effect of age on object failure more explicitly than
cdf or pdf. h(t) is the amount of risk of an object at time t. It is a special case of the
intensity function for a nonhomogeneous Poisson process (Blischke et al. 2011).
The hazard function satisfies
• h(t) ≥ 0 for all t,
∞
• −∞ h(t) dt = ∞.
Based on hazard function, the lifetime distribution can be characterized in the
following three types:
• Constant failure rate (CFR): Probability of breakdown is independent of the age
or usage of the unit. That is, the unit is equally likely to fail at any moment during
its lifetime, regardless of how old it is.
• Increasing failure rate (IFR): Unit becomes more likely to fail as it gets older.
• Decreasing failure rate (DFR): Unit gets less likely to fail as it gets older.
The cumulative hazard function of the random variable T, denoted by H(t), is
defined as
t
H (t) = h(x)dx. (2.12)
0
H(t) is also called the cumulative failure rate function. Cumulative hazard function
must satisfy the following conditions:
• H(0) = 0.
• lim H (t) = ∞.
t→∞
2.6 Failure Rate Function 25
• H(t) is a nondecreasing function of t.

The cumulative hazard function is important for defining the characteristics of
failure time distributions. In addition, it plays an important role in inferential proce-
dures as well as for generating data in some simulation studies.
The average of the failure rate over a given time interval, say [t 1 , t 2 ], is defined
by
t2
1 H (t2 ) − H (t1 )
h̄(t1 , t2 ) = h(x)dx = , t2 ≥ t1 . (2.13)
t2 − t1 t2 − t1
t1
It is a single number that can be used as a specification or target for the population
failure rate over the interval [t 1 , t 2 ] (NIST 2019).
The hazard functions and cumulative hazard functions corresponding to the cumu-
lative density functions for devices A, B, and C discussed in Example 2.3 are shown
in Fig. 2.4. A hypothetical bathtub curve of hazard function is also inserted in the
plot of hazard functions (left side). The bathtub curve of hazard function comprises
three failure rate patterns, initially a DFR (known as infant mortality), followed by
a CFR (called the useful life or random failures), and a final pattern of IFR (known
as wear-out failures).
As illustrated in Fig. 2.4, the hazard functions for devices A, B, and C are, respec-
tively, initially increasing and then decreasing, constant, and decreasing. The figure
shows that for device A, the values of the cumulative hazard function at t = 2 and 3
are H(2) = 4.56 and H(3) = 8.99, respectively. Therefore, for the device A, h̄(2, 3)
= (8.99 − 4.56)/(3 − 2) = 4.43. This indicates that the average failure rate for the
device A over the interval [200, 300] days is 4.43.
Fig. 2.4 Hazard functions (left side) and cumulative hazard functions (right side) for the failure
times of devices A, B, and C
2.7 Mean Life Function
The mean life function, which is also often called the expected or average lifetime
or the mean time to failure (MTTF), is another widely used function that can be
derived directly from the pdf. Mean time to failure describes the expected time to
failure of nonrepairable identical products operating under identical conditions. That
is, MTTF is the average time that an object will perform its intended function before
it fails. The mean life is also denoted by the mean time between failures (MTBF) for
repairable products.
With censored data, the arithmetic average of the data does not provide a good
measure of the center because at least some of the failure times are unknown. The
MTTF is an estimate of the theoretical center of the distribution that considers cen-
sored observations (Minitab 2019). If f (t) is the pdf of the random variable T, then
the MTTF (denoted by μ or E(T )) can be mathematically calculated by
∞
MTTF = E(T ) = μ = t f (t)dt. (2.14)
0
Evaluating the right-hand side of (2.14) by means of integration by parts,7 we

obtain
∞ ∞
MTTF = [t F(t)]∞
0 − F(t)dt = [t (1 − S(t))]∞
0 − (1 − S(t)) dt
0 0
∞ ∞
= [t (1 − S(t)) − t]∞
0 + S(t)dt = [−t S(t)]∞
0 + S(t)dt.
0 0
In the above expression, the term tS(t) tends to zero, because S(t) tends to zero
as t tends to infinity. Therefore, the first term of the right-hand side tends to zero,
yielding
∞
MTTF = S(t)dt (2.15)
0
Equation (2.15) indicates that when the failure time random variable, T, is defined
on [0, ∞], the MTTF is the area between S(t) and the t-axis. This can be applied to
compare different survival functions.
If a distribution fits the data adequately, the MTTF can be used as a measure of
the center of the distribution. The MTTF can also be used to determine whether a
b b b
u
b
7 Integration by parts means, e.g., a uvdx = u a vdx − a a vdx dx.
2.7 Mean Life Function 27
redesigned system is better than the previous system in the demonstration test plans
(Minitab 2019).
2.8 Residual Lifetime
Given that a unit is of age t, the remaining life after time t is random. The expected
value of this random residual life is called the mean residual life (MRL) at time t
(Guess and Proschan 1988). MRL can be used in many fields, such as studying burn-
in, setting rates and benefits for life insurance, and analyzing survivorship studies in
biomedical research.
If X be a continuous random variable representing the lifetime of an object with
survival function S(x) and finite mean μ, the MRL is defined as

E (X − t)| X ≥t
m(t) = E(X − t|X ≥ t) =
P(T ≥ t)
∞
1
= (x − t) f (x)dx, for t > 0. (2.16)
S(t)
t
But
∞ ∞ ∞
(x − t) f (x)dx = f (x)dx du
u
t t
∞ ∞
= [1 − F(u)]du = S(u)du. (2.17)
t t
Therefore,
∞
1
m(t) = S(u)du, t ≥ 0. (2.18)
S(t)
t
It implies that the MTTF (2.15) is a special case of (2.18) where t = 0. Note that
the MTTF is a constant value, but the MRL is a function of the lifetime t of the
object. See Guess and Proschan (1988) for more information about the MRL.
2.9 Fractiles of Distribution
The p-fractile8 of a continuous probability distribution is any value of the random

variable T, call t p , such that F(t p ) = p, where 0 ≤ p ≤ 1. For a continuous cdf, t p is
almost always uniquely determined.9 In cases where it is not, the p-fractile can be
taken to be any value in an interval, and there are several commonly used definitions
for the term.
The p-fractile of a sample is defined as that value t p such that at least a proportion
p of the sample lies at or below t p and at least a proportion (1 − p) lies at or above
t p . This value may also not be unique, and there are several alternative definitions
that may be used. The p-fractile of a sample of observed values can be defined as
follows: Let k = [p(n + 1)] and d = p(n + 1) − k, where [x] denotes the integer part
of x. If k = 0 or k = n (corresponding to very small or very large values of p), the
fractile is not defined. If k = 1, …, n − 1, then t p is given by

t p = t(k) + d t(k+1) − t(k) . (2.19)
If the cdf F(y) is strictly increasing, then there is a unique value t p that satisfies
F(t p ) = p, and the estimating equation for t p can be expressed as t p = F −1 (p), where
F −1 (.) denotes the inverse function of the cumulative distribution function F(.). This
is illustrated in Fig. 2.5 as an example based on the cdf of device A. For p = 0.2, the
Fig. 2.5 Plot showing that

the p-fractile is the inverse of
the cdf
8 Related terms are percentile, decile, and quantile.

9 The exception occurs when the cdf is constant over a certain interval and increasing on either side
of the interval.
2.9 Fractiles of Distribution 29
figure shows the value of t p = t 0.20 = 0.87, which means that 20% of the population
for device A will fail by 87 days.
In descriptive statistics, specific interests are in the 0.25-, 0.50-, and 0.75-fractiles,
called the quartiles, and denoted Q1 , Q2 , and Q3 , respectively.
If the failure probability p (or reliability) is 0.5 (50%), the respective fractile (or
percentile) is called the median lifetime. Median is one of the popular measures of
reliability. If one has to choose between the mean time to failure and the median time
to failure (as the competing reliability measures), the latter might be a better choice,
because the median is easier to perceive using one’s common sense and the statistical
estimation of median is, to some extent, more robust (Kaminskiy 2013). Fractiles
also have important applications in reliability, where the interest is in fractiles for
small values of p. For example, if t denotes the lifetime of an item, t 0.01 is the
time beyond which 99% of the lifetimes will lie. In accordance with the American
Bearing Manufacturers Association Std-9-1990, the tenth percentile is called L 10
life. Sometimes, it is called B10 , “B ten” life (Nelson 1990).
Example 2.4 For the variables age (in days) and usage (in km at failure) given in
Table 2.1, we calculate the 0.25-, 0.50-, and 0.75-fractiles, denoted by Q1 , Q2 , and
Q3 , respectively. Let us consider the variable age first and assume that the ordered
values of this variable are denoted by t (1) , t (2) , …, t (20) . Thus, t (1) = 16, t (2) = 44, and
so forth t (20) = 364. For the 0.25-fractiles, we have k = [0.25(20 + 1)] = 5, and d =
0.25, so t 0.25 or Q1 = 78 + 0.25(78 − 78) = 78 days. Similarly, Q2 = 101.5 days,
and Q3 = 169 days.
From the usage data of Table 2.1, we find Q1 = 7473 km, Q2 = 16,064 km,
and Q3 = 38,737 km. These calculated quartiles for both variables are also given in
Table 2.2.10
2.10 Relationship Among Functions
This section derives relationships among the functions f (t), F(t), R(t), h(t), and H(t).
These relationships are very useful in survival and reliability analyses in the sense
that if any of these functions is known, the other functions can be found easily.
From the previous sections, the following basic relationships are already known,
where we assumed that the lifetime random variable is defined on [0, ∞]:
d
f (t) = F(t). (2.20)
dt
F(t) = 1 − S(t). (2.21)
10 Note that these statistics are based on a small subsample where the censored observations are not
considered.
t
F(t) = f (x)dx. (2.22)
0
f (t) = h(t)S(t). (2.23)
t
H (t) = h(x)dx. (2.24)
0
d
h(t) = H (t). (2.25)
dt
These relationships will be applied to derive other relationships as follows. Using
(2.20) and (2.21), we get
d d d
f (t) = F(t) = [1 − S(t)] = − S(t), t ≥ 0. (2.26)
dt dt dt
Equations (2.23) and (2.20) give
f (t) 1 d 1 d d
h(t) = = F(t) = − S(t) = − ln S(t). (2.27)
S(t) S(t) dt S(t) dt dt
This implies
t
ln S(t) = − h(x)dx
0
⎡ ⎤
t
or S(t) = exp⎣− h(x)dx ⎦ = exp[−H (t)]. (2.28)
0
Based on (2.21) and (2.22), S(t) can be expressed as
t ∞
S(t) = 1 − F(t) = 1 − f (x)dx = f (x)dx, t ≥ 0. (2.29)
0 t
According to (2.23), (2.20), (2.21), and (2.29), we have

∞
d
h(t) = f (t) S(t) = F(t) [1 − F(t)] = f (t) f (x)dx. (2.30)
dt t
2.10 Relationship Among Functions 31
Table 2.3 Relationships among f (t), F(t), S(t), h(t), and H(t) assuming that the random variable,
T, is defined on [0, ∞]
f (t) F(t) S(t) h(t) H (t)
d
f (t) f (t) d
dt F(t) − dtd S(t) t dt H (t) exp[−H (t)]
h(t) exp − h(x)dx
0
F(t) t F(t) 1 − S(t) 1 − 1 − exp[−H (t)]

f (x)dx
0 t
exp − h(x)dx
0

S(t) ∞ 1 − F(t) S(t) t exp[−H (t)]
f (x)dx exp − h(x)dx
t 0

h(t) ∞ d
F(t) [1 − F(t)] − dtd ln S(t) h(t) d
H (t)
f (t) f (x)dx dt dt
t

H (t) t − ln[1 − F(t)] − ln[S(t)] t H (t)
− ln 1 − f (x)dx h(x)dx
0 0
Equations (2.21) and (2.28) show that

⎡ ⎤
t
F(t) = 1 − S(t) = 1 − exp⎣− h(x)d x ⎦ = 1 − exp[−H (t)]. (2.32)
0
Or,
S(t) = exp[−H (t)]. (2.33)
Or,
⎡ ⎤
t
H (t) = − ln S(t) = − ln[1 − F(t)] = − ln⎣1 − f (x)dx ⎦, t ≥ 0. (2.34)
0
Relationships from (2.20) to (2.34) are given in Table 2.3.

More detailed discussions about the functions discussed above can be found in
most textbooks on survival or reliability analyses (e.g., Lawless 1982, 2003; Klein
and Moeschberger 2003; Kleinbaum and Klein 2012; Moore 2016).
References
Benbow DW, Broome HW (2008) The certified reliability engineer handbook. American Society
for Quality, Quality Press
London
Guess F, Proschan F (1988) Mean residual life: theory and applications. In: Krishnaiah PR, Rao
CR (eds) Handbook of statistics 7: quality control and reliability. Elsevier Science Publishers,
Amsterdam
Islam MA, Al-Shiha A (2018) Foundations of biostatistics. Springer Nature Singapore Pte Ltd.
Kaminskiy MP (2013) Reliability models for engineers and scientists. CRC Press, Taylor & Francis
Group
2nd edn. Springer, New York
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley, New York
Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley Interscience, New
York
Minitab (2019) Minitab® support. https://support.minitab.com/en-us/minitab/18/. Accessed on 23
May 2019
Moore DF (2016) Applied survival analysis using R. Springer International Publishing
Moore DS, McCabe GP, Craig B (2007) Introduction to the practice of statistics. W H Freeman,
New York
Nelson W (1990) Accelerated testing: statistical models, test plans, and data analysis. Wiley, New
York
NIST (2019) Engineering statistics handbook, NIST/SEMATECH e-handbook of statistical meth-
ods. http://www.itl.nist.gov/div898/handbook/index.htm. Accessed on 23 May 2019
Ryan TP (2007) Modern engineering statistics. Wiley, New York
Schmid CF (1983) Statistical graphics. Wiley Interscience, New York
Tufte ER (1983) The visual display of quantitative information. Graphics Press, Cheshire, CT
Tufte ER (1989) Envisioning information. Graphics Press, Cheshire, CT
Tufte ER (1997) Visual explanations. Graphics Press, Cheshire, CT
Chapter 3
Probability Distribution of Lifetimes:
Uncensored
Abstract The survival patterns of different products, components of a system, or

lifetimes of a human being or living organisms vary greatly. Hence, different failure
time distributions are needed to characterize the diversity contained in the data. This
chapter discusses some of the major lifetime distributions (exponential, Weibull,
extreme value, normal, and lognormal) applied in reliability and survival analyses.
These distributions are used here for analyzing uncensored data only.
3.1 Introduction
The survival patterns of different products, components of a system, or lifetimes of

a human being or living organisms vary greatly. The survival time of a light bulb is
so different from the survival time of a refrigerator or a watch. Similar variations are
observed in survival or failure time of a human being. The failure time of patients due
to heart diseases and survival time of patients suffering from hepatitis may not have
similar characteristics. Hence, we need different failure time distributions to charac-
terize the diversity contained in the data. Some of the major lifetime distributions in
reliability and survival analyses are:
1. Exponential,
2. Weibull,
3. Extreme value,
4. Normal and Lognormal.
Three types of parameters characterize the distributions mentioned above which
are: location, scale, and shape parameters. In lifetime distributions, the location
parameters are used to shift the distributions to the left or right along the horizontal
axis (time axis). Let us consider the distribution function of lifetime variable, T,
be F(t; μ, σ ) where μ is a location parameter, then there exists a real constant, α,
for which two values of location parameter μ1 and μ2 can be shown to satisfy the
relationship F(t; μ1 , σ ) = F(t + α; μ2 , σ ). The mean of the normal distribution is
a familiar example of a location parameter. On the other hand, the scale parameter
is used to expand or contract the horizontal axis (time) by a factor of α such that the
distribution function of lifetime, F(t; μ, σ ), where σ is a scale parameter, satisfies
34 3 Probability Distribution of Lifetimes: Uncensored
the relationship F(αt; μ, σ1 ) = F(t; μ, σ2 ) for two values of the scale parameter σ 1
and σ 2 . A familiar example of a scale parameter is the failure rate (the reciprocal of the
mean) of the exponential distribution. The shape of a probability density function is
determined by the shape parameter and can be used to classify the probability density
function under a special type. A familiar example of a shape parameter is α (or β) of
the Weibull distribution, which determines whether the distribution follows the IFR,
DFR, or CFR property.
The outline of the chapter is as follows: Sect. 3.2 presents the exponential distri-
bution. Section 3.3 discusses the Weibull distribution, which can be applied to a wide
range of situations having monotonic failure rates commonly observed in survival
and reliability data analyses. Section 3.4 describes the extreme value distributions.
The normal and lognormal distributions are presented in Sect. 3.5.
3.2 Exponential Distribution
The exponential distribution has been extensively used to model a wide range of
random variables including lifetimes of manufactured items, times between system
failures, arrivals in queue, interarrival times, and remission times. Just as the normal
distribution plays an important role in classical statistics, the exponential distribution
plays an important role in reliability and lifetime modeling since it is the only contin-
uous distribution with a constant hazard function. The exponential distribution has
often been used to model the lifetime of electronic components and is appropriate
when a used component that has not failed is statistically as good as a new component
(Ravindran 2009).
The probability density function of exponential distribution is as follows
f (t) = λe−λt , t ≥ 0 (3.1)
where λ ≥ 0 is a scale parameter (often called the failure rate). It is also known as
one-parameter exponential distribution. We can obtain the cumulative distribution
function as
t
t
F(t) = λe−λτ dτ = λ −e−λτ /λ 0 = 1 − e−λt , t ≥ 0. (3.2)
0
The reliability function or survival function becomes
S(t) = 1 − F(t) = e−λt , t ≥ 0. (3.3)

3.2 Exponential Distribution 35
By definition, the hazard function can be expressed as
f (t) λe−λt
h(t) = = −λt = λ. (3.4)
S(t) e
This means that the exponential distribution is characterized by a constant hazard

function (does not depend on time t). A constant hazard function implies that the
probability of failure of an item in the next small interval of time is independent of
the age of the item.
The pdf, cdf, reliability function, and hazard function of the exponential distribu-
tion are graphically shown in Fig. 3.1 for λ = 0.5, 1.0, and 2.0. The plot of hazard
functions shows that the hazard functions are constant and equal to the values of λ
for any values of t.
Fig. 3.1 pdf, cdf, reliability function, and hazard function of exponential distribution
3.2.1 Mean Time to Failure and Variance
The MTTF is the population average or mean time to failure. In other words, a brand
new unit has this expected lifetime until it fails (Tobias and Trindade 2012). Hence
by definition,
∞ ∞
−λt
∞ ∞ 1
MTTF = tλe dt = −te−λt 0 − −e−λt dt = − e−λt /λ 0 = . (3.5)
λ
0 0
For a population with a constant failure rate λ, the MTTF is the reciprocal of that
failure rate or 1/λ. For this distribution, it can be shown that
1
Var (T ) = . (3.6)
λ2
3.2.2 Median Time to Failure
Even though 1/λ is the average time to failure, it is not equal to the time when half
of the population will fail. For the entire population, the median is defined to be the
point where the cumulative distribution function first reaches the value 0.5 (Tobias
and Trindade 2012). The pth quantile, t p (discussed in Chap. 2), is the solution for
t p of the equation F t p = p, which implies
1
1 − e−λt p = p or t p = − ln(1 − p). (3.7)
λ
The median time to failure, t 0.5 , is obtained by putting p = 0.5 in Eq. (3.7). That
is,

1 1 1 ln (2) 0.693
Median = t0.5 = − ln (1 − 0.5) = − ln = . (3.8)
λ λ 2 λ λ
The median here is less than the MTTF, since the numerator is only 0.693 instead
of 1. In fact, when the time has reached the MTTF, we have
F(MTTF) = 1 − e−λ/λ = 1 − e−1 0.632.
This shows that approximately 63.2% of an exponential population with failure

rate λ has failed by the time MTTF, 1/λ (Tobias and Trindade 2012).
3.2.3 Memoryless Property
The constant failure rate is one of the characteristic properties of the exponential dis-
tribution, and closely related is another key property, the exponential lack of memory.
A component following an exponential life distribution does not “remember” how
long it has been operating. The probability that it will fail in the next hour of opera-
tion is the same if it were new, one month old, or several years old. It does not age
or wear out or degrade with time or use. Failure is a chance happening, always at
the same constant rate and unrelated to accumulated power-on hours (Tobias and
Trindade 2012).
The equation that describes this property states that the conditional probability
of failure in some interval of time of length h, given survival up to the start of that
interval, is the same as the probability of a new unit failing in its first h units of time,
which is
P(fail in next h|survive t) = P(new unit fails in h).
In terms of the cumulative distribution function, this relation can be expressed as
F(t + h) − F(t) F(h) − F(0) F(h)

= = = F(h).
1 − F(t) 1 − F(0) S(0)
Proof We know that the cumulative distribution function is F(t) = 1 − e−λt , and
we can show that F(t + h) = 1 − e−λ(t+h) and F(h) = 1 − e−λh . Hence,

F(t + h) − F(t) 1 − e−λ(t+h) − 1 − e−λt

=
1 − F(h) 1 − 1 − e−λt
e−λt − e−λ(t+h)
= = 1 − e−λh = F(h).
e−λt
This proves the memoryless property of the exponential distribution.
The implication of memoryless property of the exponential distribution has an

important role from testing point of view. If the memoryless property is satisfied,
then testing 10 units for 100 h is equivalent to testing 100 units for 10 h because
if a unit fails it can be repaired or replaced without worrying about age of other
units under test (Tobias and Trindade 2012). This sounds unreasonable in many
situations where the exponential distribution may not be considered to represent
the underlying failure time distribution. However, in many practical situations this
memoryless property may provide useful insights if the failure time distribution can
be used as a close or reasonable alternative. One of the most useful advantages of
this property is the that the renewal rate or average failure rate remains the same in
a specific situation, the expected time of failures or between failures can be defined
to follow exponential distribution, and the application becomes very simple because
the mean time of failure and mean time between failures both can be represented by
1/λ.
3.2.4 Areas of Application
Although the constant hazard function may not be ideal in many applications, still the
application of the exponential failure time distribution is manifold. To study wear-out
mechanisms, if the number of early failures is minimal or if we consider separately
(or if we consider to treat them separately), then the exponential distribution can be
considered a good initial choice for its simplicity and convenience of interpreting the
results. In many instances for analyzing components of a system, individual compo-
nents separately or product life may follow a constant failure rate where exponential
distribution provides very good insight into the possible choice of further strate-
gies. With the assumption of exponential failure time, strategies concerning sample
size, confidence level, precision, etc., can be very useful which may become either
intractable or very complex with other distributions. In that case, the exponential
distribution may provide an ideal initial strategy input necessary for the process of
planning experiments. However, in cases where the experiments are based on failure
times with increasing or decreasing hazard or failure rate, then the limitation of the
exponential distribution is obvious and an alternative lifetime distribution needs to
be considered.
3.2.5 Estimation of Parameter
Let T be a random variable that follows exponential distribution with pdf f (t) =
λe−λt , t ≥ 0. Then, the likelihood function, which is the joint probability distribution
of the data, expressed as a function of the parameter (λ) of the distribution and the
sample observations of size n, t 1 , t 2 , …, t n , is

n n
−λ ti
−λti
L= λe =λ e
n i=1 . (3.9)
i=1
Taking natural log, the log likelihood function is

n
ln L = n ln λ − λ ti (3.10)
i=1
and differentiating log likelihood with respect to λ, we can show the likelihood
equation
n
n
∂ ln L
= − ti = 0. (3.11)
∂λ λ i=1
Solving the above equation, the maximum likelihood estimate of the parameter, λ,
is
n
λ̂ = n . (3.12)
i=1 ti
n
If we denote T = i=1 ti , then T is a sufficient statistic for λ, and since λt i ’s are
independent exponential variates, λT has a one-parameter gamma distribution with
index parameter n. Equivalently, 2λT ∼ χ(2n)2
.
3.2.6 Test and Construction of Confidence Intervals
We know that λ̂ = Tn = 1t¯ , where t1 , . . . , tn are iid each

nwith the exponential
distribution f (t) = λe−λt , t ≥ 0. Consequently, T = i=1 ti has the gamma
λn n−1 −λt
density f (t) = (n) t e , t > 0 and 2λT ∼ χ(2n) 2
, or equivalently, 2nλ
λ̂
∼ χ(2n)
2
.
Hence, the above pivotal statistic can be used for test and confidence interval
construction. To obtain an equitailed, two-sided 1 − α confidence interval for λ, we
take
2
P χ(2n),α/2 ≤ 2λT ≤ χ(2n),1−α/2
2
=1−α
where χ(2n),
2
p is the pth quantile of χ(2n) . Then,
2
2
P χ(2n),α/2 /(2T ) ≤ λ ≤ χ(2n),1−α/2
2
/(2T ) = 1 − α (3.13)
is the 1 − α confidence interval for λ.

Example 3.1 If a product failure time (in hours) is assumed to follow an exponential
distribution with the failure rate λ = 0.00025, then answer the following questions:
(i) What is the probability that the product will survive 10,000 h?
(ii) What is the probability that the product will survive the next 20,000 h?
(iii) What is the mean time to failure (MTTF)?
(iv) What is the median time to failure?
(v) At what point in time is it expected that 30% of the products will fail?
(vi) When will 63.2% fail?
Solution It is known that λ = 0.00025.

(i) The probability of surviving 10,000 h can be obtained from the relationship
S(t) = e−λt with t = 10,000. That is,
S(10, 000) = e−0.00025×10,000 = 0.08209.
(ii) The conditional survivor function for surviving another 20,000 h for a product
that has already survived 10,000 h can be obtained by using the conditional
survivor function (2.10):
S(a + t)
ST |T ≥a (t) =
S(a)
and thus putting a = 10,000 h, t = 20,000 h, and a + t = 30,000 h, we obtain
S(30, 000) e−0.00025×30,000

ST |T ≥10,000 (20, 000) = = −0.00025×10,000 = 0.006738.
S(10, 000) e
(iii) The mean time to failure is MTTF = E(T ) = λ1 = 0.00025 1

= 4000 h.
(iv) The median time to failure, given in Eq. (3.8), is t0.5 = λ 0.00025
ln 2 0.693
= 2772 h.
(v) We can find out the time when 30% will have failed (known as the 30th per-
centile). This can be obtained by putting p = 0.3 in Eq. (3.7) as follows
ln(1 − 0.3)
t0.3 = − = 1427 h.
0.00025
(vi) For this product, the mean time to failure is 4000 h and we know that the
probability of failure by mean time to failure is
F(4000) = 1 − e−0.00025×4000 = 1 − 0.368 = 0.632.
This indicates that 63.2% of the products are expected to fail by the mean time
to failure, 4000 h.
3.3 Weibull Distribution
As mentioned in Murthy et al. (2004), the Weibull distribution is named after Waloddi
Weibull (1887–1979) who was Swedish Engineer, Scientist, and Mathematician and
the first to promote the usefulness of this distribution to model data sets of widely
differing character. The initial study by Weibull (1939) appeared in a Scandina-
vian journal and dealt with the strength of materials. A subsequent study in English
(Weibull 1951) was a landmark work in which he modeled data sets from many differ-
ent disciplines and promoted the versatility of the model in terms of its applications
in different disciplines (Murthy et al. 2004).
The failure rate h(t) remains constant in an exponential model; however, in reality,
it may increase or decrease over time. In such situations, we need a model that may
take into account the failure rate as a function of time representing a change in failure
rate with respect to time. The exponential distribution fails to address this situation.
3.3 Weibull Distribution 41
We can define a distribution where h(t) is monotonic and this type of distribution is
known as the Weibull distribution. The Weibull distribution can be applied to a wide
range of situations having monotonic failure rates commonly observed in survival
and reliability data analyses.
The probability density function of the failure time T is said to be Weibull dis-
tributed with parameters β (>0) and η (>0) if the density function is given by

β t β−1 t β
f (t) = exp − , t ≥ 0. (3.14)
η η η
Here, η is a scale parameter and β is a shape parameter. The probability density

function can also be written in the following alternative parametric form
α
f (t) = αλ(λt)α−1 e−(λt) , t ≥ 0. (3.15)
with shape parameter α and scale parameter λ. The above two forms of the probability
density functions are related to the relationship among the parameters as α = β and
λ = 1/η.
The cumulative distribution function can be obtained as follows:
t t
β τ β−1 τ β t β
F(t) = f (τ ) dτ = exp − dτ = 1 − exp − , t ≥ 0.
η η η η
0 0
(3.16)
The reliability/survival function is

t β
R(t) = S(t) = 1 − F(t) = exp − , t ≥ 0. (3.17)
η
By definition, the hazard (or failure rate) function is

f (t) β t β−1
h(t) = = , t ≥ 0. (3.18)
S(t) η η
It can be observed from the above failure rate that

(i) If β = 1, then the failure rate is constant and the Weibull distribution reduces
to an exponential distribution.
(ii) If β > 1, then the failure rate increases.
(iii) If 0 < β < 1, then the failure rate decreases.
Fig. 3.2 pdf, cdf, reliability function, and hazard function of Weibull distribution
The cumulative hazard function, H(t), is defined as
t t β
β τ β−1 t
H (t) = h(τ ) dτ = dτ = , t ≥ 0. (3.19)
η η η
0 0
The pdf, cdf, reliability function, and hazard function of the Weibull distribution
are displayed graphically in Fig. 3.2 for the values of shape parameter β = 0.8, 1.0,
1.5 and scale parameter η = 1. The plot of hazard functions includes a DFR, CFR,
and IFR for the values of shape parameter, respectively, 0.8 (<1), 1.0 (=1.0), and 1.5
(>1.0).
3.3.1 Areas of Application
Since its introduction of a statistical theory of the strength of material in 1939 (Weibull
1939) and then further providing a more comprehensive summary in 1951 in a paper
titled A Statistical Distribution of Wide Applicability (Weibull 1951), the Weibull

distribution has become one of the most widely used distributions in both reliability
and survival analyses. It has been widely used in characterizing different types of
failures of components, fatigue, and lifetimes of various engineering products and
materials. Its wide-ranging applications include electronics, aerospace, automotive
industries, and materials. The Weibull distribution is used in many other applica-
tions ranging from vacuum cleaners, capacitors, ball bearings to chemical reactions,
and degradation process. It can also be used for modeling the lifetimes of living
organisms. Several extreme value distributions can be linked conveniently with the
Weibull distribution.
3.3.2 Estimation of Parameters
For estimating the parameters of a Weibull distribution, we can use the likelihood
method. Let us consider a random sample of n failure times (T1 , . . . , Tn ) with
observed values (T1 = t1 , . . . , Tn = tn ). The likelihood function is
n

n
β ti β−1 ti β β n ti β−1 ti β
L= exp − = exp − (3.20)
i=1
η η η η i=1
η η
and the log likelihood function is

n n β
ti
ln L = n ln β − nβ ln η + (β − 1) ln ti − . (3.21)
i=1 i=1
η
The likelihood equations are obtained by differentiating the log likelihood function
with respect to the parameters, η and β, as shown below
∂ ln L nβ t n β
=− +β i
β+1
=0 (3.22)
∂η η i=1
η
and
n n β
∂ ln L n ti ti
= − n ln (η) + ln(ti ) − ln = 0. (3.23)
∂β β i=1 i=1
η η
Equation (3.22) can be solved for η as follows

1/β
1 β
n
η̂ = t . (3.24)
n i=1 i
Using Eq. (3.23), the MLE of the shape parameter β can be obtained by solving
the following equation

n
β̂

n
β̂ 1 1
n
ti ln(ti )/ ti − − ln(ti ) = 0. (3.25)
i=1 i=1 β̂ n i=1
As there is no closed-form solution of the equation for β, it can be solved iteratively

using the estimate for η simultaneously. If β = 1, then the estimator Eq. (3.24) reduces
to the maximum likelihood estimator for the mean (in this case) of the exponential
n
distribution 1/λ̂ = η̂ = n1 i=1 ti .
3.3.3 Mean and Variance
The mean and variance of the Weibull distribution are
E(T ) = η (1 + 1/β) (3.26)
and

2 1 2
Var(T ) = η 1 +
2
− 1+ , (3.27)
β β
where (·) is the gamma function.

Example 3.2 If a product failure time (in hours) is assumed to follow a Weibull
distribution with scale parameter η = 4000 and shape parameter β = 1.50, then find:
(i) What are the mean and variance of time to failure?
(ii) What is the probability that the product will operate for 5000 h?
(iii) What is the probability that the product will operate for another 2000 h after
operating for 5000 h?
Solution (i) We know that the mean
E(T ) = η (1 + 1/β) = 4000 × (1 + 1/1.50) = 3610.981 (3.28)
The variance is

2 1 2
Var(T ) = η 1 +
2
− 1+
β β
2
2 1
= 40002 1 + − 1+ = 6, 011, 045. (3.29)
1.5 1.5
(ii) The probability that the product

will operate for 5000 h can be obtained using
β
the relationship S(t) = exp − ηt with t = 5000. This gives

5000 1.5
S(5000) = exp − = exp (−1.3975) = 0.2472.
4000
That is, the probability that the product will survive 5000 h is 0.2472.
(iii) The conditional survivor function for surviving another 2000 h for a product
that has already survived 5000 h can be obtained by using the conditional
survivor function (2.10):
S(a + t)
ST |T ≥a (t) =
S(a)
and thus putting a = 5000 h, t = 2000 h, and a + t = 7000 h, we obtain

1.50
S(7000) exp(7000/4000) 0.0988
ST |T ≥5000 (2000) = = 1.50
= = 0.3997.
S(5000) exp(5000/4000) 0.2472
Assuming an exponential distribution assumption in this problem, a comparative

view can be obtained. Let us consider λ = 1/η = 0.00025 and β = 1, then the
Weibull distribution reduces to an exponential distribution where the memoryless
property of the exponential distribution can be used and we can show
ST |T ≥5000 (2000) = S(2000) = e−(0.00025×2000) = e−0.5 = 0.6065.
The difference between the probabilities of the product survival of additional

2000 h after survival of 5000 h initially using the Weibull and exponential distribution
assumptions is obvious. This difference is attributable to the shape parameter β. If
we assume β > 1, the product survival probability decreases accordingly due to
increasing failure rate.
Example 3.3 This example considers the variable usage (in km at failure) of an
automobile component given in Table 2.1. For purposes of illustration, the Weibull
distribution will be considered in analyzing the data. Estimates of the parameters of
the Weibull distribution may be obtained by solving Eq. (3.24) for η̂ and substituting
the result into Eq. (3.25) to calculate β̂. Instead, we may use the Minitab software,
which provides the output given in Fig. 3.3. From the output, the MLEs of the
parameters are found to be β̂ = 1.291 and η̂ = 23, 324.2. It can be easily seen
that these values satisfy Eqs. (3.24) and (3.25). Note that the estimate of the shape
parameter indicates an increasing failure rate. The R functions mle(), optimize(),
optim(), nlm(), or survreg(Surv()) can also be used to find the MLEs of the parameters.
Fig. 3.3 Weibull probability plot with MLEs of Weibull parameters for the variable usage of an
automobile component
The MTTF for the Weibull distribution can be estimated by substituting the MLEs
of the parameters in the formula expressed in terms of the parameters of the distri-
bution as given in Eq. (3.26). The estimated MTTF is 23, 324.2 × (1 + 1/1.291)
= 21,572.2.
Note that the data in the Weibull probability paper (WPP) plot1 fall roughly along
a straight line. The roughly linear pattern of the data on Weibull probability paper
suggests that the Weibull distribution can be a reasonable choice (Blischke et al. 2011)
for modeling the usage variable in this application. As an alternative, the lognormal
distribution will be considered in analyzing this data set later.
3.4 Extreme Value Distribution
The extreme value distribution is widely used in modeling lifetime data and is closely
related to the Weibull distribution. This distribution is extensively used for different
applications and referred to as the extreme value Type I or the Gumbel distribution.
There are two different forms of the extreme value Type I distribution based on: (i) the
smallest extreme value (minimum) and (ii) the largest extreme value (maximum). We
can show the extreme value distribution as a special case of the Weibull distribution.
1 The detail on probability plots can be found in Blischke et al. (2011) and Murthy et al. (2004).
3.4 Extreme Value Distribution 47
3.4.1 Probability Density Function
In the Weibull pdf (3.14), if we let X = ln T with μ = ln(η) and σ = 1/β, then the
pdf for the general form of the extreme value Type I or the Gumbel distribution for
minimum (also known as smallest extreme value distribution) becomes
1
f (x; μ, σ ) = exp (x − μ)/σ − exp{(x − μ)/σ } , −∞ < x < ∞ (3.30)
σ
where μ (−∞ < μ < ∞) is the location parameter and σ > 0 is the scale parameter.
It may be noted here that although the range includes negative lifetimes, if the choice
of location parameter is made such that μ is sufficiently large then the probability
of negative lifetimes becomes negligible. The standard Gumbel distribution for the
minimum is a special case where μ = 0 and σ = 1. The pdf of the standardized
Gumbel distribution for the minimum is

f (x) = exp x − exp(x) , −∞ < x < ∞. (3.31)
Similarly, the general form of the Gumbel distribution for the maximum value
(also known as largest extreme value distribution) is
1
f (x; μ, σ ) = exp −(x − μ)/σ − exp{−(x − μ)/σ } , −∞ < x < ∞.
σ
(3.32)
In this case also, μ (−∞ < μ < ∞) and σ > 0 are location and scale parameters,
respectively. Then, we obtain the standard Gumbel distribution for maximum (μ = 0
and σ = 1) as follows

f (x) = exp −x − exp(−x) , −∞ < x < ∞. (3.33)
3.4.2 Cumulative Distribution and Reliability/Survival

Functions
The cumulative distribution functions for the general forms for minimum and max-
imum are shown below:

Minimum extreme value Type I: F(x) = 1 − exp − exp{(x − μ)/σ } , −∞ < x <
∞.
Maximum extreme value Type I: F(x) = exp − exp{−(x − μ)/σ } , −∞ < x <
∞.
The survival/reliability functions for minimum and maximum extreme values are:

Minimum extreme value Type I: R(x) = S(x) = exp − exp{(x − μ)/σ } .
Maximum extreme value Type I: R(x) = S(x) = 1 − exp − exp{−(x − μ)/σ } .
The cumulative distribution functions of the standard Gumbel distributions for
minimum and maximum are:

Minimum extreme value Type I: F(x) = 1 − exp − exp(x) , −∞ < x < ∞.
Maximum extreme value Type I: F(x) = exp − exp(−x) , −∞ < x < ∞.
The survival/reliability functions of the standard Gumbel distributions for mini-
mum and maximum extreme values are:

Minimum extreme value Type I: R(x) = S(x) = exp − exp(x) , −∞ < x < ∞.
Maximum extreme value Type I: R(x) = S(x) = 1 − exp − exp(−x) , −∞ <
x < ∞.
3.4.3 Hazard Functions
The hazard functions for the general form of the Gumbel and standard Gumbel
distributions are shown below.
Gumbel (minimum): h(x) = σ1 exp[(x − μ)/σ ], −∞ < x < ∞.
Gumbel (maximum): h(x) = σ {exp exp[−(x−μ)/σ ]
, −∞ < x < ∞.
[exp(−(x−μ)/σ )]−1}
Standard Gumbel (minimum): h(x) = exp(x), −∞ < x < ∞.
Standard Gumbel (maximum): h(x) = exp[exp(−x)
exp(−x)]−1
, −∞ < x < ∞.
The pdf, cdf, reliability function, and hazard function of the smallest extreme value
distribution are displayed in Fig. 3.4 for the values of scale parameter (σ = 5, 6, 7)
and location parameter (μ = 50). This figure shows that the pdf is skewed to the
left (although most failure time distributions are skewed to the right). The exponen-
tially increasing hazard function suggests that this distribution would be suitable for
modeling the life of a product that experiences very rapid wear-out after a certain
age/usage. The distributions of logarithms of failure times can often be modeled with
the smallest extreme value distribution (Meeker and Escobar 1998).
Figure 3.5 shows the pdf, cdf, reliability function, and hazard function of the
largest extreme value distribution for the values of scale parameter (σ = 5, 6, 7) and
location parameter (μ = 10). This figure shows that the pdf is skewed to the right and
the hazard function is increasing but it is bounded in the sense that lim x→∞ h(x) =
1/σ. The largest extreme value distribution could be used as a model for the lifetime
if σ is small relative to μ > 0 (Meeker and Escobar 1998).
Fig. 3.4 pdf, cdf, reliability function, and hazard function of smallest extreme value distribution
The likelihood function of the random variable with a Gumbel (minimum) probability
distribution is
n
1
L= exp (xi − μ)/σ − exp{(xi − μ)/σ }
i=1
σ
and the log likelihood is
n
xi − μ
n
ln L = −n ln σ + − exp[(xi − μ)/σ ]. (3.34)
i=1
σ i=1
∂ ln L
Differentiating with respect to μ and solving ∂μ
= 0 for μ, we obtain the
maximum likelihood estimator of μ as
Fig. 3.5 pdf, cdf, reliability function, and hazard function of largest extreme value distribution

1 xi
n
μ̂ = σ̂ ln e σ̂ . (3.35)
n i=1
∂ ln L
There is no closed-form solution for σ . The estimating equation for σ is ∂σ
=0
which can be simplified as shown below
n n
xi xi exp(xi /σ )
−σ − i=1
+ i=1
= 0. (3.36)
n n exp(μ/σ )
3.4.5 Mean and Variance
The mean and variance of minimum extreme value Type I distribution are
σ 2π 2
E(X ) = μ − σ γ and V (X ) = (3.37)
6
where γ = 0.5772 is Euler’s constant. Similarly, the mean and variance of maximum
extreme value Type I distribution are
σ 2π 2
E(X ) = μ + σ γ and V (X ) = . (3.38)
6
3.5 Normal and Lognormal Distributions
The lognormal distribution has become one of the most popular lifetime models for
many high technology applications. In particular, it is very suitable for semiconductor
degradation failure mechanisms. It has also been used successfully for modeling
material fatigue failures and failures due to crack propagation (Tobias and Trindade
2012). It has been used in diverse situations, such as the analysis of failure times of
electrical insulation.
Many of the properties of the lognormal distribution can be investigated directly
from the properties of the normal distribution, since a simple logarithmic transfor-
mation transforms the lognormal data into normal data. So, we can directly use our
knowledge about the normal distribution and normal data to the study of lognormal
distribution and lognormal data as well.
The distribution is most easily specified by saying that the lifetime T is lognor-
mally distributed if the logarithm Y = ln T of the lifetime is normally distributed,
say with mean μ (−∞ < μ < ∞) and variance σ 2 > 0. The probability density
function of Y is therefore normal as shown below
1
e− 2σ 2 (y−μ) , −∞ < y < ∞,
1 2
f (y) = √ (3.39)
2π σ 2
and from this, the probability density function of T = eY is lognormal and found to
be
1
e− 2σ 2 (ln t−μ) , t > 0.
1 2
f (t) = √ (3.40)
t 2π σ 2
The survivor and hazard functions for the lognormal distribution involve the stan-
dard normal distribution function (Lawless 2003), where the cumulative distribution
function is
ln t t
1 − 2σ12 (x−μ)2 1
e− 2σ 2 (ln u−μ) du
1 2
F(t) = √ e dx = √ (3.41)
2π σ 2 u 2π σ 2
−∞ −∞
Fig. 3.6 pdf, cdf, reliability function, and hazard function of lognormal distribution

as T = eY , F(t) = P(T < t) = P eY < t , and similarly F(t) = P(T < t) = P(Y
< ln t). The lognormal survival function/reliability function is S(t) = 1 − F(t), and
the hazard function is h(t) = f (t)/S(t), t > 0.
The pdf, cdf, reliability function, and hazard function of the lognormal distribution
are displayed in Fig. 3.6 for the values of scale parameter (σ = 0.3, 0.5, 0.8) and
location parameter (μ = 0). This figure shows that the pdf is skewed to the right.
The hazard function of the lognormal distribution starts at 0, increases to a point in
time, and then decreases eventually to zero.
The probability density function of Y i = ln T i is normal distribution where T i follows

a lognormal distribution. The likelihood function for n log lifetimes (ln t1 , . . . , ln tn )
is
3.5 Normal and Lognormal Distributions 53

n
1
e− 2σ 2 (ln ti −μ) .
1 2
L= √ (3.42)
i=1 ti 2π σ 2
n n n
1
n
ln L = − ln(2π ) − ln σ 2 − ln ti − 2
(ln ti − μ)2 . (3.43)
2 2 i=1
2σ i=1
Differentiating ln L with respect to parameters μ and σ 2 and equating to zero,

we obtain the following likelihood estimators for μ and σ 2 , respectively:
1
n
μ̂ = ln ti , (3.44)
n i=1
and
1 2
n
σ̂ 2 = ln ti − μ̂ . (3.45)
n i=1
An alternative unbiased estimator of σ 2 is
1 2
n
s2 = ln ti − μ̂ . (3.46)
n − 1 i=1
The lognormal distribution is to the normal distribution as the Weibull distribution

is to the Gumbel distribution. The parameters of the lognormal distribution, μ and
σ 2 , are, respectively, the mean and variance of the distribution of log lifetime. Like
the Gumbel, the normal distribution has location and scale parameters only, and like
the Weibull, the lognormal is a distribution of varying shape (Wolstenholme 1999).
The lognormal distribution is difficult to deal with analytically and also has a
particular disadvantage in the form of its hazard function (Wolstenholme 1999).
Initially, hazard function increases, reaches a maximum, and then slowly decreases,
tending to zero as t → ∞. However, when large values of t are not of interest, the
model is often found to be suitable. The distribution can also be derived theoretically
for such processes as the growth of fatigue cracks.
It may be noted that we do not need to use the lognormal distribution separately as
a simple transformation of log lifetimes makes it normal. Hence, the normal analysis
routines can be used as the basis to obtain relevant estimates and then the results
are transformed into the estimates for lognormal distribution (Tobias and Trindade
2012).
The mean time to failure (MTTF) is

σ2
MTTF = exp μ + (3.47)
2
and the variance of failure time, T, is

Var(T ) = exp 2μ + σ 2 exp σ 2 − 1 .
The reliability/survival function becomes
S(t) = P(T > t) = P(ln T > ln t)

ln T − μ ln t − μ μ − ln t
=P > =F
σ σ σ
where F(·) is the distribution function of the standard normal distribution.

The median satisfies S(t M ) = F(t M ) = 0.5 where t M is the median; it can be
shown that μ−ln σ
tM
= 0, and solving this equation, we obtain t M = eμ .
The median is thus a meaningful parameter obtained naturally and is linked with
the mean of the ln T. The median shows the failure time by which 50% of the items
fail and can be used to characterize the lifetimes under consideration directly. The
relationship between μ and t M provides useful interpretations in various applications.
The role of σ 2 seems to be more of a shape parameter in case of lognormal distribution
unlike of scale parameter in case of a normal distribution.
Using the relationship between the mean of log time and the median of failure
time μ = ln t M , the probability density, cumulative distribution, and survivor and
hazard functions are
2
1 − 2σ12 ln t
f (t) = √ e , t > 0,
tM
t 2π σ 2

t
F(t) = F ln /σ , t > 0,
tM

t tM
S(t) = 1 − F(t) = 1 − F ln /σ = F ln /σ , t > 0,
tM t
and
2
− 2σ12 ln t t
√1 e
f (t) t 2πσ 2
M
h(t) = = tM , t > 0.
S(t) F ln t /σ
It is seen from the above expression that the hazard function tends to 0 as t → ∞.
This restricts the use of lognormal distribution for extremely large values of failure
times.
References 55
Fig. 3.7 Lognormal probability plot with MLEs of lognormal parameters for the variable usage of
an automobile component
Example 3.4 For purposes of illustration, the lognormal distribution will be con-
sidered here for analyzing the variable usage of an automobile component failure
data of Table 2.1. Estimates of the parameters of the lognormal distribution can be
obtained by solving Eq. (3.44) for μ and Eq. (3.45) for σ 2 . Instead, we may use the
Minitab software, which provides the output given in Fig. 3.7. From this, we obtain
the parameter estimates as μ̂ = 9.62047 and σ̂ = 0.891818 are, respectively, the
sample mean and sample standard deviation (with divisor n rather than n − 1) of
the data transformed to the log scale. The functions “mle()” and “survreg(Surv())”
are given in the R-libraries stats4 and survival, respectively, that can also be used to
find the MLEs of the parameters. The relationship between the parameters and the
MTTF for this distribution,
given in Eq. (3.47), is used to estimate this quantity. The
result is exp μ̂ + σ̂ 2 /2 = exp 9.62047 + 0.8918182 /2 = 22,429.6, as shown in
Fig. 3.7.
Note that the data appear to follow roughly a linear pattern in the lognormal plot.
It is noteworthy that the adjusted Anderson–Darling (AD*) value2 for the lognormal
distribution (0.998, in Fig. 3.7) is smaller than that of AD* value for the Weibull
distribution (1.115, in Fig. 3.3). Therefore, the AD* values indicate that the lognormal
distribution provides a better fit for the usage data of this example than the Weibull
distribution.
2 The detail on adjusted Anderson–Darling (AD*) can be found in Blischke et al. (2011) and Murthy
et al. (2004).
References
London
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley, New Jersey
Murthy DNP, Xie M, Jiang R (2004) Weibull models. Wiley, New York
Ravindran AR (ed) (2009) Operations research applications. CRC Press, Taylor & Francis Group,
LLC
Tobias PA, Trindade DC (2012) Applied reliability, 3rd edn. CRC Press, Taylor & Francis Group
Weibull W (1939) A statistical theory of the strength of material. Ingeniors Vetenskapa Acadamiens
Handligar 151:1–45
Weibull W (1951) A statistical distribution function of wide applicability. J Appl Mech 18:293–296
Wolstenholme LC (1999) Reliability modelling: a statistical approach. Chapman and Hall/CRC
Chapter 4
Censoring and Truncation Mechanisms
Abstract Censoring and truncation are the special types of characteristics of time
to event data. A censored observation arises when the value of the random variable
of interest is not known exactly, that is, only partial information about the value is
known. In the case of truncation, some of the subjects may be dropped from the study
due to the implementation of some conditions such that their presence or existence
cannot be known. In other words, the truncated subjects are subjects to screening by
some conditions as an integral part of the study. This chapter presents the maximum
likelihood estimation method for analyzing the censored and truncated data.
4.1 Introduction
Time to event data present themselves in different ways which create special prob-
lems in analyzing such data (Klein and Moeschberger 2003). One peculiar feature,
generally present in time-to-event data, is known as censoring, which, broadly speak-
ing, occurs when in some cases, the exact time of occurrence of the desired event
is not known. In other words, the lifetime is known partially until the censoring
occurs in these cases. A censored observation arises when the value of the random
variable of interest is not known exactly, that is, only partial information about the
value is known. In addition to censoring, another source of incomplete lifetime data
is known as truncation. In the case of truncation, the observation is not considered
due to conditions implied in a study or an experiment.
The outline of the chapter is as follows: Sect. 4.2 defines various types of censor-
ing. Section 4.3 discusses the truncation of lifetime data. Construction of likelihood
functions for different types of censored data is explained in Sect. 4.4.
4.2 Types of Censoring
There are various types of censoring:

(i) Right censoring,
(ii) Left censoring, and
(iii) Interval censoring.
58 4 Censoring and Truncation Mechanisms
In order to handle censoring in the analysis, we need to consider the design which
was employed to obtain the reliability/survival data. Right censoring is very common
in lifetime data and left censoring is fairly rare.
4.2.1 Right Censoring
If the exact value of an observation is not known but only known that it is greater
than or equal to time t c , then the observation is said to be right censored at t c . Right
censoring is more common in real-life situations. Generally, we observe the following
types of right-censored data:
(i) Type I censoring,
(ii) Type II censoring,
(iii) Progressive Type II censoring, and
(iv) Random censoring.
4.2.1.1 Type I Censoring
If we fix a predetermined time to end the study, then an individual’s lifetime will
be known exactly only if it is less than that predetermined value. In such situations,
the data are said to be Type I (or time) censored (Islam and Al-Shiha 2018). Type
I censoring arises in both survival and reliability analyses. Let T1 , . . . , Tn be inde-
pendently, identically distributed random variables each with distribution function
F. Let tc be some (preassigned) fixed number which we call the fixed censoring
time. Instead of observing T1 , . . . , Tn (the random variables of interest), we can only
observe t1 , . . . , tn where

Ti if Ti ≤ tc
ti = (4.1)
tc if Ti > tc , i = 1, 2, . . . , n.
We can also define a variable, δ, to represent whether the time is complete or

incomplete as shown below

1 if Ti ≤ tc
δi = (4.2)
0 if Ti > tc , i = 1, 2, . . . , n.
In other words, δi = 1 if the time is uncensored or complete, and δi = 0 if the

time is censored or incomplete.
A more general form of Type I censored sample is one that arises when an item
has its own censoring time, since all items may not start on test at the same time.
In this case, instead of a fixed time point tc , we may consider C1 , . . . , Cn be the
length of end time for each item in the study. The pair of variables can be shown as
4.2 Types of Censoring 59
(T1 , C1 ), . . . , (Tn , Cn ) where T is failure time and C is censoring time. We may now
define a new pair of variables (t, δ) for each item with

Ti if Ti ≤ Ci
ti = (4.3)
Ci if Ti > Ci
and

1 if Ti ≤ Ci
δi = (4.4)
0 if Ti > Ci .
If all the items in the experiment are considered to start at the beginning, and the
endpoint of the study is a prefixed time, tc , then all the failures that occur before
tc provide complete observations (uncensored) and the items not failing before tc
provide incomplete observations (censored). In case of the incomplete observations,
failure time might be greater than the prefixed time tc . In some experiments, the
items may not start at the same time or at the time of the beginning of the study. In
that case, there is no single prefixed time that can be applied to each item instead,
each item may have its own endpoint which can be set as the time of entry to time of
exit. As the time of entry varies for each item, the duration of stay in the experiment
varies for each item too. Hence, we need to observe whether Ti ≤ Ci indicating that
the failure time is completely known or Ti > Ci , i = 1, 2, …, n, indicating that only
partial information about censoring time is known.
4.2.1.2 Type II Censoring
Let us consider a random sample of n lifetimes T1 , . . . , Tn which are independently

and identically distributed with probability density function f (t) and survivor func-
tion S(t). Then, a Type II censored sample is defined if only r smallest lifetimes
are observed out of the n (1 ≤ r ≤ n) lifetimes. In life testing experiments, Type
II censoring scheme is often employed due to its advantage of observing only the r
smallest lifetimes by which time and cost can be saved. The experiment is terminated
with the failure of the r-th item or achieving a prefixed target of the proportion of
failure r/n.
Let r (where r < n) be fixed, and let T(1) , . . . , T(n) denote the order statistics of
T1 , . . . , Tn . Observation ceases after the r-th failure so we can observe T(1) , . . . , T(r ) .
The full ordered observed sample is
t(1) = T(1) , . . . , t(r ) = T(r ) , t(r +1) = T(r ) , . . . , t(n) = T(r ) .
It is observed that the lifetimes are complete for r items which are denoted by
t(1) , . . . , t(r ) but after the r-th failure, the lifetimes of (n − r) items are not known
except their time of censoring, T(r ) , as the experiment is terminated at that time.
Hence, the time obtained after the r-th failure can be shown as t(r +1) = t(r +2) =
· · · = t(n) = t(r ) . This experiment results in the smallest r complete and remaining
(n − r) incomplete observations. The complete observations are uncensored, and the
incomplete observations are termed as censored observations.
Example 4.1 This example is taken from Miller (1981). Both Type I and Type II
censoring arise in engineering applications. In such situations, there is a batch of
transistors or tubes, we put them all on test at t = 0, and record their times to failure.
Some transistors may take a long time to burn out, and we will not want to wait that
long to end the experiment. Therefore, we might stop the experiment at a prespecified
time, tc , in which case we have Type I censoring. If we do not know beforehand what
value of the fixed censoring time is good, so we decide to wait until a prespecified
fraction r/n of the transistors has burned out, in which case we have Type II censoring
(Miller 1981).
4.2.1.3 Progressive Type II Censoring
A generalization of Type II censoring is progressive Type II censoring (Lawless

2003). In this case, the first r1 failures in a sample of n items are observed; then n 1
of the remaining n − r1 unfailed items are removed from the experiment, leaving
n−r1 −n 1 items still present. When further r2 items have failed, n 2 of the still unfailed
items are removed and so on. The experiment terminates after some prearranged
series of repetitions of this procedure.
4.2.1.4 Random Censoring
Let T1 , . . . , Tn be independently and identically distributed failure times with density

function f (t) and survivor function S(t) and C1 , . . . , Cn be independently and iden-
tically distributed censoring times associated with T1 , . . . , Tn having the probability
density function f C (c) and survivor function SC (c). Let us assume independence of
failure time T and censoring time C. Let us define the following variables similar to
Type I scheme

Ti if Ti ≤ Ci
ti =
Ci if Ti > Ci
and

1 if Ti ≤ Ci
δi = (4.5)
0 if Ti > Ci .
4.2 Types of Censoring 61
Example 4.2 Random censoring arises in medical applications where the censoring
times are often random. In a medical trial, patients may enter the study in a more or
less random fashion, according to their time of diagnosis. We want to observe their
lifetimes. If the study is terminated at a prearranged date, then censoring times, that
is the lengths of time from an individual’s entry into the study until the termination
of the study, are random (Lawless 1982).
4.2.2 Left Censoring
In a study, a lifetime variable, T, is considered to be left censored if it is less than a

censoring time C l (C l for left censoring time), that is, the event of interest has already
occurred for the item before that item is observed in the study at time C l . For such
items, we know that the event has occurred before time C l , but the exact event time
is unknown (Klein and Moeschberger 2003).
The exact lifetime T is known if and only if, T is greater than or equal to C l . The
data from a left-censored sampling scheme can be represented by pairs of random
variables (t, δ) where

Ti if Ti ≥ Cli
ti = (4.6)
Cli if Ti < Cli
and

1 if Ti ≥ Cli
δi = (4.7)
0 if Ti < Cli .
Example 4.3 In early childhood learning centers, interest often focuses upon testing
children to determine when a child learns to accomplish certain specified tasks (Klein
and Moeschberger 2003). The time to event would be considered as the age at which
a child learns the task. Assume that some children are already performing the task at
the beginning of their study. Such event times are considered as left-censored data.
4.2.3 Interval Censoring
The interval censoring occurs if the exact time of failure or event cannot be observed
due to the observations taken only in intervals (L i , Ri ) where L i = starting time
point of the interval and Ri = end time point of the interval i. For example, let an
item be observed in the state functioning at the starting time point of interval i gives
the value of L i and the end time point of interval i at which status of an item is
observed (functioning/not functioning) and is denoted as Ri . In other words, failure
occurs only within an interval due to the fact that the observations are taken only at
specified times such as follow-up at times one year intervals. In that case at the last
follow-up, the response could be still functioning but at the subsequent follow-up
the response could be not functioning. The failure occurred in between the interval
(L i , Ri ) where only information known is that the failure time lies between L i and Ri
or L i < Ti < Ri . Such interval censoring occurs when patients visit in a clinical trial
or longitudinal study at specified intervals only, and the patient’s event time is only
known to fall in the specified interval. In the studies performed in reliability analysis
such as industries where observations are taken only at the time of inspections at
specified intervals may provide interval-censored data.
Example 4.4 In the Health and Retirement Study, the age at which a subject first
developed diabetes mellitus may not be known exactly due to the collection of data
after every two years. The incidence of the disease may occur any time between
the last follow-up when the subject was observed to be free from diabetes mellitus
and observed to be suffering from the disease for the first time at the subsequent
follow-up. The disease occurred during the interval of two years between the two
follow-ups. This observation is an example of interval censoring.
4.3 Truncation
Another important feature of lifetime is truncation which may be present in both

reliability and survival analyses. In the case of truncation, some of the items may
be dropped from the study due to the implementation of some conditions such that
their presence or existence cannot be known. In other words, the truncated items are
subject to screening by some conditions as an integral part of the study. It is only
known that in the case of truncated data, the items that are exposed to some events are
observed by the investigator of the study. The events that act as conditions occurring
prior to the event of interest causing truncation may be bounded by the time such as
the date of birth prior to a time specified by the investigator, retirement from active
services, exposure to a disease or an event that might have occurred before the event
of interest, etc. In other words, all these conditions restrict the inclusion of some
subjects into the study on an event of interest.
Let us denote Y for the time of the condition or event that causes truncation
of lifetimes of subjects. In case of left truncation, the subjects with T ≥ Y are
observed implying that only lifetimes greater than the time of truncation caused by
the condition occurring prior to the event of interest are observed in the study. It is
frequently experienced that studies consider subjects of different ages being selected
randomly in a study causing delayed entry in some instances. The subjects with
delayed entry point are dropped from the study due to not satisfying the conditions
of being included in the study. All these subjects are followed through the end of the
study resulting in either occurrence of the event of interest or right censoring. The
4.3 Truncation 63
subjects with delayed entry might be exposed to the event of interest during the study
but due to exclusion from the study, they are not considered causing left truncation.
Example 4.5 An example of left truncation is given in Balakrishnan and Mitra (2011)
referring to the data collected by Hong et al. (2009). In that study, Hong et al. (2009)
considered 1980 as the cutoff time for inclusion in the study on a lifetime of machines
due to the fact that detailed record keeping on the lifetime of machines started in
1980 and the detailed information on failure of machine could be observed only
after 1980 causing left truncation. The left-truncated machines had information on
the date of installation but no information was available on the date of failure prior
to 1980. Hence, if the machines were installed and failed prior to 1980, then left
truncation occurred because the failure time prior to 1980 cannot be known by the
experimenter.
Example 4.6 Right truncation is particularly related to the studies of (AIDS) acquired
immune deficiency syndrome. In a study on AIDS, if a subject is included in the
sample only after the diagnosis of AIDS, then the potential patient of AIDS who
was infected but had not developed or diagnosed with AIDS during the study period
results in right truncation. In this case, the subjects are included in the study only if
the subjects are diagnosed with AIDS before the end of the study period. Those who
were suffering from infection during the study period but would develop the disease
after the end of the study are right truncated. This may happen for diseases with long
duration of the incubation period.
4.4 Construction of Likelihood Function
The construction of likelihood functions for different types of censoring is demon-

strated in this section. The construction of likelihood functions for Type I censoring,
Type II censoring, random censoring, and left truncation is shown here.
4.4.1 Type II Censoring
A Type II censored sample is one for which only the r smallest observations in a
random sample of n (1 ≤ r ≤ n) items are observed (Lawless 1982). It should be
stressed here that with Type II censoring, the number of observations, r, is decided
before the data are collected. Let us consider a random sample of n observations,
(T1 , . . . , Tn ). The r smallest lifetimes are T(1) , . . . , T(r ) out of the random sample
of n lifetimes (T1 , . . . , Tn ). Let us consider that the failure times (T1 , . . . , Tn ) are
independently and identically distributed with probability density function f (t) and
survivor function S(t).
Now, the likelihood function of r smallest observations T(1) , . . . , T(r ) and (n − r)

censoring times T(r ) is
r
(n−r )
L= f t(i) S t(r ) . (4.8)
i=1
Example 4.7 The likelihood function under Type II censoring for T ∼

Exponential(λ) is

r
r
(n−r ) −λt(r ) (n−r )
−λt(i)
L= f t(i) S t(r ) = λe e . (4.9)
i=1 i=1
The log likelihood function is

r
ln L = r ln λ − λ t(i) − λ(n − r )t(r ) .
i=1
Then, the estimator of λ is obtained by differentiating the log likelihood function

with respect to λ and equate to 0 as shown below
r
r
∂ ln L
= − t(i) − (n − r )t(r ) = 0.
∂λ λ i=1
Solving for λ, we obtain the maximum likelihood estimator under Type II cen-
soring scheme as
r
λ̂ = r . (4.10)
i=1 t(i) + (n − r )t(r )
4.4.2 Type I Censoring
As we have shown that if T1 , . . . , Tn be independent and identically distributed

random variables each with distribution function F, and if tc be some (preassigned)
fixed number which we call the fixed censoring time, then we observe t1 , . . . , tn
and δ1 , . . . , δn defined in Eqs. (4.1) and (4.2), respectively. In many occasions, the
censoring time may vary, and we can define the variables (ti , δi ) as in Eqs. (4.3) and
(4.4).
In case of known Ci (Ci = tc is a special case), the likelihood contribution of the
ith item is
4.4 Construction of Likelihood Function 65

f (ti ) if δi = 1
Li =
S(ti ) if δi = 0
which can be expressed as
L i = f (ti )δi S(ti )1−δi .
For independently and identically distributed Ti s, the likelihood function is

n
n
L= Li = f (ti )δi S(ti )1−δi . (4.11)
i=1 i=1
Example 4.8 Let us consider a random sample T1 , . . . , Tn using Type I censoring

scheme from an exponential distribution with T ∼ Exponential(λ). Then, the like-
lihood function for the exponential distribution is
n
−λti δi −λti 1−δi

n
−λ ti
L= λe e =λ e
r i=1 = λr e−λt
i=1
n n
where r = i=1 δi and t = i=1 ti . The log likelihood function is
ln L = r ln λ − λt
differentiating with respect to the parameter λ, we obtain
∂ ln L r
= −t
∂λ λ
and equating to 0, we find the maximum likelihood estimator of λ
r
λ̂ = . (4.12)
t
The second derivative of the log likelihood function is
∂ 2 ln L r
= − 2.
∂λ 2 λ
The observed information is
∂ 2 ln L r
− = 2
∂λ2 λ
and the Fisher information is defined as
2
∂ ln L E(r )
E − = 2 . (4.13)
∂λ2 λ
4.4.3 Random Censoring
In random censoring, we consider that each item may have both failure times
T1 , . . . , Tn with density function f (t) and survivor function S(t) and censoring times
C1 , . . . , Cn with probability density function f C (c) and survivor function SC (c). Let
us assume independence of failure time T and censoring time C and define the
following variables

Ti if Ti ≤ Ci
ti = (4.14)
Ci if Ti > Ci
and

1 if Ti ≤ Ci
δi = . (4.15)
0 if Ti > Ci
The likelihood contribution of the ith item for a pair of observations (ti , ci ) is

f (ti )SC (ti ) if Ti ≤ Ci
Li =
f C (ci )S(ci ) if Ti > Ci
and the likelihood function can be expressed as follows

L= f (ti )SC (ti ) f C (ci )S(ci ). (4.16)
ti ≤ci ti >ci
The relationships between hazard, density, and survivor functions are

f (t)
h(t) = S(t) if δ = 1
f C (t)
h C (t) = SC (t) if δ = 0
implying
f (t) = h(t)S(t) if δ = 1
,
f C (t) = h C (t)SC (t) if δ = 0
respectively.
Hence, equivalently, the likelihood function can be expressed as shown below
4.4 Construction of Likelihood Function 67

n
n
L= h(ti )δi S(ti ) h C (ti )1−δi SC (ti ). (4.17)
i=1 i=1
The second product term of Eq. (4.17) does not involve any information about
event time and corresponding parameters of the underline distribution; hence, this
can be ignored under the assumption of independence of event time and censoring
time. If the second product term is ignored, then it reduces to the likelihood function
of Type I censoring discussed earlier

n
L= f (ti )δi S(ti )1−δi (4.18)
i=1
because
f (ti )δi = h(ti )δi S(ti )δi . (4.19)
4.4.4 Any Type of Censoring and Truncation
We have shown that the likelihood function from random censoring based on the
assumption that we are interested in the parameters of the failure time only (not in
the parameters of the censoring time) as well as failure time and censoring time are
independent, then the likelihood function can be expressed as

n
L= f (ti )δi S(ti )1−δi = f (td ) S(tr ). (4.20)
i=1 d∈D r ∈R
In the above formulation, D is the set of failure times and R is the set of right-
censored times. We have shown that a failure at time td is proportional to the proba-
bility of observing a failure at time td while only known information about tr is that
the right-censoring time tr is less than the true survival time Tr . If we include the
other two sources of censoring, left censoring and interval censoring, then the above
likelihood can be generalized in the following form

L= f (td ) S(tr ) [1 − S(tl )] [S(L i ) − S(Ri )] (4.21)
d∈D r ∈R l∈L i∈I
where L used in the right side of Eq. (4.21) denotes the set of left-censored time
and we know only the information about the corresponding failure time that the
left-censored time tl is greater than the true survival time Tl , I is the set of interval-
censored times and we know that L i < Ti < Ri which means that the event occurred
between (L i , Ri ).
Odell et al. (1992) provided the construction of likelihood for left, right, or interval
censoring using the following indicators:
δri = 1 if the ith item is right censored at time ti < Ti , 0 otherwise;

δli = 1 if the ith item is left censored at time ti > Ti , 0 otherwise;
δ I i = 1 if the ith item is interval censored with tli < Ti < tri , 0 otherwise;
δ Ei = 1 if the ith item has the event occurred exactly at time ti = Ti , 0 otherwise;
where δ Ei = 1 − (δri + δli + δ I i ).

Let f (t) denote the probability distribution of failure time or event time and F(t)
denote the distribution function with S(t) = 1 − F(t), then the likelihood function
is

n

L= f (ti )δ Ei F(tli )δli [1 − F(tri )]δri [F(tri ) − F(tli )]δ I i
i=1
which can be rewritten as

n

L= f (ti )δ Ei [1 − S(tli )]δli [S(tri )]δri [S(tli ) − S(tri )]δ I i .
i=1
This expression is equivalent to the form shown earlier.

Let the time of left truncation occur at Tlti where Tlti is a random variable for
left truncation for the ith item, then the conditional distribution of failure time Ti
f (ti )
given Tlti is S(T lti )
, the failure time is observable only for Ti > Tlti . Then, the above
likelihood function can be further generalized as follows

n
L= f (td ) S(tr ) [1 − S(tl )] {S(L i ) − S(Ri )} / S(Tlti ). (4.22)
d∈D r ∈R l∈L i∈I i=1
In case of right-truncated data, the likelihood function is

n
f (ti )
L=
i=1
1 − S(Tr ti )
f (ti )
where the probability of failure time Ti given Tr ti is 1−S(T r ti )
, in this case Ti is
observable only if Ti < Tr ti . We cannot observe the failure times that occur after the
truncation time Tr ti . In case of interval truncated, Ti is observed only if Ti < Tr ti or
Ti > Tlti .
There are several studies on the use of left truncation and right censoring. Using
the general likelihood, we obtain (Balakrishnan and Mitra 2013)
References 69

f (ti )
δi
1 − F(ti ) 1−δi
δi
L= f (ti ) [1 − F(ti )] ×
1−δi
i∈S1 i∈S2
1 − F(tlti ) 1 − F(tlti )

f (ti ) δi S(ti ) 1−δi
δi
= f (ti ) [S(ti )] 1−δi
× . (4.23)
i∈S i∈S
S(tlti ) S(tlti )
1 2
where S 1 and S 2 denote the index sets corresponding to the units which are not
left truncated and left truncated, respectively. Balakrishnan and Mitra (2011, 2012,
2014) have discussed in detail the fitting of lognormal and Weibull distributions to
left truncated and right censored data through the Expectation–Maximization (EM)
algorithm.
Example 4.9 This example is taken from Balakrishnan and Mitra (2011). Let us
consider a lifetime variable T follows the lognormal distribution with parameters
μ and σ . The probability density function is
1 (ln t−μ)2
f (t) = √ e− 2σ 2 , t > 0
σ t 2π
then Y = ln T is distributed as follows

1 (y−μ)2
f (y) = √ e− 2σ 2 , −∞ < y < ∞.
σ 2π
The likelihood function for left-truncated and right-censored data is
1 yi − μ δi
yi − μ 1−δi

L= f 1− F
i∈S1
σ σ σ
yi −μ δi yi −μ 1−δi
1
f 1 − F
σ
× tσlti −μ tltiσ−μ (4.24)
i∈S
1 − F σ
1 − F σ
2
n
1 yi − μ
ln L(μ, σ ) = −δi ln σ − δi 2 (yi − μ) + (1 − δi ) ln 1 − F
2
i=1
2σ σ

tlti − μ
− ln 1 − F (4.25)
i∈S
σ
2
where μ and σ are location and scale parameters, respectively, f (·) and F(·) are
probability density and cumulative distributions of the standard normal distribution,
respectively, δi = 0 for right censored and δi = 1 for uncensored, tlti is the left-
truncation time, S1 is the index set for not left truncated and S2 is the index set for
left truncated.
References
Balakrishnan N, Mitra D (2011) Likelihood inference for lognormal data with left truncation and
right censoring with an illustration. J Stat Plan Infer 141:3536–3553
Balakrishnan N, Mitra D (2012) Left truncated and right censored Weibull data and likelihood
inference with an illustration. Comput Stat Data Anal 56:4011–4025
Balakrishnan N, Mitra D (2013) Likelihood inference based on left truncated and right censored
data from a gamma distribution. IEEE Trans Reliab 62:679–688
Balakrishnan N, Mitra D (2014) Some further issues concerning likelihood inference for left trun-
cated and right censored lognormal data. Commun Stat Simul Comput 43:400–416
Hong Y, Meeker WQ, McCalley JD (2009) Prediction of remaining life of power transformers based
on left truncated and right censored lifetime data. Ann Appl Stat 3:857–879
Islam MA, Al-Shiha A (2018) Foundations of biostatistics. Springer Nature Singapore Pte Ltd
Lawless JF (1982) Statistical models and methods for lifetime data. Wiley, New Jersey
Miller RG Jr (1981) Survival analysis. Wiley, New York
Odell PM, Anderson KM, D’Agostino RB (1992) Maximum likelihood estimation for interval-
censored data using a Weibull-based accelerated failure time model. Biometrics 48(3):951–959
Chapter 5
Nonparametric Methods
Abstract This chapter discusses the nonparametric approach for analyzing relia-
bility and survival data. It explains the nonparametric approach to inference based
on the empirical distribution function, product-limit estimator of survival function,
warranty claims rate, etc. This chapter also deals with the hypothesis tests for com-
parison of two or more survival/reliability functions. Examples are given to illustrate
the methodology.
5.1 Introduction
Data analysis begins with the use of graphical and analytical approaches in order
to gain insights and draw inferences without making any assumption regarding the
underline probability distribution that is appropriate for modeling the data (Blis-
chke et al. 2011). Nonparametric methods play an important role for analyzing the
data. These methods provide an intermediate step toward building more structured
models that allow for more precise inferences with a degree of assurance about the
validity of model assumptions. As such, nonparametric methods are also referred to
as distribution-free methods. This is in contrast to parametric and semiparametric
methods (given in the next chapters), which begin with a probabilistic model and
then carry out the analyses as appropriate for that model.
The ability to analyze data without assuming an underlying life distribution avoids
some potential errors that may occur because of incorrect assumptions regarding the
distribution (Blischke et al. 2011). It is recommended that any set of reliability and
survival data first be subjected to a nonparametric analysis before moving on to
parametric analyses based on the assumption of a specific underlying distribution.
This chapter deals with some of the common methods used for the nonparametric
analysis of data. It includes a number of examples to illustrate the methods.1
The outline of this chapter is as follows: Sect. 5.2 discusses the empir-
ical distribution function. Section 5.3 explains the Product-Limit estimator
of survival function. Section 5.4 deals with the nonparametric estimation of
1 Minitab software package (https://www.minitab.com), S-plus (http://www.insightful.com) and R-
language (http://cran.r-project.org/) will be used mainly in performing the analyses in this book.
72 5 Nonparametric Methods
age-based failure rate. Hypothesis tests for comparison of survival/reliability func-

tions are briefly reviewed in Sect. 5.5.
5.2 Empirical Cumulative Distribution Function
One of the key tools for investigating the distribution underlying the data is the
sample equivalent of F(t), denoted by F̂(t), and called the empirical cumulative
distribution function (ecdf) or empirical distribution function (edf) (Blischke et al.
2011). Its value at a specified value of the measured variable is equal to the proportion
of sample observations that are less than or equal to that specified value. The ecdf
plots as a “step function,” with steps at observed values of the variable. The form of
the function depends on the type of population from which the sample is drawn. On
the other hand, the procedure is nonparametric in the sense that no specific form is
assumed in calculating the ecdf (Ben-Daya et al. 2016).
The calculation of ecdf depends on the type of available data as discussed below
(Blischke et al. 2011).
5.2.1 Complete Data
In this case, the data are given by t1 , t2 , . . . , tn which are observed values of inde-
pendent and identically distributed (iid) real-valued random variable. The ecdf is
obtained as follows:
1. Order the data from the smallest to the largest observations. Let the ordered
observations are t(1) ≤ t(2) ≤ · · · ≤ t(n)
2. Compute
# of observations ≤ t(i) 1
n
F̂ t(i) = = I t( j) ≤ t(i) , i = 1, 2, . . . , n (5.1)
n n j=1

where I is the indicator function, namely I t( j) ≤ t(i) is one if t( j) ≤ t(i) and zero
otherwise.2
In other words, the value of the ecdf at a given point t (i) is obtained by dividing
the number of observations that are less than or equal to t (i) by the total number of
observations in the sample.
For any fixed real value t, it can be shown that the random variable n F̂(t) has a
binomial distribution with parameters n and success probability F(t), where F(t) is
2 Sometimes (n + 1) is used as the divisor rather than n in the Step 2 (Makkonen 2008).
5.2 Empirical Cumulative Distribution Function 73
Fig. 5.1 Empirical cdfs for age (left side) and usage (right side)
the true cdf of T. Therefore,

using properties of the
binomial
distribution, the mean
and variance of F̂(t) are E F̂(t) = F(t) and Var F̂(t) = F(t)[1 − F(t)]/n. This
implies that (i) the proportion of sample values less than or equal to the specified
value t is an unbiased estimator of F(t) and (ii) the variance of F̂(t) tends to zero as
n tends to infinity. Thus, using Chebyshev’s inequality, it can be shown that F̂(t) is a
consistent estimator of F(t), or, in other words, F̂(t) converges to F(t) in probability
(see, e.g., Gibbons and Chakraborti 2003).
Example 5.1 The plots of the empirical cumulative distribution functions for age (in
days) and usage (in km driven at failure) for the data of Table 2.1 are given in Fig. 5.1.
According to Eq. (5.1), for the variable age, we get, for example, F̂(90) = 7/20 =
0.35 (also can also be seen in Fig. 5.1), which indicates that the proportion of obser-
vations that are less than or equal to 90 days is 0.35. In this example, F̂(90) can
also be used as a point estimate of P{T ≤ 90}. Similarly, for the variable usage,
F̂(20, 000) = 13/20 = 0.65, which indicates the proportion of observations that
are less than or equal to 20,000 km is 0.65.3
5.2.2 Right-Censored Data
For censored data, we look only at right-censored case since that is the most common
type of censoring found in many reliability and survival applications. Detail discus-
sion about censored data is presented in Chap. 4. To calculate ecdf, the observations
are ordered, including both censored and uncensored values in the ordered array.
Suppose that m observations in the ordered array are uncensored. Denote these as
t1 , t2 , . . . , tm . These are the locations of the steps in the plot of the ecdf. To deter-
mine the heights of the steps, for i = 1, …, m, form the counts ni = number at risk
3 As before, the results for age and usage are based on a small subsample of the original data.
(the number of observations greater than or equal to ti in the original ordered array),
and d i = number of values tied at ti (=1 if the value is unique), then calculate the
“survival probabilities” (Blischke et al. 2011) as

d1 di
S t1 = 1 − and S ti = 1 − S ti−1 , i = 2, . . . , m. (5.2)
n1 ni

Then, the corresponding ecdf becomes F ti = 1 − S ti , i = 1, 2, . . . , m.
Note This procedure for censored data may also be applied to grouped data. Since
this is the sample version of F(t), it may be used to estimate the true cdf. In this
context, the ecdf is generally known as the Kaplan–Meier estimator (Meeker and
Escobar 1998, Sect. 3.7).
5.3 Product-Limit Method
Kaplan and Meier (1958) derived the nonparametric estimator of the survival function
for censored data which is known as the product-limit (PL) estimator. This estimator
is also widely known as the Kaplan–Meier (KM) estimator of the survival function.
Nonparametric estimation of the survival function for both complete and censored
data is discussed in Lawless (1982).
Suppose that there are observations on n individuals and that there are k (k ≤ n)
distinct times t 1 < t 2 < ··· < t k at which deaths/failures occur. Or, suppose the time
line (0, ∞) is portioned into k + 1 intervals as (t 0 , t 1 ], (t 1 , t 2 ], …, (t i−1 , t i ], …, (t k ,
t k+1 ], where t 0 = 0 and t k+1 = ∞. Let d i denote the number of units that died/failed
at in the ith interval (t i−1 , t i ] and r i represent the number of units that survive interval
i and are right-censored at t i , i = 1, 2, …, k. Then, the size of the risk set (number of
units that are alive) at the beginning of interval i is

i−1
i−1
ni = n − dj − r j , i = 1, 2, . . . , k, (5.3)
j=0 j=0
where d 0 = 0 and r 0 = 0. The estimator of the conditional probability that a unit

dies/fails in the time interval from t i−1 = t i − δt to t i for small δt, given that the unit
enters this interval is the sampling proportion failing at time t i is, p̂i = di /n i ,
i = 1, 2, . . . , k. The
estimator of the corresponding probability of surviving is
P T > t j |T > t j−1 = q̂i = 1 − p̂i = (1 − di /n i ), i = 1, 2, …, k. Then, the
survival function or probability of surviving beyond time t is
Ŝ(t) = P{T > t}

= P T > t j and T > t j−1 , for t ∈ t j−1 , t j
5.3 Product-Limit Method 75
As for any two events A and B, P{A and B} = P{ A|B} × P{B}, hence

Ŝ(t) = P T > t j |T > t j−1 × P T > t j−1

= P T > t j |T > t j−1 × P T > t j−1 |T > t j−2 × P T > t j−2

= P T > t j |T > t j−1 × P T > t j−1 |T > t j−2 × P T > t j−2 |T > t j−3
× · · · × P{T > t2 |T > t1 } × P{T > t1 |T > t0 } × P{T > t0 }

= 1 − p̂ j × 1 − p̂ j−1 × 1 − p̂ j−2 × · · · × 1 − p̂1 × 1 − p̂0
where t0 = 0, tk+1 = ∞ and p̂0 = 0
dj

= 1 − p̂ j = 1− (5.4)
j : t <t j : t <t
nj
j j
This is known as the Kaplan–Meier estimator of the survival function S(t).4 The
nonparametric estimator of F(t) is obtained by using the Kaplan–Meier estimator of
the survival function. The result is
F̂(t) = 1 − Ŝ(t), t ≥ 0. (5.5)
5.3.1 Variances of Ŝ(t) and F̂(t)

As F̂(t) = 1 − Ŝ(t), we can show that V F̂(t) = V Ŝ(t) . Also, as Ŝ(t) =

j : t j <t 1 − p̂ j = j : t j <t q̂ j and S(t) = j : t j <t q j . Then a first-order Taylor
series approximation for Ŝ(t) (Meeker and Escobar 1998, p. 54) is

∂ S(t)
Ŝ(t) ≈ S(t) + q̂ j − q j

∂q j
j : t <tj
qj
S(t)
= S(t) + q̂ j − q j
j : t <t
qj
j
Then, it follows that

S(t) 2 S(t) 2 q j p j
V F̂(t) = V Ŝ(t) = V(q̂ j ) = (5.6)
j : t <t
qj j : t <t
qj nj
j j
4 Kaplan and Meier (1958) allowed the width of the interval (t i−1 , t i ], i = 1, 2, …, k, to approach 0
and the number of intervals to approach ∞ (Meeker and Escobar 1998).
because q̂ j are approximately uncorrelated binomial proportions.

This can be shown as
pj pj
V Ŝ(t) = [S(t)]2 = [S(t)]2 . (5.7)
j : t j <t
n jqj n 1 − pj
j : t <t j
j
The right-hand side of Eq. (5.7)

is also an asymptotic (large-sample approximate)
variance denoted by Avar F̂(t) (Meeker and Escobar 1998). Substituting p̂ j for pj
and Ŝ(t) for S(t) in Eq. (5.7) gives the following variance estimator.
2 p̂ j
V̂ F̂(t) = V̂ Ŝ(t) = Ŝ(t) . (5.8)
j : t j <t
n j 1 − p̂ j
This is called Greenwood’s formula. This formula can also be obtained by using the
popular delta method.
5.3.2 Confidence Interval of F̂(t)
Meeker and Escobar (1998, p. 54) discussed the estimation method for confidence
intervals for F(t) using the normal-approximation of point estimator of F(t). By
using the logit transformation, they showed that two-sided 100(1 − α)% confidence
intervals for F(t) can be calculated as
⎡ ⎤
⎣ F̂(t) F̂(t)
, ⎦ (5.9)
F̂(t) + 1 − F̂(t) × w F̂(t) + 1 − F̂(t) /w

where w = exp z (1−α/2) s ê F̂(t) / F̂(t) 1 − F̂(t) and s ê F̂(t) = V F̂(t) .
The PL estimate Eq. (5.4) possesses a number of important properties. It is a con-
sistent estimator of S(t) under quite general conditions and a nonparametric maxi-
mum likelihood estimator of S(t) (Lawless 1982; Meeker and Escobar 1998). The PL
estimate of survival function can be used to estimate the cumulative hazard function
for censored data. A nonparametric estimator of the cumulative hazard function was
derived by Nelson (1969, 1972, 1982) and Aalen (1976).
Example 5.2 The lifetime data comprising of both censored and uncensored obser-
vations on a sample of 54 batteries are given in Table 5.1. The data include failure
times (in days) for 39 items that failed under warranty and service times for 15 items
5.3 Product-Limit Method 77
Table 5.1 Lifetime data of

Time to failure Service time
battery
64 599 852 131
66 619 929 162
164 631 948 163
178 639 973 202
185 645 977 232
299 656 1084 245
319 681 1100 286
383 722 1100 302
385 727 1350 315
405 738 337
482 761 845
492 765 983
506 788 1259
548 801 1384
589 848 1421
that had not failed at the time of observation. That is, the data are right-censored with
39 failures and 15 censored units.5
Table 5.2 illustrates numerical computations for obtaining nonparametric esti-
mates of S(t) and F(t). In this table, the column with heading “status” indicates
whether the corresponding t i s are failure or censored, where 1 indicates failure and
0 indicates censored. To make the table, short the calculations for 41 rows (from row
No. 8 to No. 48) are not shown in the table.
The function survfit,6 given in R- and S-plus software provides Kaplan–Meier
estimate of the survival function. Figure 5.2 shows the Kaplan–Meier estimate of
the survival function (solid line) for the battery failure data. The dashed lines are the
95% confidence intervals for the survival function. The Ŝ(t) values given in Table 5.2
coincide with the estimate shown in Fig. 5.2.
Table 5.2 and Fig. 5.2 show the decreasing step function of the estimated survival
function. The survival function drops at the values of the observed failure times and is
constant between observed failure times. We can see from Table 5.2 and Fig. 5.2 that
about 11.38% of the batteries are estimated to survive until 1100 days. The estimated
95% confidence interval for Ŝ(t) at 1100 days is (0.047, 0.276).
5 The data are taken from Blischke et al. (2011).

6 The “survfit” function computes the estimates of survival probabilities for censored data using
either the Kaplan–Meier or the Fleming and Harrington method and computes the predicted survival
function for a Cox proportional hazards model. For more information, see (Fleming and Harrington
1984), S-plus (www.insightful.com), and R-language (http://cran.r-project.org).
78
Table 5.2 Calculations for the nonparametric estimates of S(t i ) and F(t i ) for battery failure data

i Time (t i ) Status di ri ni p̂i 1 − p̂i Ŝ(ti ) F̂(ti ) V̂ Ŝ(ti )
1 64 1 1 0 54 0.0185 0.9815 0.9815 0.0185 0.000336

2 66 1 1 0 53 0.0189 0.9811 0.9630 0.0370 0.000661
3 131 0 0 1 52
4 162 0 0 1 51
5 163 0 0 1 50
6 164 1 1 0 49 0.0204 0.9796 0.9433 0.0567 0.001012
7 178 1 1 0 48 0.0208 0.9792 0.9237 0.0763 0.001348
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
49 1100 1 2 0 6 0.3333 0.6667 0.1138 0.8862 0.002637
50 1259 0 0 1 4
51 1350 1 1 0 3 0.3333 0.6667 0.0759 0.9241 0.002133
52 1384 0 0 1 2
53 1421 0 0 1 1
5 Nonparametric Methods
5.4 Age-Based Failure Rate Estimation 79
Fig. 5.2 Plot of the nonparametric estimate of S(t) for battery failure data with 95% confidence
interval
5.4 Age-Based Failure Rate Estimation
Age-based (or age-specific) failure rate (or claim rate) estimation is used for assessing
the reliability of a product as a function of the age of the product. Many factors
contribute to product failures that result in warranty claims. One of the most important
factors is the age of the product (Karim and Suzuki 2005). Age is calculated by the
service time measured in terms of calendar time since the product is sold or entered
in service. The age-based analysis of product failure data has generated considerable
interest in the literature (Kalbfleisch et al. 1991; Kalbfleisch and Lawless 1996;
Lawless 1998; Karim et al. 2001; Suzuki et al. 2001), and a number of approaches
have been developed with regard to addressing the age-based analysis of warranty
claims data. Age-based analysis of claim data forms the basis for estimating and
predicting warranty claims rates and comparing claim rates among different product
groups and/or different production periods.
To estimate the age-based claim rate, we use the following notations. Let N i be
the number of products sold in the ith month for i = 1, 2, …, I, where i is the number
of months of sale (MOS). Let r ij be the number of products sold in the ith month
which failed in jth months, j = 1, 2, …, J, where j is the number of observed months
and J ≥ I. Also, let W be the warranty period and r j be the counts of claims occurring
in the jth month where
min
(I, J )
rj = ri j , j = 1, 2, . . . , J. (5.10)
j=max (1, j−W +1)
The structure of the monthly counted warranty claims for different MOS is shown
in Table 5.3.
Let nit be the number of items from MOS i that fail at age t or month in service
(MIS) t, t = 1, 2, …, min(W, J). The r ij can be expressed in terms of nit as ni,t −i+1
= r it , i = 1, 2, …, I and t = 1, 2, …, min(W, J), and Table 5.3 can be rearranged as
Table 5.4.
The age-based number of warranty claims, WC(t) or nt , can be calculated as
min (I,
J −t+1)
WC(t) = n t = n it , t = 1, 2, . . . , min (W, J ). (5.11)
i=1
To illustrate the estimation of age-based warranty claims rate (WCR), we will

consider the following two tables, Tables 5.5 and 5.6, with I = 3, W = 4, J = 5, and
hypothetical values of N i , r ij , and nit .
The number of items under warranty for each time period or risk of claims at
age t, RC(t), can be computed from Tables 5.5 and 5.6. For the one-dimensional
free-replacement warranty (FRW) policy, this quantity is given by
min (I,
J −t+1)
RC1 (t) = Ni , t = 1, 2, . . . , min (W, J ) (5.12)
i=1
and for the one-dimensional pro-rata warranty (PRW) policy (with refund)7 it is given
by
⎧
⎪
⎪ J −t+1)
min (I,
⎪
⎨ Ni , if t = 1
i=1
RC2 ( t) = min (I, . (5.13)
⎪
⎪ J −t+1) J −i)
min (t−1,
⎪
⎩ Ni − n iu , if t > 1
i=1 u=1
For example, according to the data given in Table 5.6,
7 In case of a free-replacement warranty (FRW) policy, the seller agrees to repair or provide replace-
ments for failed items free of charge up to a time W from the time of the initial purchase. In the
case of a pro-rata warranty (PRW) policy, the seller agrees to refund an amount α(T )C s if the item
fails at age T prior to time W from the time of purchase, where C s is the sale price and α(T ) is a
non-increasing function of T, with 0 < α(T ) < 1 (Murthy and Jack 2014).
Table 5.3 Monthly counted warranty claims {r ij } for different MOS
MOS (i) Ni Warranty claims in a calendar time (month, j)
1 2 … W W +1 … I I +1 … J
5.4 Age-Based Failure Rate Estimation
1 N1 r 11 r 12 … r 1W
2 N2 r 22 … r 2W r 2,W +1
.. .. .. ..
. . . . … … … … …
.. .. .. ..
I NI . . … . . … r II r I,I+1 … r IJ
{r j } r1 r2 … rW r W +1 … rI r I+1 … rJ
Note I is the total number of months of sale; J is the number of observed months (I ≤ J); W is the length of the warranty period; in this table W < J; however,
W ≥ J also be possible
81
Table 5.4 Age-based count of warranty claims {nit } for different MOS
MOS (i) Ni Warranty claims at age t (in month)
1 2 … W
1 N1 n11 n12 … n1W
2 N2 n21 n22 … n2W
.. .. .. .. ..
. . . . … .
I NI nI1 nI2 … nIW
{nt } n1 n2 … nW
Table 5.5 Monthly count of warranty claims {r ij } for different MOS

MOS (i) Ni Warranty claims in a calendar time (month, j)
1 2 3 W =4 J =5
1 N 1 = 250 r 11 = 2 r 12 = 4 r 13 = 4 r 14 = 6
2 N 2 = 200 r 22 = 1 r 23 = 2 r 24 = 3 r 25 = 4
I =3 N 3 = 150 r 33 = 0 r 34 = 1 r 35 = 2
{r j } r1 = 2 r2 = 5 r3 = 6 r 4 = 10 r5 = 6
Table 5.6 Age-based count of warranty claims {nit } for different MOS
MOS (i) Ni Warranty claims at age t (in month)
1 2 3 W =4
1 N 1 = 250 n11 = 2 n12 = 4 n13 = 4 n14 = 6
2 N 2 = 200 n21 = 1 n22 = 2 n23 = 3 n24 = 4
I =3 N 3 = 150 n31 = 0 n32 = 1 n33 = 2
{nt } n1 = 3 n2 = 7 n3 = 9 n4 = 10
min
(3,5)
RC1 (1) = Ni = N1 + N2 + N3 = 600;
i=1
min
(3,2)
RC1 (4) = Ni = N1 + N2 = 450;
i=1
⎧ ⎫
min
(3,4) ⎨ min
(1,5−i) ⎬
RC2 ( 2) = Ni − n iu = {N1 − n 11 } + {N2 − n 21 } + {N3 −n 31 }
⎩ ⎭
i=1 u=1
= 250 − 2 + 200 − 1 + 150 − 0 = 597.
Using Eqs. (5.12) and (5.13), and Table 5.6, we obtain the first four columns of
Table 5.7.
5.4 Age-Based Failure Rate Estimation 83
Table 5.7 Estimation of WCR1 (t) and WCR2 (t) for each t
Age (t) RC1 (t) RC2 (t) nt WCR1 (t) WCR2 (t)
1 600 600 3 0.00500 0.00500
2 600 597 7 0.01167 0.01173
3 600 590 9 0.01500 0.01525
4 450 434 10 0.02222 0.02304
The age-based warranty claims rate (WCR) analysis examines claims as a fraction
of the units still under warranty. Here, we define WCR in two ways. The first is
WC (t) nt
WCR1 (t) = = , t = 1, 2, . . . , min (W, J ). (5.14)
RC1 (t) RC1 (t)
Note that WCR1 is the ratio of the total number of claims for period t and the
number of items under warranty prior to that period for FRW policy. The second
definition is
WC (t) nt
WCR2 (t) = = , t = 1, 2, . . . , min (W, J ), (5.15)
RC2 (t) RC2 (t)
which is the ratio of the total number of claims for period t and the number of items
under warranty prior to that period for PRW policy. The estimates of WCR1 (t) and
WCR2 (t) for the data shown in Table 5.5 are obtained by using Eqs. (5.14) and (5.15),
respectively, and are given in Table 5.7.
Table 5.7 indicates that WCR for FRW policy, WCR1 (t), is smaller than that of
PRW policy, WCR2 (t).
5.5 Hypothesis Tests for Comparison of Survival/Reliability

Functions
We are often interested in assessing whether there are differences in survival curves
among different groups of individuals. For example, in a clinical trial with a sur-
vival outcome, we might be interested in comparing survival between participants
receiving a new drug as compared to a placebo that is made to resemble drugs but
does not contain an active drug. In an observational study, one might be interested
in comparing survival between men and women or between participants with and
without a particular complication (e.g., hypertension or diabetes), (Sullivan 2019).
There are several tests available to compare survival among different independent
groups of participants. We describe how to compare two or more groups of survival
data based on their survival curves using the log-rank test of the null hypothesis of
statistically equivalent KM survival curves. First, we apply the test for two groups
only, and then we extend the procedure for several groups.
5.5.1 Comparison of Survival Functions for Two Groups
As mentioned by Liu (2012), in survival analysis, there are a number of methods

that can be used to test group differences in the survival function statistically. These
techniques include, but are not limited to, the Mantel–Haenszel log-rank test (Mantel
and Haenszel 1959), the Peto and Peto log-rank test (Peto and Peto 1972), the Gehan
generalized Wilcoxon rank sum test (Gehan 1967), the Peto and Peto and Prentice
generalized Wilcoxon test (Peto and Peto 1972; Prentice 1978), and the Tarone and
Ware modified Wilcoxon test (Tarone and Ware 1977).
We start the description of the log-rank test by assuming a population of interest
with two separate groups, group 1 and group 2 denoted by G1 and G2 , respectively.
Let S 1 (t) or R1 (t) and S 2 (t) or R2 (t) denote the survival or reliability functions for G1
and G2 , respectively. At each specific lifetime t i , (i = 1, 2, …, m; m less than or equal
to sample size n), the number of units that died/failed is d i which consists of d 1i and
d 2i that are affiliated with G1 and G2 , respectively. The number of units exposed to
the risk of the event just prior to t i , denoted by ni , is also divided into n1i for G1
and n2i for G2 . This can be summarized in a 2 × 2 contingency table displaying the
number of failures and survivals at t i , as shown in Table 5.8.
Here, we test that G1 and G2 have the identical survival functions, that is, the
null hypothesis is H0 : S 1 (t) = S 2 (t) or H0 : R1 (t) = R2 (t). Under this hypothesis, the
marginal totals should all be fixed. Given n1i , n2i , d i , and ni − d i fixed quantities, the
entry in the (1, 1) cell of the 2 × 2 table, d 1i , can be viewed as a random variable, and
it has a probability distribution called hypergeometric distribution (Peto and Peto
1972). Briefly, the hypergeometric probability of having d 1i in n1i , given the fixed
values of ni , n1i , and d i , can be written as

di n i − di
d1i n 1i − d1i
P(D1i = d1i ) =
, d1i = 0, 1, . . . , n 1i (5.16)
ni
n 1i
Table 5.8 Number of

Event Group Total
failures/deaths and of
censored/survivals at t i in two G1 G2
groups Deaths d 1i d 2i di
Survivals n1i − d 1i n2i − d 2i ni − d i
Total n1i n2i ni
5.5 Hypothesis Tests for Comparison of Survival/Reliability Functions 85
where D1i is the random variable representing the number of death in G1 at time t i .
The mean and variance of D1i are
di n 1i
E(D1i ) = (5.17)
ni
and
n 1i di (n i − di )(n i − n 1i )
V (D1i ) = . (5.18)
n i2 (n i − 1)
Equation (5.16) interprets that if the null hypothesis is true, d i should proportion-
ally be allocated into G1 and G2 , and hence the expected value of D1i is simply the
proportion of ni selected to n1i , then multiplied by d i .
Mantel and Haenszel (1959) proposed to compute the sum of differences between
m values of D1i over all the observed survival times, let
the observed and the expected
us denote this by D̃1 = i=1 [d1i − E(D1i )], where m is the total number of distinct
observed lifetimes. Similarly, the variance of D̃1is the sum of the variances of D1i
m
over the total number of survival times m, V D̃1 = i=1 V (D1i ). As sample size
increases, D̃1 tends
to be normally distributed with mean 0 and variance V ( D̃1 ).

Therefore, D̃1 / V D̃1 ∼ N (0, 1) and a z-score can be derived for testing the
independence of survival and group (Liu 2012), with the test statistic defined by
m
[d1i − E(D1i )]
$
z = i=1 ∼ N (0, 1). (5.19)
m
i=1 V (D1i )
As the square of a standard normal random variable has a chi-squared distribu-

tion, we can also conduct an approximate chi-square test when the total number of
observed events is large. Statistically,
m 2
[d1i − E(D1i )]
χ =
2
m
i=1
∼ χ12 (5.20)
i=1 V (D 1i )
where χ12 indicates the chi-square distribution with one degree of freedom for two
groups. This test is known as the log-rank test. The term “log-rank test” actually
comes from Peto and Peto’s inference, in which the method uses the log transforma-
tion of the survival function to test a series of ranked survival times (Liu 2012).
Example 5.3 The summarized data presented in Table 5.9 are the Instant Power
Supply (IPS) battery performance data consisting of both failure and censored life-
times of the batteries. The data are taken from Ruhi (2016). The IPS were used in
Table 5.9 IPS battery failure and censored data for maintained and nonmaintained groups
Time (in months) (t i ) Maintained group: no. of Nonmaintained group: no. of
Failures (d 1i ) Censored (c1i ) Failures (d 2i ) Censored (c2i )
2 0 0 0 1
3 0 2 0 0
6 1 1 1 2
7 0 2 1 2
8 1 2 1 1
9 0 1 0 1
10 0 3 2 1
11 0 0 3 0
12 2 9 3 2
13 0 0 3 1
14 0 2 0 0
15 0 1 3 0
18 5 2 1 0
20 2 0 0 1
21 0 1 0 0
22 0 1 0 0
23 1 1 0 0
24 8 14 0 0
25 0 1 0 0
29 1 0 0 0
30 13 6 0 0
31 2 0 0 0
33 0 1 0 0
34 1 0 0 0
36 15 4 0 0
42 12 2 0 0
44 1 0 0 0
45 1 0 0 0
52 1 0 0 0
53 2 0 0 0
Total 69 56 18 12
some offices and residences in Rajshahi region of Bangladesh. Some batteries were
maintained regularly with proper care and some were not maintained regularly. The
information regarding the names of the manufacturing companies is not disclosed
here to protect the proprietary nature of the information.
The data set consists of 87 failure data and 68 censored data out of 155 observed
batteries. The column, characterized by “Time” indicates the age (in months) of
the item at the data collection period. Among the 155 batteries, 125 are found as
maintained regularly by the user and 30 observations are found as nonmaintained.
The numbers of failures and censored observations corresponding to each lifetime
under both maintained and nonmaintained groups are given in the table.8
As an example of the application of the log-rank test, we consider the comparison
of maintained (Group 1) and nonmaintained (Group 2) batteries used in IPS.
The quantities needed to calculate the test statistic for equality of two survival
functions for the IPS battery performance data of Table 5.9 are given in Table 5.10.
According to Eq. (5.20), the test statistic for the log-rank test to compare two
groups is
m 2 m m 2
[d1i − E(D1i )] i=1 d1i − i=1 E(D1i )
χ =
2 i=1
m = m
i=1 V (D1i ) i=1 V (D1i )
(69 − 84.3390)2
= = 103.327 (5.21)
2.2771
This test statistic is approximately distributed as chi-square with one degree of
freedom. For this test, the decision rule at 5% level of significance is in favor of
rejecting H0 because χ 2 > 3.84. Therefore, we have statistically significant evidence
at α = 0.05 to conclude that the survival functions for maintained and nonmaintained
groups are different. There is evidence to suggest that the maintained group has much
better survival than the nonmaintained group (see Fig. 5.3).
The function “survdiff” in R is a family of tests parameterized by parameter
rho can be used for this test. The following description is from R documentation
on survdiff: “This function implements the G-rho family of Harrington and Fleming
(1982), with weights on each death of S(t)ˆrho, where S is the Kaplan–Meier estimate
of survival. With rho = 0, this is the log-rank or Mantel–Haenszel test, and with rho
= 1 it is equivalent to the Peto and Peto modification of the Gehan–Wilcoxon test.”
For the IPS battery data, both the tests (with rho = 0 and rho = 1) give very small
p-values (less than 2 × 10−16 ), indicate that there is a significant difference between
the survival curves of maintained and nonmaintained groups.
8 NoteSometimes some offices replace batteries batch-wise assuming failure because of low per-
formance and so the number of failures becomes high, for example, see the lifetimes 30, 36 and
42 months. Also an office decided not to use IPS with a batch of good batteries at age 24 months,
which were considered as censored observations.
Table 5.10 Quantities needed to calculate the test statistic for equality of two survival functions
88
i ti d 1i c1i d 2i c2i n1i n2i nt dt E(d 1i ) E(d 2i ) V (d 1i )

1 2 0 0 0 1 125 30 155 0 0 0 0
2 3 0 2 0 0 125 29 154 0 0 0 0
3 6 1 1 1 2 123 29 152 2 1.6184 0.3816 0.3067
4 7 0 2 1 2 121 26 147 1 0.8231 0.1769 0.1456
5 8 1 2 1 1 119 23 142 2 1.6761 0.3239 0.2695
6 9 0 1 0 1 116 21 137 0 0 0 0
7 10 0 3 2 1 115 20 135 2 1.7037 0.2963 0.2505
8 11 0 0 3 0 112 17 129 3 2.6047 0.3953 0.3379
9 12 2 9 3 2 112 14 126 5 4.4444 0.5556 0.4780
10 13 0 0 3 1 101 9 110 3 2.7545 0.2455 0.2212
11 14 0 2 0 0 101 5 106 0 0 0 0
12 15 0 1 3 0 99 5 104 3 2.8558 0.1442 0.1346
13 18 5 2 1 0 98 2 100 6 5.8800 0.1200 0.1117
14 20 2 0 0 1 91 1 92 2 1.9783 0.0217 0.0213
15 21 0 1 0 0 89 0 89 0 0 0 0
16 22 0 1 0 0 88 0 88 0 0 0 0
17 23 1 1 0 0 87 0 87 1 1 0 0
18 24 8 14 0 0 85 0 85 8 8 0 0
19 25 0 1 0 0 63 0 63 0 0 0 0
20 29 1 0 0 0 62 0 62 1 1 0 0
21 30 13 6 0 0 61 0 61 13 13 0 0
(continued)
5 Nonparametric Methods
Table 5.10 (continued)
i ti d 1i c1i d 2i c2i n1i n2i nt dt E(d 1i ) E(d 2i ) V (d 1i )
22 31 2 0 0 0 42 0 42 2 2 0 0
23 33 0 1 0 0 40 0 40 0 0 0 0
24 34 1 0 0 0 39 0 39 1 1 0 0
25 36 15 4 0 0 38 0 38 15 15 0 0
26 42 12 2 0 0 19 0 19 12 12 0 0
27 44 1 0 0 0 5 0 5 1 1 0 0
28 45 1 0 0 0 4 0 4 1 1 0 0
29 52 1 0 0 0 3 0 3 1 1 0 0
30 53 2 0 0 0 2 0 2 2 2 0 0
Sum 69 56 18 12 87 84.3390 2.6610 2.2771
5.5 Hypothesis Tests for Comparison of Survival/Reliability Functions
89
Fig. 5.3 Plots of the nonparametric estimates of S(t) for two groups for IPS battery data
5.5.2 Comparison of Survival Functions for More Than Two

Groups
In reliability and survival analyses, researchers often need to compare reliability or

survival functions for more than two groups. Technically, tests for equality of survival
functions for more than two groups are a straightforward extension of the log-rank
test for two groups described in Sect. 5.5.1. Suppose, there are K (K > 2) different
groups denoted by G1 , G2 , …, GK . At each lifetime t i , the sample ni is allocated into
K groups, given by n1i , n2i , …, nKi , (i = 1, 2, …, m). Likewise, the number of units
that died/failed, d i , is divided into d 1i , d 2i , …, d Ki , where d ki is defined as the number
of units that died/failed in Gk at lifetime t i , i = 1, 2, …, m. Therefore, ni = n1i + n2i
+ ··· + nKi and d i = d 1i + d 2i + ··· + d Ki . This classification can be summarized in
a 2 × K contingency table displaying the number of failures and survivals at t i , as
displayed in Table 5.11.
In this case, we test that G1 , G2 , …, GK have the identical survival function, that is,
the null hypothesis is H0 : S 1 (t) = S 2 (t) · · · = S K (t). A χ(K 2
−1) test can be developed
Table 5.11 Number of failures/deaths and of censored/survivals at t i in K groups

Event Group Total
G1 G2 … GK
Deaths d 1i d 2i … d Ki di
Survivals n1i − d 1i n2i − d 2i … nKi − d Ki ni − d i
Total n1i n2i … nKi ni
by comparing the observed and the expected number of values of events (Liu 2012),
by extending the method described in the previous Sect. 5.5.1.
Here, we use matrix expressions of mathematical equations for comparison of K
groups. Let Oi = [d 1i , d 2i , … d (K −1)i ] be a vector for the observed number of events
in group 1 to group (K − 1) at lifetime t i , i = 1, 2, …, m. The distribution of counts in
Oi is assumed to follow a multivariate hypergeometric distribution for given [n1i , n2i ,
…, nKi ], conditional on both the row and the column totals (a detailed description can
be found in Liu 2012). This multivariate hypergeometric distribution is associated
with a mean vector

di n 1i di n 2i di n (K −1)i
Ei = , ,..., (5.22)
ni ni ni
and a variance–covariance matrix

⎡ ⎤
v11i v12i ··· v1(K −1)i
⎢ v22i ··· v2(K −1)i ⎥
⎢ ⎥
Vi = ⎢ . .. .. .. ⎥ (5.23)
⎣ .. . . . ⎦
v(K −1)(K −1)i
with the kth diagonal element
n ki di (n i − di )(n i − n ki )
vkki = (5.24)
n i2 (n i − 1)
and with the klth off-diagonal element
n ki n li di (n i − di )
vkli = , k = l; i = 1, 2, . . . , m. (5.25)
n i2 (n i − 1)
m m m
Then, letting O = i=1 Oi , E = i=1 Ei , and V = i=1 Vi , we have the
approximate chi-square test statistic for the log-rank test as
χ 2 = (O − E) V−1 (O − E) ∼ χ(K

2
−1) . (5.26)
This test statistic does not consider any weight for different groups. At lifetime t i ,
let the positive weight function for the group k denoted by wk (t i ) with the property that
wk (t i ) is zero whenever ni is zero (Klein and Moeschberger 2003). With consideration
of weight function, the test statistic Eq. (5.26) can be expressed (Kalbfleisch and
Prentice 1980; Liu 2012) as
' ( ' m (

m
−1
χ =
2
wi (Oi − Ei ) [wi Vi wi ] wi (Oi − Ei ) ∼ χ(K
2
−1) (5.27)
i=1 i=1
where wi = diag(wi ) for a (K − 1) × (K − 1) diagonal matrix.

The test statistic Eq. (5.27) is used as a generalized formula for testing survival
curves of K groups with the choice of weight. As a standardized expression, Eq. (5.27)
can also be applied to a two-group comparison as a special case with K = 2.
When the weight wi = 1, this test becomes the Mantel–Haenszel or log-rank test,
for wi = ni it gives the Gehan–Wilcoxon test, and for wi = ŜKM (ti ), where ŜKM (ti )
is the Kaplan–Meier estimator of survival function for the pooled sample, the test
changes to the Peto–Peto modification of the Wilcoxon test (MathSoft 1998; Liu
2012).
Numerical examples on a comparison of survival functions for more than two
groups are given, for example, in Klein and Moeschberger (2003) and Hosmer and
Lemeshow (1999).
References
Aalen O (1976) Nonparametric inference in connection with multiple decrement models. Scand J
Stat 3:15–27
Ben-Daya M, Kumar U, Murthy DNP (2016) Introduction to maintenance engineering: modelling,
optimization and management. Wiley
London Limited
Fleming T, Harrington DP (1984) Nonparametric estimation of the survival distribution in censored
data. Commun Stat 13(20):2469–2486
Gehan EA (1967) A generalized Wilcoxon test for comparing arbitrarily single singly censored
samples. Biometrica 52:203–223
Gibbons JD, Chakraborti S (2003) Nonparametric statistical inference. Chapman and Hall/CRC
Harrington DP, Fleming TR (1982) A class of rank test procedures for censored survival data.
Biometrika 69:553–566
Hosmer DW, Lemeshow S (1999) Applied survival analysis: regression modeling of time-to-event
data. Wiley
Kalbfleisch JD, Lawless JF (1996) Statistical analysis of warranty claims data. In: Blischke WR,
Murthy DNP (eds) Product warranty handbook. M. Dekker, New York, pp 231–259
Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Wiley, New York
Kalbfleisch JD, Lawless JF, Robinson JA (1991) Methods for the analysis and prediction of warranty
claims. Technometrics 33:273–285
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat
Assoc 53:457–481
Karim MR, Suzuki K (2005) Analysis of warranty claim data: a literature review. Int J Qual Reliab
Manag 22(7):667–686
Karim MR, Yamamoto W, Suzuki K (2001) Statistical analysis of marginal count failure data.
Lifetime Data Anal 7:173–186
2nd edn. Springer
Lawless JF (1998) Statistical analysis of product warranty data. Int Stat Rev 66:41–60
Liu X (2012) Survival analysis: models and applications. Wiley, UK
Makkonen L (2008) Bringing closure to the plotting position controversy. Commun Stat Theory
Methods 37:460–467
References 93
Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from retrospective studies
of disease. J Natl Cancer Inst 22(4):719–748
MathSoft, Inc (1998) S-PLUS 5 for UNIX guide to statistics, vol 2. Data Analysis Products Division
MathSoft, Inc, Seattle, Washington
Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley Interscience, New
York
Murthy DNP, Jack N (2014) Extended warranties, maintenance service and lease contracts: modeling
and analysis for decision-making. Springer, London
Nelson W (1969) Hazard plotting for incomplete failure data. J Qual Technol 1:27–52
Nelson W (1972) Theory and application of hazard plotting for censored survival data. Technomet-
rics 14:945–966
Nelson W (1982) Applied life data analysis. Wiley, New York
Peto R, Peto J (1972) Asymptotically efficient rank invariant test procedures. J R Stat Soc A
135:185–207
Prentice RL (1978) Linear rank tests with right censored data. Biometrika 65:167–179
Ruhi S (2016) Application of complex lifetime models for analysis of product reliability data.
Unpublished doctoral dissertation, University of Rajshahi, Bangladesh
Sullivan L (2019) Survival analysis. Available at Boston University School of Public Health website
http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Survival/. Accessed 28 May 2019
Suzuki K, Karim MR, Wang L (2001) Statistical analysis of reliability warranty data. In: Rao CR,
Balakrishnan N (eds) Handbook of statistic: advances in reliability. Elsevier, Amsterdam
Tarone RE, Ware JH (1977) On distribution-free tests for equality for survival distributions.
Chapter 6
Probability Distribution of Lifetimes:
Censored and Left Truncated
Abstract This chapter discusses the maximum likelihood estimation method for
analyzing the censored and truncated data using some common lifetime distribu-
tions. The likelihood functions under the schemes of different types of censoring and
truncation constructed in Chap. 4 will be applied in this chapter.
6.1 Introduction
In Chap. 4, we have discussed the failure time distributions for uncensored data.
Now, we discuss the failure time distributions for right-censored and left-truncated
data. The construction of likelihood functions is shown in the previous chapter under
the schemes of different types of censoring and truncation. In this chapter, two most
extensively used types of censoring, Types I and II, will be considered along with
left truncation. The left-truncated and right-censored observations are sometimes
found in the same data set and need to be taken into account jointly for estimating
parameters of a lifetime distribution. In this chapter, some examples are provided.
The outline of the chapter is as follows: Sect. 6.2 presents the exponential dis-
tribution with a discussion on the statistical inference on its parameter for censored
data. Section 6.3 explains the extreme value and Weibull distributions. The normal
and lognormal distributions are presented in Sect. 6.4. Section 6.5 deals with the
gamma distribution for censored and left-truncated data.
6.2 Exponential Distribution
The probability density function of one-parameter exponential distribution is
f (t) = λe−λt , t ≥ 0 (6.1)

96 6 Probability Distribution of Lifetimes: Censored and Left Truncated
where λ > 0 is a scale parameter. As discussed in Chap. 4, the cumulative distribution

function, survival function and hazard function of this distribution are, respectively,
F(t) = 1 − e−λt , S(t) = e−λt , and h(t) = λ.
6.2.1 Estimation of Parameter: Type II Censoring
Let T1 , . . . , Tn be a random sample from an exponential distribution with possible

observed values t1 , . . . , tn , respectively. Let us now consider that out of n observed
values, the first r failure times are observed completely, and the remaining (n − r)
are observed partially up to the time of censoring, these (n − r) incomplete times
are censored. The first r observations in an ordered sequence of failure times from
the smallest to the largest are denoted by t(1) < · · · < t(r ) . For each of the remaining
(n − r) incomplete observations, the only available information is that each of them
survived up to time t(r ) . Thus, the probability density function for the complete
observations is f (t) = λe−λt , t ≥ 0, and the survival function at time t(r ) for the
incomplete observation is S t(r ) = e−λt(r ) . Then, the likelihood function under Type
II censoring is
r r
(n−r ) −λt(r ) (n−r )
−λt(i)
L= f t(i) S t(r ) = λe e . (6.2)
i=1 i=1

r
ln L = r ln λ − λ t(i) − λ(n − r )t(r ) . (6.3)
i=1
Then, the estimator of λ is obtained by differentiating the log likelihood function

with respect to λ and equating to 0 as shown below
r
r
∂ ln L
= − t(i) − (n − r )t(r ) = 0.
∂λ λ i=1
Solving for λ, we obtain the maximum likelihood estimator under Type II cen-
soring scheme as
r
λ̂ = r . (6.4)
i=1 t (i) + (n − r )t(r )
Under Type II censoring, the mean time to failure is
1 t
MTTF = =
λ̂ r
r t is the total time, including both uncensored and censored, denoted by t =

where
i=1 t(i) + (n − r )t(r ) . It may be noted that in estimating the mean time to failure
although the numerator takes into account r complete and (n − r) incomplete times,
in the denominator, r is considered (the number of complete observations), not n (the
size of sample). We may show easily that t is a sufficient statistics for the parameter
λ and the r independent λTi ’s are also exponential leading to the distribution of
2λt ≈ χ2r 2
.
6.2.1.1 Test and Construction of Confidence Intervals
We know that the maximum likelihood estimator of λ is

r r
λ̂ = r = .
i=1 t (i) + (n − r )t (r ) t
Using rλ̂λ as a pivotal quantity where rλ

λ̂
≈ χ2r
2
, we can construct the (1 − α) × 100
percent two-sided confidence interval

χ2r,α/2
2
χ2r,1−α/2
2
P ≤λ≤ = 1 − α. (6.5)
2t 2t
6.2.2 Estimation of Parameter: Type I Censoring
Let the random sample of n lifetimes be T1 , . . . , Tn , and let c1 , . . . , cn be correspond-

ing censoring times. We observe true lifetime or complete observation if Ti ≤ ci and
the censoring time or incomplete observation if Ti > ci . In reality, we can observe
only the values of the pair of variables (ti , δi ) where
ti = min (Ti , ci ),
and

1, if ti = Ti
δi = .
0, if ti = ci
Then, the likelihood function can be obtained as follows for Type I censoring

n
n

L= Li = f (ti )δi S(ti )1−δi
i=1 i=1
where

f (ti ), if δi = 1
Li = .
S(ti ), if δi = 0
Example 6.1 The likelihood function for the exponential distribution is shown as
n
−λti δi −λti 1−δi

n
−λ ti
L= λe e = λr e i=1 = λr e−λt (6.6)
i=1
n n
where r = i=1 δi and t = i=1 ti .
As shown in Chap. 4, the maximum likelihood estimator of λ based on the above
likelihood function is
r
λ̂ = , (6.7)
t
and the Fisher information can be obtained as
2
∂ ln L E(r )
I (λ) = E − = 2 . (6.8)
∂λ2 λ
It can be assumed that the number of uncensored observations, r, is distributed

binomially with parameters, n and p, and the expected value of r is E(r ) = np. Here,
p is the probability that the ith failure time is uncensored which is
∞
p = P(δi = 1) = f (u i )[1 − F(u i )] du i
0
and the probability of being uncensored is P(δi = 0) = 1 − p. Hence, the Fisher

information is
E(r ) np
I (λ) = = 2. (6.9)
λ2 λ
It may be noted here that we can define the observed information as − ∂ ∂λln2 L =
2
r
λ2
. Sometimes, we use the observed information instead of Fisher information for
computational convenience.
6.2.2.1 Tests
Some tests for the parameter of an exponential distribution in the presence of censor-
ing are shown below. Both the large sample and small sample tests are introduced.
Large Sample Tests
(i) In the previous section, we have shown that the Fisher information is I (λ) =
E(r )/λ2 and for large sample size, n, the standardized form
λ̂ − λ
W =
I −1 (λ)
is asymptotically N(0, 1). For testing the null hypothesis H0 : λ = λ0 , we can

replace I (λ) with I λ̂ . The 95% confidence interval is

1 1
P λ̂ 1 − z 1−α/2 < λ < λ̂ 1 + z 1−α/2 = 1 − α. (6.10)
r r
(ii) An alternative
to the above test for the large sample (i), we can assume that
ln λ̂ ∼ N ln λ, r1 where
∂ ln λ 2 λ2 1 1
Var ln λ̂ Var λ̂ = I −1 (λ) = = .
∂λ np λ2 np

Here, p̂ = r/n and estimated variance is Var̂ ln λ̂ 1/r . Hence, for testing the
null hypothesis H0 : λ = λ0 , we can use the following under H0
ln λ̂ − ln λ0
W = √ .
1/r
Here, W is asymptotically N (0, 1). The (1 − α)100% confidence interval for ln λ is

1 1
P ln λ̂ − z 1−α/2 < ln λ < ln λ̂ + z 1−α/2 = 1−α.
r r
From the above confidence limits for ln λ, we can obtain the (1−α)100% confidence
limits for λ as follows:
√1 √1
λ̂e−z1−α/2 r , λ̂ez1−α/2 r . (6.11)
(iii) If there are two samples from populations 1 and 2 drawn from two exponential
distributions with parameters λ1 and λ2 , respectively, then we may be interested
in testing for the equality of the parameters. Let us consider sample sizes in
samples 1 and 2 be n1 and n2 and number of failures be r 1 and r 2 , respectively.
The null hypothesis is the equality of two parameters H0 : λ1 = λ2 .

Let ln λ̂1 ∼ N ln λ1 , r11 and ln λ̂2 ∼ N ln λ2 , r12 and it can be shown that
1 1
Var ln λ̂1 − ln λ̂2 Var ln λ̂1 + Var ln λ̂1 = + .
r1 r2
Then under the null hypothesis H0 : λ1 = λ2 , the test statistic is
ln λ̂1 − ln λ̂2
W = .
1
r1
+ 1
r2
Here, W is asymptotically N (0, 1). The (1 − α)100% confidence interval for ln λ1 −

ln λ2 is

1 1
P ln λ̂1 − ln λ̂2 − z 1−α/2 + < ln λ1 − ln λ2
r1 r2

1 1
< ln λ̂1 − ln λ̂2 + z 1−α/2 + = 1−α.
r1 r2
From the above confidence limits for ln λ1 − ln λ2 , we can obtain the (1 − α)100%
confidence limits for λ1 − λ2 as follows:

−z 1−α/2 r1 + r1 z 1−α/2 r1 + r1
λ̂1 − λ̂2 e 1 2 , λ̂1 − λ̂2 e 1 2 . (6.12)
(iv) A test for both small and large samples can be developed using the sug-
gestion made by Sprott (1973). Lawless (2003) illustrated the test for the
1
exponential distribution. Sprott (1973) indicated that φ̂ = λ̂ 3 provides a
better approximation to a normal distribution for even small samples. The
1
distribution of φ̂ is approximately normal with mean φ = λ 3 and variance
2
Var φ̂ = ∂φ ∂λ
Var λ̂ .
The Fisher information is

2
∂ ln L E(r ) np
I (λ) = E − = 2 = 2.
∂λ 2 λ λ
We can show that

λ2
Var λ̂ I −1 (λ) =
np
and
1 2 2 λ2 2
λ3 φ2
Var φ̂ = λ− 3 = = .
3 np 9np 9np
The test statistic is
φ̂ − φ
Z = 1/2 ∼ N (0, 1).
φ̂2
9np

φ̂ 2
If we consider the observed information, then Var̂ φ̂ 9r
and the test statistic
is
φ̂ − φ
Z = 1/2 ∼ N (0, 1).
φ̂2
9r
The (1 − α)100% confidence interval for φ is

√ √
P φ̂ − z 1−α/2 φ̂/3 r < φ < φ̂ + z 1−α/2 φ̂/3 r = 1 − α .
1
As φ = λ 3 which implies that λ = φ 3 and the estimator is λ̂ = φ̂ 3 , the (1−α)100%
confidence interval for λ can be shown as
√ 3 √ 3
P λ̂ 1 − z 1−α/2 /3 r < λ < λ̂ 1 + z 1−α/2 /3 r = 1−α. (6.13)
(v) The likelihood ratio method can be used for either tests or interval estimation.
Under the hypothesis H0 : λ = λ0 , the likelihood ratio statistic is
Fig. 6.1 Fitted pdf, cdf, reliability function and hazard function of exponential distribution for
battery failure data
L(λ0 )
= −2 ln ∼ χ12 ,
L λ̂
where L(λ0 ) is the likelihood under H0 and L(λ̂) is the likelihood under H1 .
0)
The likelihood ratio = −2 ln L(λL(λ)
can be expressed as

= 2r λ̂/λ − 1 − ln λ̂/λ ∼ χ12
and an approximate 95% confidence interval for λ can be obtained from the following
inequality

= 2r λ̂/λ − 1 − ln λ̂/λ ≤ χ12 . (6.14)
Fig. 6.2 Exponential probability plot for the battery failure data
Example 6.2 In this example, we consider the battery failure data of Table 5.1, for
which n = 54, with r = 39 failure times and n − r = 15 censored times. The unit of
measurement is the number of days. Nonparametric analysis of this data set is given in
Example 5.2. Our objective in this example is to analyze these data using exponential
distribution. The likelihood function (6.6) and hence the Eq. (6.7)
n can be used for
finding the MLE of the parameter λ. The MLE of λ is λ̂ = r/ i=1 ti = 39/33,256
= 0.001172403. The estimate of MTTF is 1/λ̂ = 1/0.001172403 = 852.949 days.
The standard error of MTTF is 136.581, and the 95% normal confidence interval for
the MTTF is [623.192, 1167.410].
Figure 6.1 shows the ML estimates of the pdf, cdf, reliability function, and hazard
function of the fitted exponential distribution for the battery failure data.
The exponential probability plot for the battery failure data given in Fig. 6.2
indicates that the exponential distribution is not a suitable distribution for the data.
The search for a better model for this data set requires further investigation.
6.3 Extreme Value and Weibull Distributions
The Weibull distribution (Weibull 1951) is one of the most used lifetime distributions
in both reliability and survival analyses. The Weibull distribution has extensive use in
lifetimes and fatigue of electronic devices as well as in longitudinal data on survival
times of human being. On the other hand, the extreme value distributions or Gumbel
distributions are closely related to the Weibull distribution. It has been shown as
a special case of the Weibull distribution under certain transformations and can be
represented equivalently.
A two-parameter Weibull probability density function is
α
f (t|α, β) = αλ(λt)α−1 e−(λt) , t ≥ 0. (6.15)
where α is the shape and λ is the scale parameter of the distribution. Using the
following transformations, we can show the equivalence between the Weibull and
the extreme value distributions: Y = ln T, u = ln(1/λ) and b = 1/α. We can also
show that T = eY , λ = e−u and α = 1/b. Then, the extreme value distribution can
be expressed as
(y−u)/b
f (y|u, b) = e(y−u)/b e−e , −∞ < y < ∞ (6.16)
where u is the location and b is the scale parameter. The reliability or survival func-
tions of the Weibull and the extreme value distributions are
α
Weibull: R(t) = S(t) = 1 − F(t) = e−(λt)
(y−u)/b
Extreme Value: R(y) = S(y) = 1 − F(y) = e−e .
The hazard functions are
Weibull: h(t) = αλ(λt)α−1
Extreme Value: h(y) = b1 e(y−u)/b .
The likelihood function under Type II censoring is discussed here. Let t(1) <
· · · < t(r ) be the r smallest ordered lifetimes from a random sample of size n from
the Weibull distribution. The r smallest log lifetimes are y(1) = ln t(1) , . . . , y(r ) =
ln t(r ) . It may be noted here that t(r +1) = t(r ) , . . . , t(n) = t(r ) which are censored.
Equivalently, we can show that y(r +1) = y(r ) , . . . , y(n) = y(r ) for log lifetimes. The
r smallest ordered log lifetimes from an extreme value distribution can be written as
y(1) < · · · < y(r ) . The likelihood function for partially censored extreme values is
( y(i) −u )/b − e( y(i) −u)/b −(n−r )e( y(r ) −u)/b

r r
1
L = rei=1 e i=1 e
b
and the log likelihood is

r

r
ln L = −r ln b + y(i) − u /b − e( y(i) −u )/b − (n − r ) e( y(r ) −u )/b . (6.17)
i=1 i=1
Differentiating the log likelihood function with respect to u and equating to 0, the
likelihood equation for estimating u is
r
∂ ln L r e( y(i) −u )/b (n − r ) e( y(r ) −u )/b
=− + i=1
+ =0
∂u b b b
6.3 Extreme Value and Weibull Distributions 105
and the estimator is

n
e y(i) /b̂
û = b̂ ln i=1
. (6.18)
r
Similarly, differentiating the log likelihood function with respect to b and equating
to 0, the likelihood equation for estimating b is
r n ( y(i) −u ) ( y(i) −u)

∂ ln L r y(i) − u e b
=− − i=1
+ i=1 b = 0. (6.19)
∂b b b 2 b
We need to solve for the estimates of u and b simultaneously.
Equivalently, the likelihood equations for the Weibull parameters are
∂ ln L n n n n
= + n ln λ + ln ti − ln λ (λti )α − (λti )α ln ti = 0, (6.20)
∂α α i=1 i=1 i=1
∂ ln L rα n
= − αλα−1 α
t(i) = 0. (6.21)
∂λ λ i=1
The estimator for λ is

α̂1
r
λ̂ = n α̂
. (6.22)
i=1 t(i)
Lawless (2003) suggested the likelihood ratio test for the null hypothesis H0 :
b = b0 where the likelihood ratio test statistic is
−2 ln = −2[ln L 0 − ln L 1 ]

which is asymptotically χ12 . Here, L 0 = L û(b0 ), b0 and L 1 = L û, b̂ .
The likelihood function for right-censored and left-truncated data is shown in this
example for extreme value distribution (see Balakrishnan and Mitra 2012 for details).
The extreme value distribution is shown below
1 (y−u)/b −e(y−u)/b
f (y|u, b) = e e , −∞ < y < ∞, −∞ < u < ∞, b > 0
b
(6.23)
where Y = ln T , T is the lifetime that follows Weibull distribution, u and b are loca-
tion and scale parameters, respectively. Now let us define δi = 0 for right censored
and δi = 1 for uncensored, tlti be the left-truncation time, S1 be the index set for not
left truncated and S2 be the index set for left truncated, then the likelihood function
for left-truncated and right-censored data is
1 yi −u
yi−u
δi yi−u 1−δi
L= b −e e−e
b b
e
i∈S1
σ
δi tlti−u yi−u 1−δi
1 e tlti−u yi −u
yi−u
−e b
× × ee e−e .
b b b
e e b (6.24)
i∈S2
b
Let us denote νi = 0 for the ith item truncated and νi = 1 for the ith item not
truncated, then the log likelihood function becomes
n
n
yi − u yi −u tlti −u
ln L = −δi ln b + δi −e b + (1 − vi )e b . (6.25)
i=1
b i=1
6.3.1 Estimation of Parameters: Type I Censoring
Like the likelihood function of the exponential distribution, discussed in Sect. 6.2.2,
the likelihood function of Weibull distribution for Type I censoring becomes

n

L= f (ti )δi S(ti )1−δi (6.26)
i=1
and the maximum likelihood estimating equations are shown in Eqs. (6.20) and
(6.21). Solving those equations, we obtain the estimates of the parameters of Weibull
distribution.
Example 6.3 In this example, Weibull distribution is applied to analyze the battery
failure data of Table 5.1 (also discussed in Example 6.2). The likelihood function
(6.26) and hence the Eqs. (6.20) and (6.21) can be used for finding the MLE of the
parameters.
The MLEs of the shape parameter, α̂ = 1.9662, and scale parameter, λ̂ = 836.3442.
The estimate of MTTF is 741.449 days. The MLE of the shape parameter of the
Weibull distribution is 1.9662, which is greater than one, indicates increasing failure
rate for the battery with respect to its age. Also, the shape parameter is much higher
than one and implies that the exponential distribution is not a suitable distribution
for the lifetime of the battery (as explained in Example 6.2).
Figure 6.3 shows the ML estimates of the pdf, cdf, reliability function, and hazard
function of the fitted Weibull distribution for the battery failure data.
6.3 Extreme Value and Weibull Distributions 107
Fig. 6.3 Fitted pdf, cdf, reliability function and hazard function of Weibull distribution for battery
failure data
The Weibull probability plot for the battery failure data is given in Fig. 6.4. The
roughly linear pattern of the data on the Weibull probability paper suggests that
the Weibull model may be a reasonable choice for modeling the lifetime of the
battery data. Comparing Fig. 6.4 with Fig. 6.2, it can be concluded that the Weibull
distribution appears to be better than the exponential distribution for this data set.
6.4 Normal and Lognormal Distributions
The lognormal distribution has become one of the most popular life distribution
models in reliability for many high technology applications such as semiconductor
degradation failure mechanisms. The use of lognormal distribution is quite satis-
factory for modeling fatigue failures and crack propagation. We observe the use
of lognormal distribution in modeling failure times of electrical insulation as well.
Although we cannot use the normal distribution directly in reliability and survival
analyses due to the fact that the data are always nonnegative, still it is important
Fig. 6.4 Weibull probability plot for the battery failure data
because we can transform lognormal data to normal data simply by logarithmic

transformation. In that case, the desirable properties of the normal distribution can
be of great help in analyzing data.
Let us assume that lifetimes T1 , . . . , Tn constitute a random sample of size n
from a lognormal distribution. Let Y = ln T such that Y ∼ N (μ, σ 2 ), then T has a
lognormal distribution. In Chap. 3, the lognormal distribution is introduced with an
estimation procedure for complete data. The probability density functions for both
normal for Y and lognormal for T = eY are shown as
1
e− 2σ 2 (y−μ) , −∞ < y < ∞
1 2
f (y) = √ (6.27)
2π σ 2
and
1
e− 2σ 2 (ln t−μ) , t > 0.
1 2
f (t) = √ (6.28)
t 2π σ 2
Let us now define the probability density function, f (z), and the reliability func-
tion R(z) or S(z), where Z = Y −μ
σ
then
1
f (z) = √ e−z /2 , −∞ ≤ z ≤ ∞
2
2π
6.4 Normal and Lognormal Distributions 109
and
∞

R(z) = S(z) = f z
dz
.
z
Then, the likelihood function is of the following form

r
2
n ∞
1 − 2σ12 (ln t(i) −μ) 1
e− 2σ 2 (ln u (i) −μ) du (i)
1 2
L= √ e √
i=1 t(i) 2π σ 2 i=r +1 u (i) 2π σ 2
ln t(i)
n
y(i) − μ
r
1 y(i) − μ
= f R (6.29)
i=1
σ σ i=r +1
σ

It may be noted here that R(t) = P(T ≥ t) = P eY ≥ t = P(Y ≥ ln t).

1 2
r n
y(i) − μ
ln L = −r ln σ − y(i) − μ + ln R . (6.30)
2σ 2 i=1 i=r +1
σ
Differentiating the log likelihood with respect to μ and σ , we obtain the likelihood
equations as shown below

1 1
r n
∂ ln L y(i) − μ y(i) − μ
= 2 y(i) − μ + f /R =0
∂μ σ i=1 σ i=r +1 σ σ
and
1 2
r
∂ ln L r
=− + 3 y(i) − μ
∂σ σ σ i=1
n
1 y(i) − μ y(i) − μ y(i) − μ
+ f /R = 0.
σ i=r +1 σ σ σ
The construction of the likelihood function for left truncated and right censored
in case of the lognormal distribution is shown in Chap. 4 , (also in Balakrishnan and
Mitra 2011, 2014), where the log likelihood function is
n
1 yi − μ
ln L(μ, σ ) = −δi ln σ − δi 2 (yi − μ)2 + (1 − δi ) ln 1 − F
i=1
2σ σ

tlti − μ
− ln 1 − F , (6.31)
i∈S
σ
2
where μ and σ are location and scale parameters, respectively, f (·) and F(·) are
probability density and cumulative distributions of the standard normal distribution,
respectively, δi = 0 for right censored and δi = 1 for uncensored, tlti is the left-
truncation time, S1 is the index set for not left truncated and S2 is the index set for
left truncated. Let us denote νi = 0 for the ith item truncated and νi = 1 for the ith
item not truncated, then the score equations are
y −μ
∂ ln L n
δi (1 − νi ) f i σ
= (yi − μ) −
∂μ i=1
σ2 σ 1 − F yi σ−μ
y −μ
(1 − δi ) f i σ
+ = 0, (6.32)
σ 1 − F yi σ−μ

∂ ln L n
δi δi f tltiσ−μ tlti − μ
= − + 3 (yi − μ) − (1 − νi )
2

∂σ i=1
σ σ 1 − F tltiσ−μ σ2

n
f yi σ−μ yi − μ
+ (1 − δi ) yi −μ = 0. (6.33)
i=1
1− F σ σ2
6.5 Gamma Distribution
The two-parameter gamma probability density function for failure time data is
1 γ γ −1 −λt
f (t|λ, γ ) = λ t e , t ≥0 (6.33)
γ
∞
where λ > 0 and γ > 0 and γ = 0 t γ −1 e−t dt.
Let us consider Type II sample data of size n are drawn from a gamma dis-
tribution. The sample of failure times in ordered form can be shown as t(1) <
t(2) < · · · < t(r ) , and the remaining (n − r) observations are censored which are
t(r +1) = t(r ) , . . . , t(n) = t(r ) . Following the approach suggested by Wilk et al. (1962),
and Gross and Clark (1975) showed the likelihood function as
⎡ ⎤n−r
r (γ −1) ∞
n! τ nγ
G
L(τ, γ ) = e(−dτ A) ⎣ t γ −1 e−τ t dt ⎦ (6.34)
(n − r )! γ n t(rr )
1
where
r !r 1/r
i=1 t(i) i=1 t(i)
τ = λt(r ) , A= , G= .
r t(r ) t(r )
6.5 Gamma Distribution 111
Based on Wilk et al. (1962), and Gross and Clark (1975), Lawless (1982) suggested
estimation procedures for τ and γ or more specifically for λ and γ . The procedure
is tedious and hence an alternative procedure can be used as proposed by Mitra
(2012) and Balakrishnan and Mitra (2012, 2013). Mitra (2012) proposed a likelihood
function for left-truncated and right-censored data. To keep relevance with the pdf
mentioned above, let us introduce λ = θ1 and γ = κ. Then, the form of the likelihood
function for left-truncated and right-censored data is
" #δi " #1−δi
f (ti ) 1 − F(ti )
L(κ, θ ) = { f (ti )}δi {1 − F(ti )}1−δi × .
i∈S1 i∈S2
1 − F tiL 1 − F tiL
(6.35)
Here, T is the lifetime variable;

S1 is the index set for not truncated, and S2 is the index set for truncated;
νi = 0 if the ith unit is truncated, νi = 1 if the ith unit is not truncated;
tiL = left-truncation time for the ith unit; and
δi = 0 if the ith unit is censored, δi = 1 if the ith unit is uncensored.
The log likelihood for the gamma distribution is
n
ti
ln L(κ, θ ) = δi (κ − 1) ln ti − − κ ln θ + (1 − δi ) ln κ, tiL /θ
i=1
θ

n

− νi ln κ + (1 − νi ) ln κ, tiL /θ . (6.36)
i=1
Mitra (2012) proposed the use of Expectation–Maximization (EM) algorithm to

estimate the parameters, κ and θ . Let us denote λ = (κ, θ ), the vector of parameters
of the gamma distribution. We need the complete data likelihood function without
censoring for EM algorithm which is
" # " #
tiκ−1 e−ti /θ tiκ−1 e−ti /θ
L c (t, λ) = × . (6.37)
i∈S1
θ κ κ i∈S2
θ κ κ, tiL /θ
n
ti
ln L c (κ, θ ) = (κ − 1) ln ti − − κ ln θ
i=1
θ

n

− νi ln κ + (1 − νi ) ln κ, tiL /θ . (6.38)
i=1
Mitra (2012) obtained the following E- and M-steps of the EM algorithm:

The E-step The observed data vector is denoted by t = (t1 , . . . , tn ) where ti =

min(ti , ci ), ci is the censoring time, δ = (δ1 , . . . , δn ) is the vector of censoring
indicators.
If the value of λ at the r-th step is λ(r ) then the E-step objective function
(r )
is Q λ, λ = E λ(r ) [ln L c (t; λ)|t, δ ] which is
⎡ ⎤ ⎡ ⎤
t E (r )
Q λ, λ(r ) = ⎣ (κ − 1)E 1i(r ) ⎦ − ⎣ 2i ⎦
i
(κ − 1) log ti + +
i:δ =1 i:δ =0 i:δ =1
θ i:δ =0
θ
i i i i

n
$ %
− nκ log θ − νi log κ + (1 − νi ) log κ, tiL /θ (6.39)
i=1

where E 1i(r ) = E λ(r ) log Ti |Ti > yi and E 2i(r ) = E λ(r ) [Ti |Ti > yi ]. See Balakrishnan
and Mitra (2013) for further details.

The M-step At the M-step, maximizing Q λ, λ(r ) with respect to θ , we obtain
⎡ L κ −t L /θ ⎤
1 ⎣
n
t e i
θ= ti + E 2i(r ) − (1 − νi ) κ−1i L ⎦ (6.40)
nκ i:δ =1 i:δ =0 i=1
θ κ, ti /θ
i i
which is evaluated at λ(r ) to obtain updated estimate θ (r +1) .

To obtain the updated estimate κ (r +1) , we can use θ (r +1) in the following equation
∂Q (r )
= log ti + E 1i − n log θ − n log θ
∂κ i:δi =1 i:δi =0
" #
n ∂
(κ, tlti /θ )
− νi ψ(κ) + (1 − νi ) ∂κ (6.41)
i=1
(κ, tlti /θ )
where ψ(k) is a digamma function.
References
Balakrishnan N, Mitra D (2011) Likelihood inference for lognormal data with left truncation and
right censoring with an illustration. J Stat Plan Infer 141:3536–3553
Balakrishnan N, Mitra D (2012) Left truncated and right censored Weibull data and likelihood
inference with an illustration. Comput Stat Data Anal 56:4011–4025
Balakrishnan N, Mitra D (2013) Likelihood inference based on left truncated and right censored
data from a gamma distribution. IEEE Trans Reliab 62:679–688
Balakrishnan N, Mitra D (2014) Some further issues concerning likelihood inference for left trun-
cated and right censored lognormal data. Commun Stat Simul Comput 43:400–416
References 113
Gross AJ, Clark VA (1975) Survival analysis: reliability applications in the biomedical sciences.
Wiley, New York
Mitra D (2012) Likelihood inference for left truncated and right censored lifetime data. Ph.D.
dissertation, McMaster University, Ontario
Sprott DA (1973) Normal likelihoods and their relation to large sample theory of estimation.
Weibull W (1951) A statistical distribution function of wide applicability. J Appl Mech 18:293–296
Wilk MB, Gnanadesikan R, Huyett MJ (1962) Separate maximum likelihood estimation of scale
or shape parameters of the gamma distribution using order statistics. Biometrika 50:217–221
Chapter 7
Regression Models
Abstract In both reliability and survival analyses, regression models are employed
extensively for identifying factors associated with probability, hazard, risk, or sur-
vival of units being studied. This chapter introduces some of the regression models
used in both reliability and survival analyses. The regression models include logistic
regression, proportional hazards, accelerated failure time, and parametric regression
models based on specific probability distributions.
7.1 Introduction
In both reliability and survival analyses, regression models are employed extensively
for identifying factors associated with probability, hazard, risk, or survival of units
being studied. The use of linear regression models assuming normality assumption
is very limited in reliability and survival analyses due to the fact that: (i) the lifetime
variables are non-negative and skewed and (ii) the relationship between lifetimes and
explanatory variables are not directly linear. However, if we consider the relationships
between probability, hazard, risk, or survival/failure of units, then regression models
can be used. In this chapter, some of the regression models used very extensively in
both reliability and survival analyses are introduced. The regression models include
logistic regression, proportional hazards, accelerated failure time, and parametric
regression models based on specific probability distributions.
The outline of the chapter is as follows. Section 7.2 presents the logistic regression
model. Section 7.3 explains the proportional hazards model. Section 7.4 discusses the
accelerated failure time model. Section 7.5 deals with parametric regression models,
including the exponential, Weibull, and lognormal regression models.
7.2 Logistic Regression Model
The logistic regression model is one of the most widely used models mainly due to
its simplicity and useful and natural interpretation of the estimates of the parameters.
This model is based on the survival status of a unit over time. Let us consider time

116 7 Regression Models
points; t0 denotes the starting time, and te denotes the endpoint of the study. In other
words, during the process of the study, we observe the survival or failure status at the
beginning and at the end of the study. Let us denote the outcome variable as follows:

1, if the unit fails during the period (t0 , te ),
Y = (7.1)
0, if the unit survives during the period (t0 , te ).
Here Y is a binary variable. Let us consider a sample of size n. The outcomes can
be shown as a vector for n units as Y = (Y1 , . . . , Yn ). Now we may consider a
vector of p explanatory variables or risk factors for Yi , X i = X i1 , . . . , X i p , i =
1, 2, . . . , n. We want to know whether these explanatory variables are associated
with the outcome variable, Y. In other words, the outcome variable, survival status,
observed for each unit is associated with any or all of the corresponding covariates
included in the vector, X. The covariate vector values can be represented by the
following matrix for a sample of size n:
⎡ ⎤
x11 . . . x1 p
⎢ .. ⎥
X =⎣ . ⎦.
xn1 . . . xnp
To find the relationship between the outcome variable, Y, and the covariates, X, let
us define the following probability function:
exi β
P(Yi = 1|X i = xi ) = (7.2)
1 + exi β

where xi = 1 xi1 . . . xi p and β = β0 β1 . . . β p . We can also define
1
P(Yi = 0|X i = xi ) = (7.3)
1 + exi β
which is obvious from the probability of a binary variable, P(Yi = 0|X i = xi ) =

1 − P(Yi = 1|X i = xi ). The ratio of the probabilities P(Yi = 1|X i = xi ) and
P(Yi = 0|X i = xi ) provides us a very important measure, odds of occurrence of
the event. In other words, the odds of occurrence of an event can be obtained from

P(Yi = 1|X i = xi ) exi β 1
= = exi β . (7.4)
P(Yi = 0|X i = xi ) 1 + exi β 1 + exi β
After taking log, we obtain the linear relationship

P(Yi = 1|X i = xi )
ln = xi β. (7.5)
P(Yi = 0|X i = xi )
7.2 Logistic Regression Model 117
This is known popularly as the logit function, and this regression model is known
as the logistic regression model. In Chap. 8, this will be discussed again as a link
function of the generalized linear models.
The interpretation of the parameters of a logistic regression model is very mean-
ingful, and it can be linked with a well-known measure called the odds ratio. Let
us consider the covariate value of an explanatory variable xi j = 0 or 1. Keeping
the values of all other explanatory variables constant, the odds for xi j = 1 can be
obtained as follows
eβ0 +β1 xi1 +···+β j−1 xi, j−1 +β j ×1+···+β p xi p
P Yi = 1 X i = xi , i = j; X i j = 1 1+eβ0 +β1 xi1 +···+β j−1 xi, j−1 +β j ×1+···+β p xi p
=
P Yi = 0 X i = xi , i = j; X i j = 1 1
β0 +β1 xi1 +···+β j−1 xi, j−1 +β j ×1+···+β p xi p
=e .
Similarly, the odds for xi j = 0 is

P Yi = 1 X i = xi , i = j; X i j = 0 1+eβ0 +β1 xi1 +···+β j−1 xi, j−1 +β j ×0+···+β p xi p
=
P Yi = 0 X i = xi , i = j; X i j = 0 1
β0 +β1 xi1 +···+β j−1 xi, j−1 +β j ×0+···+β p xi p
=e .
Then the odds ratio is obtained by taking the ratio of two odds for xi j = 1 and
xi j = 0 as shown below

P Yi = 1 X i = xi , i = j; X i j = 1 P Yi = 1 X i = xi , i = j; X i j = 0
= eβ j .
P Yi = 0 X i = xi , i = j; X i j = 1 P Yi = 0 X i = xi , i = j; X i j = 0
(7.6)
It is clearly evident that

• β j = 0 implies the odds ratio = 1 indicating no association between X j and Y ,
• β j < 0 implies the odds ratio < 1 indicating negative association between X j and
Y, and
• β j > 0 implies the odds ratio > 1 indicating positive association between X j and
Y.
We can use the method of maximum likelihood for estimating the parameters
β = β0 , β1 , . . . , β p . As the outcome variables (Y1 , . . . , Yn ) are binary, the
underlying distribution is the Bernoulli distribution with probability of occurrence
exi β
of the event P(Yi = 1|X i = xi ) = 1+e xi β and the probability of nonoccurrence of
the event P(Yi = 0|X i = xi ) = 1+e1xi β . The likelihood function can be shown as

n

L= {P(Yi = 1|X i = xi )}Yi {P(Yi = 0|X i = xi )}1−Yi
i=1
Yi 1−Yi
n
exi β 1
= . (7.7)
i=1
1 + exi β 1 + exi β
The log likelihood is

n

ln L = yi xi β − ln 1 + exi β − (1 − yi ) ln 1 + exi β
i=1

n

= yi xi β − ln 1 + exi β . (7.8)
i=1
Differentiating the log likelihood with respect to parameters β0 and β j and putting
equal to zero, we obtain the estimating equations
n
∂ ln L exi β
= yi − =0
∂β0 i=1
1 + exi β
n
∂ ln L exi β
= xi j yi − = 0, j = 1, . . . , p. (7.9)
∂β j i=1
1 + exi β
The approximate variance–covariance matrix can be obtained from the observed

information where the (i, j)th element is defined as
x β
∂ 2 ln L n
ei
=− 2
xi j 2 , j = 0, 1, . . . , p;
∂β j2
i=1 1 + exi β
x β
∂ 2 ln L n
ei
=− xi j xik 2 , j = 0, 1, . . . , p; k = 0, 1, . . . , p, j = k.
∂β j ∂βk i=1 1 + exi β
(7.10)
Here, for brevity, it is considered that xi0 = 1.

∂ 2 ln L
The observed information matrix, I, includes the elements I jk = − ∂β .
j ∂βk β=β̂

The approximate variance–covariance can be obtained from Cov̂ β̂ I −1 . The
variances are obtained from the diagonal elements of I −1 , and off-diagonal elements
show the covariance terms.
We can test the following null hypotheses: (i) H0 : β1 = · · · = β p = 0 and (ii)
H0 : β j = 0. The first hypothesis is used to test the significance of the model, and
the second hypothesis tests the statistical significance of a parameter corresponding
to an explanatory variable or covariate. The tests are discussed below.
7.2 Logistic Regression Model 119
(i) The test for H0 : β1 = · · · = β p = 0 is performed by using the likelihood ratio

test
L(H0 ) L0
= = .
L(H1 ) L1
where L(H0 ) = L 0 is the likelihood under the null hypothesis and L(H1 ) = L 1 is
the likelihood under the alternative hypothesis (extended logistic regression model
including the parameters, β1 , …, βp ). The log likelihood ratio test statistic is
−2 ln = −2(ln L 0 − ln L 1 ) (7.11)
which is asymptotically χ p2 .
(ii) The test for H0 : β j = 0 can be conducted by using the following test statistic
β̂ j
t= (7.12)
se β̂ j
which has t distribution with (n-p-1) degrees of freedom. For large n, this test statis-
tic can be shown asymptotically as standard normal. For more on logistic regres-
sion model, see, for example, Agresti (2002, 2015), Hosmer and Lemeshow (2000),
and Kleinbaum and Klein (2010).
7.3 Proportional Hazards Model
In a longitudinal study, we can observe or record the failure times at the time of
occurrence. It is frequently the case that time to failure is dependent on other random
variables—characteristics which are perhaps subject to natural variation, but may
also be under a certain amount of control. These explanatory variables or covariates
influence the lifetime model through the reliability function, and thus by implication
through the hazard function. Common to such models is the notion of a baseline
reliability function which corresponds to the lifetime behavior for some standard or
initializing condition.
Let us denote T be the failure time variable. The vector of failure times for a random
sample of size n is represented by T = (T1 , . .. , Tn ). Let us represent the set of
covariates by a vector X = X1 , X 2 , . . . , X p and the corresponding parameter
vector β = β1 , β2 , . . . , β p . The observed values of T and X are denoted by
T = (t1 , . . . , tn ) and X = x1 , x2 , . . . , x p , respectively. The reliability function
conditional on covariate vector X is defined as
S(t, X ) = P(T ≥ t|X = x ). (7.13)
In the proportional hazards model, proposed by Cox (1972), the combined effect of
the X variables is to scale the hazard function up or down (Islam and Shiha 2018).
The hazard function satisfies
h(t; x) = h 0 (t)g(x) (7.14)
where h 0 (t) is the baseline hazard function. For covariate values x1 and x2 , the hazard
ratio at failure time t is
h(t; x1 ) h 0 (t)g(x1 ) g(x1 )
= = (7.15)
h(t; x2 ) h 0 (t)g(x2 ) g(x2 )
which is independent of time t and depends on the values of the covariates. This is
well-known proportionality assumption of a proportional hazards model.
Cox (1972) proposed the proportional hazards model of the following form
h(t; x) = h 0 (t)exβ . (7.16)
In other words, the hazard function h(t; x) depends on two components: (i) base-
line hazard function of time, h 0 (t), independent of covariates, and (ii) function of
covariates, exβ . The hazard function is more sensitive to any change during a small
period of time which makes it suitable to use for defining the underlying regression
model instead of survivor or reliability function and probability density function.
However, the hazard function can be shown to have relationship with both probabil-
ity density and survivor functions (see Chap. 2). We know that
f (t; x)
h(t; x) =
S(t; x)
which can be shown to have the following relationships
f (t; x) = h(t; x)S(t; x),

S(t; x) = e−H (t;x) ,
and
t
H (t; x) = h(τ ; x)dτ
0
Hence, the survivor function can be shown as

7.3 Proportional Hazards Model 121
t
S(t; x) = e− 0 h 0 (τ )exβ dτ
. (7.17)
Further simplification shows that

t exβ
S(t; x) = e− 0 h 0 (τ )dτ . (7.18)
t
Let S0 (t) = e− 0 h 0 (τ )dτ
; then the survivor function is
xβ
S(t; x) = [S0 (t)]e .
The following linear relationship can be obtained
ln[− ln{S(t, x)}] = ln[− ln{S0 (t)}] + xβ. (7.19)

t
In the above linear relationship, it is clearly evident that S0 (t) = e− 0 h 0 (τ )dτ
depends on the distinct values of failure times and the estimation of every value of
h 0 (t) or S0 (t) remains difficult due to the fact that the function is not specified in the
model and that is why the proportional hazards model is known as a semiparametric
model.
Estimation of Parameters
Let us consider a random sample of size n, and lifetimes are T =
(T1 , T2 , . . . , Tn ). Each time has a corresponding censoring indicator denoted by
δi = 0 for censored and δi = 1 for uncensored. The observed failure times with no
ties can be ordered from the smallest to the largest as t(1) < t(2) < · · · < t(r ) if there
are r uncensored failure times. The covariate values for the ith item are represented
by the vector x(i) = (x(i)1 , .. . , x(i) p ), i = 1, …, r. Then let us define the risk set
prior to each failure as R t(i) = number of items at risk just prior to time t(i) .1 The
conditional probability that item (i) fails at t(i) is
h(t(i) ; x(i) ) ex(i) β

! =! xl β
, i = 1, . . . , r.
l∈R(t(i) ) h(t(i) ; xl ) l∈R(t(i) ) e
We obtain the partial likelihood function as follows

r
ex(i) β
L(β) = ! . (7.20)
i=1 l∈R(t(i) ) e xl β
1 Here R(t (i) ) denotes the risk set, not the reliability function. The reliability function in this section
is denoted by S(t (i) ).
The unspecified parameters, h 0 (t), are canceled out from both the numerator and
denominator.
In the presence of ties, the partial likelihood is

r
esi β
L(β) = ! di
(7.21)
xl β
i=1 l∈R(t(i) ) e
!
where di is the number of ties at time t(i) and si = l∈di xl sum of covariate vector
values for all the failures at time t(i) . This partial likelihood represents the contribution
to likelihood well in case of a small number of ties. The log partial likelihood function
is
⎡ ⎤
r p

r
ln L(β) = x(i) j β j − ln⎣ ex(i) β ⎦. (7.22)
i=1 j=1 i=1 l∈R(t(i) )
The estimating equations are obtained by taking the first derivative of the log partial
likelihood function and equating to 0 similar to the log likelihood for a parametric
form
! x(i) β
∂ ln L(β)
r r
l∈R(t(i) ) xl j e
U j (β) = = x(i) j − ! x(i) β
= 0. (7.23)
∂β j i=1 i=1 l∈R(t(i) ) e
The elements of the information matrix I (β) are I jk , j = 1, . . . , p; k = 1, . . . , p

which are obtained by taking the negative value of the second derivatives of the log
partial likelihood with respect to the regression parameters as shown below
!
∂ 2 ln L(β)
r
l∈R(t(i) xl j xlk ex(i) β
I jk (β) = − = !
∂β j ∂βk i=1 l∈R(t(i) ) ex(i) β
! !

r
l∈R(t(i) xl j ex(i) β l∈R(t(i) xlk ex(i) β
− ! ! . (7.24)
i=1 l∈R(t(i) ) ex(i) β l∈R(t(i) ) ex(i) β
The variance and covariance of the estimators of β can be found approximately from
the inverse of the information matrix

Var β̂ = I −1 β̂ . (7.25)
The estimates are obtained by solving the estimating equations for β1 , . . . , β p using
an iterative method such as Newton–Raphson method of iteration.
Sometimes we need the estimate of survivor function that requires the estimation
of baseline hazard function. Breslow (1972a, b) proposed a method of estimating
survivor function using the estimate of parameters of a proportional hazards model
as shown below
⎡ ⎤

Ĥ0 (t) = ⎣d j e xl β ⎦ (7.26)
t j ≤t l∈R(ti )
where d j is the number of events occurred at time t j and the baseline survivor function
is
Ŝ0 (t) = e− Ĥ0 (t) .
The survivor function is

ex β̂
Ŝ(t, x) = Ŝ0 (t) . (7.27)
sTests
Two tests are generally performed to test for the null hypothesis H0 : β = β0 .
(i) Wald test
If we assume asymptotically normal estimators, then

χW
2
= β̂ − β0 I β̂ β̂ − β0 (7.28)
which is asymptotically χ p2 if the null hypothesis is true.

(ii) Score test
The score test is based on the scores under the null hypothesis. The test statistic
is defined as
χ S2 = U (β0 ) I −1 (β0 )U (β0 ) (7.29)
which is also asymptotically χ p2 under H0 .

The test for β j is performed for H0 : β j = 0, and the test statistic is
β̂ j
W = (7.30)
se β̂ j
which is asymptotically N (0, 1).

(iii) Assessing the assumption of proportionality

Let us consider the proportional hazards model
h(t; x) = h 0 (t)eβ1 x1 +···+β p x p +γ1 x1 g(t)+γ2 x2 g(t)+···+γ p x p g(t)
where g(t) is a function of time such as g(t) = t or g(t) = ln t (Kleinbaum and

Klein 2012). The null hypothesis H0 : γ j = 0, j = 1, 2, …, p can be tested by using
the Wald test as follows
γ̂ j
W =
se γ̂ j
which is asymptotically N (0, 1).

To test for the null hypothesis H0 : γ1 = γ2 = · · · = γ p = 0, the likelihood ratio
test can be used as shown below
−2 ln = −2(ln L 0 − ln L 1 )
where L 0 is the likelihood under the null hypothesis and L 1 is the likelihood under
the extended proportional hazards model including the parameters γ1 , . . . , γ p . The
likelihood ratio test statistic is asymptotically χ p2 .
If there is violation of proportionality assumption due to variable, say X p , then
stratified proportional hazards model can be used to control by stratification of the
predictor X p keeping (p − 1) remaining variables in the model that do not violate
the assumption. The stratified proportional hazards model is represented by
h s (t; x) = h 0s (t)eβ1 x1 +···+β p−1 x p−1
where s = 1, 2, …, S, S is the number of strata (number of categories of the variable that

does not satisfy proportionality assumption). The baseline hazard function is different
for each stratum, while other parameters in the model remain unchanged. The partial
likelihoods for S strata are multiplied to obtain the overall partial likelihood. The
partial likelihood for the sth strata is

k
exi β
Ls = ! (7.31)
i=1 l∈R (t(i) ) e xl β
and the overall partial likelihood is

S
L(β) = L s (β).
s=1
The estimates are obtained by taking first derivatives with respect to parameters
of the model and equating to zero.
Prentice et al. (1978) and Farewell (1979) extended the proportional hazards for
competing causes. The cause-specific hazard function is defined by
h s (t; x) = lim P{t ≤ T ≤ t + t, S = s|T ≥ t, x}/ t, s = 1, . . . , S.

t→0
The function h s (t; x) gives the instantaneous failure rate from cause s at time t,
given the vector of explanatory variables, X, in the presence of other failure types.
Assuming distinct failure types, the overall hazard function can be expressed in
terms of cause-specific hazard function (Prentice et al. 1978) as

S
h(t; x) = h s (t; x).
s=1
The overall survivor function is

t
− h(τ ;x)dτ
S(t; x) = e 0
and the probability density function for time to failure and cause of failure
f s (t; x) = h s (t; x)S(t; x).
The cause-specific hazard function can be expressed as follows
h s (t; x) = h s (t)exβs
and with time-dependent covariate vector x(t), this can be shown as
h s (t; x) = h 0s (t)ex(t)βs

where βs = βs1 , . . . , βsp is the vector of regression coefficients corresponding to
the observed values of the covariate vector xs for the failure of type s (s = 1, 2, …, S).
Let the ordered failure times for failures of type s (s = 1, …, S) are t(s1) < · · · < t(sks ) ;
then the partial likelihood is

S
ks
exsi βs
L= ! (7.32)
s=1 i=1 l∈R (t(si) ) exl βs

where k S is the total number of failures due to cause S and R t(si) is the risk set for
a failure due to cause s at time t(si) .
Table 7.1 Hypothetical data

No. Age (T ) Gender (x) Status (δ)
for applying partial likelihood
approach 1 61 Male Death
2 62 Female Censored
3 63 Female Death
4 64 Male Death
5 65 Male Censored
Example 7.1 This example illustrates the ideas of construction of partial likelihood
and estimation of the parameter with hypothetical data. Let a group of five patients
with chronic kidney disease was observed for 5 years from their ages 60 years. The
hypothetical data given in Table 7.1 show the age, gender, and status of the patients.
Assume the hazard function, h(t; x) = h 0 (t)exβ , given in (7.16), for the data
where t denotes age, and x = {0 for males and 1 for females}. First, we derive the
partial likelihood for these observations and then find the MLE of the parameter β
based on the above model.
Since there are three deaths, the partial likelihood will be the product of three
terms—one in respect of each age at which deaths occur. For the first death, the
contribution to the partial likelihood is
h 1 (61|z = 0)
L1 = .
h 1 (61|z = 0) + h 2 (61|z = 1) + · · · + h 5 (61|z = 0)
This gives the ratio of the hazard probability for the patient who dies at the youngest
age and the total hazard probability for those patients alive at that age. L 1 is equivalent
to
h 0 (61) 1
L1 = = .
h 0 (61) + h 0 (61)eβ + h 0 (61)eβ + h 0 (61) + h 0 (61) 3 + 2eβ
Similarly, for the second death, the contribution of the partial likelihood is
h 3 (63|z = 1) eβ
L2 = = .
h 3 (63|z = 1) + h 4 (63|z = 0) + h 5 (63|z = 0) 2 + eβ
Finally, for the third death, the contribution to the partial likelihood is
h 4 (64|z = 0) 1
= .
h 4 (64|z = 0) + h 5 (64|z = 0) 2
The partial likelihood, L, is the product of three terms, and it becomes

1 eβ 1 eβ
L = L1 × L2 × L3 = × × = C .
3 + 2eβ 2 + eβ 2 2eβ + 3 eβ + 2
where C is a constant.
The log likelihood is

ln L ∝ β − ln 2eβ + 3 − ln eβ + 2 .
Differentiating ln L with respect to β and equating to 0, we get
2eβ eβ
1− β − β =0
2e + 3 e + 2
β β
2e + 3 e + 2 − 2eβ eβ + 2 − eβ 2eβ + 3
or, =0
2eβ + 3 eβ + 2
or, 2e2β + 4eβ + 3eβ + 6 − 2e2β − 4eβ − 2e2β − 3eβ = 0
or, 6 − 2e2β = 0.
This gives the MLE of β as β̂ = ln(3)/2 = 0.5493. The asymptotic variance of

β̂ is given by
⎡ ⎤−1
∂ 2 ln L −1 β̂ β̂
⎢ 6e 2e ⎥
Var β̂ = E − = = ⎣ 2 + 2 ⎦ = 2.0104.
∂β 2 β=β̂ 2eβ̂ + 3 eβ̂ + 2
So the asymptotic standard error of β̂ is (2.0104)1/2 = 1.4179.

According to this model, at age t, the ratio of the hazards for a male patient and a
female patient is
h male (t) h 0 (t)

= = exp(−0.5493) = 0.5774.
h female (t) h 0 (t) exp(0.5493)
That is, these hypothetical data indicate that the hazard probability for a male
patient is 42.26% lower than that of a female patient.
7.4 Accelerated Failure Time Model
In a proportional hazards model, a multiplicative intensity is considered where a

common hazard function, h 0 (t), is multiplied by a function of covariates, g(x) =
exβ . Hence, the role of a covariate is to modify the hazard by its effect. As the
baseline hazard function is left out during the process of estimation considering it
a nuisance parameter. This restricts the use of the proportional hazards model for
prediction purposes. Another limitation of the proportional hazards model is the need
for satisfying the proportionality assumption which is violated very often in reality.
An alternative to the proportional hazards model is the accelerated failure time (AFT)
model. In an accelerated failure time model, we consider the role of a covariate is
to accelerate or decelerate the lifetime by a constant in terms of hazard, probability
density, or survivor functions. This makes an accelerated failure time model more
attractive for direct interpretation of results meaningfully.
We know that lifetime is non-negative; hence, the linear relationship between log
lifetime and covariates can be written as
ln T = xβ + ε (7.33)
and the relationship between T and the covariates is
T = exβ+ε
= T0 exβ (7.34)
where T0 = eε . The distribution of ε is unspecified.

For illustration, let us consider a single covariate x with x = 0 for group 0 and
x = 1 for group 1. Then the failure time in group 0 is
T0 = exβ+ε =T0 e0×β
and for group 2, it is
T1 = exβ+ε =T0 e1×β =T0 eβ =T0 γ
where γ = eβ . It is clearly observed that
T1 = T0 eβ = T0 γ
or, alternatively
T0 = T1 e−β = T1 /γ .
A comparison can be made between the survivor functions for groups 0 and 1. For
group 1, we can show
S1 (t) = P(T1 > t)=P(T0 > t/γ )=S0 (t/γ ). (7.35)
Similarly,
7.4 Accelerated Failure Time Model 129
dS1 (t) dS0 (t/γ ) 1

f 1 (t) = − =− = f 0 (t/γ ) (7.36)
dt dt γ
and the hazard function is
γ 0 (t/γ )
1
f 1 (t) f 1
h 1 (t) = = = h 0 (t/γ ). (7.37)
S1 (t) S0 (t/γ ) γ
The above relationship shows that
γ h 1 (t) = h 0 (t/γ )
implying the hazard at time t for group 1 is γ times that of the hazard at time t/γ
for group 0. In case of γ = 1, both remain the same. If γ = 3, then the risk at time
t for group 1 will be one-third of the risk of group 0 items/subjects at one-third of
the time. Another way to interpret is that the risk of group 0 at one-third time is
equivalent to three times risk of the group 1 items/individuals at time t. This is clear
from the relationship between group 1 and group 0 that T1 = T0 eβ which implies
that the survival time increases in group 1 compared to 0 if β > 0 for a unit increase
in variables which results in an acceleration in survival or failure time.
To generalize the accelerated failure time model based on the background provided
above, let us rewrite
S(t; x) = S0 (tg(x))
where γ1 = g(x) = exβ . Similarly, the probability density and hazard functions
are f (t; x) = g(x) f 0 (tg(x)) and h(t; x) = g(x)h 0 (tg(x)), respectively. Using
g(x) = exβ , the survivor, probability density, and hazard functions for the accelerated
failure time model are shown below:

S(t; x) = S0 (tg(x)) = S0 texβ , (7.38)

f (t; x) = exβ f 0 texβ , (7.39)
and

h(t; x) = exβ h 0 texβ . (7.40)
For the estimation and test for the accelerated failure time models, several infer-
ence procedures have been proposed. Both rank-based and least-squares-method-
based techniques have been suggested, but still the estimation procedure remains
difficult. In this section, the method proposed by Buckley and James (1979) is intro-
duced. The Buckley–James method is a usual least squares approach for censored
data. Stare et al. (2000) observed that the Buckley–James method provides consistent
estimators under usual regularity conditions and appears to be superior to other least
squares approaches for censored data.
Let us consider a model for lifetime T (or some monotonic transformation) as
follows
Ti∗ = ln Ti = xi β + εi , i = 1, . . . , n, (7.41)
where εi are iid with E(εi ) = 0 and Var(εi ) = δ 2 . Here the distribution of εi is
unspecified with distribution function F. It is also assumed that ε and x are inde-
pendent. We observe Yi = min(Ti∗ , Ci∗ ), where Ci∗ is the log censoring time and
δi = I (Ti ≤ Ci ) is the censoring indicator.
According to Buckley and James

Yi∗ = Yi δi + E Ti∗ Ti∗ > Yi (1 − δi ).

It can be shown that E Yi∗ = E Ti∗ . Then

E Ti∗ Ti∗ > Yi = xi β + E(εi |εi > Yi − (xi β) )
where
∞
dF
E(εi |εi > Yi − (xi β) ) = ε .
1 − F(Yi − (xβ))
Yi −(β0 +xβ)
We can obtain the distribution function approximately from the survivor function
using the product-limit method. Hence
" ! #
ε j >εi wjεj
Yi∗ = Yi δi + xi β + (1 − δi ). (7.42)
1 − F(εi )
We obtain w j from the product-limit estimates or distribution function at each

failure time. Jin et al. (2006) proposed a more comprehensive approach of estimating
the parameters of the Buckley–James method by using a preliminary consistent
estimator as a starting point. They showed that the estimators are consistent and
asymptotically normal. They defined ei (β) = Ŷi∗ − xi β, and F̂β is the Kaplan–Meier
estimator of F based on the transformed data (ei (β), δi ) as shown below
" #
δi
F̂β (t) = 1 − 1 − !n . (7.43)
i:ei (β)<t j=1 I{e j (β)≥ei (β)}
Let us denote the vector of initial values b for the vector of parameters β; then,
the estimating equations are
7.4 Accelerated Failure Time Model 131

n $ %
U (β, b) = (xi − x̄) Ŷi∗ (b) − Ȳ (b) − (xi − x̄) β = 0 (7.44)
i=1
!n !n
where x̄ = n1 i=1 xi and Ȳ ∗ (b) = n1 i=1 Ȳi∗ (b).
The iterative procedure provides a consistent and asymptotically normal estimator
of β, and confidence intervals can be obtained by using the Wald method.
7.5 Parametric Regression Models
In both survival and reliability analyses, we need to fit regression models to identify
factors associated with lifetimes. Due to the nature of the data, it is not a practical
option to use a linear regression model, and we need alternative regression models.
The logistic regression models are used for nominal or ordinal binary/polytomous
outcomes. The proportional hazards and accelerated failure models are semipara-
metric models, because the baseline hazards or survivor functions are not specified.
However, the accelerated models can be parametric as well if the underlying proba-
bility distributions are specified. The accelerated failure time model is ln T = xβ +ε,
where ε is not specified in Sect. 7.4. A fully specified accelerated failure time model
requires specification of ε, and this becomes a parametric regression model. In engi-
neering, the analysis of reliability of components may require parametric regression
models.
7.5.1 Exponential Regression Model
Use of the exponential distribution in reliability analysis is quite popular particularly

in analyzing lifetime of a machine and its components. The unique memoryless
property of the exponential distribution has both advantages and disadvantages in
describing real-life situations. However, the exponential model is the simplest model
available to provide analysis of lifetime data.
The pdf of the exponential distribution is
f (t) = λe−λt , t ≥ 0, λ > 0 (7.45)
with the survivor, hazard, and cumulative hazard functions S(t) = e−λt , h(t) = λ,
and H (t) = λt. The expected value of failure time under exponential distribution is
E(T ) = μ = 1/λ.
The accelerated failure time model is

ln T = xβ + ε (7.46)
and taking antilog, we can show
T = exβ+ε .
The error term is
ε = ln T − xβ.
Let us assume that the distribution of ε is extreme value (Gumbel) ε ∼ G(0, σ )

and putting σ = 1, we can show
ε
f (ε) = eε−e . (7.47)
Here eε is a standard exponential distribution with constant hazard λ = 1. The

expected value can be linked with the systematic component as follows assuming
the Poisson link function
g(μ) = ln μ = xβ
and, it is equivalently
λ = e−xβ .
Replacing λ with the model, we obtain
f (t) = e−xβ e−(e t)

−xβ
S(t) = e−(e t )
−xβ
h(t) = λ = e−xβ
and
H (t) = λt = e−xβ t.
Based on the models proposed by Glasser (1967), Cox (1972), Prentice (1973), and
Breslow (1974), we can consider a special case for expressing the hazard function
∗
which is equal to h(t) = λ = exβ where β ∗ = −β. The likelihood function
for partially censored data, we can define two variables (Ti , δi ), i = 1, . . . , n
where Ti is lifetime for the ith item/subject and δi = 1 indicates that the lifetime
is uncensored and δi = 0 indicates that the observed lifetime is censored that is
the lifetime is observed only partially up to the time of censoring. The likelihood
function is
7.5 Parametric Regression Models 133
∗ δi ∗ 1−δi

n
xβ ∗ − e ti − e xβ ti
xβ
L= e e e (7.48)
i=1
and the log likelihood function becomes

n
∗ ∗ ∗
ln L = δi xi β − exi β ti + (1 − δi ) −exi β ti
i=1
n
∗ xi β ∗
n
n
∗
= δi xi β − e ti = δi xi β ∗ − exi β ti . (7.49)
i=1 i=1 i=1
The estimating equations are obtained by differentiating the log likelihood func-
tion with respect to the regression parameters
∂ ln L n n
∗
= δi xi j − xi j e(xi β ) ti = 0, j = 0, 1, . . . , p. (7.50)
∂β j i=1 i=1
The second derivatives are
∂ 2 ln L n
∗
=− xi j xi j e(xi β ) ti , j, j = 0, 1, . . . , p. (7.51)
∂β j ∂βx j i=1
The information matrix is obtained by

∂ 2 ln L
I = −
∂β j ∂β j
and the approximate variance–covariance matrix by

Var β̂ I −1 .
The hazard ratio is defined as

λ t x j + 1 ∗
HR = = eβ j

λ t xj

where
λ t X 1 = x1 , . . . , X j = x j , . . . , X p = x p is the denominator and
λ t X 1 = x1 , . . . , X j = x j + 1, . . . , X p = x p is the numerator indicating the
ratio as the potential impact of increasing the value of X j by one unit keeping the
values of all other variables unchanged.
To perform the null hypothesis H0 : β ∗∗ = 0, where β ∗∗ = β1 , . . . , β p , the
likelihood ratio can be used:
= −2[ln L 0 − ln L 1 ] ∼ χ p2 (7.52)
which is chi-square asymptotically.
7.5.2 Weibull Regression Model
A two-parameter Weibull probability density function is

α
f (t|α, λ ) = αλ(λt)α−1 e−(λt) , t ≥ 0 (7.53)
where α is the shape parameter and λ is the scale parameter of the distribution.
α
The hazard and survivor functions are h(t) = αλ(λt)α−1 and S(t) = e(−λt) . Let
ln T = xβ + σ ε, where σ ε is distributed as extreme value with scale parameter σ .
Now let λ = e−xβ and replacing λ with e−xβ , we obtain
α−1 −(e−xβ t )α
f (t|α, λ, x ) = αe−xβ e−xβ t e , t ≥ 0,
α−1
h(t, x) = αe−xβ e−xβ t ,
α
S(t, x) = e( ) .
−xβ
−e t
(7.54)
The hazard ratio for two covariate vector values is

α−1
h(t, xi ) αe−xi β e−xi β t α
= α−1 = e−(xi −xi )β (7.55)
h(t, xi ) αe −x i β
e −x i β
t
which is independent of time and hence can be considered as a special case of

proportional hazards model. The likelihood function is
n $
α−1 −(e−xi β ti )α %δi $ −(e−xi β ti )α %1−δi
L= αe−xi β e−xi β t e e (7.56)
i=1

n $
α % $ α %
ln L = δi ln α − xi β + (α − 1) ln ti − xi β − e−xi β tiα + (1 − δi ) − e−xi β tiα
i=1
n

= δi ln α − xi β + (α − 1) ln ti − xi β − e−αxi β tiα
i=1
n

= δi ln α − αxi β + (α − 1) ln ti − e−αxi β tiα . (7.57)
i=1
Differentiating ln L with respect to α and β, the following estimating equations are

solved
∂ ln L n

= −αxi j δi − e−αxi β = 0. (7.58)
∂β j i=1
An alternative procedure can be shown in the form of location-scale model with

Y = ln T , λ = e−αμ , μ = xβ, and σ = 1/α; then probability density function can
be shown as

1 ( y−μ ) [−e( y−μ )] 1 y−xβ
−e y−xβ
f (y) = e σ e σ = e σ
e σ
. (7.59)
σ σ
This is known as the extreme value distribution. The survivor function is

−e y−xβ
S(y) = e σ
.
The likelihood function is

⎡⎧ " y −x β # ⎫δi ⎧ " y −x β # ⎫1−δi ⎤
⎪
⎨
⎢ 1 yi −xi β − e σ
n
i i
⎪
⎬ ⎪ ⎨ − e σ
i i
⎪
⎬
⎥
L= ⎣ e σ
e e ⎦
⎪ σ ⎪ ⎪ ⎪
i=1 ⎩ ⎭ ⎩ ⎭
n
yi − xi β yi −xi β yi −xi β
ln L = −r ln σ + δi −e σ
− (1 − δi )e σ
i=1
σ
n
yi − xi β yi −xi β
= −r ln σ + δi −e σ (7.60)
i=1
σ
!n
where r = i=1 δi .
The estimating equations for β and σ are
n
∂ ln L −xi j yi −xi β
= δi − e σ
= 0, j = 1, 2, . . . , p
∂β j i=1
σ
n
∂ ln L r 1 yi − xi β yi −xi β
=− + − δi − e σ = 0.
∂σ σ i=1
σ σ

∂ ln L 1
n
yi −xi β

=− 2 xi j xik e σ = −I β̂ j , β̂k , j, k = 0, 1, . . . , p
∂β j ∂βk σ i=1
y −x β
1 yi − xi β
n
∂ ln L i i
=− 2 xi j e σ = −I β̂ j , σ̂ , j = 0, 1, . . . , p
∂β j ∂σ σ i=1 σ
-
.
∂ ln L r n
1 yi − xi β 2 yi −xσ i β
=− 2 − e = −I σ̂ .
∂σ ∂σ σ i=1
σ 2 σ
The covariance matrix of the estimators is

⎛ ⎞−1
I β̂ I β̂, σ̂
Var β̂, σ̂ ⎝ ⎠ . (7.61)
I σ̂ , β̂ I σ̂
For testing the null hypothesis H0 : β j = 0, we can use the Wald test
β̂ j
W =
se β̂ j
which is asymptotically N(0, 1). Similarly, the test for H0 : β1 = · · · = β p = 0 is

asymptotically χ p2 under the null hypothesis as shown before.
7.5.3 Lognormal Regression Model
A random sample of lifetimes T1 , . . . , Tn with corresponding censoring indicators

δi = 0 or 1, i = 1, . . . , n, for censored or uncensored times, respectively, has a
lognormal distribution if the probability density function is
1
e− 2σ 2 (ln t−μ) , t > 0.
1 2
f (t) = √ (7.62)
t 2π σ 2
The survivor function is

∞
1
e− 2σ 2 (ln t −μ) dt .
1 2
S(t) = √
t 2π σ 2
t
Let us consider the model ln Ti = xi β + εi where εi ∼ N (0, 1). It can be shown

that E(ln Ti ) = μi = xi β. The likelihood function is
⎡ ⎧ ⎫1−δi ⎤
n δi ⎨ ∞ ⎬
⎣ √1 1
e− 2σ 2 (ln ti −xi β ) dti
2
e− 2σ 2 (ln ti −xi β) ⎦.
1 2 1
L= √
ti 2π σ 2 ⎩ t 2π σ 2 ⎭
i=1 ln tii
(7.62)
The likelihood function can be rewritten as

δi 1−δi
n
1 ln ti − μi ln ti − μi
L= f S .
i=1
σ ti σ σ

1
r n
ln ti − μi
ln L = −r ln σ − (ln ti − μi ) 2
+ ln S . (7.64)
2σ 2 i=1 i=r +1
σ
It is defined that μi = β0 + β1 xi1 + · · · + β p xi p . Differentiating with respect to

β j , j = 0, 1, . . . , p, we can show

1 1 f ln tiσ−μi
n n
∂ ln L
= 2 δi xi j (ln ti − μi ) − (1 − δi )xi j ln ti −μi = 0. (7.65)
∂β j σ i=1 σ i=1 S σ
and
1
r
∂ ln L r
=− + 3 (ln ti − μi )2
∂σ σ σ i=1
n
1 ln ti − μi ln ti − μi ln ti − μi
+ f /S = 0. (7.66)
σ i=r +1 σ σ σ
∗∗
ratio test for testing the null hypothesis H0 : β = 0
We can use the likelihood
∗∗
where β = β1 , . . . , β p as follows
= −2(ln L 0 − ln L 1 )
which is asymptotically χ p2 . More details on parametric regression models can be

found, for example, in Kalbfleisch and Prentice (2002) and Lawless (2003).
7.5.4 Example
Example 7.2 This example is reproduced, with permission, from Blischke et al.
(2011). Table 7.2 shows a part of the warranty claims data for an automobile com-
Table 7.2 Detailed warranty claims data of an automobile component

Serial Date of Date of Date of Age in Used Failure Used Type of
no. produc- sale failure days km at mode region auto
tion failure used
the unit
1 01- 01-Sep- 15-Jan- 136 36,487 M2 R1 A2
Aug-01 01 02
2 01- 02-Sep- 15- 104 2381 M2 R4 A2
May-01 01 Dec-01
3 01-Feb- 07-Sep- 15- 99 14,507 M2 R3 A2
01 01 Dec-01
4 10-Jan- 12-Sep- 15- 94 7377 M1 R4 A2
01 01 Dec-01
5 01- 12-Sep- 15- 94 10,790 M1 R4 A1
May-01 01 Dec-01
6 05-Jul- 12-Sep- 15-Feb- 156 47,312 M3 R1 A1
01 01 02
7 05- 13-Sep- 05-Jul- 295 56,943 M2 R4 A1
Apr-01 01 02
8 10-Jun- 15-Sep- 12-Jul- 300 45,292 M1 R3 A2
01 01 02
9 05- 22-Sep- 15- 54 5187 M2 R3 A1
Apr-01 01 Nov-01
10 01- 24-Sep- 15- 82 4512 M3 R1 A1
Aug-00 01 Dec-01
11 01- 27-Sep- 15- 169 18,175 M1 R3 A1
Dec-00 01 Mar-02
12 10-Jan- 27-Sep- 15- 169 18,106 M1 R3 A1
01 01 Mar-02
13 01- 27-Sep- 26-Sep- 364 27,008 M1 R3 A2
Apr-01 01 02
14 05- 28-Sep- 15- 78 11,600 M1 R4 A1
Apr-01 01 Dec-01
15 01- 28-Sep- 15- 78 7900 M1 R4 A1
May-01 01 Dec-01
16 10-Jun- 29-Sep- 15- 77 17,620 M1 R4 A1
01 01 Dec-01
17 10-Jun- 29-Sep- 15-Oct- 16 7762 M1 R4 A1
01 01 01
18 10-Jun- 01-Oct- 15- 165 39,487 M1 R3 A1
01 01 Mar-02
19 01-Jun- 02-Oct- 15- 44 6420 M1 R3 A2
00 01 Nov-01
20 01- 07-Oct- 10- 185 45,121 M2 R4 A1
Mar-01 01 Apr-02
ponent (20 observations out of 498).2 The data set includes age (in days), mileage
(in kilometers), failure mode, region, types of automobile that used the unit, and
other factors. Failure modes, type of automobile used the component, and auto-used
zone/region are shown in codes. We analyze the failure data (498 claims) using the
parametric regression model.
The aim of the analysis is to investigate how the usage-based lifetime (used km) of
the component differs with respect to age (x 1 ) and other three categorical covariates:
region [(x 2 : Region1 (R1), Region2 (R2), Region3 (R3), Region4 (R4)], type of
automobiles that use the component [x 3 : Auto1 (A1), Auto2 (A2)], and failure modes
[x 4 : Mode1 (M1), Mode2 (M2), Mode3 (M3)]. The number of observed claims or
failures in R1, R2, R3, R4, A1, A2, M1, M2, and M3 are, respectively, 29, 105, 172,
192; 143, 355; 364, 106, and 28.
Without loss of generality, {R1, A1, M1} is taken as the reference or baseline
level, the level against which other levels (all possible combinations of the values
of three covariates) are compared. The covariate vector x = (1, x1 , x2 , x3 , x4 )
can then be rewritten as x = (1, x D , x R2 , x R3 , x R4 , x A2 , x M2 , x M3 ) under
the assumption that, except for x D , the other six dichotomous covariates take on
the values 1 or 0 to indicate the presence or absence of a characteristic. A Weibull
regression model f (y|x, β, σ ) is assumed for mileage Y, with scale parameter σ
and location parameter dependent on covariates x, namely
μ(x) = xβ = (β0 + x D β D + x R2 β R2 + x R3 β R3 + x R4 β R4
+x A2 β A2 + x M2 β M2 + x M3 β M3 ).
Table 7.2 summarizes the numerical results for the Weibull regression model
obtained by using Minitab3 . In this table, very small p-values for all of the regression
coefficients except β M2 ( p = 0.867), provide strong evidence of the dependency of
average lifetime on those covariates.
The log likelihood of the final model is −5479.3, while the log likelihood of the
null model (with intercept only) is −5628.4. The likelihood ratio chi-square statistic
is −2[−5628.4 − (−5479.3)] = 298.2 with 7 degrees of freedom, and the associated
p-value is 0. Thus, we reject the null hypothesis that all regression parameters are
zero.
Comment: A set of models (smallest extreme value, exponential, Weibull, normal,
lognormal, logistic, and log logistic) were fitted to the data. It was found, based on
the Akaike Information Criterion (AIC) values and the plots of residuals, that the
Weibull is the best model for the data among these alternatives (Blischke et al. 2011).
The estimates of Table 7.3 can be used to estimate and compare other reliability-
related quantities (e.g., B10 life, MTTF) at specified levels of the covariates. For
example, when the covariates of age, region, auto type, and failure mode are fixed,
2 The information regarding the names of the component and manufacturing company are not dis-
closed to protect the proprietary nature of the information.
3 This may also be done with S-plus and R-language.
Table 7.3 Estimates of parameters β and γ for the Weibull regression model for usage
Parameters ML estimates Standard error Z p 95% Normal CI
Lower Upper
β0 8.9713 0.1516 59.18 0.000 8.6741 9.2684
βD 0.0052 0.0003 16.03 0.000 0.0045 0.0058
β R2 0.3860 0.1432 2.70 0.007 0.1055 0.6666
β R3 0.5678 0.1377 4.12 0.000 0.2980 0.8376
β R4 0.5027 0.1319 3.81 0.000 0.2442 0.7613
β A2 −0.1638 0.0690 −2.37 0.018 −0.2991 −0.0286
β M2 0.0127 0.0758 0.17 0.867 −0.1359 0.1614
β M3 0.2593 0.1304 1.99 0.047 0.0037 0.5149
Shape γ = 1/σ 1.5376 0.0495 1.4435 1.6377
respectively, at 365 days, Region1, Auto1, and Mode1, the ML estimate, and 95%
confidence intervals of B10 life are 12,072.7 km and [9104.48, 16,008.7]. These
estimates become 23,433.6 km and [16,890.3, 32,511.8] for covariate values of age
365 days, Region3, Auto2, and Mode3. Under the first combination of levels of
covariates, the estimates imply that there is 95% confidence that 10% of the units are
expected to fail between usages 9104 and 16,009 km. Estimates of Bp life for other
values of the covariates may be estimated and interpreted similarly.
The probability plot for standardized residuals (Fig. 7.1) is used to check the
Fig. 7.1 Smallest extreme value probability plots for standardized residuals
assumptions of a Weibull model with assumed parameters for the data. The plotted
points do not fall on the fitted line perfectly, but the fit appears to be adequate, with
the possibility of one or a few outliers. This suggests that the residual plot does not
represent any serious departure from the Weibull distributional assumption in the
model for the observed data (Blischke et al. 2011).
References
Agresti A (2002) Categorical data analysis. Wiley, New Jersey

Agresti A (2015) Foundations of linear and generalized linear models. Wiley, New Jersey
London Limited
Breslow NE (1972a) Discussion of the paper by D. R. Cox. J Royal Statist Soc B 34:216–217
Breslow NE (1972b) Covariance analysis of survival data under the proportional hazards model.
Int Stat Rev 43:43–54
Breslow NE (1974) Covariance analysis of censored survival data. Biometrics 30:89–99
Buckley J, James I (1979) Linear regression with censored data. Biometrics 66:429–436
Cox DR (1972) Regression models and life tables (with discussion). J Royal Stat Soc B 34:187–220
Farewell VT (1979) An application of cox’s proportional hazards model to multiple infection data.
Appl Stat 28:136–143
Glasser M (1967) Exponential survival with covariance. J Am Stat Assoc 62:561–568
Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley, New York
Islam MA, Shiha AA (2018) Foundations of biostatistics. Springer, Singapore
Jin Z, Lin DY, Ying Z (2006) On least-squares regression with censored data. Biometrika 93:147–161
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley,
New York
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley Inc, New Jersey
Prentice RL (1973) Exponential survivals with censoring and explanatory variables. Biometrika
60:279–288
Prentice RL, Kalbfleisch JD, Peterson AV, Flournoy N, Farewell VT, Breslow NE (1978) The
analysis of failure times in the presence of competing risks. Biometrics 34:541–554
Stare J, Heinz H, Harrell F (2000) On the Use of Buckley and James Least Squares Regression
for Survival Data. New Approaches in Applied Statistics, Ferligoj, A. and Mrvar, A. (Editors),
Metodoloski zvezki, 16, Ljubljana, FDV
Kleinbaum DG, Klein M (2010) Logistic regression: a self-learning text. Springer, New York
Chapter 8
Generalized Linear Models
Abstract The concept of generalized linear models has become increasingly useful
in various fields including survival and reliability analyses. This chapter includes the
generalized linear models for various types of outcome data based on the underlying
link functions. The estimation and test procedures for different link functions are
also highlighted.
8.1 Introduction
In previous chapters, different nonparametric, semiparametric, and parametric

regression models have been introduced in analyzing reliability and survival data.1 It
is a challenge in many instances to find a general procedure of obtaining linear models
if the underlying distributions are non-normal. A linear regression model requires
to satisfy the assumption of normality and in case of binary, multinomial, count,
exponential, or gamma outcomes, we cannot fit a linear regression model. For exam-
ple, if we want to analyze binary outcomes, a possible alternative is a linear logistic
regression model which has been introduced in Chap. 7. To generalize linear models,
the concept of generalized linear models has become increasingly useful in various
fields including survival and reliability analyses. However, the generalized linear
models have to satisfy that the outcome variable belongs to the class of exponential
family of distributions. This restriction can be overcome by a further extension of the
likelihood approach known as the quasi-likelihood. In this chapter, the generalized
linear models are discussed keeping in view the applications to problems in survival
and reliability analyses. The unified approach of fitting linear models for outcomes
distributed according to some exponential family was first introduced by Nelder and
Wedderburn (1972) and was further elaborated in a book by McCullagh and Nelder
(1989). The quasi-likelihood approach was first developed by Wedderburn (1974) as
an extension of the generalized linear model based on specifying a relation between
the mean and variance of observations when the form of underlying distribution of
the observations is not specified.
1 Sectionsof this chapter draw from the co-author’s (M. Ataharul Islam) previous published work,
reused here with permissions (Islam and Chowdhury 2017).
144 8 Generalized Linear Models
The outline of the chapter is as follows. Section 8.2 presents the exponential fam-
ily and GLM. Section 8.3 explains the expected value and variance for exponential
family of distributions. Section 8.4 discusses the components of a GLM. Section 8.5
deals with estimating equations. Sections 8.6–8.11 discuss the deviance of mod-
els, including the exponential, gamma, Bernoulli, Poisson, and Weibull regression
models.
8.2 Exponential Family and GLM
Let us consider the outcome variable, lifetime T, is distributed according to the

following form

tθ −b(θ )
+c(t,φ)
f (t; θ ) = e a(φ)
(8.1)
where b(θ ) is a function of θ , φ is called the dispersion parameter, and c(t, φ) is a

function of t and φ. The distribution of a random variable belongs to the exponential
family if its probability density function or probability mass function can be expressed
in the above form (8.1).
Example 8.1 Bernoulli distribution. The exponential form of the distribution is
shown below
f (y; p) = p y (1 − p)1−y , y = 0, 1
= exp[y ln p + (1 − y) ln(1 − p)]
⎡ ⎤
y ln 1−p p − (− ln(1 − p))
= exp⎣ ⎦ (8.2)
1
Here θ = ln p
1− p
, b(θ ) = − ln(1 − p), c(y, φ)= 0, a(φ) = 1.
Example 8.2 Exponential distribution. We can express the exponential form of the
exponential distribution as follows
f (t; λ) = λe−λt , t ≥ 0
= exp[−λt + ln λ]

{−λt−(− ln λ)}
= exp (8.3)
1
where θ = −λ, b(θ ) = − ln λ, a(φ) = 1, c(t, φ) = 0.

8.2 Exponential Family and GLM 145

Example 8.3 Lognormal distribution. Let Y ∼ N μ, σ 2 where Y = ln T ; then the
exponential form is shown below
1
e− 2σ 2 (y−μ) , −∞ < y < ∞
1 2
f y; μ, σ 2 = √
2π σ 2

1 2 1
= exp − 2 y − 2μy + μ − ln 2π σ 2 2
2σ 2

yμ − 2 μ1 2
1 2 1
= exp − y − ln 2π σ 2
(8.4)
σ2 2σ 2 2

where θ = μ, b(θ ) = μ2 /2, φ = σ 2 , c(y, φ) = − y 2 / 2σ 2 + 21 ln 2π σ 2 .
The lognormal distribution can be shown as
1
e− 2σ 2 (ln t−μ) , t ≥ 0
1 2
f t; μ, σ 2 = √
t 2π σ 2

1 1
= exp − 2 (ln t) − 2μ(ln t) + μ − ln 2π σ t
2 2 2 2
2σ 2

(ln t)μ − 2 μ 1 2
1 1
= exp − (ln t)2 − ln 2π σ 2 t 2 . (8.5)
σ2 2σ 2 2
The above formulation shows that a canonical link function does not exist for a
lognormal distribution because the distribution does not belong to the exponential
family. However, as the transformation Y = ln T belongs to the exponential family,
we can use the relationship to obtain estimates for the lognormal distribution based
on the GLM estimates for normal distribution.
8.3 Expected Value and Variance
The expected value and variance can be shown easily under regularity conditions
for exponential family of distributions. Differentiating f (t, θ ) with respect to θ , we
obtain
d f (t, θ ) 1
= t − b (θ ) f (t, θ ) (8.6)
dθ a(φ)
and under regularity conditions, interchanging differentiation and integration in the

following, we obtain the following equation

d f (t, θ ) 1
dy = t − b (θ ) f (t, θ )dy = 0. (8.7)
dθ a(φ)
Solving the above equation, it is easy to show that
E(T ) = b (θ ).
Taking second derivative with respect to the parameter and then integrating over
the outcome variable, we obtain

d2 f (t, θ ) 1 1
2
dt = −b (θ ) + E t − b (θ ) = 0.
dθ 2 a(φ) a(φ)
The above equation can be expressed as
Var(T ) = a(φ)b (θ ) (8.8)
where a(φ) is known as the dispersion parameter and b (θ ) is the variance function
or the function of the mean.
8.4 Components of a GLM
In a generalized linear model, there are three components: (i) random component, (ii)
systematic component, and (iii) link function. The random component specifies the
underlying distribution of the outcome variable, T ∼ f (t, θ, φ), and the systematic
component describes the linear function of selected covariates in the model, η = xβ.
The link function is the link between the random and the systematic component,
θ = g(μ) = η = xβ, where μ = E(T |x ). Here the link function, g(μ), provides
the link between the random variable T and the systematic component, η = xβ such
that function of E(T ) = μ = g −1 (xβ). This implies that the expected value can also
be expressed as a function of regression parameters such as μ(β). The link function
is θ = g(μ(β)) = β0 + β1 x1 + · · · + β p x p .
8.4.1 Components of GLM for Binary Outcome Data
(i) Random Component
For binary outcome data, let the random variable, Y ∼ Bernoulli( p), which can
be shown as
f (y, θ ) = p y (1 − p)1−y , y = 0, 1
In exponential family of distributions form,

8.4 Components of a GLM 147
f (y, p) = e[y ln p+(1−y) ln(1− p)]

= e[y ln( p/(1− p))+ln(1− p)] . (8.9)

Hence, θ = ln p
1− p
, b(θ ) = − ln(1 − p), a(φ) = 1, c(y, φ) = 0.
θ
θ
It can be shown that p = 1+e
e
θ , b(θ ) = ln(1 + e ), E(Y ) = μ = b (θ ) = p, and

Var(Y ) = a(φ)b (θ ) = p(1 − p).
(ii) Systematic Component
The systematic component shows the linear function: η = Xβ.
(iii) Link Function

The natural link function is θ = ln 1−p p which can be expressed as a function of

μ
expected value of Y, θ = ln 1−μ = g(μ). This is known as the logit link function.
We can link the random component with systematic component as shown below:
θ = g[μ(β)] = η = Xβ.
Let us denote μ(β) = μ for brevity; then the logit link function is
μ
g[μ] = ln = Xβ.
1−μ
Then the model can be expressed as
e Xβ
μ=
1 + e Xβ
This is the linear logistic regression model presented in Sect. 7.2
8.4.2 Components of GLM for Exponential Data
(i) Random Component
Let T be an exponential random variable, T ∼ Exponential(λ). Then the expo-

nential form can be shown as:
{−λt−(− ln λ)}
f (t; λ) = λe−λt = e{−λt+ln λ} = e 1
(8.10)
where θ = −λ, b(θ ) = − ln λ, a(φ) = 1, c(t, φ) = 0. Here b(θ ) = − ln(−θ ),

−1
and E(T ) = μ = b (θ ) = − −θ = λ1 . Similarly, we can find variance using these
relationships as Var(T ) = a(φ)b (θ ) = (−θ1 )2 = λ12 .
(ii) Systematic Component

The systematic component shows the linear function: η = Xβ.
(iii) Link Function
The natural link function is θ = −λ which can be expressed as a function of
expected value of T, θ = −λ. We know that θ = −λ = g(μ) = − μ1 . We can link
the random component with systematic component as shown below:
θ = Xβ
and
1
g(μ) = − = Xβ
μ
Then the model can be expressed as
1
μ=−
Xβ
In many instances, the negative reciprocal link function may fail to provide results
for ensuring non-negative values of the mean, and there may also be a problem with
convergence in the process of estimating parameters; alternatively, we can use the
following link function

1
θ = ln λ = g(μ) = ln = Xβ (8.11)
μ
and the model becomes μ = e−Xβ . This relationship implies λ = e Xβ which is com-
monly used in many parametric and semiparametric regression models in reliability
and survival analyses.
8.5 Estimating Equations
Let us consider a random sample of size n, T1 , . . . , Tn , where Ti ∼ f (ti , θi , φ), i =

1, …, n. Then the likelihood function can be shown as

n
n ti θi −b(θi )
+c(ti ,φ)
L(θ, φ, t) = f (ti ; θi , φ) = e a(φ)
. (8.12)
i=1 i=1
8.5 Estimating Equations 149

n
n
l(θ, φ, t) = l(θi , φ, ti ) = [{ti θi − b(θi )}/a(φ) + c(ti , φ)]. (8.13)
i=1 i=1
Here, we have θi = ηi = g[E(Ti )] = g(μi ) = X i β, where X = (X 1 , . . . , X n )

n
denotes the n × p matrix of covariate values. Here i=1 X i j can be shown as a
sufficient statistic for β j , and θ = η is called the canonical link function. The log
likelihood function is expressed as a function of parameters θi and φ, but θi = g(μi )
and g(μi ) = X i β. We are interested in the parameters of the linear function, β. A
chain rule is proposed to make the estimation procedure convenient which is shown
below:
∂li ∂li ∂θi ∂μi ∂ηi
= × × × , j = 1, 2, . . . , p. (8.14)
∂β j ∂θi ∂μi ∂ηi ∂β j
As it is shown that θi = ηi in case of canonical link, the chain rule reduces to:
∂li ∂li ∂θi

= × , j = 1, 2, . . . , p,
∂β j ∂θi ∂β j
where
∂li ti − b (θi ) ti − μi
= = ,
∂θi a(φ) a(φ)
and

p
∂ηi
θi = Xi j β j , = Xi j .
j=1
∂β j
For the canonical link, we have therefore,
∂l ∂l ∂θi
= · , j = 1, . . . , p
∂β j ∂θi ∂β j
1
n
= [ti − μi ]X i j , j = 1, . . . , p. (8.15)
a(φ) i=1
Consequently, we can find the maximum likelihood estimates of the parameters

by solving the system of equations
1
n
[ti − μi ]X i j = 0.
a(φ) i=1
As we observed from various examples, in most cases, a(φ) is a constant, so these

equations can be rewritten as

n
[ti − μi ]X i j = 0. (8.16)
i=1
It may be noted here that μi = μi (β) and in case of canonical link, the relationship
between linear function and canonical link function is θi = g[μi (β)]. Some examples
are shown below:
(i) Identity link: θi = μi (β); hence, μi (β) = X i β.
The estimating equations are:

n
[ti − X i β]X i j = 0, j = 1, . . . , p.
i=1
(ii) Log link:θi = ln μi (β); hence, μi (β) = e X i β .

n

ti − e X i β X i j = 0, j = 1, . . . , p.
i=1
μi (β) eXi β
(iii) Logit link: θi = ln 1−μi (β)
; hence, μi (β) = 1+e X i β
.

n

eXi β
ti − X i j = 0, j = 1, . . . , p.
i=1
1 + eXi β
Similarly, we obtain different estimating equations for different link functions

arising from geometric, negative binomial, exponential, gamma, and other distribu-
tions that follow exponential family of distributions.
8.6 Deviance
Deviance is introduced with GLM to measure the goodness of fit for a model that
links the random component and systematic component through a link function. The
random component provides the probability distribution of the outcome variable,
and from its exponential form, we obtain the natural parameter that is used as a link
function as shown below:
8.6 Deviance 151
f (t, θ, φ) = e[{tθ−b(θ)}/a(φ)+c(t,φ)] (8.17)
where θ = g(μ), μ = E(T ). The systematic component is
η = Xβ, X = (1, X 1 , . . . , X p ), β = (β0 , β1 , . . . , β p ) .
Then the canonical link function is defined as

∂
θ = g(μ) = η = Xβ, where μ = b(θ ) = b (θ ).
∂θ
This relationship implies g(μ) = g(μ(β)), and b(θ ) is a function of β as well. An

example is displayed here to illustrate these relationships for Bernoulli’s variable.
Let Y be a Bernoulli’s random variable with parameter p; then, the exponential form
is
f (t, p) = e[t ln( p/(1− p))−{− ln(1− p)}] (8.18)

where θ = ln 1−p p , b(θ ) = − ln(1 − p), and a(φ) = 1. We can find expected
value from these relationships as follows:

eθ eθ
p= , b(θ ) = − ln 1 − = ln(1 + eθ ),
1 + eθ 1 + eθ
eθ
μ = E(T ) = b (θ ) = ,
1 + eθ
and
eθ 1
Var(Y ) = a(φ)b (θ ) = = μ(1 − μ).
1 + eθ 1 + eθ
The systematic component is η = Xβ, and the canonical link function can be
rewritten as

μ e Xβ
θ = g(μ) = ln = Xβ, μ = , and b(θ ) = ln(1 + e Xβ ).
1−μ 1 + e Xβ
The likelihood function using the exponential form can be shown as

n
L(θ, φ, t) = e[{ti θi −b(θi )}/a(φ)+c(ti ,φ)]
i=1
and the log likelihood can be expressed


n
l(θ, φ, t) = ln L(θ, φ, t) = [{ti θi − b(θi )}/a(φ) + c(ti , φ)].
i=1
This can be rewritten using the relationship θ = g(μ)

n
l(μ, φ, t) = ln L(μ, φ, t) = [{ti θi (μ) − b(θi (μ))}/a(φ) + c(ti , φ)] (8.19)
i=1
where θ = g(μ) = Xβ and hence b(θ ) is a function of Xβ. In this likelihood function,
we consider a model with (p + 1) parameters. Hence, the likelihood estimation
procedure involves (p + 1) parameters for estimating the expected value E(Ti ) =
μi . As n expected values are estimated using only a small number of parameters
compared to the sample size, the estimated means may deviate from the true values
and one of the ways to have an idea about such deviation is to compare with the
likelihood based on a saturated model. The saturated model for the observed sample
data is to replace the mean by its observed value; in other words, E(Ti ) is replaced
by Ti . This saturated model can be referred as the full model. For the full model, the
canonical parameter can be defined as θ = g(t). The log likelihood function for the
saturated model is

n
l(t, φ, t) = ln L(t, φ, t) = [{ti θi (t) − b(θi (t))}/a(φ) + c(ti , φ)]. (8.20)
i=1
Now we can define the deviance and scaled deviance
Deviance = D = 2[l(t, φ, t) − l(μ, φ, t)] (8.21)
and the scaled deviance is D ∗ = a(φ)

D
where a(φ) is the dispersion parameter.
As we are interested eventually in estimating the parameters of the linear model,
the deviance can be expressed as
Deviance = D = 2[l(t, t) − l(β, t)]. (8.22)
A small value of deviance may indicate good fit, but a large value may reflect
poor fit of the model to the data.
8.7 Exponential Regression Model
Let us consider the following random component of T
f (t; λ) = λe−λt
8.7 Exponential Regression Model 153
and the exponential form is
f (t; λ) = λe−λt
= e−λt+ln λ (8.23)
where θ = −λ, b(θ ) = − ln λ. The link function is θ = −λ = Xβ and μ = − Xβ

1
.
∗ ∗
Let β = −β, then θ = λ = Xβ and μ = Xβ ∗ . 1

n
n
l(ti ; λ) = − λi ti + ln λi .
i=1 i=1
The deviance is
D = 2[l(t, t) − l(t, μ)]

n n
ti − μi
=2 {−(1 − ln ti ) + (ti /μi ) + ln μi } = 2 − ln(ti /μi ) − .
i=1 i=1
μi
The deviance based on l(β, t) and l(t, t) is
Deviance = D = 2[l(t, t) − l(β, t)]

n
n
n

∗ ∗
=2 n+ ln ti − X i β ti + ln X i β
i=1 i=1 i=1
n

γ ti
n

=2 γ ln ∗
+ ti X i β ∗ − γ ti . (8.24)
i=1
X i β i=1
8.8 Gamma Regression Model
The two-parameter gamma probability density function for failure time data is
1 γ γ −1 −λt
f (t|λ, γ ) = λ t e ,t ≥0
γ
∞ γ −1 −t
where λ > 0, γ > 0 and γ = 0 t e dt.
The exponential form is
f (t|λ, γ ) = e[−λt+γ ln λ+(γ −1) ln t−ln γ ]


(−λ/γ )t+ln λ
+(γ −1) ln t−ln γ
=e 1/γ
(8.25)
where
λ
θ = − , b(θ ) = − ln λ = − ln(−γ θ ), a(φ) = 1/γ , and c(t, φ) = (γ − 1) ln t − ln γ .
γ
−γ
The expected value of T is E(T ) = μ = b (θ ) = − −γ θ
= γλ , the variance function
γ2
isV (μ) = b (θ ) = 1
(θ)2
= λ2
= μ2 , and the variance is V (T ) = a(φ)b (θ ) =
γ
γ
μ
1 2
= λ2
.
n

(−λi /γ )ti + ln λi
l(λ, γ , t) = ln L(λ, γ , t) = + (γ − 1) ln ti − ln γ
i=1
1/γ

n

= −λi ti + γ ln λi + (γ − 1) ln ti − ln γ
i=1
and replacing λ with γ /μ, it can be shown that
n

γ
l(μ, γ , t) = ln L(μ, γ , t) = − ti − γ ln μi + γ ln γ + (γ − 1) ln ti − ln γ .
i=1
μi
Similarly, the saturated model can be obtained from
n

γ
l(t, γ , t) = ln L(t, γ , t) = − ti − γ ln ti + γ ln γ + (γ − 1) ln ti − ln γ .
i=1
ti
The deviance is

D = 2 l(t, γ , t) − l(μ, γ , t)
n

ti ti − μi
=2 −γ ln − . (8.26)
i=1
μi μi
The canonical link function θ = −λ/γ = Xβ. As β ∗ = −β, then μ = 1/ Xβ ∗ ,

and replacing μ with the estimates μ̂ = 1/ X β̂ ∗ , we obtain the required deviance.
The log likelihood after replacing μ with 1/ Xβ ∗ can be shown as

n

l(μ, γ , t) = −γ {(X i β∗)ti − ln(X i β∗)} + γ ln γ + (γ − 1) ln ti − ln γ .
i=1
8.8 Gamma Regression Model 155
The estimating equations are
1
n
∂l
∗ = [ti − μi ]X i j = 0, j = 0, 1, . . . , p
∂β j a(φ) i=1
which are
n

∂l 1 1
= ti − X i j = 0, j = 0, 1, . . . , p.
∂β ∗j a(φ) i=1 X i β∗
8.9 Bernoulli Regression Model
Let Y ∼ Bernoulli( p); then, the exponential form is

θ
f (y, θ, φ) = e[ yθ−ln(1+e )] (8.27)

eθ
where θ = ln p
,p=, b(θ ) = ln 1 + eθ , a(φ) = 1. It can also be shown
1+eθ
1− p

eθ μ
that μ = E(Y ) = b (θ ) = 1+e θ = p, so θ = ln 1−μ = g(μ).

μ
This is a logit link function, and θ = ln 1−μ = Xβ. Then the mean can be
e Xβ
expressed in the following form as a function of systematic component,μ = 1+e Xβ
.
μi (β) eXi β
We can also express the logit link as θi = ln 1−μi (β)
, where μi (β) = 1+e X i β
. The
estimating equations are
n

eXi β
ti − X i j = 0, j = 1, . . . , p.
i=1
1 + eXi β
The deviance for logit link function is
D = 2[l(y, y) − l(β, y)]

⎡ n ⎤
yi
⎢ yi ln + ln(1 − yi ) ⎥
⎢ i=1 1 − yi ⎥
= 2⎢
⎢ n ⎥
⎣ μ̂i ⎥⎦
− yi ln + ln 1 − μ̂i
i=1
1 − μ̂i
n
yi 1 − yi
= yi ln + (1 − yi ) ln . (8.28)
i=1
μ̂i 1 − μ̂i
e X i β̂
Here μ̂i = .
1+e X i β̂
8.10 Poisson Regression Model
Let Y ∼ Poisson(λ); then, the exponential form is

θ
f (y, θ, φ) = e[ yθ−e −ln y!] (8.29)
where θ = ln λ, λ = eθ , b(θ ) = eθ , a(φ) = 1, c(y, φ) = − ln y!. It can also be

shown that μ = E(Y ) = b (θ ) = eθ = λ, so θ = ln μ = g(μ). This link function
is called the log link function, and θ = ln μ = Xβ. The relationship between mean
and the systematic component is μ = e Xβ .
For the log link function θi = ln μi (β), we can show μi (β) = e X i β and the
estimating equations are

n

yi − e X i β X i j = 0, j = 1, . . . , p.
i=1
The deviance for log link function is
D = 2[l(y, y) − l(β, y)]

n

n

=2 (yi ln yi − yi − ln yi !) − yi ln μ̂i − μ̂i − ln yi !
i=1 i=1
n

yi
=2 yi ln − yi − μ̂i (8.30)
i=1
μ̂i
n n
Here μ̂i = e X i β̂ . If i=1 yi = i=1 μ̂i , then the deviance for log link is
n

yi
D=2 yi ln . (8.31)
i=1
μ̂i
8.11 Weibull Regression Model
The Weibull distribution is one of the most widely used distributions in reliability
and survival analyses. The Weibull distribution belongs to the exponential family but
does not have a canonical parameter; hence, a direct application of generalized linear
8.11 Weibull Regression Model 157
modeling is restricted. However, if condition is imposed on the nuisance parameter

that restricts the application of the generalized linear model, then it is possible to
widen the scope of Weibull regression model. It is also observed that the Weibull dis-
tribution can also be generalized using the concept of orthogonality without imposing
any condition (Cox and Reid 1987; Prudente and Cordeiro 2010).
Let us use the following form of the Weibull distribution
λ
λt λ−1 − t
f (t) = e η
, t > 0, λ ≥ 0, η ≥ 0 (8.32)
ηλ
with the exponential form

λt λ−1 − ηt λ
f (t) = e
ηλ
λ −λ
= e[−t η −λ ln η+(λ−1) ln t+ln λ] (8.33)
where a(t) = t λ , b(η) = −η−λ , c(η) = −λ ln η, d(t) = ln λ + (λ − 1) ln t which

shows clearly that there is no canonical parameter due to the presence of λ, and if
we assume λ = 1, then it reduces to the exponential distribution with canonical
parameter −1/η. It can also be shown that if we assume a fixed λ as a nuisance
parameter, then without reparameterization the expected value and variance of T can
be obtained as follows
c (η)
E T λ = μ(η) = = ηλ and Var T λ = η2λ .
b (η)
Without reparameterization, if we consider the link function as a function of the

expected value, μ, then g(μ) = −η−λ = − μ1 = X β. The log likelihood function is

n
λ
ln L = ti X i β + ln λ + ln −X i β + (λ − 1) ln ti . (8.34)
i=1
The first and second derivatives with respect to β are
n λ
∂ ln L ti − μi xi j
= = 0,
∂β j i=1
μi
and
∂ 2 ln L n
=− xi j xik μi2 , j, k = 0, 1, . . . , p.
∂β j ∂βk i=1
The first set of equations shown above are scores used as estimating equations,
and the second set of equations provide elements of the information matrix if taken
the negative value of the second derivative
∂ 2 ln L
I jk = − .
∂β j ∂βk
The iterative weighted least squares (IWLS) method can be easily applied to
obtain the MLEs, b, iteratively using the following equation
−1 (m−1) (m−1)
b(m) = X W (m−1) X XW Z
where W is an n × n diagonal matrix with the elements wii = μi2 , and the modified
tiλ −μi
dependent variable, Z , has elements z i = X i b(m−1) + μi2
. Initial approximation,
b , is used in an iterative algorithm to determine the subsequent estimates b(1) , and
(0)
the iterations continue until convergence is obtained. The iterative procedure begins
by setting λ = 1.
As we have discussed earlier that there may be problem in obtaining convergence
if the negative reciprocal link is used and alternatively log link function is a good
choice. In that case, θ = ln μ = X β.
Sharmin and Islam (2017) showed a generalized Weibull distribution in the sense
of Cox and Reid (1987) that transforms the parameters from (η, λ) to (α, λ). The
parameters α and λ are globally orthogonal. This procedure is discussed and applied
by Prudente and Cordeiro (2010). Let α = ηe{λ (2)} ; then, the link function is
−1
defined by

μ (2)
g(μ) = g(α) = ln e λ
,
1 + λ1
where the value of (2) is 1 − γ and γ = 0.577215 . . . is Euler’s constant. The

MLEs of b and λ are obtained iteratively from the following formulas
−1 (m−1) (m−1)
b(m) = X W (m−1) X XW Z (8.35)
and

(m) (m−1) 1 (m)

λ =λ 1+ 1+ζ , (8.36)
ψ (1)
where Z = Xβ + λ−1 −1/2

W δ is a modified dependent variable and ξ = n λτ 1.
−1
Moreover, W = diag (1)2 is a diagonal matrix of order n, and τ = (τ1 , . . . , τn ) is

λ
n×1 with ith component given by τi = −δi log λyii , where δi = αyii exp (2) −
8.11 Weibull Regression Model 159
1. The polygamma function ψ (1) is evaluated at 1 is 1.64493 (Abramowitz and

Stegun 1970).
Fahrmeir and Tutz (2001), Davis (2002), Dobson and Barnett (2008), McCulloch
et al. (2008), Stroup (2012), and Agresti (2015) are useful references for further
materials, basic theories, and examples for generalized linear models.
References
Abramowitz M, Stegun IA (1970) Handbook of mathematical functions with formulas, graphs and
mathematical tables. National Bureau of Standards, Washington, DC
Agresti A (2015) Foundations of linear and generalized linear models. Wiley, New Jersey
Cox DR, Reid N (1987) Parameter orthogonality and approximate conditional inference. J Royal
Stat Soc B 49(1):1–39
Davis CS (2002) Statistical methods for the analysis of repeated measuments. Springer, New York
Dobson AJ, Barnett AG (2008) An introduction to generalized linear models, 3rd edn. Chapman
and Hall, London
Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models.
Springer, New York
Islam MA, Chowdhury RI (2017) Analysis of repeated measures data. Springer, Singapore
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, London
McCulloch CE, Searle SR, Neuhaus JM (2008) Generalized, linear, and mixed models, 2nd edn.
Wiley, New Jersey
Nelder JA, Wedderburn RWM (1972) Generalized linear models. J Royal Stat Soc, Ser A (General)
135(3):370–384
Prudente AA, Cordeiro GM (2010) Generalized Weibull linear models. Commun Stat Theory Meth-
ods 39:3739–3755
Sharmin AA, Islam MA (2017) Generalized Weibull linear models with different link functions.
Adv Appl Stat 50:367–384
Stroup WW (2012) Generalized linear mixed models. CRC Press, Boca Raton
Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models and the Gaussian
Newton method. Biometrika 61:439–447
Chapter 9
Basic Concepts of System Reliability
Abstract A system is a collection of components interconnected to a specific design

in order to perform a given task. The reliability of a system depends on the types,
quantities, and reliabilities of its components. This chapter discusses some basic
ideas behind the analysis of reliability of a system. It derives the distribution and
reliability functions of the lifetime of the system as a function of the distribution or
reliability functions of the individual component lifetimes.
9.1 Introduction
This chapter describes and illustrates the basic concepts of system reliability analysis
by focusing on the components within the system. A component is a part or element
of a larger whole, especially a part of a machine or vehicle. A system is a collection of
components, modules, assemblies, or subsystems that are interconnected to a specific
design in order to perform a given task. The types of components, their quantities
and qualities, and the way they are assembled within the system have a direct impact
on the reliability of the system. That is, the reliability of a system is related to the
types, quantities, and reliabilities of its components. One of the main objectives of
system reliability analysis is to derive or select a suitable probability distribution that
represents the lifetime of the entire system based on the probability distributions of
the lifetimes of its components.
Generally, a component is characterized in terms of two states—working or
failed—first it starts in its working state and changes to a failed state after a cer-
tain time and/or usage. The failure of a component occurs due to a complex set
of interactions between the material properties and other physical properties of the
component and the stresses that act on the component (Blischke and Murthy 2000).
The time to failure (with respect to age or usage) is a random variable, and it can be
modeled by a probability distribution. Subsequent failures of a component depend on
the type of implemented rectification actions. These, in turn, depend on whether the
component is repairable or nonrepairable. In the case of a nonrepairable component,
a failed component needs to be replaced by a new one. On the other hand, in the case

162 9 Basic Concepts of System Reliability
of a repairable component, the failed component/system is repaired instead of being

replaced (Blischke and Murthy 2000).
The outline of the chapter is as follows. Section 9.2 presents a short note on the
reliability block diagram. Sections 9.3 and 9.4 deal with modeling of series and
parallel systems, respectively. Analysis of combined systems reliability is given in
Sect. 9.5. Finally, Sect. 9.6 discusses the analysis of a k-out-of-n system reliability.
9.2 Reliability Block Diagrams
As stated in Blischke et al. (2011), almost all products are built using many com-
ponents, and the number of components used increases with the complexity of the
product. As such, a product can be viewed as a system of interconnected compo-
nents. The decomposition of a product or system involves several levels. The number
of levels that is appropriate depends on the system. The performance of the system
depends on the state of the system (working, failed, or in one of several partially
failed states), and this, in turn, depends on the state (working/failed) of the various
components (Blischke and Murthy 2000; Blischke et al. 2011).
A diagram that displays the relationships of components of a system showing all
logical connections of (functioning) components required for system operation is
called reliability block diagram (RBD). In a RBD, each component is represented
by a block with two endpoints. When the component is in its working state, there is
a connection between the two endpoints. This connection is broken when the com-
ponent is in a failed state (Blischke and Murthy 2000; Blischke et al. 2011; Murthy
and Jack 2014; Ben-Daya et al. 2016). The component block might have a single
input and a single output or multiple inputs and multiple outputs. By convention,
inputs are generally assumed to enter either from the left side or from the top side
of the box, and outputs exit either from the right side or from the bottom side of the
box (Myers 2010). Systems may be of different types, e.g., series structure, parallel
structure, and general structure (combination of series and parallel substructures).
RBD represents a system that uses interconnected blocks arranged in combinations
of series and/or parallel configurations. It can be used to estimate quantitatively the
reliability and availability (or unavailability) of a system by considering active and
standby states.
A system can be analyzed by decomposing it into smaller subsystems or compo-
nents and estimating reliability of each subsystem/component to assess the overall
reliability of the system by applying the rules of probability theory1 according to
the RBD. If a system is constructed for performing more than one function, each
function must be considered individually, and a separate RBD has to be established
for each function of the system. The construction of RBD for a very complex system
may be complicated.
1 Here the rules of probability theory mean, for example, the additive, multiplicative, and conditional
probability rules for two or more events.

9.3 Series System Reliability 163
9.3 Series System Reliability
Generally, an overall system is constructed based on multiple blocks or elements.

These blocks or elements are arranged in series or in parallel or in a combination of
series and parallel. Series system/structure represents the case where the system is
in its working state only when all the components are in working states (Blischke
et al. 2011). As a result, the system will be operational if all of its components are
operational, and system failure will be caused by the failure of any one component.
Such an arrangement is known as series system or series system structure. Series
systems are also referred to as weakest link or chain systems. A simple example of
a series system is the four wheels of a car. If any one of the tires punctures, the car
cannot be driven for practical purposes. Thus, these four tires of the car form a series
system (Dhillon 2002).
The RBD of a simple series system with two components is shown in Fig. 9.1.
To find the reliability of a series system, we consider first a two-component series
system as shown in Fig. 9.1. Suppose T 1 and T 2 be the lifetime random variables
for Component 1 and Component 2, respectively. Let E 1 (t) denote the event that
Component 1 fails through age t, and E 2 (t) denote the event that Component 2
fails through age t, t ≥ 0. Here age means the current lifetime of the component.
Let F 1 (t; θ ) = F 1 (t) = P(T 1 ≤ t) and F 2 (t; θ ) = F 2 (t) = P(T 2 ≤ t) denote the
cumulative distribution functions of T 1 and T 2 corresponding to the events E 1 (t) and
E 2 (t), respectively. The series system shown in Fig. 9.1 fails if either Component 1
fails (event E 1 (t) occurs), Component 2 fails (event E 2 (t) occurs) or both fail (event
E 1 (t) ∩ E 2 (t) occurs). This statement can be expressed pictorially by a Venn diagram,
given in Fig. 9.2.
The probability of the event of system failure, F sys (t), is equal to the probability
that at least one of two components fails. This means F sys (t) is the probability of
the union of the events of individual component failures, i.e., the union of the events
E 1 (t) and E 2 (t). It follows that
Fig. 9.1 Diagram of a series

system with two components Component 1 Component 2
Fig. 9.2 Venn diagram of

the events E 1 (t) and E 2 (t)
E1(t) E1(t)∩E2(t) E2(t)

Fsys (t) = P[E 1 (t) ∪ E 2 (t)]

= P[E 1 (t)] + P[E 2 (t)] − P[E 1 (t) ∩ E 2 (t)]
= P[T1 ≤ t] + P[T2 ≤ t] − P[T1 ≤ t ∩ T2 ≤ t]
= F1 (t) + F2 (t) − F1 (t)F2 (t), t ≥ 0, (9.1)
which can be expressed as follows
Fsys (t) = 1 − [1 − F1 (t) − F2 (t) + F1 (t)F2 (t)]

= 1 − [1 − F1 (t)][1 − F2 (t)]

2
=1− [1 − Fi (t)], t ≥ 0. (9.2)
i=1
In general, if a series system consists of k components and F i (t) denotes the cdf
of the ith component, i = 1, 2, …, k, then the cdf of the system can be expressed as

k
Fsys (t) = 1 − [1 − Fi (t)], t ≥ 0. (9.3)
i=1
If F i (t) and F sys (t) are replaced in terms of their reliability functions, we get

k
1 − Rsys (t) = 1 − [Ri (t)], t ≥ 0.
i=1
Or,

k
Rsys (t) = Ri (t), t ≥ 0. (9.4)
i=1
Thus, the reliability of a series system can be expressed as the product of the
reliabilities of individual components of the system. The reliability of a series system
is always lower than the reliability of any of its components. If the Ri (t)’s are estimated
from independent sets of data, with estimates denoted R̂i (t), the estimate of Rsys (t)
is calculated as the product of the R̂i (t), i = 1, 2,…, k. Let hi (t) denotes the hazard
function of the ith component, i = 1, 2, …, k; then using (2.27) and (9.4), the hazard
function of the system can be expressed as
k
d d
h sys (t) = − ln Rsys (t) = − ln Ri (t)
dt dt i=1
k k
d
= − ln{Ri (t)} = h i (t), t ≥ 0. (9.5)
i=1
dt i=1
This implies that the hazard function of a series system is equal to the sum of the
hazard functions of the individual components of the system.
If the estimators of the component reliabilities are unbiased, so is the estimator of
Rsys (t), since under independence the expectation of the product is the product of the
expectations (See Blischke et al. 2011). The variance of the estimator is somewhat
more complex. For k = 2, the result is
V { R̂(t)} = [E{ R̂1 (t)}]2 V { R̂2 (t)} + [E{ R̂2 (t)}]2 V { R̂1 (t)} + V { R̂1 (t)}V { R̂2 (t)}.
(9.6)
For larger values of k, the result becomes increasingly complex. The estimated
variance may be used to obtain asymptotic confidence intervals in the usual way.2
The reliability of components (especially in systems containing many compo-
nents) is often characterized by constant failure rate. In this case, the exponential
distribution (discussed in Chap. 4) can be applied, which assumes that the hazard
rate is constant with respect to component age. When a component is functioning
during its useful lifetime, constant-hazard model is precisely suitable. The reliability
function of a constant-hazard rate model for component i at age t is
Ri (t) = exp(−λi t), t ≥ 0 (9.7)
where λi is the hazard rate for component i, i = 1, 2, …, k. According to Eq. (9.4),

the reliability of the series system changes to

k
k
Rsys (t) = exp(−λi t) = exp −t λi = exp(−λt), t ≥ 0 (9.8)
i=1 i=1
k
where λ = i=1 λi . The MTTF of the system (assuming that each failed component
is immediately replaced by an identical component) becomes
∞ ∞
1
MTTF = Rsys (t)dt = exp(−λt)dt = . (9.9)
λ
0 0
Equation (9.8) indicates that if the distributions of times to failure of each com-
ponent of a series system follow an exponential distribution, then the distribution of
time to failure of the system is again exponential with the failure rate as the sum of
failure rates of individual components.
2 The variances of
F̂(t) and R̂(t) can also be computed by using the delta method (e.g., see Meeker
and Escobar 1998).
A series system has the following properties:

• The failure rate of the system is always higher than the failure rate of the worst
component in the system.
• The MTTF of a series system is always shorter than that of the MTTF of any
component of the system.
• The reliability function of the system decreases with time faster than that of the
reliability functions of any individual components.
• The reliability of the system decreases as the number of components in the system
increases.
Example 9.1 Suppose the reliabilities of four components denoted by Ri (t), i = 1,

2, 3, 4, for a mission period of one year (t = 1) are given in the following reliability
block diagram, Fig. 9.3.
Since it is a series system, the reliability of the system for a mission period of
one year is equal to the product of the reliabilities of the individual components.
Therefore, the reliability of the series system is given from (9.4) as

4
Rsys (1) = Ri (1) = 0.6 × 0.7 × 0.85 × 0.8 = 0.2856. (9.10)
i=1
Note that the reliability of the system, 0.2856, is less than that of the reliability of
the worst component, R1 (1) = 0.6.
In the reliability block diagram (given in Fig. 9.3), if we assume four different
systems such that the first system consists of only component 1, the second system
consists of components 1 and 2, the third system consists of components 1, 2, and 3,
and fourth system consists of all four components; then, the changes of the reliabilities
of assumed four systems would be as shown in Table 9.1.
R1(1)=0.6 R2(1)=0.7 R3(1)=0.85 R4(1)=0.8
Fig. 9.3 Reliability block diagram for Example 9.1
Table 9.1 Changes of reliabilities for changing the number of components in the series system
System No. of Reliability of system % changes in the reliability
compo-
nents in
the system
First 1 0.6 –
Second 2 0.6 × 0.7 = 0.42 (0.6 − 0.42)/0.6 × 100 = 30
Third 3 0.6 × 0.7 × 0.85 = 0.357 (0.6 − 0.357)/0.6 × 100 = 40.5
Fourth 4 0.6 × 0.7 × 0.85 × 0.8 = 0.2856 (0.6 − 0.2856)/0.6 × 100 = 52.4
Table 9.2 Changes of reliabilities for improving the reliabilities of each component one by one in
the series system
R1 (1) R2 (1) R3 (1) R4 (1) Rsys (1) % changes in the reliability
0.6 0.7 0.85 0.8 0.2856 –
0.7 0.7 0.85 0.8 0.3332 (0.3332 − 0.2856)/0.2856 × 100 = 16.67
0.6 0.8 0.85 0.8 0.3264 (0.3332 − 0.3264)/0.2856 × 100 = 14.29
0.6 0.7 0.95 0.8 0.3192 (0.3332 − 0.3192)/0.2856 × 100 = 11.76
0.6 0.7 0.85 0.9 0.3213 (0.3332 − 0.3213)/0.2856 × 100 = 12.50
The fourth column of Table 9.1 shows the percentages of reliabilities of the systems
decrease compared to the system having a single component, Component 1. This
column indicates that the reliability of a series system decreases as the number of
components in it increases.
In the reliability block diagram (Fig. 9.3), if we increase the reliabilities of each
component one by one by 0.1 taking unchanged the reliabilities of the remaining
components, then the changes of the reliabilities of the systems would be as shown
in Table 9.2.
The sixth column of Table 9.2 shows the percentages of reliabilities of the systems
increase compared to the system having unchanged the reliabilities of the compo-
nents. This column indicates that the improvement in the reliability of the system (in
percentage) is higher when the reliability of the weakest component (Component 1)
is increased by 0.1 in comparison with the cases when reliabilities of the other three
components are increased one by one by the same amount (0.1) (can be seen also in
eGyanKosh 2019). This suggests that if one wants to improve the reliability of the
series system, effort should be given first in improving the reliability of the weakest
Reliability of Products with Two or More Causes of Failure

Many systems, subsystems, and components (which we call “products” here)
have more than one cause of failure. In some applications and for some purposes, it
is important to find the suitable distributions of the times to failure for various failure
causes. Failure cause is sometimes referred to as “failure mode” or “fault mode”,
which provides a description of a fault. Failure modes are identified by studying
the performance of the product. Failure time of a product with two or more failure
modes can be modeled with a series system or competing risk model. That is, the
competing risk model is suitable for modeling the lifetime of the product with more
than one mode of failure. Each risk is like a component in a series system. When one
component fails, the product (or system) fails. A common example of competing
risks in medical studies is one where a patient may die from one of several causes
such as cancer, heart disease, kidney disease, and complications of diabetes. The
investigator is interested in the distribution of the time to failure from one of these
causes in the presence of all other causes (Klein et al. 2014).
Model (9.3) is called a “competing risk model” because it is applicable when a

product may fail by any one of k failure modes, i.e., it can fail due to any one of the
k mutually exclusive causes in a set {C1 , C2 , . . . , Ck }.3 Let Ti be a positive-valued
continuous random variable denoting the time to failure if the product is exposed
only to cause Ci , 1 ≤ i ≤ k. If the product is exposed to all k causes at the same time
and the failure causes do not affect the probability of failure by any other mode, then
the time to failure is the minimum of these k lifetimes, i.e., T = min{T1 , T2 , . . . , Tk },
which is also positive-valued, continuous random variable (Blischke et al. 2011). The
density function of T is given by
⎧ ⎫
k ⎪
⎪ ⎪
⎪
⎨k ⎬
f sys (t) = [1 − F j (t)] f i (t), t ≥ 0, (9.11)
⎪ ⎪
i=1 ⎪
⎩ j=1 ⎪
⎭
j=i
where f i (t) denotes the pdf of the ith failure mode, i = 1, 2, …, k. Equation (9.11)
may be rewritten as
k
f i (t)
f sys (t) = Rsys (t) , t ≥ 0. (9.12)
i=1
Ri (t)
The maximum likelihood estimation of the parameters of a competing risk model

can be found in Blischke et al. (2011) and the references cited therein. Islam and
Chowdhury (2012) discussed on the elimination of causes and its impact in competing
risks.
Example 9.2 To illustrate the competing risk model, this example (Karim 2012)
considers an electronic product for which lifetimes follow an exponential distribution.
The product exhibits a new mode of failure due to mounting problems. If incorrectly
mounted, it can fail earlier, and this is also modeled by an exponential distribution.
The parameters of the exponential distributions for failure modes 1 and 2 are λ1 =
0.0006 and λ2 = 0.0004 per day. The estimate of the reliability function of the
component is
Rsys (t) = exp(−0.0006t) × exp(−0.0004t)

= exp[−(0.0006 + 0.0004)t] = exp(−0.001t), t ≥ 0.
Figure 9.4 displays a comparison of the reliability functions for failure mode 1,
failure mode 2, and product (combined failure modes 1 and 2) for 0 ≤ t ≤ 10,000 days.
This figure can be used to assess the reliability of the component for given
days. For example, the figure indicates the reliabilities of the component at age
2000 days are 0.30 for failure mode 1, (R1 (t)), 0.45 for failure mode 2, (R2 (t)), and
3 The competing risk model has also been called the compound model, series system model, and
multirisk model in the reliability literature.

Fig. 9.4 Comparison of reliability functions for a competing risk model
0.14 for the product (Rsys (t)). The estimated MTTF of the product is found to be
∞
μ = 0 Rsys (t)dt = 1/(λ1 + λ2 ) = 1000 days.
9.4 Parallel System Reliability
A system composed of several subsystems or components is called a parallel struc-

ture or parallel system if it operates if any one or more of its components operate.
This represents the case where the system is in a failed state only when all of the
components are in failed states (Blischke et al. 2011). A very simple example of
a parallel system is the power of a laptop connected with AC power source and a
battery. As long as at least one, either AC power or battery, function properly, the
laptop will get the power.
A simple parallel system with two components, Component 1 and Component 2, is
shown in Fig. 9.5. This system is operational if either component or both components
are operational.
To find the reliability of a parallel system, we consider first a two-component
parallel system as shown in Fig. 9.5. The parallel system shown in Fig. 9.5 fails
if both of the components fail, i.e., Component 1 fails (event E 1 (t) occurs) and
Component 2 fails (event E 2 (t) occurs). The probability of simultaneous occurrence
Fig. 9.5 Diagram of a

Component 1
parallel system with two
components
Component 2
of two mutually independent events equals to the product of the probabilities of the
occurrences of two individual events. Therefore, the resultant cumulative distribution
function for two-component parallel system through age t can be calculated as
Fsys (t) = P[E 1 (t)] × P[E 2 (t)]

= P[T1 ≤ t] × P[T2 ≤ t] = F1 (t) × F2 (t), t ≥ 0. (9.13)
In general, F sys (t), for a parallel system with k components, becomes

k
Fsys (t) = Fi (t), t ≥ 0. (9.14)
i=1
By interchanging F i (t) and F sys (t) with their corresponding reliability functions,
we get

k
Rsys (t) = 1 − [1 − Ri (t)], t ≥ 0. (9.15)
i=1
Thus, the reliability of a parallel system is equal to one minus the product of one
minus the reliabilities of individual components of the system. The reliability of a
parallel system is greater than the individual reliability of any components of the
system. The variance of the estimator (9.15) may be obtained by applying the delta
method.
If the reliabilities of individual components are characterized by constant failure
rates (if λi denotes the hazard rate for component i, i = 1, 2, …, k), the reliability
function of a parallel system changes to

k

Rsys (t) = 1 − 1 − exp(−λi t) , t ≥ 0. (9.16)
i=1
This implies that for the case of parallel system, the reliability function estimating
equation becomes more complex than the case of a series system, even if the failure
rates of the individual elements are constant (also can be seen in Menčík 2016). In
a special case, where all k identical components have the same failure rate (say λ),
the reliability function of parallel system changes to
k
Rsys (t) = 1 − 1 − exp(−λt) , t ≥ 0. (9.17)
9.4 Parallel System Reliability 171
Under such specific situation and also assuming that each failed component is
immediately replaced by an identical component, the MTTF for the system can be
expressed as

1 1 1 1
MTTF = 1 + + + ··· + . (9.18)
λ 2 3 k
A parallel system has the following properties:

• The reliability of a multicomponent parallel system is always higher than the
reliability of the best component in the system.
• The MTTF of a parallel system is always longer than that of the MTTF of any
• The reliability of the system increases as the number of components in the system
increases.
• To increase the reliability of the system, sometimes parallel system can be used
instead of series system.
Example 9.3 Suppose the reliabilities of four components denoted by Ri (t), i = 1,
2, 3, 4, for a mission period of one year (t = 1) are given in the following reliability
block diagram (Fig. 9.6).
Since it is a parallel system, the reliability of the system for a mission period of
one year is given by using (9.15) as

4
Rsys (1) = 1 − [1 − Ri (t)]
i=1
= 1 − (1 − 0.6) × (1 − 0.7) × (1 − 0.85) × (1 − 0.8) = 0.9964. (9.19)
Note that the reliability of the system, 0.9964, is greater than that of the reliability
of the best component, R3 (1) = 0.85. This example also indicates that the reliability
of the system is higher compared to the reliability of the series system of Example
9.1 having the same number of components with the same reliabilities.
For selecting series or parallel system, we have to balance the costs of the com-
ponents and the desire reliability of the system. Whether we give importance to the
Fig. 9.6 Reliability block

R1 (1)=0.6
diagram for Example 9.3
R2 (1)=0.7
R3 (1)=0.85
R4 (1)=0.8
cost or to reliability depends on the severity of the consequence of system failure

(eGyanKosh 2019). For example, we may compare the engine in airplane with a
component of a television. In the case of airplane, at least one additional parallel
engine in the airplane should be set up whatever be the cost of it, as reliability is
more important than the cost. On the other hand, in the case of a component of a
television, priority should be given to cost instead of the reliability because failure
of a television has very less consequence compared to the crash of an airplane.
Like Table 9.1 of Example 9.1, the changes of reliabilities for changing the num-
ber of components in this parallel system can be evaluated easily with a simple
modification in Table 9.1. If we do this in this example, it can be observed that the
reliability of a parallel system increases as the number of components in it increases.
This investigation also suggests that from a reliability point of view to provide more
than two redundant components for a one-component system is not of much benefit.
Again like Table 9.2 of Example 9.1, the changes of reliabilities by increasing the
reliabilities of each component one by one by 0.1 taking unchanged reliabilities of
the remaining components can be investigated easily. If we perform similarly in this
example, it can be observed that the improvement in the reliability of the system is
higher when the reliability of the best component is increased as compared to the
cases when reliabilities of any other components increased (one at a time) by the
same amount (also can be seen in eGyanKosh 2019). This suggests that to improve
the reliability of a parallel system, effort should be given on the improvement of the
reliability of the best component. Thus, the best component dictates the reliability of
the parallel system; on the other hand, the weakest component dictates the reliability
of the series system (see Example 9.1).
Example 9.4 Suppose we want to design a parallel system with overall reliability of
0.97 for a mission period of 18 months by using identical components, each having
individual reliability of 0.30 for the same mission period. We find the minimum
number of components that should be connected in the system to achieve the desired
reliability.
Let Rsys (t) be the overall reliability of the parallel system, and R(t) is the reliability
of each component for a mission period of t months, t ≥ 0. If the required number
of components is k and if all k identical components have the same reliability, R(t),
then the reliability function of the parallel system for t = 18 can be written as
Rsys (18) = 1 − [1 − R(18)]k ,
which is
[1 − R(18)]k = 1 − Rsys (18)

ln[1 − Rsys (18)] ln[1 − 0.97]
∴ k= = = 9.83 10.
ln[1 − R(18)] ln[1 − 0.30]
9.4 Parallel System Reliability 173
Therefore, 10 components should be connected in parallel to achieve a reliability

of 0.97.
9.5 Combined Series and Parallel Systems Reliability
In combined series and parallel system (also known as mixed system), the compo-
nents are connected in series and parallel arrangement to perform a required system
operation. Figure 9.7 displays two types of reliability block diagrams for combined
systems, top system with five components and bottom system with seven compo-
nents.
To assess the reliability of this system, the RBD is broken into series or parallel
subsystems. The formulas for estimating reliabilities for series and parallel systems
are used to obtain the reliability of each subsystem first, and then, the reliability of
the system can be obtained on the basis of the relationship among the subsystems.
Example 9.5 Suppose a system is constructed based on three components as shown
by the following diagram (Blischke et al. 2011) (Fig. 9.8).
If the lifetimes of components 1, 2, and 3 all follow exponential distributions with
λ = 0.001, 0.002, and 0.003 failures per hour, respectively, then the reliability of the
system for ten hours (t = 10) can be computed as follows:
Component 1 Component 2
Component 4
Component 3
Component 5
Component 3
Component 1 Component 2 Component 4 Component 6 Component 7
Component 5
Fig. 9.7 Diagrams of two combined (or mixed) systems, top with five and bottom with seven
components
Fig. 9.8 Reliability block Component 2

diagram for Example 9.5
Component 1
Component 3
Components 2 and 3 in Fig. 9.8 constitute a subsystem in parallel structure. The

reliability of this subsystem at t = 10 (based on (9.15)) is
R2,3 (t = 10) = 1 − {(1 − R2 (10))(1 − R3 (10))}

= R2 (10) + R3 (10) − R2 (10)R3 (10)
= e−0.002×10 + e−0.003×10 − e−0.002×10 e−0.003×10 = 0.99941.
Component 1 and the subsystem with components 2 and 3 are in series structure.
From (9.4), the reliability of the system at t = 10 is
Rsys (t = 10) = R1 (10)R2,3 (10) = e−0.001×10 × 0.99941 = 0.98947.
9.6 K-Out-of-N System Reliability
A system that works if and only if at least k of its n components works is called a
k-out-of-n structure. If k = 1, the system will become parallel structure and if k = n,
the system will become series structure. The RBD of a 2-out-of-3 system is shown in
Fig. 9.9. The 2-out-of-3 system is the system where any two of the three components
are required to work for functioning the system. In Fig. 9.9, a 1-out-of-3 (where k
= 1) system means a parallel structure and 3-out-of-3 (where k = n) system gives a
series structure.
To find the reliability of a k-out-of-n system, we consider first a 2-out-of-3 system
as shown in Fig. 9.9. The cumulative distribution function for a 2-out-of-3 system of
independent components can be expressed as
Fsys (t) = P(T ≤ t)

= P(exactly 2 components fail) + P(exactly 3 components fail)
= [F1 (t) × F2 (t){1 − F3 (t)} + F1 (t) × {1 − F2 (t)} × F3 (t) + {1 − F1 (t)}
× F2 (t) × F3 (t)] + F1 (t) × F2 (t) × F3 (t), t ≥ 0.
According to Meeker and Escobar (1998), in general, for k-out-of-n independent

components, F sys (t) can be written as
Fig. 9.9 Reliability block Component 1 Component 2

diagram of a 2-out-of-3
system structure
9.6 K-Out-of-N System Reliability 175
⎡ n ⎤

n
Fsys (t) = ⎣ Fi (t)δi (1 − Fi (t))(1−δi ) ⎦, t ≥ 0, (9.20)
j=n−k+1 δ∈A j i=1
where δ = (δ1 , δ2 , . . . , δn ) with δi = 1 indicating failure

nof component i by time t
and δi = 0 otherwise, and A j is the set of all δ such that i=1 δi = j. The reliability
function of the system is Rsys (t) = 1 − Fsys (t), t ≥ 0.
For identically distributed components, where Fi (t) = F(t), t ≥ 0 for i = 1, 2,
…, n, Fsys (t) has a binomial probability as follows

n
n (n− j)
Fsys (t) = F(t) (1 − F(t))
j
, t ≥ 0. (9.21)
j=n−k+1
j
In a special case, where all n identical components have the same and constant
failure rate (say λ), the cumulative distribution function for a k-out-of-n configuration
(9.21) changes to

n
n j (n− j)
Fsys (t) = 1 − e−λt e−λt , t ≥ 0. (9.22)
j=n−k+1
j
Example 9.6 For an illustration, consider a 4-out-of-6 system that works if at least
four out of six components work. Let the lifetimes of the six components are indepen-
dent and identically distributed Weibull random variables having reliability function

t 1.5
Ri (t) = exp − , t ≥ 0, i = 1, 2, . . . 6.
200
where the shape and scale parameters are 1.5 and 200 days, respectively. The relia-
bility function of the system, using (9.21), becomes

6
6 ! "# j ! "#(6− j)
Rsys (t) = 1 − 1 − exp −(t/200)1.5 exp −(t/200)1.5 , t ≥ 0.
j
j=3
(9.23)
The reliability function,Rsys (t), is displayed in Fig. 9.10.

From Eq. (9.23) and Fig. 9.10, it can be found that the reliability of the system, for
example, for t = 100 days, is Rsys (100) = 0.7484. This indicates that the probability
of working the system at least 100 days is 0.7484. In reliability experiment, investi-
gation on the change in k out of n plays a very important role in making reliability and
cost related to optimal decision. For this example, a comparison of the reliabilities
of 3-out-of-6, 4-out-of-6, and 5-out-of-6 systems for t = 100 days is as follows.
Fig. 9.10 Reliability

function of a 4-out-of-6
system structure
Rsys (100|k = 3) = 0.9313 > Rsys (100|k = 4) = 0.7484

> Rsys (100|k = 5) = 0.4250.
As expected in a k-out-of-n system, the reliability of the system decreases as k

increases if n is taken as fixed.
The readers interested in a deeper understanding of system reliability analysis

should note the following recommended books (Kececioglu 1994; Rausand and Høy-
land 2004; Pham 2006; Beichelt and Tittmann 2012; Beyersmann et al. 2012; Kuo
and Zhu 2012; Tobias and Trindade 2012).
References
Beichelt F, Tittmann P (2012) Reliability and maintenance: networks and systems. Taylor & Francis
Group, CRC Press
Ben-Daya M, Kumar U, Murthy DNP (2016) Introduction to maintenance engineering: modelling,
optimization and management. Wiley, New York
Beyersmann J, Allignol A, Schumacher M (2012) Competing risks and multi-state models with R.
Springer, New York
Blischke WR, Murthy DNP (2000) Reliability—modeling, prediction, and optimization. Wiley,
New York
London Limited
Dhillon BS (2002) Engineering maintenance: a modern approach. CRC Press, USA
eGyanKosh (2019) A national digital repository. http://www.egyankosh.ac.in/bitstream/123456789/
35168/1/Unit-14.pdf. Accessed on 26 May 2019
Islam MA, Chowdhury RI (2012) Elimination of causes in competing risks: a hazards model
approach. World Appl Sci J 19(5):608–614
Karim MR (2012) Competing risk model for reliability data analysis. In: Proceedings of international
conference on data mining for bioinformatics, health, agriculture and environment, University of
Rajshahi, Bangladesh, pp. 555–562
Kececioglu D (1994) Reliability engineering handbook—V 2. Prentice-Hall, Englewood Cliffs, NJ
References 177
Klein JP, Houwelingen HCV, Ibrahim JG, Scheike (2014) Handbook of survival analysis. Taylor &
Francis Group, CRC Press
Kuo W, Zhu X (2012) Importance measures in reliability, risk, and optimization: principles and
applications. Wiley, New York
Menčík J (2016) Concise reliability for engineers. InTech
Murthy DNP, Jack N (2014) Extended warranties, maintenance service and lease contracts: modeling
and analysis for decision-making. Springer, London
Myers A (2010) Basic elements of system reliability. In: Complex system reliability. Springer Series
in Reliability Engineering, Springer, London
Pham H (2006) System software reliability. Springer, London
Rausand M, Høyland A (2004) System reliability theory: models, statistical methods, and applica-
tions, 2nd edn. Wiley, New York
Tobias PA, Trindade DC (2012) Applied reliability, 3rd edn. Taylor & Francis Group, CRC Press
Chapter 10
Quality Variation in Manufacturing
and Maintenance Decision
Abstract This chapter looks at the issues in modeling the effect of quality vari-
ations in manufacturing. It models the effects of assembly errors and component
nonconformance. This chapter constructs the month of production—month in ser-
vice (MOP-MIS) diagram to characterize the claims rate as a function of MOP and
MIS. It also discusses on the determination of optimum maintenance interval of an
object.
10.1 Introduction
Quality variation in manufacturing is one of the main causes for the high infant (early)
failure rate of the product.1 For example, if some of the bearings are defective,
they can wear faster, causing washing machines with these defective bearings to
fail earlier. Failures during the infant failure period are highly undesirable because
they provide a negative impact on customers’ satisfaction, especially on customers’
first impression on the product. As such, proper monitoring and diagnosis of the
manufacturing process play an important role toward continuous improvement of
product quality.
Most consumer durables, industrial products, and commercial products are pro-
duced using either continuous or batch production.2 One may view the continuous
production as a batch production by dividing the time into suitable intervals (e.g.,
8-hour shifts, days, months, etc.). Similarly, components, whether produced inter-
nally or bought from external vendors, are produced in batches. Due to variations in
materials and/or production, the quality of components can vary from batch to batch.
This, combined with variations in assembly of the components, can result in quality
variations, at the product level, from batch to batch. For proper root cause analysis,
it is necessary to identify the batch number for each failed item at both the product
1 Sections of the chapter draw from the co-author’s (Md. Rezaul Karim) previous published work,
2 For certain products (e.g., automobiles, personal computers), each unit at the product level has a
unique identification number. For example, in the case of automobiles, each vehicle has a unique
identification number, referred to as Vehicle Identification Number (VIN).
180 10 Quality Variation in Manufacturing and Maintenance Decision
and component levels. The ability to perform this identification is called traceability
(Blischke et al. 2011). The date of production is critical for traceability of the batch
in production.
Once the batch numbers of failed products or components are identified, the
analysis for assessing quality variation from batch to batch can be performed and the
results related to other variables of the production process can be utilized in order
to control or reduce the quality variations. The analysis results can be used to find
answers to the questions (Jiang and Murthy 2009), such as, (i) Is there a problem
with product reliability? (ii) If so, is it a design or a manufacturing problem? (iii)
If it is a manufacturing problem, is it an assembly or component nonconformance
problem?
Every object (product, plant, or infrastructure) degrades with its age and/or usage
and finally fails (when it is no longer capable of fulfilling the function for which it
was designed). Maintenance actions are used to control the degradation processes
and to reduce the likelihood of failure of an object. In order to preserve the function of
the system, it is vital to identify the maintenance strategies that are needed to manage
the associated failure modes that can cause functional failure (Ahmadi 2010). An
effective maintenance requires proper data management—collecting, analyzing, and
using models for decision making (Murthy et al. 2015). Finally, it determines an
appropriate lifetime (age, usage, etc.) interval for the preventive maintenance of an
object.
This chapter looks at the issues in modeling the effect of quality variations in
manufacturing and in determining the optimum maintenance interval of an object.
The outline of the chapter is as follows. Section 10.2 discusses on reliability from a
product life cycle perspective. Section 10.3 explains the effect of quality variations
in manufacturing. This section models the effects of assembly errors and component
nonconformance. Section 10.4 deals with the construction of month of production—
month in service (MOP-MIS) diagram. Section 10.5 discusses on maintenance and
optimum maintenance interval of an object.
10.2 Reliability from Product Life Cycle Perspective
The life cycle of a product (consumer, commercial, or industrial) is basically the

period of time during which it is in existence, either conceptually or physically, and
may be defined in various ways. It is commonly referred to as the product life cycle
(PLC). From a PLC perspective, there are several different notions of reliability. The
sequential links among different notions along with the potential factors that affect
the PLC are demonstrated in Fig. 10.1 (given in Murthy et al. 2008; Blischke et al.
2011).
Design reliability, inherent reliability, reliability at sale, and field reliability are
described in short below.
10.2 Reliability from Product Life Cycle Perspective 181
DESIGN INHERENT RELIABILITY FIELD

RELIABILITY RELIABILITY AT SALE RELIABILITY
CUSTOMER NEEDS ASSEMBLY TRANSPORTATION USAGE MODE AND

ERRORS INTENSITY
DESIGN PRODUCTION SALE USE
RELIABILITY COMPONENT OPERATING

SPECIFICATIONS NONCONFORMANCE STORAGE ENVIRONMENT
Fig. 10.1 Different notions of reliability of a standard product
Design Reliability At the design stage, the desired product reliability is determined
through a tradeoff between the cost of building in reliability and the corresponding
consequence of failures. Design reliability is assessed during the design stage linking
the reliability of the components.3 Manufacturers assess this reliability prior to the
launching of the product based on limited test data.
Inherent Reliability For standard products produced in volume, the inherent relia-
bility is the reliability of the produced item that can differ from the design reliability
due to quality variations in manufacturing (such as assembly errors and/or noncon-
forming components) (Jiang and Murthy 2009).
Reliability at Sale Reliability at sale is the reliability of the product that a customer
gets at the time of purchase. After production, the product must be transported to the
market and is often stored for some time before it is sold. The reliability of a unit at sale
depends on the mechanical load (resulting from vibrations during transport), impact
load (resulting from mishandling), duration of storage, and the storage environment
(temperature, humidity, etc.) (Blischke et al. 2011). As a result, the reliability at sale
can differ from the inherent reliability (Murthy et al. 2008).
Field Reliability The field reliability is the reliability of the product in operation.
The field reliability is calculated based on the recorded performance to the customers
during the use of the product. The reliability performance of a unit in operation
depends on the length and environment of prior storage and on operational factors
such as the usage intensity (which determines the load—electrical, mechanical, ther-
mal, and chemical on the unit), usage mode (whether used continuously or intermit-
tently), and operating environment (temperature, humidity, vibration, pollution, etc.)
and, in some instances, on the human operator (Murthy et al. 2008; Blischke et al.
2011). The field reliability is also called the actual reliability or actual performance
of the product.
3 The linking of component reliabilities to product reliability is discussed in Chap. 9 and in Blischke
et al. (2011).
10.3 Effect of Quality Variations in Manufacturing
Quality variation in manufacturing is one of the main causes for the high infant
(early) failure rate of the product. As such, proper monitoring and diagnosis of the
manufacturing process play an important role toward continuous improvement of
product quality. The reliability of manufactured products can differ from the desired
design reliability due to variations in manufacturing quality (Jiang and Murthy 2009).
Failure data collected from the field provide useful information to assess whether the
change in product reliability variation may be considered to have a reasonable impact
or not. This section looks at the issues in modeling the effect of quality variations in
manufacturing on product reliability.
Let F0 (t) denote the distribution function of the lifetime variable T for the products
with design reliability. Let R0 (t), f 0 (t), and h 0 (t) denote, respectively, the reliability
function, the density function, and the hazard function associated with F0 (t). Quality
variations in manufacturing can occur for a variety of causes. Two of the main causes
of variations are (i) assembly error and (ii) component nonconformance.
10.3.1 Assembly Errors
Generally, a simple product consists of several components that are made separately
and then assembled together in production. There may occur errors in the individual
components or in the assembly of the components. If the errors occur during the
assembly operation of the components, the errors are known as assembly errors.
Assembly errors occurred in the products due to improper manufacturing of the
products. This type of assembly operation depends on the product. For an electronic
product, one of the assembly operations is soldering. If the soldering is not done
properly (called dry solder), then the connection between the components can break
within a short period, leading to a premature failure. For a mechanical component, a
premature failure can occur if the alignment is not correct or the tolerance limits are
violated (Blischke et al. 2011).
Failures resulting from assembly errors can be viewed as a new mode of failure
that is different from other failure modes that one examines during the design process
(Jiang and Murthy 2009). Let F1 (t) denote the distribution function associated with
this new failure mode, and R1 (t), f 1 (t), and h 1 (t) denote, respectively, the reliability
function, density function, and hazard function associated with F1 (t). The hazard
function h 1 (t) is a decreasing function of t, implying that failure will occur sooner
rather than later, and that the MTTF under this new failure mode is much smaller
than the design MTTF. Let q, 0 ≤ q ≤ 1, denote the probability that an item has
an assembly error. If Ra (t) denote the reliability function of the produced items,
then according to Jiang and Murthy (2009) and Blischke et al. (2011), Ra (t) can be
expressed as
10.3 Effect of Quality Variations in Manufacturing 183
Ra (t) = R0 (t)[1 − q F1 (t)], t ≥ 0. (10.1)
If q = 0, then Ra (t) = R0 (t), as is expected—there is no assembly errors. If q =

1, then Ra (t) = R0 (t)R1 (t), which is the reliability function of a twofold competing
risk model.4
10.3.2 Component Nonconformance
Items that are produced with nonconforming components will not meet design speci-
fications and will tend to have an MTTF that is much smaller than the intended design
MTTF. Let F2 (t) denote the distribution function of items that have nonconforming
components, and R2 (t), f 2 (t), and h 2 (t) denote, respectively, the reliability, density,
and hazard functions associated with F2 (t). Here, h 2 (t) is an increasing function of
t, with h 2 (t) > h 0 (t) for all t. Let p, 0 ≤ p ≤ 1, denote the probability that an
item produced has nonconforming components, so that its distribution function is
given by F2 (t). Then (1 − p) is the probability that the item is conforming and has
distribution function F0 (t). As a result, the reliability of the items produced is given
by
Rn (t) = (1 − p)R0 (t) + p R2 (t), t ≥ 0. (10.2)
This is the reliability function of a mixture model involving two distributions. If

p = 0, then Rn (t) = R0 (t), as is expected—all items have conforming components,
and if p = 1, then Rn (t) = R2 (t), as all items have nonconforming components.5
10.3.3 Combined Effect of Assembly Errors and Component

Nonconformance
If items are produced with both assembly errors and component nonconformance,
the reliability function of the items is given by6
Rq (t) = [(1 − p)R0 (t) + p R2 (t)][1 − q F1 (t)], t ≥ 0. (10.3)
4 A general k-fold competing risk model means the competing risk model derived based on k failure
modes of the product.

5 A general k-fold finite mixture distribution is a weighted average of distribution functions given
k k
by F(x) = i=1 pi Fi (x), with pi ≥ 0, i=1 pi = 1 and Fi (x) ≥ 0, i = 1, 2, . . . , k distribution
functions associated with the k subpopulation are called the components of the mixture.
6 This is discussed, for example, in Jiang and Murthy (2009) and Blischke et al. (2011).
This involves three distributions and two additional parameters p and q. If p = 0,

then Rq (t) becomes Ra (t), given in (10.1), if q = 0, then Rq (t) becomes Rn (t),
given in (10.2), and if p = q = 0, then Rq (t) = R0 (t), as is expected—no quality
variations in manufacturing.
The density functions, distribution functions, and hazard functions corresponding
to Ra (t), Rn (t), and Rq (t) can be obtained easily based on the relationship given in
Table 2.3. For a given data set, the maximum likelihood estimation method can be
applied for estimating the parameters of the models (10.1), (10.2), and (10.3). How-
ever, the method requires numerical optimization techniques. To make the figures
of reliability functions Ra (t), Rn (t), and Rq (t), and their corresponding distribution
functions Fa (t), Fn (t), and Fq (t), density functions f a (t), f n (t), and f q (t), and haz-
ard functions h a (t), h n (t), and h q (t), we assume the Weibull distribution for F0 (t),
F1 (t), and F2 (t) with different values of parameters. Similar to Jiang and Murthy
(2009), we assume the parameter values as: β0 = 2.8, η0 = 24; β1 = 0.6, η1 = 0.5;
and β2 = 3.5, η2 = 10. The values of p and q are considered as 0.10 and 0.05, respec-
tively. The reliability functions, distribution functions, density functions, and hazard
functions are shown in Fig. 10.2. The figure illustrates different types of shapes of
these functions for the three models (for more interpretations, see Jiang and Murthy
2009).
10.4 Month of Production—Month in Service (MOP-MIS)

Diagram
The MOP-MIS (month of production—month in service) diagram is used to relate

production dates, failures, and age at failure (Blischke et al. 2011). It may also be
used to evaluate and compare the quality of products from warranty claims or failure
data by each MOP and MIS. In this analysis, we are looking at monthly data, but if
necessary, the unit “month” can be substituted easily with day, week, year, and so
on.7
10.4.1 Notations for MOP-MIS Diagram
To construct MOP-MIS diagrams, we use the following notations:

Mi Number of units produced in period i, i = 1, 2, …, I,
Si j Number ofunits produced in period i and sold in period j, j = 1, 2, …, J (i ≤
j), (Mi = j≥i Si j , if all products produced in period i are sold),
n i jk Number of warranty claims in period k = 1, 2, …, K for units produced in
period i and sold in period j (i ≤ j ≤ k)
7 In some cases, it can be a shift, if a company operates more than one shift per day.
10.4 Month of Production—Month in Service (MOP-MIS) Diagram 185
Fig. 10.2 Reliability, distribution, density, and hazard functions for the models of assembly errors,
component nonconformance, and combined effects
Note that I represents the total number of production periods, J is the total number
of sale periods and K is the total number of claim periods with I ≤ J ≤ K .
MOP Month of production (indexed by subscript i),
MOS Month of sale (indexed by subscript j),
MIS Month in service (duration for which the item is in use—indexed by t = k− j),
and
n it Number of items from MOP i that fail at age t.
10.4.2 MIS-Based Warranty Claims Rate for Each MOP
Let the number of items under warranty (or risk of claims) at the beginning of age
group t (MIS t) for the MOP i denoted by RC 3 (i, t), is given by
⎧
⎪ −t+1)
⎪ min(J,K
⎪
⎪ Si j , if t = 1
⎨
j=i
RC3 (i, t) = min(J,K −t+1) (10.4)
⎪
⎪
j+t−2
⎪
⎪ Si j − n i jk if t > 1.
⎩
j=i k= j
After we calculate the number of warranty claims (WC) and the warranty claims
rate (WCR), and we can construct MOP-MIS plots or tables. The age-based number
of warranty claims at age t, WC(t) or nt , defined in (5.11), do not make any difference
by month of production. Another way of defining this is
min(J,K
−t+1)
W C2 (i, t) = n it = n i, j, j+t−1 , i = 1, 2, . . . , I ; t = 1, 2, . . . min(K , W ),
j=i
(10.5)
which characterizes the number of claims as a function of MOP(i) and MIS(t). In

this case, the warranty claims rate (WCR) can be defined as
W C2 (i, t) n it
W C R3 (i, t) = = , i = 1, 2, . . . , I ; t = i, i + 1, . . . , min(W, K )
RC3 (i, t) RC3 (i, t)
(10.6)
which indicates the age-based or MIS-based claim rates for each production month.
For a particular month of production, if the warranty claims rate shows a sudden
change such as a noticeable increase compared to other months of production, then
it may indicate a quality-related problem in that month of production. Definition and
the procedure of estimation of age-based WCR are discussed in detail in Sect. 5.4.
10.4.3 An Illustration of MOP-MIS Diagram
Example 10.1 This example (reproduced with permission from Blischke et al. 2011)
considers warranty claims data for an automobile component. Data relating to com-
ponents manufactured over a 12-month period (I = 12). The component is non-
repairable and the automobiles on which it is used are sold with a nonrenewing
two-dimensional free replacement warranty. The warranty region is a rectangle with
W = 18 months (age limit) and U = 100,000 km (usage limit). The components were
sold over 28 months (J = 28) and the data of the study were on claims that were
filed over an observation interval of 29 months (K = 29). The number of warranty
claims over the 29-month period from the start of production was 2230.
The supplementary data required for performing the analysis are the monthly
production amounts (M i ) over 12 months and monthly sales amounts (S ij ) over
28 months, i = 1, 2,…, 12, j = 1, 2,…, 28. For reasons of confidentiality, details of
monthly production and sales are not disclosed. The total number of units produced
10.4 Month of Production—Month in Service (MOP-MIS) Diagram 187
over the 12-month period was 75,700. The total number sold was 75,666, implying
that nearly all the items produced were sold, but with a lag8 between production and
sale dates. The objective of the analysis is to investigate production-related problems
by looking at the reliability of components produced in different batches (monthly).
The monthly sales data (S ij ) and failures as a function of MIS(t) and MOS(j) for
a particular MOP (September, i = 9) are given in Table 10.1.
The number of items under warranty at the beginning of time period t, RC 3 (i, t),
the number of warranty claims, WC 2 (i, t), and the warranty claims rates WCR3 (i, t)
can be calculated using Eqs. (10.4), (10.5), and (10.6), respectively. These values are
given in Table 10.2.
The estimates of WCR3 (i, t) for the other eleven MOP can be calculated similarly.
All these results are displayed in Table 10.3.
Based on Table 10.3, the MOP-MIS plot of WCR3 (i, t) for all MOP (i = 1, 2,
…, 12) and MIS (t = 1, 2, …, 18) is shown in Fig. 10.3. This figure is useful in
determining if the failure rates are related to the month in service and/or month of
production. The figure indicates that the warranty claims rates are initially decreasing
with respect to the month of production and that there is a significant increase for
the sixth month (June). In MOP June, the high claims rates are for 10, 12, 13, 15, 16,
17, and 18 MIS.
Figure 10.4 shows the estimates of WCR for each MIS separately.9 This figure
indicates that the production period July–November, MOP(7)–MOP(11), is the best
in the sense that the claim rates are low and stable for all MIS in this period. For MIS
from 1 to 10, the claim rates are low and approximately constant in all MOPs. The
variation in claim rates in different MOP increases as MIS increases.
The MOP-MIS charts given above are useful in determining whether or not there
are problems in production. If the WCR for a MOP is unexpectedly high, this indicates
that there may be production-related problems in that MOP. Figures 10.3 and 10.4
indicate that there are some problems with the January, February, June, and possibly
December MOP and these MOPs need further investigation.
10.5 Maintenance of an Object
Every object (product, plant, or infrastructure) is designed and comprised of several

components (or elements) to perform its intended function properly. Components
degrade with age and/or usage and this, in turn, affects their performance. A com-
ponent is considered to have failed when its performance level falls below a specific
predefined level. An object fails due to the failure of one or more of its components.
The performance of a product or system depends not only on its age and/or usage,
8 More on sales lag can be found in Karim and Suzuki (2004) and Karim (2008).
9 The library ggplot2 of R-language is used to create Figs. 10.1 and 10.3. These figures can also be
generated after importing the estimated WCR3 (i, t) in a Minitab worksheet and choosing graph →
scatterplot → with connect and group.
188
Table 10.1 Monthly sales (Si j ) and failures (n it ) indexed by MOS(j) and MIS(t) for a particular MOP(i = 9)
j Si j Failures {nit } in MIS (t) under warranty Tot
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 1064 1 1 2 1 1 2 1 1 1 1 12
2 3600 4 5 1 1 3 5 6 4 5 4 3 4 3 3 5 7 5 68
3 2113 1 1 2 3 2 1 1 2 5 2 3 2 1 3 29
4 720 1 1 1 1 1 1 1 1 1 1 1 11
5 442 1 3 2 1 1 8
6 235 1 2 1 1 1 1 7
7 168 0
8 94 0
9 74 1 2 3
10 90 1 1 2
11 51 1 1 2
12 16 0
13 63 0
14 82 1 1 2 4
15 51 0
16 27 0
17 8 0
18 20 0
19 12 0
20 8 1 1
Tot 8938 6 8 3 5 6 7 9 12 8 10 10 13 8 8 9 6 9 10 147
10 Quality Variation in Manufacturing and Maintenance Decision
10.5 Maintenance of an Object 189
Table 10.2 Calculated

MIS (t) WC2 (9, t) RC 3 (9, t) WCR3 (9, t)
values of WCR3 (i, t) for a
particular MOP(i = 9) of an 1 6 8938 0.0007
automobile component 2 8 8932 0.0009
3 3 8917 0.0003
4 5 8902 0.0006
5 6 8877 0.0007
6 7 8863 0.0008
7 9 8829 0.0010
8 12 8769 0.0014
9 8 8679 0.0009
10 10 8608 0.0012
11 10 8582 0.0012
12 13 8523 0.0015
13 8 8422 0.0010
14 8 8343 0.0010
15 9 8241 0.0011
16 6 8064 0.0007
17 9 7830 0.0011
18 10 7387 0.0014
Total 147 0.0173
but also on its design and operation and on the servicing and maintenance during
its operational lifetime. Thus, proper functioning over an extended lifetime requires
proper actions (e.g., changing the oil in an engine) on a regular basis, adequate repair
or replacement of failed components, proper storage when not in service, and so
forth (Blischke and Murthy 2003). These actions are parts of maintenance that are
applied to control the degradation processes, to reduce the likelihood of failure, to
restore a failed object to an operational state, etc. Maintenance is the combination
of technical, administrative, and managerial actions carried out during the life cycle
of an item and intended to retain it in, or restore it to, a state in which it can perform
the required function (SS-EN 13306:2010).
There are two primary types of maintenance actions: (i) Preventive maintenance
(PM) is comprised of actions to control the degradation processes and to reduce
the likelihood of failure of an item (component or object). This provides system-
atic inspection, detection, and correction of incipient failures before they occur. (ii)
Corrective maintenance (CM) is comprised of actions to restore a failed product or
system to a specified operational state. It is carried out to restore a failed object to
the operational state by rectification actions (repair or replace) on the failed compo-
nents necessary for the successful operation of the object. Corrective maintenance
actions are unscheduled actions, whereas preventive maintenance actions are sched-
uled actions.
Table 10.3 MOP-MIS table of WCR3 (i, t) of automobile component for all MOP
190
MIS-1 MIS-2 MIS-3 MIS-4 MIS-5 MIS-6 MIS-7 MIS-8 MIS-9

MOP-1 0.0017 0.0006 0.0013 0.0021 0.0008 0.0019 0.0034 0.0036 0.0038
MOP-2 0.0011 0.0012 0.0013 0.0012 0.0015 0.0015 0.0037 0.0017 0.0037
MOP-3 0.0009 0.0003 0.0009 0.0017 0.0010 0.0024 0.0017 0.0016 0.0014
MOP-4 0.0008 0.0006 0.0009 0.0002 0.0009 0.0011 0.0009 0.0009 0.0013
MOP-5 0.0012 0.0007 0.0009 0.0022 0.0015 0.0010 0.0009 0.0022 0.0015
MOP-6 0.0009 0.0008 0.0005 0.0016 0.0014 0.0021 0.0011 0.0030 0.0030
MOP-7 0.0012 0.0006 0.0006 0.0008 0.0011 0.0016 0.0008 0.0014 0.0020
MOP-8 0.0004 0.0006 0.0006 0.0004 0.0004 0.0010 0.0003 0.0006 0.0009
MOP-9 0.0007 0.0009 0.0003 0.0006 0.0007 0.0008 0.0010 0.0014 0.0009
MOP-10 0.0007 0.0007 0.0000 0.0005 0.0002 0.0002 0.0011 0.0007 0.0002
MOP-11 0.0005 0.0005 0.0005 0.0003 0.0005 0.0005 0.0003 0.0006 0.0003
MOP-12 0.0011 0.0010 0.0010 0.0010 0.0013 0.0008 0.0024 0.0013 0.0016
MOP-1 0.0025 0.0045 0.0064 0.0059 0.0034 0.0074 0.0050 0.0043 0.0064
MOP-2 0.0023 0.0033 0.0049 0.0049 0.0044 0.0050 0.0051 0.0050 0.0042
MOP-3 0.0016 0.0037 0.0028 0.0027 0.0025 0.0023 0.0031 0.0029 0.0042
MOP-4 0.0023 0.0015 0.0023 0.0021 0.0020 0.0026 0.0024 0.0036 0.0016
MOP-5 0.0013 0.0020 0.0023 0.0028 0.0023 0.0017 0.0031 0.0043 0.0035
MOP-6 0.0035 0.0042 0.0053 0.0045 0.0029 0.0057 0.0050 0.0040 0.0035
MOP-7 0.0019 0.0022 0.0015 0.0013 0.0016 0.0017 0.0017 0.0017 0.0029
MOP-8 0.0007 0.0012 0.0013 0.0010 0.0013 0.0017 0.0015 0.0017 0.0026
(continued)
10 Quality Variation in Manufacturing and Maintenance Decision
MOP-9 0.0012 0.0012 0.0015 0.0010 0.0010 0.0011 0.0007 0.0011 0.0014
MOP-10 0.0019 0.0005 0.0007 0.0005 0.0002 0.0010 0.0005 0.0003 0.0003
MOP-11 0.0009 0.0006 0.0008 0.0002 0.0010 0.0005 0.0005 0.0012 0.0003
MOP-12 0.0023 0.0021 0.0034 0.0037 0.0024 0.0027 0.0013 0.0005
10.5 Maintenance of an Object
191
Fig. 10.3 MOP-MIS Chart of WCR3 (i, t)
Preventive maintenance actions can be classified into the following categories

(Blischke and Murthy 2003): (i) clock-based maintenance, (ii) age-based main-
tenance, (iii) usage-based maintenance, (iv) condition-based maintenance, (v)
opportunity-based maintenance, and (vi) design-out maintenance.
According to Blischke and Murthy (2003), the classification of corrective main-
tenance actions depends on the types of repair actions as follows: (i) good-as-new
repair, (ii) minimal repair, (iii) different-from-new repair (I), and (iv) different-from-
new repair (II).
An extensive review of concepts, modeling, and analysis of maintenance and
reliability is given in Pham and Wang (1996), Blischke and Murthy (2000, 2003),
and Benbow and Broome (2008).
10.5.1 Optimum Preventive Replacement Time
The main objectives of a good preventive maintenance program are to (i) minimize
the overall costs, (ii) minimize the downtime of the object, and (iii) have the reliability
of the object at a specified level. In order to achieve these objectives, it is important to
determine an appropriate lifetime (age, usage, etc.) interval for the preventive mainte-
nance of the object. Here, we consider the optimal age-based preventive replacement
policy. Under this policy, to determine the optimal replacementtime, an objective
function is formulated that explains the associated costs and risks.
Fig. 10.4 MOP-MIS chart of WCR3 (i, t) for each MIS separately
To formulate the objective function, it is assumed that if the product/system fails

before lifetime T (e.g., hours, days, moths, etc.), perform corrective maintenance
at the time of failure and reschedule the preventive maintenance after T operation
times, and if it does not fail by lifetime T, perform preventive maintenance action. It
is also assumed that after the preventive maintenance, the system will be as good as
new condition.
The objective function is the asymptotic expected cost per unit time and the time
between two successive renewal points defines a cycle. The asymptotic expected cost
per unit time can be obtained as the ratio of the expected cycle cost (ECC) and the
expected cycle length (ECL) (Blischke and Murthy 2000; Ruhi and Karim 2016),
given by
Expected cycle cost (ECC) E[C(T )]

J (T ) = = , (10.7)
Expected cycle length (ECL) E[L(T )]
where C(T ) and L(T ) denote the cost per cycle and length of a cycle at T, respectively.
Let the time to failure, X, is a random variable with cumulative density function F(x).
A PM action results if X ≥ T in which case the cycle length is T with probability
R(T ), the reliability at T. A CM action results when X < T and the cycle length is X
(Sultana and Karim 2015).
Let C f and C p denote the average cost of a CM and PM replacement, respectively.
At time T, the cost per cycle C(T ) is a random variable, which takes values C p with
probability R(T ) and C f with probability F(T ). As a result, E[C(T )] is given by
E[C(T )] = C p R(T ) + C f F(T ). (10.8)
Again, at time T, the length of a cycle L(T ) is another random variable having
both discrete value T with probability R(T ) and continuous value t with probability
f (t)dt, which implies P[L(T ) ∈ (t, t + dt)] = f (t)dt for t ≤ T. As a result, E[L(T )]
is given by

T
E[L(T )] = T R(T ) + t f (t)dt. (10.9)
0
Recall that, d
dt
R(t) = − f (t), and by applying integration by parts, we get
⎡ ⎤

T
T
T
d
t f (t)dt = − t R(t)dt = −⎣ t R(t)|0T − 1.R(t)dt ⎦
dt
0 0 0
⎡ ⎤

T
T
= −⎣T R(T ) − R(t)dt ⎦ = R(t)dt − T R(T )
0 0
Inserting this value in (10.9), we have

T
E[L(T )] = R(t)dt (10.10)
0
Thus, using (10.8) and (10.10), the objective function can be expressed as
C p R(T ) + C f F(T )
J (T ) = T . (10.11)
0 R(t)dt
The optimum replacement time can be found by minimizing the expected cost
per unit time J(T ) with respect to T.
Example 10.2 Optimum Replacement Time: Suppose that the lifetime random vari-
able X (age in hours) of a component follows the Weibull distribution with shape
parameter
β = 2.5 and scale parameter η = 1000 h. Therefore,
F(x) = 1 −
exp −(x/1000)2.5 , x ≥ 0 and R(x) = exp −(x/1000)2.5 , x ≥ 0. Assume that
the cost for a corrective maintenance, C f = $5, and the cost for a preventive mainte-
nance, C p = $1. We estimate the optimum replacement age in order to minimize the
objective function, J(T ), which depends on preventive and corrective maintenance
costs defined in (10.11).
Note that the component has an increasing failure rate with age (since the value
of the shape parameter of Weibull distribution is greater than 1) and the cost for
preventive maintenance is less than the cost of corrective maintenance. This implies
that the conditions for the optimum age replacement policy have been satisfied with
the component. From (10.11), the asymptotic expected maintenance cost per unit
time (the objective function) is given by

5 × 1 − exp −(T /1000)2.5 + exp −(T /1000)2.5
J (T ) = T . (10.12)
0 exp −(x/1000)
2.5
dx
Fig. 10.5 Optimum replacement age when lifetime follows the Weibull distribution with parameters
(shape = 2.5, scale = 1000) assuming C p = $1 and C f = $5
A numerical optimization method is required to minimize J(T ) and to find the

optimum replacement age, T. The optimize() function of the R program can be
used for this purpose using the integrate() function for evaluating the denominator
of (10.12). Figure 10.5 shows the plot of J(T ) versus T. This figure indicates the
optimum age T = 493.05 h that minimizes the expected cost rate function at the
value of J(T ) = $0.0035. This suggests that the optimum maintenance age of the
component is 493.05 h for C f = $5 and C p = $1.
References
Ahmadi A (2010) Aircraft scheduled maintenance programme development: decision support

methodologies and tools. Doctoral thesis, Luleå University of Technology, Luleå, Sweden
Benbow DW, Broome HW (2008) The certified reliability engineer handbook. American Society
for Quality, Quality Press
London Limited
Blischke WR, Murthy DNP (eds) (2003) Case studies in reliability and maintenance. Wiley, NY
Jiang R, Murthy DNP (2009) Impact of quality variations on product reliability. Reliab Eng Syst
Safe 94:490–496
Murthy DNP, Karim MR, Ahmadi A (2015) Data management in maintenance outsourcing. Reliab
Eng Syst Safe 142:100–110
Murthy DNP, Rausand M, Osteras T (2008) Product reliability–performance and specifications.
Springer, London
Pham H, Wang H (1996) Imperfect maintenance. Eur J Oper Res 94(3):425–438
Ruhi S, Karim MR (2016) Selecting statistical model and optimum maintenance policy: a case
study of hydraulic pump. SpringerPlus 5:969
Sultana N, Karim MR (2015) Optimal replacement age and maintenance cost: a case study. Am J
Theor Appl Stat 4(2):53–57
SS-EN 13306 (2010) Maintenance terminology. SIS Förlag, Stockholm
Karim MR, Suzuki K (2004) Analysis of field failure warranty data with sales lag. Pakistan J Statist
20:93–102
Karim MR (2008) Modelling sales lag and reliability of an automobile component from warranty
database. Int J Reliab Saf 2:234–247
Chapter 11
Stochastic Models
Abstract In survival and reliability analysis, the role of Markov chain models is
quite useful in solving problems where transitions are observed over time. It is very
common in survival analysis that a subject suffering from a disease at a time point
will recover at a later time. Similarly, in reliability, a machine may change state from
nondefective to defective over time. This chapter discusses the Markov chain model,
Markov chain model with covariate dependence, and Markov model for polytomous
outcome data.
11.1 Introduction
In survival analysis and reliability, the role of Markov chain models is quite useful
in solving problems where transitions are observed over time.1 The transition may
occur longitudinally and we observe change in state space. It is very common in
survival analysis that a subject suffering from a disease at a time point will recover at
a later time. Similarly, in reliability, a machine may change state from nondefective
to defective over time. There may be transitions from the normal state of health to
disease state, or defective state to nondefective state due to appropriate application
of recovery procedures.
In reliability analysis, we often have to construct block diagrams for analyzing
complex systems that become relatively convenient with the application of Markov
models. Similarly, in analyzing the repair process and availability problems, the
Markov models can provide useful insights. The underlying concepts of system state
and transition of states represent both the functioning and failed states over time
without making the representation complex.
The use of stochastic processes, particularly Markov models have been increasing
rapidly in quality of service management in reliability in both safety and business
critical applications (Rafiq 2015). The discrete time Markov models are used exten-
sively in reliability-related quality of service applications while the continuous time
Markov processes are used generally in analyzing the performance of quality of ser-
1 Sections of this chapter draw heavily from the co-author’s (M. Ataharul Islam) previous published
work, reused here with permissions (Islam et al. 2012).

198 11 Stochastic Models
vices that depends on time such as response time and throughput problems including
several other issues of concern in the field of reliability. The use of Markov models
in statistical software testing has been quite old (Whittaker and Thomason 1994).
It is applied for predicting the severity of risk of workers in industries (Okechukwu
et al. 2016). In a reinforced concrete structure, service life has been studied for
degradation of concrete overtime using Markov models (Possan and Andrade 2014).
In analyzing the reliability of components of both parallel and series systems, the
applications of Markov models can be very useful. In obtaining system performance
measures of reliability, the long-run steady-state properties of Markov models can
provide helpful insights. Similarly, the importance of Markov models in analyzing
survival data emerging from time series, panel, or longitudinal data is well docu-
mented. In epidemiological and survival studies, there are extensive applications too
including chronic diseases with well-defined phases like cancer and autoimmune
diseases, dementia due to the neurodegenerative disease for possible prediction and
treatment strategies, depression status of elderly, disease status over time, etc.
The outline of the chapter is as follows. Section 11.2 discusses the Markov chain.
Section 11.3 deals with the higher-order Markov chains. The first-order Markov chain
model with covariate dependence and the second-order Markov chain model with
covariate dependence are presented, respectively, in Sects. 11.4 and 11.5. Section 11.6
explains the Markov model for polytomous outcome data. Section 11.7 illustrates an
application of Markov model to analyze the health and retirement study (HRS) data.
11.2 Markov Chain
We can define a stochastic process as a collection of random variables {Y (t), t ∈ T }

indexed by a parameter such as time or space. Here, Y (t) is the state of the system at
time t for a given t as a random variable. We call the values of Y (t) states and the state
space takes into account the set of possible values. Let T be the index set or parameter
space which is the set of possible values of the indexing parameter t. Examples: (i)
Let us consider disease status at time t is denoted by Y (t) and Y (t) = 1 if diseased
at time t and Y (t) =0 if not diseased; (ii) Let us observe the product with fault at
time t denoted by Y (t) = 1 and without fault by Y (t) = 0. In many instances, the
state space may include more than two states. We have given examples of stochastic
processes known as discrete time stochastic processes characterized by countable
index sets where the states are denoted by Y0 , Y1 , . . . at time points t = 0, 1, . . ..
Since a stochastic process is a collection of random variables, {Y (t), t ∈ T }, we may
need the joint distribution denoted by f (y0 , y1 , . . . , yk ) if there are (k + 1) outcomes
observed at (k + 1) time points t = 0, 1, . . . , k. We need to consider joint distribution
of {Y (t), t ∈ T } if the outcomes at k + 1 time points are of interest to understand
the underlying features of the Markov chain. A time-homogeneous Markov chain is
characterized by
P(Yt = yt |Yt−1 = yt−1 ) = P(Y1 = y1 |Y0 = y0 ) (11.1)

11.2 Markov Chain 199
for all t and yt . The joint cumulative distribution function of Y (t1 ) and Y (t2 ) is
F(y1 , y2 ; t1 , t2 ) = P(Y (t1 ) ≤ y1 , Y (t2 ) ≤ y2 ). (11.2)
We can define Markov process if it satisfies the following property for continuous
time
P(Y (t + h) = j|Y (s) = i, Y (h) = i ) = P(Y (t + s) = j|Y (s) = i ) (11.3)
where i, j, i = 0, 1, . . . , k; t > 0; i ≤ i < s.

The transition probabilities of the Markov process are stationary if the probabilities
satisfy
P(Y (t + s) = j|Y (s) = i) = P(Y (t) = j|Y (0) = i ) (11.4)
for all t, s > 0 and for all i, j.

For a finite countable set, a Markov chain is a Markov process for an index set of
time, where t = 0, 1, . . . , k formally defined by
P(Yk = yk |Y0 = y0 , . . . , Yk−1 = yk−1 )
for time points t = 0, 1, . . . , k and for states y0 , . . . , yk . A one-step transition prob-

ability can be defined as
P(Yk = yk |Y0 = y0 , . . . , Yk−1 = yk−1 ) = P(Yk = yk |Yk−1 = yk−1 ).
We can denote the transition probability as follows
Pik−1,k
j = P(Yk = j|Yk−1 = i ). (11.5)
The Markov property states that the future status of a Markov chain or a Markov
process depends only on the current status irrespective of the past behavior. This
implies that additional information will not change the one-step transition probability
or Markov chain of first order.
The transition probability for two time points t = 0, 1, can be defined as
01 01

P00 P01
P= 01 01 . (11.6)
P10 P11
A more simple way of displaying the above one-step transition probability matrix
is

P00 P01
P= . (11.7)
P10 P11
In this transition probability matrix, the Markov chain probabilities are indepen-
dent of time which is called stationary transition probabilities. A stationary transition
probability is a stochastic process with the following characteristic
P(Yt = j|Yt−1 = i ) = P(Y1 = j|Y0 = i )
for all t and all states i, j. Hence, as the transition probabilities do not depend on
time, we can write
P(Yk = j|Yk−1 = i ) = Pik−1,k

j = Pi j . (11.8)
This can be generalized for k + 1 states as shown below for two consecutive time
points
⎛ ⎞
P00 ... P0k
⎜ P10 ... P1k ⎟
⎜ ⎟
P=⎜ . .. .. ⎟. (11.9)
⎝ .. . . ⎠
Pk0 . . . Pkk
The transition probabilities of a transition probability matrix are defined by P =

Pi j where the transition probabilities satisfy the conditions
∞

Pi j ≥ 0, i, j = 0, 1, 2, . . . ; and Pi j = 1, i = 0, 1, 2, . . .
j=0
Example 11.1 Suthaharan (2004) used the Markov model for solving the congestion
control for Transmission Control Protocol (TCP). For categorizing queue size, let
us denote th_min be the minimum threshold used for queue management, th_max
be the maximum threshold used for queue management, and buffer size be the total
number of package in the queue. Let us consider three states at time t, based on
average queue size:
State 0: average queue size in the interval [0, th_min],
State 1: average queue size is in between (th_min, th_max),
State 2: average queue size in the interval [th_max, buffer size].
Let the successive observations of average queue size be denoted by
X 0 , X 1 , . . . , X i , . . . where X i is a random variable. Then, we can define:

p j (t) = P X j = j ,

p jk (1) = P X t+1 = k|X t = j
for any t ≥ 0 where j,k = 0, 1, 2.
Then, the transition probabilities of the Markov chain with state space 0, 1, 2 is:
⎛ ⎞
P00 P01 P02
P(1) = ⎝ P10 P11 P12 ⎠.
P21 P22 P22
For example,
⎛ ⎞
0.90 0.05 0.05
P(1) = ⎝ 0.05 0.90 0.05 ⎠.
0.05 0.05 0.90
The goal is to keep the average queue size in the middle range between the
thresholds th_min and th_max. The transition probability matrix shows the one-step
transition probabilities during the interval [ti−1 , ti ], i = 0, 1, . . .
It appears clearly from a transition probability matrix that the transition probabil-
ities are conditional probabilities where the given values are assumed to be known.
The joint probability of Y0 = y0 , . . . , Yk = yk can be expressed for first order as
P(Y0 = y0 , Y1 = y1 , . . . , Yk = yk )
= P(Y0 = y0 ) × P(Y1 = y1 |Y0 = y0 ) × · · · × P(Yk = yk−1 ).
The initial probability of P(Y0 = y0 ) is required for the full specification of the
joint probability.
We can write a two-step transition probability of a Markov chain for k + 1 states
as
Pi(2)
j = P(Ym+2 = j|Ym = i ). (11.10)
If we consider an intermediate state Ym+1 = s at time m + 1 then the two-step

transition probability can be rewritten as

k
Pi(2)
j = P(Ym+2 = j, Ym+1 = s|Ym = i ).
s=0
Equivalently, this can be expressed as

k
Pi(2)
j = P(Ym+2 = j|Ym+1 = s , Ym = i)(Ym+1 = s|Ym = i )
s=0

k
= P(Ym+2 = j|Ym+1 = s )(Ym+1 = s|Ym = i )
s=0

k
= Pis Ps j
s=0
The n-step transition probability of a Markov chain can be defined as
Pi(n)
j = P(Yn+m = j|Ym = i ) (11.11)
which shows the probability of making a transition from state i to state j in n steps.
If we assume that the Markov chain is homogeneous with respect to time, then the
above probability is invariant of time. Then the n-step transition probabilities satisfy
the following relationship

k
Pi(n)
j = Pis(l) Ps(n−l)
j ,
s=0
Pi(0)
j = 1 if i = j, 0 otherwise.
The joint probability can be shown as
P (n) = P × P × · · · × P = P n ,
where P (n) is the matrix of n-step transition probabilities.

The above property leads to the Chapman–Kolmogorov equation for obtaining
P (n) . The Chapman–Kolmogorov equation for making a transition from i to s in n
steps and from s to j in m steps can be shown as
∞

Pi(n+m)
j = Pis(n) Ps(m)
j . (11.12)
s=0
Let us denote the probability of first return as
Pi(n) = P(Yn = i|Y0 = i ) (11.13)
and the mean time of recurrence to state i can be shown as

∞

μi = n Pi(n) .
n=1
The limiting probability of returning to state i is
1
lim Pii(n) = .
n→∞ μi
We can also show that if πi denotes the stationary probability then

∞

lim P j(n)
j = πj = πi Pi j (11.14)
n→∞
i=0
which is a limiting probability.

For large n, we can find a steady state probability. Let us denote the probability
of being in state i at time n as follows
pi(n) = P(Yn = i).
The probability of being in state i at time j can be shown as

k
k
pi(1) = P(Y1 = i) = P(Y1 = i|Y0 = s)P(Y0 = s) = Psi ps(0)
s=0 s=0
which can be rewritten in the vector-matrix form
p (1) = p (0) P
and a further generalization shows that
p (n) = p (0) P n .
We obtain the vector of steady state probabilities for n → ∞
p (n) → π. (11.15)
11.3 Higher-Order Markov Chains
The first-order Markov chain depends on the current state to find the transition to a
future state. We can extend this for a second and higher order. Let us consider three
time points T = 0, 1, 2 and the corresponding states are Y0 , Y1 , Y2 , respectively. Then
a second-order Markov chain can be defined as
P(Yk = yk |Y0 = y0 , . . . , Yk−1 = yk−1 ) = P(Yk = yk |Yk−1 = yk−1 , Yk−2 = yk−2 )

(11.16)
where the process depends only on the outcomes at times k − 1 and k − 2 and
outcomes at times 0, 1, …, k − 3 are ignored and assumed to have no contribution
on the future outcomes. In the above expression, the Markov property is satisfied
if we redefine the outcomes by partitioning as follows: Yk |Yk−1 , Yk−2 which can be
viewed as Z k |Z k−1 . In other words, Z k−1 = (Yk−1 , Yk−2 ) and Z k = (Yk ) satisfies the
Markov property by shifting the time to k − 2 instead of defining for k − 1. Thus,
the higher dependence can be viewed as first-order dependence without making the
theory complex.
The second-order transition probabilities for time points t = (0, 1, 2) are displayed
below:
Y0 Y1 Y2
0 1
⎡ ⎤
0 0 P000 P001
(11.17)
0 1 ⎢
⎢ P010 P011 ⎥
⎥
1 0 ⎣ P100 P101 ⎦
1 1 P110 P111
The third-order Markov chain can be shown as
P(Yk = yk |Y0 = y0 , . . . , Yk−1 = yk−1 )

= P(Yk = yk |Yk−1 = yk−1 , Yk−2 = yk−2 , Yk−3 = yk−3 ).
In this case, we have considered the outcomes as Z k |Z k−1 where Z k−1 =

(Yk−1 , Yk−2 , Yk−3 ) and Z k = (Yk ) where the process depends only on the outcomes
at times k − 1, k − 2, and k − 3. The third-order transition probabilities for time
points t = (0, 1, 2, 3) are displayed below:
Y0 Y1 Y2 Y3
0 1
⎡ ⎤
0 0 0 P0000 P0001
0 0 1 ⎢ P0010 P0011 ⎥
⎢ ⎥
⎢P ⎥
0 1 0 ⎢ 0100 P0101 ⎥
⎢ ⎥ (11.18)
0 1 1 ⎢ P0110 P0111 ⎥
⎢ ⎥
1 0 0 ⎢ P1000 P1001 ⎥
⎢ ⎥
1 0 1 ⎢ P1010 P1011 ⎥
⎢ ⎥
1 1 0 ⎣ P1100 P1101 ⎦
1 1 1 P110 P1111
In the case of the transition probability matrix of order 3, we have 24 probabilities

for 23 given values of Y1 , Y2 and Y3 and similarly, the transition probability matrix
of order r with 2r +1 probabilities can also be displayed.
Statistical Inference
Let us consider the following (k + 1)-state Markov chain transition probability
matrix
⎛ ⎞
P00 ... P0k
⎜ P10 ... P1k ⎟
⎜ ⎟
P = ⎜. .. .. ⎟. (11.19)
⎝ .. . . ⎠
Pk0 . . . Pkk
11.3 Higher-Order Markov Chains 205
The likelihood function can be shown as follows (Islam et al. 2012)

k
ni !
L= P ni0 P ni1 . . . Piknik (11.20)
i=0
n i0 !n i1 ! . . . n ik ! i0 i1
where n i j denotes the number of transitions from ith to jth state at consecutive time
points and n i denotes the total number in state i, i = 0, 1, …, k. The estimates obtained
using the maximum likelihood method are
ni j
P̂i j = , i, j = 0, . . . , k.
ni
To test for the null hypothesis H0 : Pi j = Pi0j , i, j = 0, 1, . . . , k, we can use the

following test statistic

k
k
n i (Pi j − Pi0j )2
χ2 = . (11.21)
i=0 j=0
Pi0j
This is χ 2 with k(k + 1) degrees of freedom under H0 .
11.4 First-Order Markov Model with Covariate

Dependence
The transition probabilities discussed in the previous sections can be further general-
ized by introducing covariate dependence. For covariate dependence, let us consider
a two-state Markov model as shown below
Y1 Y2
0 1

0 π00 (x) π01 (x)
(11.22)
1 π10 (x) π11 (x)
where X = (X 1 , . . . , X p ) is a vector of covariates. The transition probabilities

are considered as function of covariates satisfying π00 (x) + π01 (x) = 1 and
π10 (x) + π11 (x) = 1. As we have shown that the transition probabilities are condi-
tional probabilities where the probabilities of Y2 are shown for given values of Y1 .
Both Y1 and Y2 are binary variables. For binary outcome variable Y2 , the conditional
probabilities can be defined as function of covariates as follows
exβ01
P(Y2 = 1|Y1 = 0, x ) = , (11.23)
1 + exβ01
and
exβ11
P(Y2 = 1|Y1 = 1, x ) = , (11.24)
1 + exβ11

where β01 = β010 , . . . , β01 p , β11 = β110 , . . . , β11 p , and x = 1, x1 , . . . , x p .
It can be shown that
1
P(Y2 = 0|Y1 = 0, x ) = ,
1 + exβ01
and
1
P(Y2 = 0|Y1 = 1, x ) = .
1 + exβ11
The likelihood function is

ni

1
1

δ
L= πi j (xl ) i jl (11.25)
i=0 j=0 l=1
where δi jl = 1 if a transition from i to j occurs for item l, δi jl = 0 otherwise. It can be

n i n i 1
shown that l=1 δi jl = n i j , 1j=0 l=1 δi jl = 1j=0 n i j = n i , and i=0 n i = n. Let
xl βi1
us denote πi j (xl ) = πi jl then πi1l = 1+e
e
xl βi1 , i = 0, 1; and πi0l = 1+exl βi1 , i = 0, 1.
1
The likelihood function for the sample of size n displayed above essentially rep-
resents the multiplication of two separate likelihoods for conditional probabilities
based on given values of Y1 . Let us denote the likelihood from state Y1 = 0 as L 0
and the likelihood from state Y1 = 1 as L 1 . Then the log likelihood is
ln L = ln L 0 + ln L 1 .
The log likelihood functions are

n0

ln L 0 = δ01 xl β01 − ln 1 + exl β01 (11.26)
l=1
and

n1

ln L 1 = δ11 xl β01 − ln 1 + exl β11 . (11.27)
l=1
Differentiating with respect to the parameters and solving the following equations
we obtain the likelihood estimates for 2(p + 1) parameters:
11.4 First-Order Markov Model with Covariate Dependence 207

∂ ln L 0 n0
xlq exl β01
= δ01 xlq − = 0, q = 0, 1, . . . , p (11.28)
∂β01q l=1
1 + exl β01
and

∂ ln L 1 n1
xlq exl β11
= δ11 xlq − = 0, q = 0, 1, . . . , p. (11.29)
∂β11q l=1
1 + exl β11
The second derivatives are:
∂ 2 ln L 0 n0
=− xlq xlq π00 (xl )π01 (xl ), (11.30)
∂β01q ∂β01q l=1
∂ 2 ln L 1 n1
=− xlq xlq π10 (xl )π11 (xl ). (11.31)
∂β11q ∂β11q l=1
The information matrix, I01 (β), contains elements

δ 2 ln L 0
I01qq = − , q, q = 0, 1, . . . , p (11.32)
δβ01q δβ01q
for the model of transition type 0–1 and

δ 2 ln L 1
I11qq = − (11.33)
δβ11q δβ11q
for transition type 1–1. The approximate variance–covariance matrix is

−1
I 0
V = 01 . (11.34)
0 I11
Testing for the Model and Parameters
To test for the parameters H01 : β011 = . . . = β01 p = 0 and H02 : β111 = . . . =
β11 p = 0, we can use the likelihood ratio tests as shown below:
−2[ln L(β010 ) − ln L(β01 )] ≈ χ p2 (11.35)
−2[ln L(β110 ) − ln L(β11 )] ≈ χ p2 (11.36)
for transition types 0–1 and 1–1, respectively.

We can also test the significance of the parameters βi1q , i = 0, 1; q = 1, 2, . . . , p
where the null hypothesis is H0 : βi1q = 0 using the familiar Wald test as shown below
β̂i1q
W = , (11.37)
s ê β̂i1q
where W tends to standard normal asymptotically.
11.5 Second-Order Markov Model with Covariate

Dependence
The second-order transition probability matrix is

⎡ ⎤
π000 (x) π001 (x)
⎢ π010 (x) π011 (x) ⎥
⎢ ⎥ (11.38)
⎣ π100 (x) π101 (x) ⎦
π110 (x) π111 (x)
where X = (X 1 , . . . , X p ) is a vector of covariates. The row sums of the transition

probability matrix are 1, similar to the first-order probabilities and it can be shown
that: π000 (x) + π001 (x) = 1, π010 (x) + π011 (x) = 1, π100 (x) + π101 (x) = 1, and
π110 (x) + π111 (x) = 1. All these probabilities are conditional probabilities of Y3 for
given values of Y1 and Y2 .
For binary outcome variables Y1 , Y2 and Y3 , the conditional probabilities are
shown below
exβ001
P(Y3 = 1|Y1 = 0, Y2 = 0, x ) = ,
1 + exβ001
exβ011
P(Y3 = 1|Y1 = 0, Y2 = 1, x ) = ,
1 + exβ011
exβ101
P(Y3 = 1|Y1 = 1, Y2 = 0, x ) = ,
1 + exβ101
exβ111
P(Y3 = 1|Y1 = 1, Y2 = 1, x ) = ,
1 + exβ111

where
β001 = β0010 ,. . . , β001 p , β011 = β0110 , . . . , β011p , β101 =
β1010 , . . . , β101 p , β111 = β1110 , . . . , β111 p and x = 1, x1 , . . . , x p .
We can now construct four likelihood functions for given values of Y1 =
0, 1 and Y2 = 0, 1 as shown below
ni

1

δi jkl
Li j = πi jk (xl ) (11.39)
k=0 l=1
11.5 Second-Order Markov Model with Covariate Dependence 209
where δi jkl = 1 if a transition of type i-j-k occurs for item l, δi jkl = 0 otherwise.
n i n i
It can be shown that l=1 δi jkl = n i jk , 1k=0 l=1 δi jkl = 1k=0 n i jk = n i j , and
1 1 exl βi j1
i=0 j=0 n i j = n. Let us denote πi jk (xl ) = πi jkl ; then, πi j1l = 1+exl βi j1 , i, j = 0, 1
and πi j0l = 1+e1xl βi j1 , i, j = 0, 1.
The log likelihood functions are

n0

ln L i j = δi j1l xl βi j1 − ln 1 + exl βi j1 , i, j = 0, 1. (11.40)
l=1
There are four different likelihoods L 00 , L 01 , L 10 and L 11 for four models based
on given values of Y1 and Y2 . The estimating equations are shown below
ni j
∂ ln L i j xlq exl βi j1
= δi j1l xlq − = 0, i, j = 0, 1; q = 0, 1, . . . , p
∂βi jq l=1
1 + exl βi j1
(11.41)
and the second derivatives are:
∂ 2 ln L i j ni j
=− xlq xlq πi j0l πi j1l ; i, j = 0, 1, q, q = 0, 1, . . . , p.
∂βi j1q ∂βi j1q l=1
(11.42)
The information matrix, Ii j (β), contains elements

δ 2 ln L i j
Ii jqq = − , q, q = 0, 1, . . . , p (11.43)
δβi j1q δβi j1q
for the model of transition type i-j-1 and the approximate variance–covariance matrix
is
⎡ ⎤−1
I001 0 0 0
⎢0 I011 0 0 ⎥
V ⎢
⎣0
⎥ . (11.44)
0 I101 0 ⎦
0 0 0 I111
Testing for the models and Parameters
To test for the parameters H01 : β0011 = · · · = β001 p = 0, H02 : β0111 = · · · =

β011 p = 0, H03 : β1011 = · · · = β101 p = 0, and H04 : β1111 = · · · = β111 p = 0, the
likelihood ratio tests can be performed for each null hypothesis as shown below:

−2 ln L βi j10 − ln L βi j1 ≈ χ p2 ; i, j = 0, 1 (11.45)
for transition type i-j-1. The Wald test is performed for testing the null hypothesis
H0 : βi j1q = 0
β̂i j1q
W = , (11.46)
s ê β̂i j1q
where W tends to standard normal asymptotically.
11.6 Markov Model for Polytomous Outcome Data
In previous sections, two-state Markov models are shown for covariate dependence of
transition probabilities. In many instances, the number of outcomes is more than two
and we need a further generalization. In this section, a covariate dependent multistate
Markov model is discussed.
Let us consider m state transition probability matrix as follows
⎡ ⎤
π00 ... π0m−1
⎢ π10 ... π1m−1 ⎥
⎢ ⎥
π = ⎢. .. .. ⎥, (11.47)
⎣ .. . . ⎦
πm−10 . . . πm−1m−1
!
where πus = P(Y j = s !Y j−1 = u ) and m−1 s=0 πus = 1, u = 0, 1, . . . , m − 1.
Let X i = 1, X i1, · · · , X i p denote the
vector of covariates for the ith item where
X i0 = 1, and βus = βus0 , βus1 , . . . , βusp is the vector of parameters corresponding
to the covariates for the transition from u to s. The transition probabilities are defined
as follows:
! egus (X )
πus (Y j = s !Y j−1 = u, X ) = m−1 , u, s = 0, 1, . . . , m − 1, (11.48)
guk (X )
k=0 e
where
"
0, if s= 0
gus (X ) = πus (Y j =s |Y j−1 =u,X )
ln π Y =0|Y =u,X , if s= 1, . . . , m − 1
us ( j j−1 )
and gus (X ) = βus0 + βus1 X 1 + · · · + βusp X p , gu0 (X ) = 0. There are m − 1 models

for any u = 0, 1, …, m − 1 because 1s=0 πus (X i ) = 1, u = 0, 1, . . . , m − 1. Hence,
the total number of models for m states is m(m − 1).
The likelihood function for transitions from u to s with a given number of transi-
tions n u , u = 0, 1, . . . , m − 1 is
11.6 Markov Model for Polytomous Outcome Data 211

n us m−1
Lu = [πus (X i )]δusi , u, s = 0, 1, . . . , m − 1, (11.49)
i=1 s=0
usi = 1 if a transition
where δ type u − s is observed
m−1 for the ith item and 0 oth-
nu
erwise; i=1 δusi = n us , m−1 n
s=0 us = n u and u=0 n u = n. The estimates of the
parameters can be obtained using the overall likelihood shown below

m−1
L= Lu.
u=0
The likelihood function for given u for transitions from u is

⎧ ⎡ ⎤δusi ⎫
⎪
⎪ ⎪
⎪
n ⎪⎨m−1
⎢ βus0 +βus1 X 1i +···+βusp X pi ⎥ ⎪
⎬
⎢ e ⎥
Lu = ⎢ ⎥
⎪ ⎣ β +β X +···+β X ⎦ ⎪
m−1
i=1 ⎪
⎪ s=1 1 + ⎪
⎪
⎩ e us0 us1 1i usp pi
⎭
s=1
⎡ ⎤δu0i
⎢ ⎥
⎢ 1 ⎥
×⎢ ⎥ (11.50)
⎣
m−1 ⎦
1+ eβus0 +βus1 X 1i +···+βusp X pi
s=1
and the log likelihood can be shown as shown below

"m−1

n
ln L = δusi βus0 + βus1 X 1i + · · · + βusp X pi
i=1 s=1
* +,

m−1
− ln 1 + eβus0 +βus1 X 1i +···+βusp X pi
s=1
* +

n
m−1
− δu0i ln 1 + eβus0 +βus1 X 1i +···+βusp X pi . (11.51)
i=1 s=1

It may be noted that m−1
s=0 δusi = 1.
The estimating equations for transition type u to s are
⎧ ⎫
⎪
⎪ ⎪
⎪
n ⎪
⎨ ⎪
⎬
∂ ln L eβus0 +βus1 X 1i +···+βusp X pi
= X ki δusi − m−1 = 0, (11.52)
∂βusk ⎪
⎪ β +β X +···+β X ⎪ ⎪
i=1 ⎪
⎩ 1+ e us0 us1 1i usp pi ⎪
⎭
s=1
which can be rewritten as

∂ ln L n
= X ki {δusi − πus (X i )} = 0. (11.53)
∂ βusk i=1
∂ 2 ln L n
=− X ki X k i πus (X i )[1 − πus (X i )] = −Ius β̂usk , β̂usk ,
∂βusk ∂βusk i=1
∂ 2 ln L n
= X ki X k i πus (X i )πu s (X i ) = −Ius,us β̂usk , β̂us k , (11.54)
∂βusk ∂βus k i=1
where u = 0, 1, . . . , m − 1; s = 1, 2, . . . , m − 1; k, k = 0, 1, . . . , p.
The variance–covariance matrix of the estimators is
−1
Var β̂ = I β̂ , (11.55)
where
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
β̂us0 β̂u1 β̂0
⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟
β̂us = ⎝. ⎠, β̂u = ⎝ . ⎠, β̂ = ⎝ . ⎠,
β̂usp β̂u,m−1 β̂m−1

I β̂ is a m(m − 1)( p + 1) × m(m − 1)( p + 1) matrix.
Testing for the Models and Parameters

In a Markov model with m states, there are m × (m − 1) models and each model
contains (p + 1) parameters. Then the null hypothesis is H0 : β∗ = 0 where β∗
contains all the parameters of each model except βus0 , u = 0, 1, · · · , m − 1; s =
1, 2, · · · , m − 1. We can use the likelihood ratio test as follows
−2[ln L(β0 ) − ln L(β)] ≈ χm(m−1)

2
p. (11.56)
To test the significance of the kth parameter for transition type u to s, the null
hypothesis is H0 : βusk = 0 and the corresponding Wald test is
β̂usk
W = , (11.57)
s ê β̂usk
where W is asymptotically N(0, 1). Further details on basic theories, methods, and
applications of stochastic models can be found in Bailey (1964), Cox and Miller
(1965), Bhat (1971), Hoel et al. (1972), Taylor and Karlin (1984), Lawler (2006),
Ross (2007), and Islam et al. (2009, 2012).
11.7 Application 213
11.7 Application
The Health and Retirement Study (HRS) data are applied to illustrate a Markov model
with three states (Islam et al. 2012). The Health and Retirement Study is conducted by
the University of Michigan nationwide on individuals over age 50 and their spouses.
The panel data used in this application are collected in every two years since 1992.
We have considered data from the first six rounds in this application. The outcome
variable is self-reported emotional health status (perceived emotional health) among
the elderly people in the USA. The perceived health statuses considered in this
application are: State 1: Poor, State 2: Fair/Good, and State 3: Very Good/Excellent.
The number of respondents in 1992 was 9772 elderly people and respondents in
subsequent follow-ups are 8039 in 1994, 7823 in 1996, 7319 in 1998, 6824 in 2000,
and 6564 in 2002. The selected explanatory variables of the models are: gender (male
= 1, female = 0), marital status (unmarried = 0, married = 1), vigorous physical
activity (3 or more days per week) (yes = 1, no = 0), ever drank any alcohol (yes =
1, no = 0), ever smoked (yes = 1, no = 0), felt depressed during the past week (yes
= 1, no = 0), felt lonely during the past week (yes = 1, no = 0), race (white = 1,
else 0; black = 1, else 0; others = reference category), and age (less than or equal to
60 years = 0 and more than 60 years = 1).
The transition counts and transition probabilities are displayed in Table 11.1. All
the transitions made by the respondents during 1992–2002 are considered and a first-
order model is assumed. It appears from Table 11.1 that about 57% remained in the
poor state of perceived emotional health status starting from poor but a considerable
percentage (40%) made transition from poor to good/fair and a small percentage
(3%) made the transition from poor to very good/excellent status. From the fair/good
perceived emotional health status, 7% reported a move to worse (poor health status),
72% reported the same status, and 21% reported improved status in subsequent
follow-up (very good/excellent). It is also important to note that nearly 1% reported
transition from very good/excellent to poor perceived emotional health status, while
25% reported transition to good/fair and remaining 74 % remained in the same status
of perceived health status (very good/excellent).
The estimates of parameters, standard errors, W-values, p-values, and confidence
intervals for the models are shown in Table 11.2. The number of models is m(m −
1) = 3 × 2 = 6. Each model contains 11 parameters. The first-order Markov
models considered in this application for the following transition types: (i) poor →
fair/good, (ii) poor → very good/excellent, (iii) fair/good → poor, (iv) fair/good →
very good/excellent, (v) very good/excellent → poor, and (vi) very good/excellent
→ fair/good.
In case of the model for the transition type Poor → Fair/Good, we observe: (i)
positive association with physical activity and drinking alcohol and (ii) negative
association with feeling depressed. The transition type Poor → Very Good/excellent
appears to have a positive association with physical activity and elderly black. The
transition type Fair/Good → Poor shows negative association with marital status,
physical activity, drinking alcohol, whites and blacks as compared to Asians or other
214
Table 11.1 Transition count and transition probability matrix

Perceived health Transition count Transition probability
Poor (0) Fair/good (1) Very good/excellent (2) Poor (0) Fair/good (1) Very good/excellent (2) Total
Poor (0) 1463 1002 74 0.576 0.395 0.029 2539
Good/fair (1) 1131 11,862 3465 0.069 0.721 0.211 16,458
Excellent/very good (2) 149 4364 13,059 0.008 0.248 0.743 17,572
11 Stochastic Models
Table 11.2 Estimates of three-state Markov model for perceived emotional health
Variables Coeff. Std. err. W-value p-value 95% C.I.
LL UL
Transition type poor → fair/good
Constant −0.520 0.189 −2.76 0.006 −0.890 −0.151
Gender −0.141 0.092 −1.53 0.127 −0.322 0.040
Marital status 0.157 0.090 1.74 0.082 −0.020 0.333
Physical activity 0.326 0.115 2.84 0.005 0.101 0.551
Drink 0.288 0.095 3.03 0.002 0.102 0.474
Smoke 0.000 0.094 0.000 0.999 −0.184 0.184
Felt depression −0.310 0.098 −3.18 0.001 −0.502 −0.119
Felt lonely −0.080 0.103 −0.77 0.439 −0.282 0.122
White 0.264 0.171 1.54 0.123 −0.071 0.599
Black 0.221 0.182 1.21 0.225 −0.136 0.578
Age −0.031 0.087 −0.35 0.724 −0.202 0.140
Transition type poor → very good/excellent
Constant −4.674 1.054 −4.44 0.000 −6.739 −2.609
Gender 0.212 0.267 0.80 0.426 −0.310 0.735
Marital status 0.264 0.269 0.98 0.327 −0.264 0.793
Drink 0.307 0.274 1.12 0.263 −0.231 0.845
Smoke 0.143 0.288 0.50 0.618 −0.420 0.707
Felt depression −0.380 0.280 −1.36 0.175 −0.928 0.168
Felt lonely −0.145 0.300 −0.48 0.628 −0.733 0.443
White 1.666 1.023 1.63 0.103 −0.338 3.670
Black 1.992 1.031 1.93 0.053 −0.030 4.014
Age 0.266 0.245 1.08 0.278 −0.215 0.747
LL UL
Transition type fair/good → poor
Constant −1.602 0.147 −10.87 0.000 −1.891 −1.313
Gender 0.006 0.070 0.08 0.933 −0.130 0.142
Marital status −0.155 0.070 −2.22 0.027 −0.292 −0.018
Physical activity −0.293 0.075 −3.92 0.000 −0.440 −0.147
Drink −0.375 0.067 −5.60 0.000 −0.507 −0.244
Smoke 0.308 0.071 4.33 0.000 0.169 0.448
Felt depression 0.586 0.081 7.26 0.000 0.427 0.744
Felt lonely 0.262 0.086 3.02 0.002 0.092 0.431
(continued)

LL UL
Transition type fair/good → poor
White −0.502 0.132 −3.81 0.000 −0.761 −0.244
Black −0.439 0.142 −3.10 0.002 −0.717 −0.161
Age −0.230 0.067 −3.45 0.001 −0.360 −0.099
Transition type fair/good → very good/excellent
Constant −1.362 0.117 −11.62 0.000 −1.592 −1.132
Gender 0.008 0.042 0.19 0.847 −0.074 0.090
Marital status 0.183 0.047 3.85 0.000 0.090 0.276
Drink 0.150 0.040 3.75 0.000 0.072 0.229
Smoke −0.161 0.042 −3.87 0.000 −0.243 −0.080
Felt depression −0.221 0.065 −3.38 0.001 −0.349 −0.093
Felt lonely −0.176 0.069 −2.56 0.010 −0.311 −0.041
White 0.185 0.109 1.69 0.091 −0.029 0.399
Black −0.030 0.117 −0.25 0.800 −0.258 0.199
Age −0.106 0.041 −2.61 0.009 −0.186 −0.026
Transition type very good/excellent → poor
Constant −3.603 0.468 −7.70 0.000 −4.520 −2.685
Gender 0.225 0.174 1.29 0.197 −0.117 0.567
Marital status −0.700 0.186 −3.76 0.000 −1.065 −0.335
Physical activity −0.262 0.171 −1.54 0.124 −0.597 0.072
Drink −0.847 0.170 −4.98 0.000 −1.181 −0.514
Smoke 0.958 0.197 4.86 0.000 0.572 1.345
Felt depression 1.149 0.260 4.42 0.000 0.639 1.659
Felt lonely 0.355 0.273 1.30 0.194 −0.180 0.889
White −0.463 0.429 −1.08 0.280 −1.303 0.378
Black 0.450 0.457 0.98 0.325 −0.446 1.346
Age 0.004 0.178 0.03 0.980 −0.345 0.354
LL UL
Transition type very good/excellent → fair/good
Constant −0.786 0.114 −6.89 0.000 −1.010 −0.562
Gender 0.126 0.037 3.39 0.001 0.053 0.199
(continued)

LL UL
Transition type very good/excellent → fair/good
Marital status −0.161 0.044 −3.66 0.000 −0.247 −0.075
Physical activity −0.231 0.036 −6.33 0.000 −0.302 −0.159
Drink −0.419 0.037 −11.35 0.000 −0.492 −0.347
Smoke 0.223 0.037 5.96 0.000 0.150 0.297
Felt depression 0.468 0.075 6.24 0.000 0.321 0.614
Felt lonely 0.146 0.073 2.00 0.046 0.003 0.290
White −0.183 0.107 −1.71 0.087 −0.392 0.026
Black 0.462 0.116 3.99 0.000 0.235 0.689
Age 0.172 0.037 4.60 0.000 0.099 0.246
Model chi-square (p-value) 12382.1644 (0.000)
LRT 17039.4682 (0.000)
races and positive association with smoking, feeling depressed, and feeling lonely.
On the other hand, improvement to very good/excellent is associated with mari-
tal status (p < 0.10), physical activity, drinking alcohol, and negatively associated
with age, smoking, feeling depressed, and feeling lonely. The transition from very
good/excellent status of perceived emotion health to poor status shows positive asso-
ciation with smoking and feeling depressed and negative association with marital
status, physical activity, and drinking alcohol. It is also seen that transition type
Very Good/Excellent to Good/Fair status of perceived emotional health is positively
associated with gender, smoking, feeling depressed, feeling lonely, blacks compared
to Asians, and other groups but negatively associated with marital status, physical
activity, and drinking alcohol.
References
Bailey NTJ (1964) The elements of stochastic processes: with applications to the natural sciences.
Wiley, New York
Bhat UN (1971) Elements of applied stochastic processes. Wiley, New York
Cox DR, Miller HD (1965) The theory of stochastic processes. Methuen & Co Ltd, London
Hoel PG, Port SC, Stone CJ (1972) Introduction to stochastic processes. Houghton-Mifflin, Bostons
Islam MA, Chowdhury RI, Huda S (2009) Markov models with covariate dependence for repeated
measures. Nova Science, New York
Islam MA, Chowdhury RI, Singh KP (2012) A Markov model for analyzing polytomous outcome
data. Pakistan J Stat Oper Ress 8:593–603
Lawler GF (2006) Introduction to stochastic processes, 2nd edn. Chapman and Hall/CRC, Boca
Ratons
Okechukwu OM, Nwaoha TC, Garrick O (2016) Application of Markov theoretical model in pre-
dicting risk severity and exposure levels of workers in the oil and gas sector. Int J Mech Eng Appl
4:103–108
Possan E, Andrade JJO (2014) Markov chains and reliability analysis for reinforced concrete struc-
ture service life. Mater Ress 17:593–602
Rafiq Y (2015) Online Markov chain learning for quality of service engineering in adaptive computer
systems. Ph.D. dissertation, Computer Science, University of York
Ross SM (2007) Introduction to probability models, 9th edn. Academic Press, New York
Suthaharan S (2004) Markov model based congestion control for TCP. In: 37th annual simulation
symposium, Hyatt Regency Crytal City, Arlington, VA, 18–22 Apr 2004
Taylor HM, Karlin S (1984) An introduction to stochastic modeling. Academic Press, Orlando
Whittaker JW, Thomason MG (1994) A Markov chain model for statistical software testing. IEEE
Trans Softw Eng 20(10):812–824
Chapter 12
Analysis of Big Data Using GLM
Abstract The application of the generalized linear models to big data is discussed
in this chapter using the divide and recombine (D&R) framework. In this chapter, the
exponential family of distributions for binary, count, normal, and multinomial out-
come variables and the corresponding sufficient statistics for parameters are shown
to have great potential in analyzing big data where traditional statistical methods
cannot be used for the entire data set.
12.1 Introduction
During the past decade, we observed a very rapid demand for tools in analyzing big
data in every field including reliability and survival analysis. According to Gartner
IT Glossary, big data is defined as high-volume, high-velocity and high-variety infor-
mation assets that demand cost-effective, innovative forms of information process-
ing for enhanced insight and decision making.1 The definition of big data in terms
of three Vs includes: volume: data arising from transactions, social media, medi-
cal problems, Internet, usage of Facebook, Twitter, Google and YouTube, genome
sequences, CCTV, mobile financing, sales in supermarkets, online business, etc.;
velocity: in recent times, we observed an unprecedented speed in generating data
that need to be dealt with timely, or more specifically, very quickly; variety: data are
generated from various structured or unstructured formats such as text, email, audio,
video, financial transactions, etc. There are two additional dimensions being consid-
ered by many: variability and complexity. It may be noted that big data analysis is
useful for cost reduction, time reduction, new product developments and developing
new strategies and optimum decision making.
Although big data is not a new phenomenon, the volume, velocity, and variety
of big data arising from increasingly developed new frontiers of generating data in
various fields including industry, medical, and biological sciences, human behavior,
socio-economic patterns, etc., pose difficult challenges to statisticians and computer
scientists. The size of data is so big, in terms of number of cases or number of variables
or both, that we need to address new challenges by developing new techniques.
1 https://www.gartner.com/it-glossary/big-data.

220 12 Analysis of Big Data Using GLM
The development of new techniques can be categorized into two broad classes: (i)
statistical learning and (ii) machine learning. The important techniques include the
following: (i) regression models and subset selection, (ii) classification and tree-
based methods, (iii) resampling and shrinkage methods, (iv) dimension reduction,
and (v) support vector machines and unsupervised learning.
Major steps in big data analysis are: (i) acquisition of data: needs to be filtered
using data reduction techniques taking into account heterogeneous scales of mea-
surement and dependence in measurements, (ii) extraction/cleaning of structured
and unstructured data, (iii) integration/aggregation/representation of data by inte-
grating data from heterogeneous sources using suitable database designs, (iv) anal-
ysis/modeling of data stemmed from heterogeneous sources that require developing
appropriate or suitable statistical techniques for big data, (v) big data are noisy,
dynamic, heterogeneous, interrelated, untrustworthy which make the work more
challenging for prediction purposes, (vi) developing appropriate visualization tech-
niques, and (vii) interpretation of big data for addressing specific targets.
Donoho (2015, 2017) suggested six divisions of data science which are: (i) data
gathering, preparation, and exploration, (ii) data representation and transformation,
(iii) computing with data, (iv) data modeling, (v) data visualization and presentation,
and (vi) science about data science. It is noteworthy that the fast emergence of the
utility of big data from both private and public sectors in various fields of applications
including reliability and survival analysis has created a new research paradigm (Einav
and Levin 2014). This new paradigm involves statistical analysis with big data when
analysis on the basis of a single combined data set is not feasible using traditional
techniques due to storage limitation of data in a single computer. Another limitation
of extensive use of big data arises from the privacy concerns of unit records. Lee
et al. (2017) indicate that the appropriate use of sufficiency and summary statistics
can provide a very strong statistical base to overcome the concerns about the violation
of privacy associated with the use of unit records with identification. In addition, the
use of sufficiency provides the background for making use of dividing the big data
into smaller subsets where sufficiency principle can be applied to recombine the
essential statistics to obtain the estimates necessary for the big data.
In recent years, statisticians are facing new challenges in methodology, theory,
and computation of great implications arising from big data. The sources of these
data are not from traditional sample survey as a single combined data set but from
data generated through overwhelming use of information technology such as Face-
book, Twitter, and various other websites, in addition to data generated in very large
sizes in sectors such as tele-communication, banking, medical statistics, genetics,
environment, business, engineering, prediction, and forecasting based on big data,
etc. These challenges have manifold implications due to: (i) the nature of data emerg-
ing from wide variety of sources mostly not from the domains defined in statistical
terms, (ii) the variety, velocity, volume, and veracity are the characteristics of big
data, hence size and complexity pose difficult challenges to statisticians, and (iii) the
process and decision making from the data as fast as possible.
In a special issue on the role of statistics in the Era of big data of the Statistics and
Probability Letters (Cox et al. 2018), the divergence stemming from major issues of
12.1 Introduction 221
concern and debates are visibly identified. On one hand, there is a strong argument
about the centrality of statistical modeling supported by the theoretical knowledge
of the phenomenon under study, and on the other hand, contrarily, strong opposed
view by some others (mostly computer scientists) that with the advent of big data the
models and theory are useless (Sangalli 2018). No doubt, these are extreme views
from the viewpoint of users, because the central issue of statistics is always data,
may be challenged by the size and complexity of data to a large extent in the new
paradigm on one hand and the dimension and complexity pose formidable difficulty
to make use of traditional statistical techniques and thus increasing role of computer
scientists and mathematicians are emerging challenges on the other hand. This is
the most difficult challenge to statisticians since its inception and leads to a turning
point that will either bring about new frontiers of theoretical and computational
developments or make statistical applications narrower in scope limited to small or
moderate sample sizes leaving big data modeling and applications to the hands of
the computer scientists or more specifically to machine learning. Further details on
big data analytics can be found in Chen and Xie (2014), Buhlmann et al. (2016),
Zomaya and Sakr (2017), Dunson (2018), Härdle et al. (2018), and Reid (2018).
In this chapter, the divide and recombine (D&R) technique proposed in recent
past (Guha et al. 2012) is studied with special focus on statistical issues of concern
such as sufficiency and dimension reduction, modeling, and estimation procedure.
The outline of the chapter is as follows. Section 12.2 presents a short note on
sufficiency and dimensionality. Section 12.3 discusses the generalized linear models.
Section 12.4 explains the divide and recombine technique for different link functions.
Finally, Sect. 12.5 presents some comments on the chapter.
12.2 Sufficiency and Dimensionality
The notion of sufficient statistics was first introduced by Fisher (1920, 1922, 1925)
and studied extensively by many others (Pitman 1936, Koopman 1936, Halmos and
Savage 1949, Lehmann 1959, Bahadur 1954). Fraser (1961, 1963) showed that the
likelihood function can be used to analyze the effect of sampling on the dimension-
ality of the sufficient statistics. It was shown that fixed dimension for the sufficient
statistic is restricted to the exponential family.
Let us consider n observations (x1 , . . . , xn ) then the reduced statistics can be
denoted by
(l(θ |x1 ), . . . , l(θ |xn ))
where

n
l(θ |x1 , . . . , xn ) = l(θ |xi ). (12.1)
i=1
Hence, the investigation of sufficiency becomes investigation of the transforma-

tion from
(l(θ |x1 ), . . . , l(θ |xn ))
to

n
l(θ |x1 , . . . , xn ) = l(θ |xi ).
i=1
Let the number of minimum dimension be r then

r
l(θ |x ) = φ0 (θ ) + a j (x)φ j (θ ) (12.2)
j=1
where (φ1 (θ ), . . . , φr (θ )) are assumed to be the basis for generating the space for r
minimum dimensions. Then, the probability density takes the form

r
φ0 (θ)+ a j (x)φ j (θ)
f (x|θ ) = f (x|θ0 )e j=1
(12.3)
which are known as Koopman-Darmois-Pitman form or popularly termed as

densities of exponential family. The constant θ0 is a reference value on the
parameter
n space and n φ0 (θ ) is a reference element. The sufficient statistics are
i=1 a 1 (x i ), . . . , i=1 ar (x i ) and it is noteworthy that the dimension r does not
depend on the sample size, n. The necessary statistic is
Sx1 ,...,xn = S(u 1 (x1 , . . . , xn ), . . . , u r (x1 , . . . , xn )) (12.4)
where the fixed dimensionality sufficient statistics (r) are indexed by variables
(u 1 , . . . , u r ).
12.3 Generalized Linear Models
In a univariate generalized linear model for the outcome variable Y, the exponential
form is

yθ −b(θ )
a(φ) +c(y,φ)
f (y; θ, φ) = e (12.5)
where θ is the canonical parameter, b(θ ) is function of canonical parameter, b (θ ) =

E(Y ) = μ, b (θ ) = V (μ) = variance function, a(φ) is the dispersion parameter, and
c(y, φ) is a function of y and φ. The systematic component is denoted by η = g(μ) =
12.3 Generalized Linear Models 223
Xβ. Using the Koopman-Darmois-Pitman form, we can show the data reduction as
follows for sample of size 1
f (y, θ, φ) = c0 (y, φ)e{a(y)c(θ)} (12.6)
where r0 (y, φ) = ec(y,φ) and the sufficient statistic is y. For a sample of size n, the
likelihood function of this exponential form is

n
a(yi )c(θi )
L(θ, φ, y) = r0 (y, φ)e i=1 (12.7)
where y = (y1 , . . . , yn ) denotes the vector of outcome observations and θ =

(θ1 , . . . , θn ) is the vector of canonical parameters corresponding to y. If the canonical
parameter exists then we can express the relationship

n
yi θi −b(θi )
n
+ c(yi ,φ)
L(θ, φ, y) = ei=1
a(φ)
i=1 (12.8)
where for the systematic component

ηi = xi β, the canonical form is θi = g(μi ) =
X i β = θi (β), μi = E(Y i ), β = β0 , β1 , . . . , β p , vector of explanatory variables,
X
= 1, X 1 , .
. . , X p , vector of observed values of explanatory variables, x =
1, x1 , . . . , x p , and the likelihood function is

n
yi θi (β)−b(θi (β))
n
+ c(yi ,φ)
L(θ, φ, y) = ei=1 .
a(φ)
i=1 (12.9)

n
yi θi (β) − b(θi (β))
n
l(β, φ, y) = ln L(β, φ, y) = + c(yi , φ) .
i=1
a(φ) i=1
(12.10)
The maximum likelihood estimates of the parameters can be found by solving the
system of equations
1
n
[yi − μi ]xi j = 0. (12.11)
a(φ) i=1
More on generalized linear models are given in Chap. 8. Fahrmeir and Tutz (2001)
explained the multivariate statistical modelling based on generalized linear models.
12.4 Divide and Recombine
The generalized linear models and the estimation procedure of parameters of the
models are discussed in Sects. 12.2 and 12.3. These estimation procedures are quite
attractive if the size of sample is small, moderate, or large in usual statistical sense.
However, in the case of big data, it would not be possible to use the same procedure
due to the amount of data being captured and stored. We need to review the theory,
methodology, and computation techniques for big data analysis. Let us consider
a situation where we have to consider 1,000,000 observations with 100 variables
including both outcome and explanatory variables for each item resulting in a total
of 1,000,000 × 100 observations. In reality, the data would be much more complex
and bigger. This is not a typical statistical challenge simply due to the size of data
and hence we need to find a valid way to use all the data without sacrificing statistical
rigor. In this case, one logical solution is to divide and recombine data (see Guha
et al. 2012, Cleveland and Hafen 2014, Hafen 2016, Liu and Li 2018). The idea is
simple: we have to divide the big data into subsets, each analytic method is applied
to subsets and the outputs are recombined in a statistically valid manner. In the
process of dividing and recombining, the big data set is partitioned into manageable
subsets of smaller data and analytic methods such as fitting of models are performed
independently for each subset. One way to recombine is to use the average of the
estimated model coefficients obtained from each subset (Guha et al. 2012). The
resulting estimates may not be exact due to the choice of the recombining procedure
but statistically valid. The advantage is obvious, and we can make use of statistical
techniques without any constraint arising from big data using R or available statistical
packages.
Lee et al. (2017) summarized D&R steps as follows: (i) the subsets are obtained
by dividing the original big data into manageable smaller groups; (ii) the estimates
or sufficient statistics are obtained for the subsets; and (iii) the results from subsets
are combined by using some kind of averaging to obtain the estimate for the whole
data set. According to Hafen (2016), the division into subsets can be performed by
either replicate division or conditioning variable division. Replicate division takes
into account random sampling without replacement and the conditioning variable
division considers stratification of the data based on one or more variables included
in the data. A feasible measure of a good fit is the least discrepancy with the estimate
obtained from the entire data set. Other than a few exceptions, D&R results are
approximate (Lee et al. 2017).
12.4.1 Identity Link Function
The partitioning of big data may be performed using either the random sampling
without replacement or applying some conditioning variables. It may be noted that
replicate division divides the data based on random sampling without replacement
12.4 Divide and Recombine 225
while conditioning variable division stratifies the data according to one or more
variables in the data (Lee et al. 2017). Guha et al. (2012) defined the replicate division
for the data with n observations and p variables under the same experiment and
conditions. The random-replicate division uses random sampling of observations
without replacement to create subsets. This is attractive and computationally fast but
it makes no effort to create subsets each of which is representative of the data set.
Lee et al. (2017) referred two procedures for recombining the results: (i) summary
statistics D&R and (ii) horizontal D&R. They illustrated the procedures with the
regression model or GLM with identity link function
θ = g(μ) = μ = E(Y ) = Xβ
or alternatively
Y = Xβ + ε (12.12)
where ε ∼ N (0, ).

For the whole data set, the estimates (both likelihood and least squares) are
−1
β̂ = X X XY . (12.13)
For big data, let us consider partitioning into S subsets. Then the estimates can be
obtained by (i) summary statistics D&R and (ii) horizontal D&R.
Case 1 (Summary Statistics) The summary statistics D&R, shown in Fig. 12.1,
includes the following steps (Lee et al. 2017):
Step I: Divide the data into S subsets of similar structure, with Ys and X s being
the vector of responses and design matrix in subset s (s = 1,2,…, S).
Step II: Compute X s X s and X s Ys , s = 1, 2, . . . , S, and
−1
S S
Step III: Recombine using s=1 X s X s s=1 X s Ys . Chen et al. (2006)
referred this as regression cube technique. This result provides the same
Fig. 12.1 Flowchart displaying D&R method for linear regression (summary statistics)
Fig. 12.2 Flowchart displaying D&R method for linear regression (horizontal)
estimate of β due to the matrix properties

S
S
XX = X s X s and X Y = X s Ys . (12.14)
s=1 s=1
Case 2 (Horizontal) The horizontal D&R, shown in Fig. 12.2, includes the following
steps (Lee et al. 2017):
Step II: Compute X s X s and X s Ys , s = 1, 2, . . . , S, and compute
−1
β̂s = X s X s X s Ys , (12.15)
Step III: Recombine using the weighted estimate as follows

S
Ws β̂s
β̂ = s=1
S
, (12.16)
s=1 Ws
where the optimal weight is Ws = X s X s , which is proportional to the

inverse variance–covariance matrix of the regression parameter.
12.4.2 Logit Link Function
Xi et al. (2008) showed a way to divide and recombine the big data for analyzing and
predicting categorical attributes in a data cube format. For binary outcome variable
Y and explanatory variable vector X, the logit link function is

μ
ln = xβ (12.17)
1−μ
and the estimating equations are
n
∂l exi β
= yi − xi j = 0, j = 1, . . . s, p.
∂β j i=1
1 + exi β
where l is the log likelihood for the entire data. The estimate of β based on all the
data in the big data set is denoted by β̂.If we divide the data set into S strata each
S
with size n s , s = 1, 2, . . . , S then n = s=1 n s . For partitioned data set s = 1,2,…,
S, we can define

μs
ln = xs βs , s = 1, 2, . . . , S. (12.18)
1 − μs
The independent estimating equations for each partitioned set of data are
ns
∂ls exsi βs
= ysi − xsi j = 0, j = 1, . . . , p; s = 1, 2, . . . , S,
∂βs j i=1
1 + exsi βs
where ls denotes the log likelihood for subset s, s = 1, 2, …, S. As the data are drawn
independently, it may be shown that

S
l= ls .
s=1
It also follows that
∂l ∂ls S
= =0
∂β s=1
∂β
and provides the maximum likelihood estimators for overall as well as for the par-
titioned set of data as the whole data set is assumed to be drawn from the same
population. This implies that the maximum likelihood estimates obtained from
∂ls
=0
∂β
are equivalent to
∂ls
=0
∂βs
under the assumption of independent and identical distribution of the big data because
β1 = β2 = . . . = β S = β. The estimating equations under this assumption are
ns
∂ls exsi βs
= ysi − xsi j
∂βs j i=1
1 + exsi βs
ns
exsi β
= ysi − xsi j = 0, j = 1, . . . , p; s = 1, 2, . . . , S
i=1
1 + exsi β
and the recombined log likelihood is

S
l= ls
s=1
and the estimating equations are
S S ns
∂ls exsi β
= ysi − xsi j =0. (12.19)
s=1
∂βs j s=1 i=1
1 + exsi β
Approach 12.1
Zuo and Li (2018) used the logistic regression model
Yi = μi + εi (12.20)
where
exi β
μi = πi = ,
1 + exi β
β is a vector of ( p + 1) × 1 coefficients,
X is n × ( p + 1) data matrix with xi is the ith row of X,
εi error variable with mean zero and variance πi (1 − πi ).
The maximum likelihood estimator of β is
β̂ = C −1 X Ŵ Z (12.21)

where C = X Ŵ X, Ŵ = diag π̂i 1 − π̂i , Z is a column vector with ith element
−π̂i
ln π̂i + π̂ yi1− . Here, β̂ is an asymptotically unbiased estimator of β.
i( π̂i )
For the sth partitioned subset, the estimator of logistic regression parameters is
β̂s = Cs−1 X s Ŵs Z s (12.22)


where Cs = X s Ŵs X s , Ŵs = diag π̂si 1 − π̂si , Z s is a column vector with ith
−π̂si
element ln π̂si + π̂ ysi1− , s = 1, 2, . . . , S, i = 1, 2, . . . , n s .
si ( π̂si )
Here, β̂s is an asymptotically unbiased estimator of βs and using the large sample
property of maximum likelihood estimators, it can be shown that β̂s → β for n s →
∞, s = 1, 2, . . . , S.
Hence, the expected value of β̂1 + . . . + β̂ S is E β̂1 + . . . + β̂ S = Sβ, for
n s → ∞, s = 1, 2, .., S. This proof is based on both asymptotic unbiasedness and
consistency properties of maximum likelihood estimators. An unbiased estimator of
β for large n is
S
β̂s
β̂ = s=1
. (12.23)
S
The steps for dividing and recombining in this case are:
the vector of responses and design matrix in subset s (s = 1, 2,…, S).
Step II: For the sth partitioned subset (s = 1,2,…, S), compute
β̂s = Cs−1 X s Ŵs Z s

where Cs = X s Ŵs X s , Ŵs = diag π̂si 1 − π̂si , Z s is a column vector
−π̂si
with ith element ln π̂si + π̂ ysi1− , s = 1, 2, . . . , S, i = 1, 2, . . . , n s .
si ( π̂si )
Step III: Recombine using the weighted estimate as follows
S
β̂s
β̂ = s=1
.
S
Figure 12.3 displays the procedure discussed above.
Fig. 12.3 Flowchart displaying D&R method for logistic regression (Approach 12.1)
Approach 12.2
Alternatively, it is also possible to divide and recombine using the following steps:
Step II: For the sth partitioned subset (s = 1, 2,…, S), compute
Cs−1 and Rs = X s Ŵs Z s (12.24)

where Cs = X s Ŵs X s , Ŵs = diag π̂si 1 − π̂si , Z s is a column vector
−π̂si
with ith element ln π̂si + π̂ ysi1− , s = 1, 2, . . . , S, i = 1, 2, . . . , n s . It
si ( π̂si )
may be noted here that Cs is a ( p + 1) × ( p + 1) matrix Rs is a ( p + 1) × 1
vector.
Step III: Recombine using the estimates obtained in Step II as follows
S
Cs−1 Rs
β̂ = s=1
. (12.25)
S
Figure 12.4 displays the above procedure.
Approach 12.3
Xi et al. (2008) proposed a method based on first-order approximation of the first
derivative with respect to βs log likelihood using Taylor’s expansion at β̂k as shown
below
n s

ls∗ = − μ̂ β̂s β − β̂s xsi xsi
i=1
exi β̂s
where the ith element of μ̂si (β) is μ̂si (βs ) = .
1+exi β̂s
The above expression can be rewritten as
ls∗ = −As β + As β̂s = 0
where
ns

As = μ̂si β̂s xsi xsi .
i=1
S
The recombined estimates are obtained from l ∗ = ∗
s=1 l s = 0. Let A =
S
s=1 As . Then the estimators are
S −1 S

β̂ = As As β̂s . (12.26)
s=1 s=1
The steps for obtaining estimates using the divide and recombine technique are
described below.
the vector of responses and design matrix in subset s (s = 1,2,…, S),
respectively.
Step II: For the sth partitioned subset (s = 1,2,…, S), compute
ns

As = μ̂si β̂s xsi xsi and β̂s
i=1
where the parameters for the subset s,β̂s , are obtained from the following
estimating equations
∂ls s n
ls∗ = = [ysi − μ(β)]xsi = 0.
∂β i=1

S −1

S
β̂ = As As β̂s .
s=1 s=1
The above procedure is displayed in Fig. 12.5.
12.4.3 D&R Method for Count Data
In previous Sects. (12.4.1 and 12.4.2), we have shown D&R applications for identity
and logit link functions. In this section, the method is illustrated for count data. Let
us consider count variables Y = (Y1 , . . . , Yn ) with observations y = (y1 , . . . , yn )
where n is very large. Let us define the data matrix comprised of observed values of
explanatory variables X = (x1 , . . . , xn ) where xi = (1, xi1 , . . . , xi p ) and the vector
of regression coefficients β = (β0 , β1 , . . . , β p ). The Poisson distribution for count
data is represented by
e−λ λ y
f (y, λ) = , y = 0, 1, . . . (12.27)
y!
with exponential form
f ((y, λ) = e[y ln λ−λ−ln y!]
where in GLM notations, θ = ln λ, b(θ ) = λ, c(y, φ) = −y!, a(φ) = 1. As θ =

ln λ, we can show that b(θ ) = eθ , μ = b (θ ) = eθ = λ, θ = g(μ) = ln μ,V (μ) =
μ = λ, and V (Y ) = μ = λ. The exponential form can be rewritten as
f ((y, θ, φ) = e[{yθ−b(θ)}/a(φ)+c(y,φ)] (12.28)
where the systematic component is η = xβ and the canonical parameter is θ = xβ =

η. The estimating equations for log link θi = ln μi (β) and μi (β) = exi β are
1
n
[yi − exi β ]xi j = 0, j = 0, 1, . . . , p (12.29)
a(φ) i=1
where xi0 = 1.
Approach 12.1: Identity Link Function

Dobson and Barnett (2018) showed an application of the Poisson GLM with identity
link function instead of canonical log link function, where θi = μi (β) and μi (β) =
xi β. The maximum likelihood estimates are obtained iteratively as follows
(m−1)
X W X β̂ (m) = X W z (m−1) (12.30)
evaluated at β̂ (m−1) where

p
∂ηi
zi = xi j β̂ (m−1) + (yi − μi ) ,
j=1
j
∂μi
∂ηi
μi and ∂μi
are evaluated at β̂ (m−1) and W is a diagonal matrix with ith element

1 ∂μi 2
wii = .
Var(Yi ) ∂ηi
The estimators for S subsets are obtained by performing the iterative process until
the convergence is attained for s = 1,2,…, S
(m−1)
X s Ws X s β̂s(m) = X s Ws z s(m−1) , s = 1, 2, . . . , S (12.31)

p
evaluated at β̂s(m−1) , where z si = xsi j β̂s(m−1)
j + (ysi − μsi ) ∂ηsi
∂μsi
, μsi , and ∂ηsi
∂μsi
j=0
are evaluated at β̂s(m−1) and Ws is a diagonal matrix with ith element

1 ∂μsi 2
wii = .
Var(Ysi ) ∂ηsi
The steps for D&R are discussed below and Fig. 12.6 displays the steps graphically
for the Poisson regression model.
the vector of responses and design matrix in subset s (s = 1,2,…,
S).
Step II: For the sth partitioned subset (s = 1,2,…, S), compute X s Ws X s ,
X s Ws z s ,and β̂s s = 1, 2, . . . , S.
Fig. 12.6 Flowchart displaying D&R method for Poisson regression for count data
S −1 S

β̂ = X s Ws X s X s Ws z s . (12.32)
s=1 s=1
Approach 12.2: Log Link Function

In Approach 12.1, we have employed the identity link function in order to take
advantage of the closed form estimators of both the subsets and recombined set. This
is an approximation and it would cause some deviation from the results of the original
set from where the subsets were drawn. A comparison between the estimates obtained
by using the recombined set and combined set (original set from where the subsets
are drawn) can provide some empirical idea. However, we may try with another set
of estimates of the parameters obtained from the estimating equations directly. Let
the systematic component be η = xβ and the canonical parameter θ = xβ = η. The
link function is θi = ln μi (β) and the mean can be represented by μi (β) = exi β with
estimating equations for the combined set as
1
n
yi − exi β xi j = 0, j = 0, 1, . . . , p. (12.33)
a(φ) i=1
The steps for D&R are discussed below and Fig. 12.7 illustrates the computational
procedure of the generalized linear model with log link function.
Step II: For the sth partitioned subset (s = 1,2,…, S), compute β̂s , s = 1, . . . , S,
solving the following estimating equations
1
n
ysi − exsi βs xsi j = 0, s = 1, 2, . . . , S; j = 0, 1, . . . , p.
as (φ) i=1
Fig. 12.7 Flowchart displaying D&R method for Poisson regression with log link for count data
S
β̂s
β̂ R = s=1
. (12.34)
S
12.4.4 Multinomial Model
Sometimes we need to model based on multinomial outcomes. Let us denote out-

comes in J categories as Y1 = y1 , . . . , Y J = y J where Jj=1 y j = n. The probability
distribution for outcomes in J categories, Y1 = y1 , . . . , Y J = y J , follows a multino-
mial distribution with probabilities P(Y1 = y1 ) = π1 , . . . , P(Y J = y J ) = π J
n! y y
P(Y1 = y1 , . . . , Y J = y J ) = π 1 . . . πJ J . (12.35)
y1 ! . . . y J ! 1
be shownthat the conditional distribution of Y1 = y1 , . . . , Y J = y J for

It can
given Jj=1 Y j = Jj=1 y j = n is a Poisson distribution
⎛ ⎞
J y
yj
J e−μ j μ j
J yj! μ j /μ j
⎝
P Y1 = y1 , . . . , Y J = y J ⎠
Yj = n =
j=1
= n! ,
e−μ μn yj!
j=1 n! j=1
(12.36)
μ
which is equivalent to the multinomial form with π j = μj .
We can express the above distribution in exponential form
⎛ ⎞ μ

J J J
y j ln μj +ln(n!)−ln yj!
P ⎝Y1 = y1 , . . . , Y J = y J Y j = n ⎠ = e j=1 j=1
.
j=1

μ
The link functions for Y1 , . . . , Y J are ln μi1i j = xi j β j , i = 1, 2,…, n, where
xi j = (1, xi j1 , . . . , xi j p ) and β j = (β j0 , β j1 , . . . , β j p ) .
⎛ ⎞

n
J
J
μi j
l= yi j ln + ln(n!) − ln ⎝ yi j !⎠
i=1 j=1
μi j=1
⎡ ⎛ ⎞ ⎛ ⎞⎤

n
J
J J
= ⎣ yi j (xi j β j ) − ⎝1 + exi j β j ⎠ + ln(n!) − ln ⎝ yi j !⎠⎦. (12.37)
i=1 j=2 j=2 j=1
The estimating equations are

∂l n

= yi j − πi j (β) xi jk = 0, j = 2, . . . , J ; k = 0, 1, 2, . . . , p.
∂β jk i=1
The information matrix is

n
I (β) = πi (β)(1 − πi (β))xi xi
i=1
where β is comprised of (J − 1) vectors of parameters, each vector consisting of

(p + 1) parameters.
The steps for D&R are discussed below for the generalized linear model with log
link function.
Step II: For the sth partitioned subset (s = 1,2,…, S), compute β̂s , s = 1, . . . , S,
solving the following estimating equations
∂l n

= ysi j − πsi j (βs ) xsi jk = 0,
∂βs jk i=1
s = 1, . . . , S; j = 2, . . . , J ; k = 0, 1, 2, . . . , p.

S
β̂s
β̂ R = s=1
. (12.38)
S
12.5 Some Comments
The role of statistics in big data analysis has become a focal issue in the recent debate
on data science. For big data, the statisticians need to address some formidable chal-
lenges that require developing new theories, methods, and tools for data integration
and visualization in dealing with volume, velocity, and variability of big structured
or unstructured data. The role of sufficiency in reducing the dimension of data is well
known in statistics. The exponential family provides sufficient statistics and GLM
can be applied as a possible modeling approach for analyzing big data. It is shown in
this chapter that the D&R technique can be used for analyzing big data by employ-
ing data reduction strategy. Using the D&R approach, the big data is partitioned into
smaller subsets and sufficient statistics from each subset are used to obtain recom-
bined estimates. It can be shown that the recombined estimates are very close to the
aggregate values if appropriate GLM technique is used. The D&R method through
the use of GLM can be a very useful modeling approach in big data.
References 237
References
Bahadur RR (1954) Sufficiency and statistical decision functions. Ann Math Stat 25:423–462
Buhlmann P, Petros D, Michael K, van der Mark L (2016) Handbook of big data. Routledge, London
Chen Y, Dong G, Han J, Pei J, Wah BW, Wang J (2006) Regression cubes with lossless compression
and aggregation. IEEE Trans Knowl Data Eng 18:1–15
Chen X, Xie M (2014) A split-and-conquer approach for analysis of extraordinarily large data. Stat
Sinica 24:1655–1684
Cleveland S, Hafen R (2014) Divide and recombine (D&R): data science for large complex data.
Stat Anal Data Min 7:425–433
Cox DR, Kartsonaki C, Keogh RH (2018) Big data: some statistical issues. Stat Probab Lett
1(36):111–115
Dobson AJ, Barnett AG (2018) An introduction to generalized linear models, 4th edn. CRC Press,
Boca Raton
Donoho D (2015) 50 Years of data science. Presentation at the Tukey Centennial Workshop, Prince-
ton, New Jersey, Sep 2015
Donoho D (2017) 50 Years of data science. J Comput Graph Stat 26(4):745–766
Dunson DB (2018) Statistics in the big data era: failures of the machine. Stat Probab Lett 1(36):4–9
Einav L, Levin J (2014) Economics in the age of big data. Science 346:1243089-1, -5
Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models,
Fisher RA (1920) A mathematical examination of the method of determining the accuracy of an
observation by the mean error and by the mean square error, M.N.R. Astron Soc 80(8):758–770
Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos Trans R Soc
Lond A 222:309–368
Fisher RA (1925) Theory of statistical estimation. Proc Camb Philos Soc 22:700–725
Fraser DAS (1961) Invariance and the fiducial method. Biometrika 48:261–280
Fraser DAS (1963) On sufficiency and the exponential family. J R Stat Soc Ser B 25:115–123
Guha S, Hafen R, Rounds J, Xia J, Li J, Xi B, Cleveland WS (2012) Large complex data: divide
and recombine (D&R) with RHIPE. Stat 1(1):53–67
Hafen R (2016) Divide and recombine: approach for detailed analysis and visualization of large
complex data. Handbook of big data. Chapman and Hall, Boca Raton
Halmos PR, Savage LJ (1949) Application of the radon-nikodym theorem to the theory of sufficient
statistics. Ann Math Stat 20:225–241
Härdle WK, Lu HHS, Shen X (eds) (2018) Handbook of big data analytics. Springer
Koopman BO (1936) On distribution admitting a sufficient statistic. Trans Am Math Soc 39:399–409
Lee JYL, Brown JJ, Ryan MM (2017) Sufficiency revisited: rethinking statistical algorithms in the
big data era. Am Stat 71(3):202–208
Lehmann EL (1959) Theory of hypothesis testing. Wiley, New York
Liu W, Li Y (2018) A new stochastic restricted Liu estimator for the logistic regression model.
Open J Stat 8:25–37
Pitman EJG (1936) Sufficient statistics and intrinsic accuracy. Proc Camb Philos Soc 32:567–579
Reid N (2018) Statistical science in the world of big data. Stat Probab Lett 1(36):42–45
Sangalli LM (2018) The role of statistics in the era of big data. Stat Probab Lett 1(36):1–3
Xi R, Lin N, Chen Y (2008) Compression and aggregation for logistic regression analysis in data
cubes. IEEE Trans Knowl Data Eng 1(1):1–14
Zomaya AY, Sakr S (eds) (2017) Handbook of big data technologies. Springer
ZuoW Li Y (2018) A new stochastic restricted Liu estimator for the logistic regression model. Open
J Stat 8:25–37
Appendix A
Programming Codes in R
R is a free and open-source statistical computing language and environment

available for various operating systems. It is available on the Comprehensive R
Archive Network (CRAN), Web site (http://cran.r-project.org/). This appendix
provides the programming codes in R that are applied to analyze data in different
examples of the book and are given below according to the orders of the chapters.
More regarding, codes in R for reliability and survival data analyses can be
found in many books; see, for example, Singer and Willett (2003), Tableman and
Kim (2003), Hosmer et al. (2008), Kleinbaum and Klein (2012), and Moore (2016).
A.1. R Programming Codes for Chap. 2
Example 2.1 Computation of descriptive statistics. In this example, Age.days and

Usage.km are used to denote the variables age in days and usage at failure,
respectively.
summary(Age.days)
mean(Age.days, trim = 0.05)
sd(Age.days)
sd(Age.days)/mean(Age.days)*100
IQR(Age.days, type=6)
summary(Usage.km)
mean(Usage.km, trim = 0.05)
sd(Usage.km)
sd(Usage.km)/mean(Usage.km)*100
IQR(Usage.km, type=6)
cor(Age.days,Usage.km)

Md. R. Karim and M. A. Islam, Reliability and Survival Analysis,
240 Appendix A: Programming Codes in R
Example 2.2 Plot of probability mass function and distribution function.

win.graph(width = 10, height = 5, pointsize = 12)
par(mfrow=c(1,2))
x <- 0:5
px <- c(0.05, 0.15, 0.25, 0.30, 0.20, 0.05)
plot(x, px, type = ''h'', lwd=2, xlab=''X'', ylab=''p(x)'',
main=''Probability mass function for X'')
points(x, px, pch=19)
Px <- cumsum(px)
plot(x, Px, type=''s'', xlab=''X'', ylab=''P(x)'', main = ''Probability dis-
tribution function for X'')
Example 3.1 Exponential distribution—survival probability, mean, quantile, etc.

lambda <- 0.00025
S.10000 <- exp(-lambda*10000)
S.20000_10000 <- exp(-lambda*30000)/exp(-lambda*10000)
MTTF <- 1/lambda
t_0.5 <- -1/lambda*log(1-0.5)
t_0.3 <- -1/lambda*log(1-0.3)
F.4000 <- 1-exp(-lambda*4000)
Example 3.2 Weibull distribution—survival probability, mean, quantile, etc.

eta <- 4000
beta <- 1.50
mean.T <- eta*gamma(1+1/beta)
var.T <- eta^2 * (gamma(1+2/beta) - (gamma(1+1/beta))^2)
S.5000 <- exp(-(5000/eta)^beta)
S.7000 <- exp(-(7000/eta)^beta)
S.2000_5000 <- S.7000/S.5000
S.2000.Exp <- exp(-1/eta*2000)
Example 3.3 Weibull distribution fitting for uncensored data. The variable usage
(in km at failure) is denoted by Usage.km.
library(survival)
delta <- rep(1, length(Usage))
Weib.fit <- survreg(Surv(Usage.km, delta) * 1, dist='weibull')
eta.hat <- exp(Weib.fit$coefficient)
Appendix A: Programming Codes in R 241
beta.hat <- 1/Weib.fit$scale

MTTF <- eta.hat*gamma(1 + 1/beta.hat)
Example 3.4 Lognormal distribution fitting for uncensored data. The variable usage
(in km at failure) is denoted by Usage.km.
library(survival)
delta <- rep(1, length(Usage))
lognorm.fit <- survreg(Surv(Usage.km, delta) * 1, dist='lognormal')
mu.hat <- lognorm.fit$coefficients
sigma.hat <- lognorm.fit$scale
MTTF <- exp(mu.hat + sigma.hat^2/2)
Example 5.1 Plots of empirical cumulative distribution functions. In this example,

Age.days and Usage.km are used to denote the variables age in days and usage in
km at failure, respectively.
par(mfrow=c(1,2))
plot.ecdf(Age.days, verticals=T, do.points=F, xlab=''Age (in days)'',
ylab=''ecdf'', main = ''Empirical cdf of Age'')
plot.ecdf(Usage.km, verticals=T, do.points=F, xlab=''Usage (in km)'',
ylab=''ecdf'', main=''Empirical cdf of Usage'')
ecdf.Age <- ecdf(Age.days)
ecdf.Age(90)
ecdf.Usage <- ecdf(Usage.km)
ecdf.Usage(20000)
Example 5.2 Nonparametric estimation of survival function and cumulative density

function. Here, the data frame named battery data consists of two variables time and
status, where time denotes the variable time to failure and status denotes the cen-
soring indicator for time (1 means failure and 0 means censored).
library(survival)
battery.KM <- survfit(Surv(time, status)* 1, data = batterydata,
type=''kaplan-meier'')
summary(battery.KM)
plot(battery.KM, xlab=''Time to failure in days, t'', ylab=''Proportion
surviving'')
title(''Nonparametric estimate of S(t)'')
Example 5.3 Comparison of two survival functions. In this example, the data frame
named bat consists of three variables time, status, and x, where time denotes the time in
months, status denotes the censoring indicator for time (1 means failure and 0 means
censored), and x represents a factor with two levels Maintained and Nonmaintained.
library(survival)
test <- survdiff(Surv(time, status) * x, data = bat)
test$chisq
survdiff(Surv(time, status) * x, data = bat, rho=0)
survdiff(Surv(time, status) * x, data = bat.IPS, rho=1)
S.group <- survfit(Surv(time, status) * x, data = bat)
summary(S.group)
plot(S.group, main=''S(t) for two groups for IPS Battery data'', xlab=''Time
in months'', ylab=''Proportion surviving'', lty = 2:3, col=1:2, lwd=2)
legend(''topright'', c(''Maintained'', ''Nonmaintained''), lty = 2:3,
col=1:2, text.col=c(1,2), lwd=2)
Example 6.2 Exponential distribution fitting for failure and censored data. The data
and variables are the same as of Example 5.2.
library(survival)
battery.exp <- survreg(Surv(time, status)* 1, data = batterydata, dist='-
exponential')
battery.exp$coefficients
lambda.hat <- 1/exp(battery.exp$coefficients)
exp.mean <- 1/lambda.hat
par(mfrow=c(2,2))
t <- seq(0, 4500, 0.5)
f <- dexp(t, rate = lambda.hat, log = FALSE)
plot(t, f, main=''Probability density function'', xlab= ''Days, t'',
ylab=''f(t)'', col=1, lty=1, type=''l'', lwd=2)
F <- pexp(t, rate = lambda.hat, lower.tail = TRUE, log.p = FALSE)
plot(t, F, main=''Cumulative density function'', xlab= ''Days, t'', ylab=''F
(t)'', col=1, lty= 1, type=''l'', lwd=2)
R <- 1-pexp(t, rate = lambda.hat, lower.tail = TRUE, log.p = FALSE)
plot(t, R, main=''Reliability function'', xlab=''Days, t'', ylab=''R(t)'',
col=1, lty= 1, type=''l'', lwd=2)
h <- rep(lambda.hat, length(t))
plot(t, h, main=''Hazard function'', ylim=c(0,0.0020), xlab=''Days, t'',
ylab=''h(t)'', col=1, lty=1, type=''l'', lwd=2)
Example 6.3 Weibull distribution fitting for failure and censored data. The data and
variables are the same as of Example 5.2.
battery.Weib <- survreg(Surv(time, status)* 1, data=batterydata, dis-
t='weibull')
# scale parameter
lambda.hat <- exp(battery.Weib$coefficient)
# shape parameter
alpha.hat <- 1/battery.Weib$scale
MTTF <- lambda.hat*gamma(1 + 1/alpha.hat)
par(mfrow=c(2,2))
t <- seq(0, 2500, 0.5)
f <- dweibull(t, shape=alpha.hat, scale = lambda.hat, log = FALSE)
plot(t, f, main=''Probability density function'', xlab= ''Days, t'',
ylab=''f(t)'', col=1, lty=1, type=''l'', lwd=2)
F <- pweibull(t, shape=alpha.hat, scale = lambda.hat, lower.tail = TRUE,
log.p = FALSE)
plot(t, F, main=''Cumulative density function'', xlab= ''Days, t'', ylab=''F
(t)'', col=1, lty= 1, type=''l'', lwd=2)
R <- 1- pweibull(t, shape=alpha.hat, scale= lambda.hat, lower.tail = TRUE,
log.p = FALSE)
plot(t, R, main=''Reliability function'', xlab=''Days, t'', ylab=''R(t)'',
col=1, lty= 1, type=''l'', lwd=2)
h <- f/R
plot(t, h, main=''Hazard function'', ylim=c(0,0.007),xlab =''Days, t'',
ylab=''h(t)'', col=1, lty=1, type=''l'', lwd=2)
Example 7.1 Fitting of proportional hazard (PH) model based on hypothetical data.
library(survival)
data7.1 <- list(age=c(61, 62, 63, 64, 65),
status=c(1,0,1,1,0),
gender=c(0,1,1,0,0))
ph.fit <- coxph(Surv(age, status) * gender, data7.1)
ph.fit
summary(ph.fit)
# or
beta.hat <- log(3)/2
I <- 6*exp(beta.hat)/(2*exp(beta.hat)+3)^2 +2*exp(beta.hat)/(exp(beta.
hat)+2)^2
1/I
exp(-beta.hat)
Example 7.2 Weibull regression model. In this example, the data frame named
auto.data consists of the following variables:
y: usage km, dependent variable in regression model,

fs: failure/censored indicator (1 or 0), numeric,
x1: age in days, numeric,
x2: auto used region, factor with 4 levels (1, 2, 3, 4),
x3: type of automobile, factor with 2 levels (1, 2),
x4: failure mode, factor with 3 levels (1, 2, 3).
auto.data <- data.frame(y, fs, x1, x2, x3, x4)
Weib.reg <- survreg(Surv(y, fs) * x1+x2+x3+x4,
data=auto.data, dist=''weibull'')
b <- Weib.reg$coef
sigma <- Weib.reg$scale
loglik.H1 <- Weib.reg$loglik[2]
loglik.H0 <- Weib.reg$loglik[1]
chi.square <- -2*(loglik.H0-loglik.H1)
pvalue <- 1 - pchisq(chi.square, df=7)
beta.hat <- matrix(c(b), nrow=8, ncol=1)
sigma.hat <- sigma
X <- matrix(c(1, 365, 0, 0, 0, 0, 0, 0), nrow = 8, ncol=1)
# specific values of X: here 1st=intercept, 2nd=age.days,
# 3rd=region2, 4th=region3, 5th=region4, 6th=auto.type2,
# 7th=fmode2, 8th=fmode3
mu.hat <- t(beta.hat) %*% X
p <- 0.1
tp.hat <- exp(mu.hat + qsev(p) * sigma.hat)
Example 9.2 Reliability function of a competing risk model (or series system).
t <- seq(1,10000)
lambda1 <- 0.0006; lambda2 <- 0.0004;
lambda <- lambda1+lambda2
R.FM1 <- 1 - pexp(t, rate=lambda1, lower.tail=T, log.p=F)
R.FM2 <- 1 - pexp(t, rate=lambda2, lower.tail=T, log.p=F)
R.CR <- 1 - pexp(t, rate=lambda, lower.tail=T, log.p=F)

plot(t, R.FM1, main=''Reliability function for competing risk model'',
xlab='' Days, t'', ylab=''R(t)'', col=1, lty= 1, type=''l'', lwd=1)
lines(t, R.FM2, col=2, lty= 2, type=''l'', lwd=1)
lines(t, R.CR, col=4, lty= 3, type=''l'', lwd=1)
legend(''topright'', c(''R(t) for failure mode 1'', ''R(t) for failure mode 2'',
''R_sys(t) for product''), col=c(1,2,4), text.col=c(1,2,4), lty=c(1,2,3),
lwd=1)
Example 9.6 Reliability function of a k-out-of-n system with k = 4, n = 6 under the

assumption that the component lifetimes follow Weibull distribution.
k <- 4; n <- 6
shape <- 1.5; scale <- 200
k.out.n <- function(t){
cdf <- pweibull(t, shape, scale, lower.tail=T, log.p=F)
F <- rep(cdf, n)
F.sys <- 0
for(j in (n-k+1):n){
F.sys <- F.sys + choose(n,j)*F[j]^j*(1-F[j])^(n-j)
}
R.sys <- 1-F.sys
list(cdf=F.sys, rf=R.sys)
}
k.out.n(100)
k.out.n(100)$cdf
k.out.n(100)$rf
t <- seq(0, 350, 0.1)
out <- matrix(0, length(t), 3)
for(i in 1:length(t)){
out[i,1] <- t[i]
out[i,2] <- k.out.n(t[i])$cdf
out[i,3] <- k.out.n(t[i])$rf
}
plot(out[,1], out[,2], type=''l'', xlab=''t'', ylab=''F(t)'', main =
''6-out-of-4 system cdf'')
plot(out[,1], out[,3], type=''l'', xlab=''t'', ylab=''R(t)'', main =
''6-out-of-4 system reliability function'')
Example 10.1 Construction of a MOP-MIS diagram. Estimated warranty claims

rates (WCRs) given in Table 10.3 are used here to create Figs. 10.2 and 10.3. First,
insert the data from Table 10.3 in a data frame named WCR3 by MIS MOP with
18 12 observations.
library(ggplot2)
library(scales)
library(reshape2)
I <- 12; W <- 18
colnames(WCR3)<- 1:I
rownames(WCR3)<- paste(''MIS'',1:W, sep = ''-'')
zz <- as.matrix(t(WCR3))
mop_mis <- melt(zz)
cols <- c(''MIS-1''=''red'',''MIS-2''=''green'',''MIS-3''=''navy blue'',
''MIS-4''=''blue'',''MIS-5''=''orange'',''MIS-6''= ''skyblue'',
''MIS-7''=''gray'',''MIS- 8''=''antiquewhite'',''MIS- 9''=''aquamarine2'',
''MIS-10''=''chartreuse2'',''MIS- 11'' = ''chocolate1'',
''MIS-12''=''cyan'',''MIS- 13''= ''darkgoldenrod3'',
''MIS-14''=''dimgrey'',''MIS-15''=''darkorchid3'',
''MIS-16''=''gold'',''MIS-17''=''mediumvioletred'',
''MIS-18''=''olivedrab4'')
plot1 <- ggplot(mop_mis, aes(x=Var1, y = value, group = Var2, colour = Var2,

shape=Var2))+ geom_line()+ geom_point()+theme_bw()+scale_x_continuous(-
name = ''WCR'', labels = 1:I,breaks = 1:I)+ scale_y_continuous(name = ''WCR'',
breaks = c(0,0.001,0.002,0.003,0.004, 0.005,0.006,0.007,0.008), labels =
c(0,''0.001'',''0.002'',
''0.003'',''0.004'',''0.005'',''0.006'',''0.007'',''0.008''))+
guides(colour = guide_legend(''MIS''), size = guide_legend(''MIS''),shape =
guide_legend(''MIS''))+
ggtitle(label = ''MOP-MIS Diagram of WCR'')+scale_shape_manual(values =
0:17)+
scale_colour_manual(values = cols)+theme(plot.title = element_text(size
= 15, face = ''bold'', hjust = .5),
legend.background = element_rect(fill = ''white'',
linetype = ''solid'', colour = ''black'', size = .5),
legend.title = element_text(hjust = .5))
plot1
plot2 <- ggplot(mop_mis, aes(x=Var1, y= value,group = Var2))+theme_bw()+

facet_wrap(*Var2, ncol=3)+ geom_point(color = ''red'')+geom_line(color =
''blue'')+
scale_x_continuous(name = ''MOP'',labels = 1:12, breaks =1:12)+

scale_y_continuous(name = ''WCR'',breaks = c(0,0.002,0.004,0.006,0.008),
labels = c(0,''0.002'',''0.004'',''0.006'',''0.008''))+
ggtitle(label = ''MOP-MIS Diagram of WCR for each MIS'')+labs(caption =
''Panel variable: MIS'')+
theme(strip.background = element_rect(fill = ''white'',
linetype = ''solid''), strip.text = element_text(colour = ''red''), panel.
spacing = unit(0,''lines''),
plot.caption = element_text(size = 12,
colour = ''black'', hjust = 0), plot.title =
element_text(size = 15, colour = ''black'',
face = ''bold'', hjust = .5), axis.title =
element_text(face = ''bold''))
plot2
Example 10.2 Estimation of optimum replacement time and cost.

beta <- 2.5; eta <- 1000
Cp <- 1; Cf <- 5
JT <- function(T.value){
WW.F <- pweibull(T.value, shape=beta, scale=eta,
lower.tail=T, log.p=F)
WW.R <- 1-WW.F
fun.t <- function(t){
WW.f <- dweibull(t, shape=beta, scale=eta, log=F)
return(t*WW.f)
}
int.part <- integrate(fun.t, lower=0, upper=
T.value)$value
JT.value <- (Cf*WW.F+Cp*WW.R)/(int.part+T.value*WW.R)
return(JT.value)
}
soln <- optimize(JT, lower=0, upper=1000)
soln
soln.age <- soln$minimum
soln.cost <- soln$objective
TT = seq(100, 1500, 1)
TT.no <- length(TT); JT.out <- array()
for(j in 1 : TT.no){
JT.out[j] <- JT(TT[j])
}
JT.star.opt = min(JT.out)
for(i in 1:TT.no){
if(JT.out[i] == JT.star.opt) {T.est <- TT[i]}
}
plot(TT, JT.out, main=''Optimum replacement age when cp=1 & cf=5'',ylim=c

(0,max(JT.out)), xlab=''Replacement age'', ylab=''Cost per unit time'',
type=''l'', lwd=2)
segments(soln.age, -0.07, soln.age, soln.cost, col= 'black', lwd=1)
segments(-0.2, soln.cost, soln.age, soln.cost, col= 'black', lwd=1)
text(soln.age, soln.cost, ''<- Optimum cost'', col = ''black'', adj = c(-0.1,
0.8))
References
Hosmer DW, Lemeshow S, May S (2008) Applied survival analysis: regression modeling of
time-to-event data, 2nd edn. Wiley-Interscience
Kleinbaum DG, Klein M (2012) Survival analysis: a self-learning text, 3rd edn. Springer, New
York
Moore DF (2016) Applied survival analysis using R. Springer International Publishing
Singer JD, Willett JB (2003) Applied longitudinal data analysis: modeling change and event
occurrence. Oxford University Press
Tableman M, Kim JS (2003) Survival analysis using S: analysis of time-to-event data. Chapman
and Hall/CRC
Index
A interval, 61
Accelerated failure time model, 128 left, 61
Age-based analysis, 79 progressive Type II, 60
Age-based claim rate, 79 right, 58
Age-specific death rate, 24 Type I, 58
Assembly error, 182, 183 Type II, 59
Association Censoring time, 4
negative, 117 Chain rule, 149
no, 117 Chapman-Kolmogorov equation, 202
positive, 117 Chi-squared distribution, 85
Assumption of proportionality Claim rate, 79
assessing, 124 Coefficient
At sale reliability, 181 correlation, 16
Average, 15 of variation, 16
Average failure rate, 25 rank correlation, 16
Combined system, 173
B Comparison
Baseline hazard function, 120 reliability function, 83
Baseline level, 139 survival function, 83
Baseline reliability function, 119 Competing risk model, 167, 168, 183
Bernoulli distribution, 117 Conditional cdf, 23
Bernoulli regression model, 155 Conditional reliability function, 23
Big data, 219, 224 Conditioning variable division, 224, 225
Big data analysis Confidence interval
steps, 220 normal approximation, 76
Binary outcome, 205 Constant failure rate, 24
Binary variable, 116 Constant hazard function, 35
B ten life, 29 Corrective maintenance, 189
Correlation
C rank, 17
Canonical link function, 149, 151 Cost per unit time, 193
Canonical parameter, 222 Count data, 231
Censored observation, 57 Covariance, 17
Censoring, 57 Covariate dependence, 205

Md. R. Karim and M. A. Islam, Reliability and Survival Analysis,
250 Index
Cumulative Exponential family of distributions, 143, 144

failure rate function, 24 Bernoulli, 144
hazard function, 24 expected value, 145
estimate, 76 exponential, 144
Cumulative density function, 19 lognormal, 145
Cumulative probability density function, 19 variance, 145
Exponential regression model, 153
D Extreme value distribution, 46, 103, 135
Data, 14 Extreme value Type I, 46
censored, 73
Data analysis F
preliminary, 14 Failure, 4
Data science Failure mode, 167
divisions, 220 Failure probability function, 20
Decomposition of a product, 162 Failure rate, 79
Decreasing failure rate, 24 Failure rate function, 24
Density function, 20 Fault, 4
Descriptive statistics, 18 Fault mode, 167
Design reliability, 181, 182 Field reliability, 181
Deviance, 150, 152 Fisher information
Differences in survival curves, 83 exponential distribution, 98
Discrete random variable, 19 Force of decrement, 24
Dispersion parameter, 222 Force of mortality, 24
Distribution function, 19 Fractile, 16, 28
empirical, 72
properties, 19 G
Distribution-free methods, 71 Gamma distribution, 110
Divide and recombine Gamma regression model, 153
data, 224 Generalized linear model, 143, 222
horizontal, 225, 226 components, 146
steps, 224 link function, 147
summary statistics, 225 random component, 146
technique, 221 systematic component, 147
Divisions of data science, 220 Graphical representation, 17
Greenwood’s formula, 76
E Gumbel distribution, 46, 103
Error, 4
Estimate H
cumulative hazard function, 76 Hazard function, 24
Estimating equations Health and retirement Study, 213
logistic regression, 118 Hypergeometric distribution, 84, 91
Estimator
Kaplan-Meier, 74, 75 I
product limit, 74 Identity link, 150
Euler’s constant, 158 Identity link function, 232
Event, 3 Increasing failure rate, 24
Expectation–Maximization (EM) algorithm, Inherent reliability, 181
111 Inspection, 14
Expected cycle cost, 193 Instantaneous failure rate, 24
Expected cycle length, 193 Intended function, 22
Explanatory variable, 116 Intensity function, 24
Exponential distribution, 95 Iterative weighted least squares, 158
Index 251
K Median time to failure, 36

Kaplan-Meier Memoryless property, 37
estimator, 74 Mill’s ratio, 24
Koopman-Darmois-Pitman form, 222, 223 Mixed system, 173
MLE
L exponential distribution, 96
Lack of memory, 37 Mode, 15
Largest extreme value, 46 Model
Lifetime, 4 mixture, 183
Lifetime distributions, 33 Modified Wilcoxon test, 84
Likelihood construction, 63 Month in Service (MIS), 80, 184
random censoring, 66 Month of production, 184
truncation and censoring, 68 Month Of Production–Month In Service
Type I censoring, 65 (MOP-MIS)
Type II censoring, 64 diagram, 184
Likelihood function, 38, 148 Months Of Sale (MOS), 79
exponential distribution, 96 Multinomial distribution, 235
gamma distribution, 110 Multinomial outcomes, 235
logistic regression, 117
lognormal distribution, 109 N
Weibull distribution, 104 Nonconformance component, 183
Likelihood ratio Nonparametric methods, 9, 71
logistic regression, 119 Nonrepairable component, 161
Likelihood ratio method Normal distribution, 51
exponential distribution, 101
Weibull distribution, 105 O
Likelihood ratio test, 207 Object, 3, 180
Linear regression, 17 Observation, 14
Living organisms, 1 Odds of occurrence, 116
Location-scale model, 135 Odds ratio, 117
Logistic regression model, 115, 117 Optimum maintenance time, 192
Logit function, 117 Optimum replacement time, 194
Logit link, 150 Outcome variable, 116
Logit link function, 226 Outlier, 15
Log link, 150
Log link function, 234 P
Lognormal distribution, 51, 107 Parallel system, 169
Log-rank test, 83, 85 properties, 171
reliability, 170
M structure, 169
Machine learning, 220 Parameter, 33
Maintenance, 189 location, 33
Mantel–Haenszel logrank test, 84 scale, 33
Markov chain, 203 shape, 34
Markov chain model, 197 Parametric regression model, 131
Mean life function, 26 Partial likelihood, 124
Mean residual life, 27 Partial likelihood function, 121
Mean time between failure, 26 Peto and Peto logrank test, 84
Mean time to failure, 26 Poisson distribution, 235
Measure of Poisson link function, 132
center, 15 Poisson regression model, 156, 233
dispersion, 16 Population, 14
location, 15 Preventive maintenance, 189
Median, 29 Probability, 22
252 Index
Probability mass function, 20 Scaled deviance, 152

Probability of mission failure, 20 Score test, 123
Product, 5 Semiparametric model, 121
reliability, 6 Series system
Product Life Cycle (PLC), 180 properties, 166
Product-limit estimator, 74 reliability, 164
Proportional hazards model, 120 Skewness, 15
Proportionality assumption, 120 Smallest extreme value, 46
Specified time period, 22
Q Staggered entry, 8
Quality variation Standard exponential distribution, 132
assembly error, 182 Stated environmental condition, 22
in manufacturing, 179, 182 Statistic, 14
non-conformance, 182 Statistical inference, 14, 204
Quartile, 29 Statistical learning, 220
Quasi likelihood, 143 Stochastic process, 197, 198
Stratified proportional hazards model, 124
R Sufficiency, 220
Random-replicate division, 225 Sufficient statistic, 39, 149, 221, 222
Random variable, 4 Survival analysis, 1, 2
continuous, 5 reasons, 7
discrete, 4 Survival data
Range, 16 features, 8
interquartile, 16 sources, 7
Reference level, 139 Survival function, 22
Regression model, 115 Survival probability, 22
Relationship among functions, 29 System, 10, 161, 162
Reliability, 1, 2, 21 chain, 163
at sale, 181 series, 163
design, 181 series structure, 163
field, 181 Systematic component, 151, 222
inherent, 181 System reliability, 161
k-out-of-n system, 174
parallel system, 170 T
series system, 164 Time, 3
Reliability analysis Time-to-failure, 2
reasons, 7 Transition probability, 199, 201, 202
Reliability Block Diagram (RBD), 162 Trimmed mean, 15, 19
Reliability data Truncation, 57, 62
features, 8
sources, 8 U
Reliability function, 22 Uncensored observation, 59
properties, 22 Unreliability function, 20
Repairable component, 162
Replicate division, 224 W
Risk factors, 116 Wald test, 123, 124
Risk of claims, 185 Warranty Claims Rate (WCR), 80, 83, 186
Weakest link, 163
S Weibull distribution, 40, 103
Sample, 14 Weibull regression model, 134, 139, 157
mean, 15 Wilcoxon rank sum test, 84
median, 15
standard deviation, 16 Z
variance, 16 Z-score, 85

Reliability and Survival Analysis

Uploaded by

Copyright:

Available Formats

Reliability and Survival Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reliability and Survival Analysis

Uploaded by

Copyright:

Available Formats

Md. Rezaul Karim · M.

Reliability and Survival

ISBN 978-981-13-9775-2 ISBN 978-981-13-9776-9 (eBook)

My wife Tahmina Khatun, daughters Jayati

illustrated in this chapter. In both reliability and survival analyses, we need to

Rajshahi, Bangladesh Md. Rezaul Karim

1 Reliability and Survival Analyses: Concepts and Deﬁnitions . . . . . . 1

3.2.1 Mean Time to Failure and Variance . . . . . . . . . . . . . . 36

5.3 Product-Limit Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 74

8.5 Estimating Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

12 Analysis of Big Data Using GLM . . . . . . . . . . . . . . . . . . . . . . . . . . 219

Appendix A: Programming Codes in R. . . . . . . . . . . . . . . . . . . . . . . . . . . 239

M. Ataharul Islam is currently the QM Husain Professor at the Institute of Statistical

1.1 Introduction to Reliability and Survival Analyses

(iii) Is it reasonable to consider multiple causes of death or failure?

1.2 Definitions of Some Important Terms

2 Other important terms will be explained in the respective chapters.

(ii) Time to event—the time until the occurrence of an event of interest,

1.3 Product, Product Performance, and Reliability

1.4 Why Reliability and Survival Analyses?

6 See Blischke and Murthy (2000) for more details.

1.5 Sources of Survival and Reliability Data

Sources of survival data include

8 Pisani et al. (2002) mentioned three sources of site-specific survival data.

1.6 Special Features of Survival and Reliability Data

1.7 Objectives of the Book

1.8 Outline of the Book

The book consists of an introductory chapter on basic concepts and definitions of

Blischke WR, Murthy DNP (2000) Reliability. Wiley, New York

2.2 Summary Statistics

We begin our discussion of the statistical analysis of data by looking briefly at

In this section, we will look at measures of center of a sample, measures of spread

2.2.1 Measures of Center

2.2.2 Measures of Dispersion

A second descriptive measure commonly used in the statistical analysis is a measure

2.2.3 Measures of Relationship

Table 2.1 A part of warranty claims data of an automobile component

Table 2.2 Descriptive

2.3 Cumulative Distribution and Probability Density

Let T be a random variable which represents the outcome of an uncertain event.

F(t; θ ) = P{T ≤ t}, t ≥ 0, (2.4)

the function F(t) might be referred to as unreliability function or failure probability

P{t < T ≤ t + δt} ≈ f (t)δt + O(δt 2 ). (2.6)

The probability density function is the most important mathematical function in

p(ti ; θ ) = p(ti ) = P{T = ti }, i = 1, 2, . . . , n (2.7)

where p(ti ) has the following properties:

2.4 Reliability or Survival Function

The ability of an item to perform a required function, under given environmental

R(t) = S(t) = P{T > t} = 1 − P{T ≤ t} = 1 − F(t), t ≥ 0 (2.8)

Fig. 2.3 Hypothetical

2.5 Conditional Reliability Function

P{(a < T ≤ a + t) ∩ (T > a)}

It is also known as the conditional probability of failure or conditional cdf.

F(a + t) − F(a) R(a + t)

2.6 Failure Rate Function

P{t < T ≤ t + δt|T > t}

• H(t) is a nondecreasing function of t.

2.7 Mean Life Function

Evaluating the right-hand side of (2.14) by means of integration by parts,7 we

2.8 Residual Lifetime

F(t) t F(t) 1 − S(t) 1 − 1 − exp[−H (t)]