Reliability and Survival Analysis
Reliability and Survival Analysis
Reliability and Survival Analysis
Ataharul Islam
Reliability
and Survival
Analysis
Reliability and Survival Analysis
Md. Rezaul Karim M. Ataharul Islam
•
123
Md. Rezaul Karim M. Ataharul Islam
Department of Statistics Institute of Statistical Research and Training
University of Rajshahi University of Dhaka
Rajshahi, Bangladesh Dhaka, Bangladesh
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Dedicated to
My elder sister Sayeda Begom, wife Tahmina
Karim Bhuiyan, and daughters Nafisa
Tarannum and Raisa Tabassum
Md. Rezaul Karim
Both reliability and survival analyses are very important techniques for analyzing
lifetime and other time-to-event data being used in various disciplines since a long
time. The survival analysis constitutes the core methodology of biostatistical sci-
ence that stemmed from living organisms including human, animal, patient, plant,
etc. A parallel development has been observed in engineering for the survival of
products or machines, in general nonliving things. The history of survival analysis
has been quite old, initially dealt with biometrical problems, but later on, converged
to more generalized developments under biostatistical science. The parallel devel-
opment in engineering, known as reliability, can be traced back, in a more formal
sense, since World War II. Although initially the developments in reliability
appeared very different from that of the survival analysis, over time there is a
growing feeling that both the fields have a large area of overlapping interests, in
terms of techniques, that can be studied by the users and researchers of both
reliability and survival analyses without any difficulty. This will benefit large
groups of researchers and users of the reliability and survival analysis techniques.
This book is aimed to address the areas of common interests with some examples.
As the statistical modeling of lifetime and various other time to events are used
extensively in many fields such as medical statistics, epidemiology, community
health, environmental studies, engineering, social sciences, actuarial science, and
economics, this book provides a general background applicable to such various
fields.
This book includes 12 chapters covering a wide range of topics. Chapter 1
introduces the concepts and definitions that are being used in both reliability and
survival analyses. Chapter 2 discusses the important functions along with their
relationships keeping in mind the needs of users of both reliability and survival
analyses. Emphasis is given to the terms generally used in both reliability and
survival analyses as well as some of the terms used under different names such as
reliability function or survival function, hazard function or failure rate function.
Chapter 3 includes some probability distributions such as exponential, Weibull,
extreme value, and normal and lognormal distributions. The estimation of param-
eters and some important properties for uncensored data are discussed and
vii
viii Preface
science, economics, etc. The objective of the book is to present and unify funda-
mental and basic statistical models and methods applied for both reliability and
survival data analyses in one place from applications and theoretical points of view.
We have made attempts to keep the book simple for undergraduate and graduate
students of the courses applied statistics, reliability engineering, survival analysis,
biostatistics, and biomedical sciences as well as the book will be of interest to
researchers (engineers, doctors, and statisticians) and practitioners (engineers,
applied statisticians, and managers) involved with reliability and survival analyses.
We are grateful to our colleagues and students in the Department of Statistics
of the University of Rajshahi, ISRT of the University of Dhaka, Universiti Sains
Malaysia, The University of Electro-Communications, Luleå University of
Technology, King Saud University, and East West University. The idea of writing a
book on reliability and survival analyses has stemmed from teaching and super-
vising research students on reliability and survival analyses in different universities
for many years.
We want to thank D. N. Prabhakar Murthy, Kazuyuki Suzuki, Alireza Ahmadi,
N. Balakrishnan, D. Mitra, Shahariar Huda, and Rafiqul Islam Chowdhury for their
continued support to our work. We extend our deepest gratitude to Tahmina Sultana
Bhuiyan, Nafisa Tarannum, Raisa Tabassum, Tahmina Khatun, Jayati Atahar,
Amiya Atahar, Shainur Ahsan, and Adhip Rahman for their unconditional support
during the preparation of this manuscript. Further, we acknowledge gratefully
M. A. Basher Mian, M. Asaduzzaman Shah, M. Ayub Ali, M. Monsur Rahman,
M. Mesbahul Alam, Sabba Ruhi, Syed Shahadat Hossain, Azmeri Khan, Jahida
Gulshan, Israt Rayhan, Shafiqur Rahman, Mahfuza Begum, and Rosihan M. Ali for
their continued support.
We are grateful to the staff at Springer for their support. We like to thank our
Book Series Executive Editor William Achauer, Business & Economics, Springer
Singapore. We especially want to thank Sagarika Ghosh for her early interest and
encouragement and Nupoor Singh, Jennifer Sweety Johnson, and Jayanthi
Narayanaswamy who provided helpful guidance in the preparation of the book and
much patience and understanding during several unavoidable delays in completion
of the book.
xi
xii Contents
Md. Rezaul Karim obtained his Bachelor of Science and Master of Science
degrees in Statistics from the University of Rajshahi, Bangladesh, and his Doctor of
Engineering degree from the University of Electro-Communications, Tokyo, Japan.
For the last 24 years, he has been working at the Department of Statistics at the
University of Rajshahi, Bangladesh, where he is currently a Professor. He also
served as a visiting faculty at the Luleå University of Technology, Sweden. His
research interests include reliability analysis, warranty claim analysis, lifetime data
analysis, industrial statistics, biostatistics, and statistical computing. He has over 30
publications in statistics, reliability, warranty analysis, and related areas, and has
presented about 40 papers at numerous conferences and workshops in eight
countries. He is a coauthor of the book Warranty Data Collection and Analysis
(published by Springer in 2011) and has contributed chapters to several books. He
serves on the editorial boards of several journals including Communications in
Statistics, Journal of Statistical Research, International Journal of Statistical
Sciences, Journal of Scientific Research, and Rajshahi University Journal of
Science and Engineering. Further, he is a member of five professional associations.
xvii
Chapter 1
Reliability and Survival Analyses:
Concepts and Definitions
Abstract Both reliability and survival analyses are the specialized fields of math-
ematical statistics and are developed to deal with the special type of time-to-event
random variables. Reliability analysis includes methods related to assessment and
prediction of successful operation or performance of products. Nowadays, products
are appearing on the market with the assurance that they will perform satisfactorily
over its designed useful life. This assurance depends on the reliability of the product.
On the other hand, survival analysis includes statistical methods for analyzing the
time until the occurrence of an event of interest, where the event can be death, dis-
ease occurrence, disease recurrence, recovery, or other experience of interest. This
chapter introduces the basic concepts and definitions of some terms used extensively
in reliability and survival analyses. It also discusses the importance of reliability and
survival analyses and presents the outline of the book.
Both reliability and survival analyses are the specialized fields of mathematical statis-
tics and are developed to deal with the special type of time-to-event random variables
(lifetime, failure time, survival time, etc.).1 In the case of reliability, our concern is
to address the characteristics of survival times of products (item, equipment, compo-
nent, subsystem, system, etc.), whereas in the case of survival analysis, we address the
characteristics of lifetimes arising from problems associated with living organisms
(plant, animal, individual, person, patient, etc.). Hence, similar statistical techniques
can be used in these two fields due to the fact that the random variables of interest
in both fields have reasonable similarities in many respects. The theoretical devel-
opment and applications are primarily based on quite different foundations without
making use of these parallel but overlapping areas of similarities. However, it has
been felt by the researchers and practitioners of both the areas that they would be
benefited immensely if the statistical techniques of common interests can be shared
1 Sectionsof the chapter draw from the co-author’s (Md. Rezaul Karim) previous published work,
reused here with permissions (Blischke et al. 2011).
© Springer Nature Singapore Pte Ltd. 2019 1
M. R. Karim and M. A. Islam, Reliability and Survival Analysis,
https://doi.org/10.1007/978-981-13-9776-9_1
2 1 Reliability and Survival Analyses: Concepts and Definitions
conveniently. This is one of the compelling reasons to introduce reliability and sur-
vival analyses in a single book.
A salient feature of modern industrial societies is that new products are appearing
on the market at an ever-increasing pace. This is due to (i) rapid advances in technol-
ogy and (ii) increasing demands of customers, with each a driver of the other (Blischke
et al. 2011). Customers need assurance that a product will perform satisfactorily over
its designed useful life. This depends on the reliability of the product, which, in turn,
depends on decisions made during the design, development, and production of the
product. One way that manufacturers can assure customers of satisfactory product
performance is through reliability.
Reliability of a product conveys the concept of dependability and successful oper-
ation or performance. It is a desirable property of great interest to both manufacturers
and consumers. Unreliability (or lack of reliability) conveys the opposite (Blischke
et al. 2011). According to ISO 8402 (1986), reliability is the ability of an item to
perform a required function, under given environmental and operational conditions
and for a stated period of time. More technical definitions of reliability are given in
the next chapter.
The time to failure or lifetime of an item is intimately linked to its reliability, and
this is a characteristic that will vary from system to system even if they are identical
in design and structure (Kenett and Baker 2010). For example, if we use the same
automobile component in different automobiles and observe their individual failure
times, we would not expect them all to have the same failure times. The times to failure
for the components used in different automobiles would be different and be defined
by a random variable. The behavior of the random variable can be modeled by a
probability distribution which is a mathematical description of a random phenomenon
consisting of a sample space and a way of assigning probabilities to events. The basis
of reliability analysis is to model the lifetime by a suitable probability distribution
and to characterize the life behavior through the selected distribution. As mentioned
in Kenett and Baker (2010), reliability analysis enables us to answer questions, such
as:
(i) What is the probability that a unit will fail before a given time?
(ii) What percentage of items will last longer than a certain time?
(iii) What is the expected lifetime of a component?
Survival analysis is a branch of statistics that includes a set of statistical methods
for analyzing survival data where the outcome variable is the time until the occurrence
of an event of interest among living organisms. The event can be death, the occurrence
of disease, recurrence of disease, recovery from disease, etc. The time to event
popularly denoted as failure time or survival time can be measured in hours, days,
weeks, months, years, etc. For example, if the event of interest is a heart attack, then
the time to the event can be the time (in years/months/days) until a person experiences
a heart attack. Survival analysis enables us to answer questions, such as:
(i) What is the proportion of a population that will survive beyond a given time?
(ii) Among those who survive, at what rate will they die or fail?
1.1 Introduction to Reliability and Survival Analyses 3
This section defines some important terms2 used in reliability and survival analyses,
which are referred to throughout the book.
Object In this book by an object, we mean item, equipment, component, sub-
system, system, etc., among products, and plant, animal, individual, person, patient,
etc., among living organisms in an experiment/study. Sometimes, the term object is
referred to as unit of experiment/study as well.
Event In statistics, the event means the outcome of an experiment or a subset of
the sample space. In reliability, by event, we mean failure, warranty claims, recovery
(e.g., repair, replace, return to work/service), etc., or any designated experience of
interest that may happen to the unit being considered in the experiment. In the case
of survival analysis, by event, we mean death, occurrence or recurrence of disease,
recovery from disease, etc.
Time In both reliability and survival analyses, we can define time by the following
categories:
(i) Study period—the whole period of experiment or investigation, or more specif-
ically from the beginning to the end of an experiment or investigation,
3 Sometimes, the survival time refers to how long a specific object survived or will survive.
4 The detail on censored data is given in Chap. 4.
5 This excludes situations arising from preventive maintenance or any other intentional shutdown
period during which the system is unable to perform its required function.
1.2 Definitions of Some Important Terms 5
integers), and a continuous random variable can take on values from a set of possible
values which is uncountable (e.g., values in the interval (−∞, ∞)). Because the
outcomes are uncertain, the value assumed by a random variable is uncertain before
the event occurs. Once the event occurs, it assumes a certain value. The standard
convention used is as follows: X or Y or Z or T (upper case) represents the random
variable before the event, and the value it assumes after the event is represented by
x or y or z or t (lower case). For example, if we are interested in evaluating whether
an object survives for more than 5 years after undergoing cancer therapy, then the
survival time in years can be represented by the random variable T and small t equals
5 years. In this case, we then ask whether capital T exceeds 5 or T > t (Kleinbaum
and Klein 2012).
Details on some of these terms can be found in books on survival and/or reliability
analyses; see, for example, Jewell et al. (1996), Klein and Moeschberger (2003),
Kleinbaum and Klein (2012), and Moore (2016).
According to ISO 8402 (1994), a product can be tangible (e.g., assemblies or pro-
cessed materials) or intangible (e.g., knowledge or concepts), or a combination
thereof. A product can be either intended (e.g., offering to customers) or unintended
(e.g., pollutant or unwanted effects). A product can be classified in many different
ways. According to Blischke et al. (2011), common ways of classification can be as
follows:
• Consumer nondurables and durables products: These are products that are used
in households. Nondurables differ from durables in the sense that the life of a
nondurable item (e.g., food) is relatively short, and the item is less complex than
a durable item (e.g., television and automobile).
• Industrial and commercial products: These are products used in businesses for their
operations. The technical complexity of such products can vary considerably. The
products may be either complete units (e.g., trucks and pumps) or components
(e.g., batteries, bearings, and disk drives).
• Specialized products: Specialized products (e.g., military and commercial aircraft,
ships, rockets) are usually complex and expensive, often involve in the state-of-
the-art technology, and are usually designed and built to the specific needs of the
customer. An example of a more complex product is a large system that involves
several interlinked products, such as power stations, communication networks,
and chemical plants.
The complexity of products has been increasing with technological advances. As
a result, a product must be viewed as a system consisting of many elements and
capable of decomposition into a hierarchy of levels, with the system at the top level
6 1 Reliability and Survival Analyses: Concepts and Definitions
and parts at the lowest level (Blischke et al. 2011). There are many ways of describing
this hierarchy.6
In general, product performance is a measure of the functional aspects of a product.
It is a vector of variables, where each variable is a measurable property of the product
or its elements. The performance variables can be:
• Functional properties (e.g., power, throughput, and fuel consumption),
• Reliability-related properties (defined in terms of failure frequency, mean time to
failure (MTTF),7 etc.).
Products are designed for a specified set of conditions such as the usage mode,
usage intensity, and operating environment. When the conditions differ significantly
from those specified, the performance of the product is affected. Product performance
is also influenced by the skills of the operator and other factors (see Blischke et al.
2011).
Product reliability is determined primarily by decisions made during the early
stages (design and development) of the product life cycle, and it has implications
for later stages (marketing and post-sale support) because of the impact of unrelia-
bility on sales and warranty costs. It is important for the manufacturers to assess the
product reliability prior to launch of the product on the market. This generally can
be done based on limited information, such as data supplied by vendors, subjective
judgment of design engineers during the design stage, and data collected during the
development stage. However, the data from the field failures are needed to assess the
actual reliability and compare it with the design reliability or predicted reliability. If
the actual reliability is significantly lower than the predicted value, it is essential that
the manufacturer identifies the cause or causes emerging from design, production,
materials, storage, or other factors. Once this is done, actions can be initiated to
improve reliability. On the other hand, if the actual reliability is significantly above
the predicted value, then this information can be used to make changes to the mar-
keting strategy, such as increasing the warranty period and/or lowering the price that
will likely result in an increase in total sales (Blischke et al. 2011).
In today’s technological world, nearly everyone depends upon the continued func-
tioning of a wide array of complex machinery and equipment for their everyday
health, safety, mobility, and economic welfare (Dhillon 2007). Everyone expects the
products (cars, computers, electrical appliances, lights, televisions, etc.) to function
properly for a specified period of time. The results of the unexpected failure of the
product can result in unfavorable outcomes, such as financial loss, injury, loss of
life, and/or costly lawsuits. More often, repeated failure leads to loss of customer
satisfaction and the company’s goodwill. It takes a long time for a company to build
up a reputation for reliability and only a short time to be branded as “unreliable”
after shipping a flawed product (NIST 2019). Therefore, continual assessment of
new product reliability and ongoing control of the reliability of a product are a prime
necessity to the engineers and managers in today’s competitive business environment.
There are many possible reasons for collecting and analyzing reliability data
from both customer’s and manufacturer’s perspectives. Some of them as mentioned
in Meeker and Escobar (1998) are:
• Assessing characteristics of materials,
• Predicting product reliability in the design stage,
• Assessing the effect of a proposed design change,
• Comparing components from two or more different manufacturers, materials, pro-
duction periods, operating environments, and so on,
• Assessing product reliability in the field,
• Checking the veracity of an advertising claim,
• Predicting product warranty claims and costs.
On the other hand, over the past few decades, the statistical analysis of survival data
has become a topic of considerable interest to statisticians and workers in medicine
and biological sciences. Some possible reasons for survival analysis are:
• Estimating the time to event for a group of individuals, such as time until second
heart attack for a group of myocardial infarction (MI) patients,
• Comparing time to event between two or more groups, such as treatment group
versus placebo group of patients,
• Assessing the relationship between the lifetime and the covariates, such as does
treatment groups and Eastern Cooperative Oncology Group (ECOG) performance
status influence lifetime of patients?
Therefore, data collection, data analysis, and data interpretation methods for reli-
ability and survival data are important tools for those who are responsible for eval-
uating and improving the reliability of a product or system and analyzing survival
data for living organisms.
There are many sources of reliability data, and some of them are:
• Historical data,
• Vendor data,
• Research/laboratory test data,
• Handbook data,
• Field failure data/field service data,
• Warranty data,
• Customer support data.
For further discussion on these and other related issues, see MIL-HDBK 217E
(1986), Klinger et al. (1990), Ireson (1996), Meeker and Escobar (1998), and Pisani
et al. (2002).
There are some special features of survival and reliability data that distinguish them
from other types of data. These features include:
• Data are rarely complete, accurate, or without errors.
• Data are typically censored (exact failure times are not known).
• Usually, data are nonnegative values representing time.
• Generally, data are modeled using distributions for nonnegative random variables.
• Distributions and analysis techniques that are commonly used are fairly specific.
• In many instances, there may be corrupt and/or noisy data.
• Sometimes, data are affected by missing entries, missing variables, too few obser-
vations, etc.
• If there are multiple sources of data, incompatible data, data obtained at different
levels, then the reliability or survival analysis affected greatly.
• Distributions and analysis techniques that are commonly used are fairly specific.
• There are situations when all individuals do not enter the study or put on test at
the same time. This feature is referred to as “staggered entry.”
As indicated in the previous section, reliability and survival data have a number of
typical features. Therefore, extracting the maximum amount of information requires
special statistical analysis techniques, and the use of this information to make proper
and effective decisions requires building suitable models. The objectives of this
book are to present and unify fundamental and basic statistical models and methods
applied to both reliability and survival data analyses in one place from applications
and theoretical points of view. Almost all of the topics will be covered by thoroughly
1.7 Objectives of the Book 9
prepared examples using real data, with graphical illustrations and programming
codes. These examples deal with results of the analyses, interpretation of the results,
and illustrations of their usefulness.
likelihood functions under the schemes of different types of censoring and truncation
constructed in Chap. 4 will be applied in this chapter.
Chapter 7: Regression Models. In both reliability and survival analyses, regression
models are employed extensively for identifying factors associated with probability,
hazard, risk, or survival of units being studied. This chapter introduces some of
the regression models used in both reliability and survival analyses. The regression
models include logistic regression, proportional hazards, accelerated failure time,
and parametric regression models based on specific probability distributions.
Chapter 8: Generalized Linear Models. The concept of generalized linear
models has become increasingly useful in various fields including survival and relia-
bility analyses. This chapter includes the generalized linear models for various types
of outcome data based on the underlying link functions. The estimation and test
procedures for different link functions are also highlighted.
Chapter 9: Basic Concepts of System Reliability. A system is a collection of
components interconnected to a specific design in order to perform a given task.
The reliability of a system depends on the types, quantities, and reliabilities of its
components. This chapter discusses some basic ideas behind the analysis of the
reliability of a system. It derives the distribution and reliability functions of the
lifetime of the system as a function of the distribution or reliability functions of the
individual component lifetimes.
Chapter 10: Quality Variation in Manufacturing and Maintenance Decision.
Quality variations in manufacturing are one of the main causes of the high infant
(early) failure rate of the product. This chapter looks at the issues in modeling the
effect of quality variations in manufacturing. It models the effects of assembly errors
and component nonconformance. This chapter constructs the month of production—
month in service (MOP-MIS) diagram to characterize the claim rate as a function of
MOP and MIS. It also discusses the determination of optimum maintenance interval
of an object.
Chapter 11: Stochastic Models. In survival and reliability analyses, the role of
Markov chain models is quite useful in solving problems where transitions are
observed over time. It is very common in survival analysis that a subject suffer-
ing from a disease at a time point will recover at a later time. Similarly, in reliability,
a machine may change state from non defective to defective over time. This chapter
discusses the Markov chain model, Markov chain model with covariate dependence,
and Markov model for polytomous outcome data.
Chapter 12: Analysis of Big Data Using GLM. The application of the generalized
linear models (GLMs) to big data is discussed in this chapter using the divide and
recombine (D&R) framework. In this chapter, the exponential family of distributions
for binary, count, normal, and multinomial outcome variables and the corresponding
sufficient statistics for parameters are shown to have great potential in analyzing big
data where traditional statistical methods cannot be used for the entire data set.
In addition, an appendix provides the programming codes in R that are applied to
analyze data in different examples of the book.
References 11
References
Abstract There are a number of important basic functions extensively used in relia-
bility and survival data analyses. This chapter defines some of these functions that will
be applied in the later chapters. These include probability density function, cumula-
tive density function, reliability or survival function, hazard function, and mean life
function. This chapter also derives the interrelationships among these functions.
2.1 Introduction
This chapter discusses some of the most important functions used in reliability and
survival data analyses.1 These functions can be used to draw inferences regarding
various probabilistic characteristics of lifetime variable, such as
• Estimation of the number of failures that occur in a given period of time,
• Estimation of the probability of success of an object in performing the required
function under certain conditions for a specified time period,
• Estimation of the probability that an object will survive or operate for a certain
period of time after survival for a given period of time,
• Determination of the number of failures occurring per unit time, and
• Determination of the average time of operation to a failure of an object.
Under the parametric setup, some of these functions can be applied to extrapolate
to the lower or upper tail of the distribution of a lifetime variable. Their properties
are investigated either exactly or by means of asymptotic results. These functions
are interrelated, and if any of them are known, the others can be derived easily from
their interrelationship.
The outline of this chapter is as follows. Section 2.2 discusses the summary statis-
tics, including the measures of center, dispersion, and relationship. Section 2.3 defines
the density function and distribution function of a random variable. Section 2.4
defines reliability or survival function. Sections 2.5 and 2.6 discuss the conditional
reliability function and failure rate function, respectively. The mean life function
1 Sectionsof the chapter draw from the co-author’s (Md. Rezaul Karim) previous published work,
reused here with permissions (Blischke et al. 2011).
© Springer Nature Singapore Pte Ltd. 2019 13
M. R. Karim and M. A. Islam, Reliability and Survival Analysis,
https://doi.org/10.1007/978-981-13-9776-9_2
14 2 Some Important Functions and Their Relationships
and residual lifetime are presented, respectively, in Sects. 2.7 and 2.8. The fractiles
of a distribution are presented in Sect. 2.9. Section 2.10 deals with the relationship
among various functions.
The most common measures of the center of a sample (also called measures of
location) are the sample mean (or average) and median. The sample mean of T,
denoted as t¯, is the simple arithmetic average given by
1
n
t¯ = ti . (2.1)
n i=1
The sample mean is the preferred measure for many statistical purposes. It is the
basis for numerous statistical inference procedures and is a “best” measure for the
purpose of measuring the center value of a data set. However, this measure may be
affected by extreme values. In that case, we need to consider an alternative measure
of location.
For a finite set of observations, the sample median is the value that divides the
ordered observations into two equal parts. The observations belonging in the first part
are less than or equal to the median, and the observations belonging in the second
part are greater than or equal to the median (Islam and Al-Shiha 2018). The sample
median is the 0.50-fractile (t 0.50 ) or the second quartile (Q2 ). Qi means ith (i = 1, 2,
3) quartile and is the value of the random variable such that 25 × i percent or less
observations are less than Qi and (100 − 25 × i) percent or less observations are
greater than Qi .2 Median is a natural measure of location since at least 50% of the
observations lie at or above the median and at least 50% lie at or below the median.
As mentioned, the mean is sensitive to extreme values (outliers) in the data and due
to the presence of outliers it can provide a somewhat distorted measure of location.
In such cases, the median provides a more meaningful measure of the location as it
is not affected by the extreme values. If the sample is perfectly symmetrical about its
center, the mean and median become the same. If the mean and median are different,
this is an evidence of skewness in the data. If the median is less than the mean, the
data are skewed to the right, and if the median is greater than the mean, the data are
skewed to the left.
An approach to deal with the data having outliers is to compute a trimmed mean,
which is obtained by removing a fixed proportion of both the smallest and the largest
observations from the data and calculating the average of the remaining observations.
A few other measures are sometimes used. These include the mode and various other
2Q
i means ith quartile, and t p means p-fractile of a sample. More on quartile and fractile can be
found in Sect. 2.9.
16 2 Some Important Functions and Their Relationships
measures that can be defined as functions of fractiles, e.g., (Q3 − Q1 )/2, (t 0.90 −
t 0.10 )/2, and so forth.
√
The sample standard deviation is s = s 2 and is the preferred measure for most
purposes since it is in units of the original data.
A measure of variability sometimes used for describing data is the interquartile
range, denoted by IQR and defined by IQR = Q3 − Q1 , where Q1 and Q3 are the first
and third quartiles of the data, respectively. An advantage of the interquartile range
is that it is not affected by extreme values. A disadvantage is that it is not readily
interpretable as is the standard deviation. If the sample data are free from outliers
or extreme values, then a preferred and simple measure of dispersion is the range,
which is defined as range R = t (n) − t (1) .
Another useful measure of dispersion in some applications is the coefficient of
variation (CV), defined by CV = s/t¯. This measure is unit free and tends to remain
relatively constant over measurements of different types, for example, weights of
individuals over different biological species and fuel consumption of engines of very
different sizes.
When the data include two or more variables, measures of the relationship between
the variables are of interest. Here, we introduce two measures of strength of relation-
ship for two variables, the Pearson’s correlation coefficient r and a rank correlation
coefficient, r s .3
We assume a sample of bivariate data (x i , yi ), i = 1, …, n. The sample correlation
coefficient is given by
3 The subscript s is for Charles Spearman, who devised the measure in 1904.
2.2 Summary Statistics 17
n n n n
1
n−1 i=1 (xi − x̄)(yi − ȳ) 1 1
r= = xi yi − xi yi ,
sx s y (n − 1)sx s y i=1 n i=1 i=1
(2.3)
where sx and sy denote, respectively, the standard deviations of the variables, X and
Y. The numerator of (2.3), known as the sample covariance, can be used as a measure
of the relationship between two variables X and Y in certain applications.
The sample correlation coefficient, r, is the sample equivalent of the population
correlation coefficient, ρ, a parameter of the bivariate normal distribution, and as
such is a measure of the strength of linear relationship between the variables, with
ρ = 0 indicating no linear relationship. In the case of the bivariate normal distribution,
this is equivalent to the independence of the variables. Note that the correlation
coefficient is unit free. In fact, the ranges of ρ and r lie in the interval [−1, 1],
with the values −1 and +1 indicating that the variables are perfectly linear, with
lines sloping downward and upward, respectively. The general interpretation is that
values close to either extreme indicate a strong relationship and values close to zero
indicate a very little relationship between the variables.
An alternative measure of the strength of relationship is rank correlation. Rank
correlation coefficients are calculated by first separately ranking the two variables
(giving tied observations, the average rank) and then calculating a measure based
on the ranks. The advantage of this is that a rank correlation is applicable to data
down to the ordinal level and is not dependent on linearity. There are several such
coefficients. The most straightforward of these is the Spearman rank correlation r s ,
which is simply the application of (2.3) to the ranks. Note that rank correlation
can also be used to study trend in measurements taken sequentially through time.
In this case, the measurements are ranked and these ranks and the order in which
observations are taken can be used in the calculation of r s .
Another approach to the study of data relationships is the linear regression anal-
ysis, in which the linear relationship between the variables is explicitly modeled and
the data are used to estimate the parameters of the model. The approach is applicable
to nonlinear models as well. Further discussion on different regression models will
be discussed in Chaps. 7 and 8.
Graphical representation of data is also an important part of the preliminary anal-
ysis of data. The graphical representation of reliability and survival data includes
histogram, Pareto chart, pie chart, stem-and-leaf plot, box plot, and probability plot.
Detailed descriptions on these graphs are not given here. Additional details on the
above topics can be found in introductory statistics texts such as Ryan (2007) and
Moore et al. (2007), and reliability and biostatistics books such as Blischke and
Murthy (2000), Meeker and Escobar (1998), and Islam and Al-Shiha (2018). There
are many other graphical methods of representing both qualitative and quantitative
data. These are discussed in detail in Schmid (1983) and Tufte (1983, 1989, 1997).
18 2 Some Important Functions and Their Relationships
Example 2.1 Table 2.1 shows a part of the warranty claims data for an automobile
component (20 observations out of 498).4 The data are taken from Blischke et al.
(2011). For the purpose of illustration, the variables age (in days) and usage (in
km at failure) are considered here; however, the original data have more variables,
such as failure modes, type of automobile that used the component, and zone/region,
discussed in Chap. 7.
Let X and Y denote the variables age (in days) and usage (in km at failure),
n n
respectively.
n For the above data, we
n have n = 20, i=1 x i = 2759,n i=1 yi =
429, 987, i=1 xi = 539, 143, i=1 yi = 14, 889, 443, 757, and i=1 xi yi =
2 2
80, 879, 839. The calculated descriptive (or summary) statistics for the variables age
(X) and usage (Y ) are shown in Table 2.2.
4 Theinformation regarding the names of the component and manufacturing company is not dis-
closed to protect the proprietary nature of the information.
2.2 Summary Statistics 19
For both the variables, age and usage, the sample means (137.9 days and
21499 km) are greater than the respective medians (101.5 days and 16064 km), indi-
cating skewness to the right. The trimmed means for the variables, age and usage,
are obtained by removing the smallest 5% and the largest 5% of the observations
(rounded to the nearest integer) and then calculating the means of the remaining
observations for both variables. These trimmed means (132.2 days and 20592 km)
are still considerably larger than the medians, indicating real skewness, beyond the
influence of a few unusually large observations. Since the CV of usage (80.17%) is
greater than the CV of age (66.22%), the relative variability of the variable usage is
larger than the relative variability of the variable age. The correlation coefficient
between age and usage is 0.721, indicating a positive correlation between the two
variables. Note that these descriptive statistics are based on a small subsample of the
original data and hence need to be interpreted cautiously.
where θ denotes the set of parameters of the distribution function. It may be noted
here that in case of other variables, say X, if X lies between [−∞, ∞], then F(x; θ ) =
P{X ≤ x}, −∞ ≤ x ≤ ∞. However, as the reliability and survival functions used
time which is nonnegative-valued variable, hence (2.4) will be used frequently in
the subsequent chapters of the book. Often the parameters are omitted for notational
ease so that one uses F(t) instead of F(t; θ ). F(t) has the following properties with
respect to (2.4):
• 0 ≤ F(t) ≤ 1 for all t.
• F(t) is a nondecreasing function in t.
• lim F(t) = 0 and lim F(t) = 1.
t→0 t→∞
• For t 1 < t 2 , P{t 1 < T ≤ t 2 } = F(t 2 ) − F(t 1 ).
When T is a discrete random variable, it takes on at most a countable number of
values in a set (t 1 , t 2 , …, t n ), with n being finite or infinite, and the distribution function
of T, F(t i ) = P{T ≤ t i }, is a step function with steps of height pi = P{T = t i }, i = 1,
2, …, n, at each of the possible values t i , i = 1, 2, …, n.5 In reliability engineering,
5 As before, the parameters may be omitted for notational ease, so that pi is often used instead of
pi (θ).
20 2 Some Important Functions and Their Relationships
dF(t)
f (t) = (2.5)
dt
and the probability in the interval (t, t + δt] can be shown as
Example 2.2 Let X denote the number of customer complaints within a day for a
product, then X is a discrete random variable. Suppose that for a product, X takes
on the values 0, 1, 2, 3, 4, and 5 with respective probabilities 0.05, 0.15, 0.25, 0.30,
0.20, and 0.05. The probability mass function and the distribution function of X are
shown in Fig. 2.1.
In this example, the probability that the daily customer complaints 3 or more
equals to P(X ≥ 3) = 0.30 + 0.20 + 0.05 = 0.55. Therefore, the probability that the
number of complaints per day is 3 or more is 55%.
Example 2.3 If T denotes the failure times (measured in 100 days) of an electronic
device, then T is a continuous random variable, where the original time variable is
divided by 100. Figure 2.2 shows the hypothetical probability density functions and
cumulative density functions of T for three different types of devices, denoted by A,
B, and C.
2.3 Cumulative Distribution and Probability Density Functions 21
Fig. 2.1 Probability mass function (left side) and distribution function (right side) for the number
of customer complaints (X)
Fig. 2.2 Cumulative density functions (left side) and probability density functions (right side) for
the failure times of devices A, B, and C
The graph of the cdf for device B shows that F(2) = 0.87. This indicates that
87% of device B will fail within 200 days, whereas within the same age (200 days),
approximately 99 and 76% failures will occur for the devices A and C, respectively.
The graph of the cdfs indicates that approximately the same percent of cumulative
failures will occur for the three devices within the age about 113 days.
is the probability that the object will perform its intended function for a specified
time period when operating under normal (or stated) environmental conditions (Blis-
chke and Murthy 2000). In survival analysis, this probability is known as survival
probability.6
This definition contains four key components:
(i) Probability—The probability of the occurrence of an event. For example, a
timing chain might have a reliability goal of 0.9995 (Benbow and Broome
2008). This would mean that at least 99.95% are functioning at the end of the
stated time.
(ii) Intended function—This is stated or implied for defining the failure of an object.
For example, the intended function of the battery is to provide the required
current to the starter motor and the ignition system when cranking to start the
engine. The implied failure definition for the battery would be the failure to
supply the necessary current which prevents the car from starting.
(iii) Specified time period—This means the specified value of lifetime over the use-
ful life of the object measured in minutes, days, months, kilometers, number of
cycles, etc. For example, a battery might be designed to function for 24 months.
Sometimes, it is more appropriate to use two-dimensional time period; e.g., the
warranty period for a tire of a light truck might be stated in terms of first 2/32
in. of usable tread wear or 12 months from the date of purchase, whichever
comes first.
(iv) Stated environmental condition—These include environmental conditions,
maintenance conditions, usage conditions, storage and moving conditions, and
possibly other conditions. For example, a five-ton truck is designed to safely
carry a maximum of five tons. This implies that maximum five ton is a condition
of the usage environment for that truck.
The reliability function (or survival function) of the lifetime variable, T, denoted
by R(t) (or S(t)), where
is the probability that an object survives to time t. It is the complement of the cumu-
lative density function. It has the following basic properties:
• R(t) is a nonincreasing function in t, 0 ≤ t < ∞.
• R(0) = 1 and lim R(t) = 0 or R(∞) = 0.
t→∞
• For t 1 < t 2 , P{t 1 < T ≤ t 2 } = F(t 2 ) − F(t 1 ) = R(t 1 ) − R(t 2 ).
The hypothetical reliability functions corresponding to the cumulative density
functions for devices A, B, and C discussed in Example 2.3 are shown in Fig. 2.3.
Figure 2.3 shows the probability that the device A will survive more than 100 days
is R(t = 1) = P{T > 1} = 0.5. That is, 50% of the device A survives past 100 days.
6 Thismeans the probability of surviving an object (individual, person, patient, etc.) for a specified
period of time.
2.4 Reliability or Survival Function 23
This figure suggests that before age about 100 days, the reliability of the device C
is less than the reliability of device B and the reliability of device B is less than the
reliability of the device A, but they are in reverse order after the age about 120 days.
The conditional probability that the item will fail in the interval (a, a + t], given
that it has not failed prior to a, is given by
The failure rate function which is popularly known as hazard function, h(t), can be
interpreted as the probability that the object will fail in (t, t + δt] for small δt, given
that it has not failed prior to t. It is defined as
which is the ratio of the probability density function to the survivor function. The
hazard function is also known as the instantaneous failure rate, failure rate function,
force of mortality, force of decrement, intensity function, age-specific death rate, and
its reciprocal is known as Mill’s ratio in economics (Islam and Al-Shiha 2018). It
indicates the “proneness to failure” or “risk” of an object after time t has elapsed. In
other words, it characterizes the effect of age on object failure more explicitly than
cdf or pdf. h(t) is the amount of risk of an object at time t. It is a special case of the
intensity function for a nonhomogeneous Poisson process (Blischke et al. 2011).
The hazard function satisfies
• h(t) ≥ 0 for all t,
∞
• −∞ h(t) dt = ∞.
Based on hazard function, the lifetime distribution can be characterized in the
following three types:
• Constant failure rate (CFR): Probability of breakdown is independent of the age
or usage of the unit. That is, the unit is equally likely to fail at any moment during
its lifetime, regardless of how old it is.
• Increasing failure rate (IFR): Unit becomes more likely to fail as it gets older.
• Decreasing failure rate (DFR): Unit gets less likely to fail as it gets older.
The cumulative hazard function of the random variable T, denoted by H(t), is
defined as
t
H (t) = h(x)dx. (2.12)
0
H(t) is also called the cumulative failure rate function. Cumulative hazard function
must satisfy the following conditions:
• H(0) = 0.
• lim H (t) = ∞.
t→∞
2.6 Failure Rate Function 25
t2
1 H (t2 ) − H (t1 )
h̄(t1 , t2 ) = h(x)dx = , t2 ≥ t1 . (2.13)
t2 − t1 t2 − t1
t1
It is a single number that can be used as a specification or target for the population
failure rate over the interval [t 1 , t 2 ] (NIST 2019).
The hazard functions and cumulative hazard functions corresponding to the cumu-
lative density functions for devices A, B, and C discussed in Example 2.3 are shown
in Fig. 2.4. A hypothetical bathtub curve of hazard function is also inserted in the
plot of hazard functions (left side). The bathtub curve of hazard function comprises
three failure rate patterns, initially a DFR (known as infant mortality), followed by
a CFR (called the useful life or random failures), and a final pattern of IFR (known
as wear-out failures).
As illustrated in Fig. 2.4, the hazard functions for devices A, B, and C are, respec-
tively, initially increasing and then decreasing, constant, and decreasing. The figure
shows that for device A, the values of the cumulative hazard function at t = 2 and 3
are H(2) = 4.56 and H(3) = 8.99, respectively. Therefore, for the device A, h̄(2, 3)
= (8.99 − 4.56)/(3 − 2) = 4.43. This indicates that the average failure rate for the
device A over the interval [200, 300] days is 4.43.
Fig. 2.4 Hazard functions (left side) and cumulative hazard functions (right side) for the failure
times of devices A, B, and C
26 2 Some Important Functions and Their Relationships
The mean life function, which is also often called the expected or average lifetime
or the mean time to failure (MTTF), is another widely used function that can be
derived directly from the pdf. Mean time to failure describes the expected time to
failure of nonrepairable identical products operating under identical conditions. That
is, MTTF is the average time that an object will perform its intended function before
it fails. The mean life is also denoted by the mean time between failures (MTBF) for
repairable products.
With censored data, the arithmetic average of the data does not provide a good
measure of the center because at least some of the failure times are unknown. The
MTTF is an estimate of the theoretical center of the distribution that considers cen-
sored observations (Minitab 2019). If f (t) is the pdf of the random variable T, then
the MTTF (denoted by μ or E(T )) can be mathematically calculated by
∞
MTTF = E(T ) = μ = t f (t)dt. (2.14)
0
In the above expression, the term tS(t) tends to zero, because S(t) tends to zero
as t tends to infinity. Therefore, the first term of the right-hand side tends to zero,
yielding
∞
MTTF = S(t)dt (2.15)
0
Equation (2.15) indicates that when the failure time random variable, T, is defined
on [0, ∞], the MTTF is the area between S(t) and the t-axis. This can be applied to
compare different survival functions.
If a distribution fits the data adequately, the MTTF can be used as a measure of
the center of the distribution. The MTTF can also be used to determine whether a
b b b
u
b
7 Integration by parts means, e.g., a uvdx = u a vdx − a a vdx dx.
2.7 Mean Life Function 27
redesigned system is better than the previous system in the demonstration test plans
(Minitab 2019).
Given that a unit is of age t, the remaining life after time t is random. The expected
value of this random residual life is called the mean residual life (MRL) at time t
(Guess and Proschan 1988). MRL can be used in many fields, such as studying burn-
in, setting rates and benefits for life insurance, and analyzing survivorship studies in
biomedical research.
If X be a continuous random variable representing the lifetime of an object with
survival function S(x) and finite mean μ, the MRL is defined as
E (X − t)| X ≥t
m(t) = E(X − t|X ≥ t) =
P(T ≥ t)
∞
1
= (x − t) f (x)dx, for t > 0. (2.16)
S(t)
t
But
∞ ∞ ∞
(x − t) f (x)dx = f (x)dx du
u
t t
∞ ∞
= [1 − F(u)]du = S(u)du. (2.17)
t t
Therefore,
∞
1
m(t) = S(u)du, t ≥ 0. (2.18)
S(t)
t
It implies that the MTTF (2.15) is a special case of (2.18) where t = 0. Note that
the MTTF is a constant value, but the MRL is a function of the lifetime t of the
object. See Guess and Proschan (1988) for more information about the MRL.
28 2 Some Important Functions and Their Relationships
If the cdf F(y) is strictly increasing, then there is a unique value t p that satisfies
F(t p ) = p, and the estimating equation for t p can be expressed as t p = F −1 (p), where
F −1 (.) denotes the inverse function of the cumulative distribution function F(.). This
is illustrated in Fig. 2.5 as an example based on the cdf of device A. For p = 0.2, the
figure shows the value of t p = t 0.20 = 0.87, which means that 20% of the population
for device A will fail by 87 days.
In descriptive statistics, specific interests are in the 0.25-, 0.50-, and 0.75-fractiles,
called the quartiles, and denoted Q1 , Q2 , and Q3 , respectively.
If the failure probability p (or reliability) is 0.5 (50%), the respective fractile (or
percentile) is called the median lifetime. Median is one of the popular measures of
reliability. If one has to choose between the mean time to failure and the median time
to failure (as the competing reliability measures), the latter might be a better choice,
because the median is easier to perceive using one’s common sense and the statistical
estimation of median is, to some extent, more robust (Kaminskiy 2013). Fractiles
also have important applications in reliability, where the interest is in fractiles for
small values of p. For example, if t denotes the lifetime of an item, t 0.01 is the
time beyond which 99% of the lifetimes will lie. In accordance with the American
Bearing Manufacturers Association Std-9-1990, the tenth percentile is called L 10
life. Sometimes, it is called B10 , “B ten” life (Nelson 1990).
Example 2.4 For the variables age (in days) and usage (in km at failure) given in
Table 2.1, we calculate the 0.25-, 0.50-, and 0.75-fractiles, denoted by Q1 , Q2 , and
Q3 , respectively. Let us consider the variable age first and assume that the ordered
values of this variable are denoted by t (1) , t (2) , …, t (20) . Thus, t (1) = 16, t (2) = 44, and
so forth t (20) = 364. For the 0.25-fractiles, we have k = [0.25(20 + 1)] = 5, and d =
0.25, so t 0.25 or Q1 = 78 + 0.25(78 − 78) = 78 days. Similarly, Q2 = 101.5 days,
and Q3 = 169 days.
From the usage data of Table 2.1, we find Q1 = 7473 km, Q2 = 16,064 km,
and Q3 = 38,737 km. These calculated quartiles for both variables are also given in
Table 2.2.10
This section derives relationships among the functions f (t), F(t), R(t), h(t), and H(t).
These relationships are very useful in survival and reliability analyses in the sense
that if any of these functions is known, the other functions can be found easily.
From the previous sections, the following basic relationships are already known,
where we assumed that the lifetime random variable is defined on [0, ∞]:
d
f (t) = F(t). (2.20)
dt
10 Note that these statistics are based on a small subsample where the censored observations are not
considered.
30 2 Some Important Functions and Their Relationships
t
F(t) = f (x)dx. (2.22)
0
t
H (t) = h(x)dx. (2.24)
0
d
h(t) = H (t). (2.25)
dt
These relationships will be applied to derive other relationships as follows. Using
(2.20) and (2.21), we get
d d d
f (t) = F(t) = [1 − S(t)] = − S(t), t ≥ 0. (2.26)
dt dt dt
Equations (2.23) and (2.20) give
f (t) 1 d 1 d d
h(t) = = F(t) = − S(t) = − ln S(t). (2.27)
S(t) S(t) dt S(t) dt dt
This implies
t
ln S(t) = − h(x)dx
0
⎡ ⎤
t
or S(t) = exp⎣− h(x)dx ⎦ = exp[−H (t)]. (2.28)
0
t ∞
S(t) = 1 − F(t) = 1 − f (x)dx = f (x)dx, t ≥ 0. (2.29)
0 t
Table 2.3 Relationships among f (t), F(t), S(t), h(t), and H(t) assuming that the random variable,
T, is defined on [0, ∞]
f (t) F(t) S(t) h(t) H (t)
d
f (t) f (t) d
dt F(t) − dtd S(t) t dt H (t) exp[−H (t)]
h(t) exp − h(x)dx
0
Or,
Or,
⎡ ⎤
t
H (t) = − ln S(t) = − ln[1 − F(t)] = − ln⎣1 − f (x)dx ⎦, t ≥ 0. (2.34)
0
References
Benbow DW, Broome HW (2008) The certified reliability engineer handbook. American Society
for Quality, Quality Press
Blischke WR, Murthy DNP (2000) Reliability. Wiley, New York
32 2 Some Important Functions and Their Relationships
Blischke WR, Karim MR, Murthy DNP (2011) Warranty data collection and analysis. Springer,
London
Guess F, Proschan F (1988) Mean residual life: theory and applications. In: Krishnaiah PR, Rao
CR (eds) Handbook of statistics 7: quality control and reliability. Elsevier Science Publishers,
Amsterdam
Islam MA, Al-Shiha A (2018) Foundations of biostatistics. Springer Nature Singapore Pte Ltd.
Kaminskiy MP (2013) Reliability models for engineers and scientists. CRC Press, Taylor & Francis
Group
Klein JP, Moeschberger ML (2003) Survival analysis: techniques for censored and truncated data,
2nd edn. Springer, New York
Kleinbaum DG, Klein M (2012) Survival analysis: a self-learning text, 3rd edn. Springer, New York
Lawless JF (1982) Statistical models and methods for lifetime data. Wiley, New York
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley, New York
Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley Interscience, New
York
Minitab (2019) Minitab® support. https://support.minitab.com/en-us/minitab/18/. Accessed on 23
May 2019
Moore DF (2016) Applied survival analysis using R. Springer International Publishing
Moore DS, McCabe GP, Craig B (2007) Introduction to the practice of statistics. W H Freeman,
New York
Nelson W (1990) Accelerated testing: statistical models, test plans, and data analysis. Wiley, New
York
NIST (2019) Engineering statistics handbook, NIST/SEMATECH e-handbook of statistical meth-
ods. http://www.itl.nist.gov/div898/handbook/index.htm. Accessed on 23 May 2019
Ryan TP (2007) Modern engineering statistics. Wiley, New York
Schmid CF (1983) Statistical graphics. Wiley Interscience, New York
Tufte ER (1983) The visual display of quantitative information. Graphics Press, Cheshire, CT
Tufte ER (1989) Envisioning information. Graphics Press, Cheshire, CT
Tufte ER (1997) Visual explanations. Graphics Press, Cheshire, CT
Chapter 3
Probability Distribution of Lifetimes:
Uncensored
3.1 Introduction
the relationship F(αt; μ, σ1 ) = F(t; μ, σ2 ) for two values of the scale parameter σ 1
and σ 2 . A familiar example of a scale parameter is the failure rate (the reciprocal of the
mean) of the exponential distribution. The shape of a probability density function is
determined by the shape parameter and can be used to classify the probability density
function under a special type. A familiar example of a shape parameter is α (or β) of
the Weibull distribution, which determines whether the distribution follows the IFR,
DFR, or CFR property.
The outline of the chapter is as follows: Sect. 3.2 presents the exponential distri-
bution. Section 3.3 discusses the Weibull distribution, which can be applied to a wide
range of situations having monotonic failure rates commonly observed in survival
and reliability data analyses. Section 3.4 describes the extreme value distributions.
The normal and lognormal distributions are presented in Sect. 3.5.
The exponential distribution has been extensively used to model a wide range of
random variables including lifetimes of manufactured items, times between system
failures, arrivals in queue, interarrival times, and remission times. Just as the normal
distribution plays an important role in classical statistics, the exponential distribution
plays an important role in reliability and lifetime modeling since it is the only contin-
uous distribution with a constant hazard function. The exponential distribution has
often been used to model the lifetime of electronic components and is appropriate
when a used component that has not failed is statistically as good as a new component
(Ravindran 2009).
The probability density function of exponential distribution is as follows
where λ ≥ 0 is a scale parameter (often called the failure rate). It is also known as
one-parameter exponential distribution. We can obtain the cumulative distribution
function as
t
t
F(t) = λe−λτ dτ = λ −e−λτ /λ 0 = 1 − e−λt , t ≥ 0. (3.2)
0
f (t) λe−λt
h(t) = = −λt = λ. (3.4)
S(t) e
Fig. 3.1 pdf, cdf, reliability function, and hazard function of exponential distribution
36 3 Probability Distribution of Lifetimes: Uncensored
The MTTF is the population average or mean time to failure. In other words, a brand
new unit has this expected lifetime until it fails (Tobias and Trindade 2012). Hence
by definition,
∞ ∞
−λt
∞ ∞ 1
MTTF = tλe dt = −te−λt 0 − −e−λt dt = − e−λt /λ 0 = . (3.5)
λ
0 0
For a population with a constant failure rate λ, the MTTF is the reciprocal of that
failure rate or 1/λ. For this distribution, it can be shown that
1
Var (T ) = . (3.6)
λ2
Even though 1/λ is the average time to failure, it is not equal to the time when half
of the population will fail. For the entire population, the median is defined to be the
point where the cumulative distribution function first reaches the value 0.5 (Tobias
and Trindade 2012). The pth quantile, t p (discussed in Chap. 2), is the solution for
t p of the equation F t p = p, which implies
1
1 − e−λt p = p or t p = − ln(1 − p). (3.7)
λ
The median time to failure, t 0.5 , is obtained by putting p = 0.5 in Eq. (3.7). That
is,
1 1 1 ln (2) 0.693
Median = t0.5 = − ln (1 − 0.5) = − ln = . (3.8)
λ λ 2 λ λ
The median here is less than the MTTF, since the numerator is only 0.693 instead
of 1. In fact, when the time has reached the MTTF, we have
The constant failure rate is one of the characteristic properties of the exponential dis-
tribution, and closely related is another key property, the exponential lack of memory.
A component following an exponential life distribution does not “remember” how
long it has been operating. The probability that it will fail in the next hour of opera-
tion is the same if it were new, one month old, or several years old. It does not age
or wear out or degrade with time or use. Failure is a chance happening, always at
the same constant rate and unrelated to accumulated power-on hours (Tobias and
Trindade 2012).
The equation that describes this property states that the conditional probability
of failure in some interval of time of length h, given survival up to the start of that
interval, is the same as the probability of a new unit failing in its first h units of time,
which is
Proof We know that the cumulative distribution function is F(t) = 1 − e−λt , and
we can show that F(t + h) = 1 − e−λ(t+h) and F(h) = 1 − e−λh . Hence,
1 − F(h) 1 − 1 − e−λt
e−λt − e−λ(t+h)
= = 1 − e−λh = F(h).
e−λt
This proves the memoryless property of the exponential distribution.
to follow exponential distribution, and the application becomes very simple because
the mean time of failure and mean time between failures both can be represented by
1/λ.
Although the constant hazard function may not be ideal in many applications, still the
application of the exponential failure time distribution is manifold. To study wear-out
mechanisms, if the number of early failures is minimal or if we consider separately
(or if we consider to treat them separately), then the exponential distribution can be
considered a good initial choice for its simplicity and convenience of interpreting the
results. In many instances for analyzing components of a system, individual compo-
nents separately or product life may follow a constant failure rate where exponential
distribution provides very good insight into the possible choice of further strate-
gies. With the assumption of exponential failure time, strategies concerning sample
size, confidence level, precision, etc., can be very useful which may become either
intractable or very complex with other distributions. In that case, the exponential
distribution may provide an ideal initial strategy input necessary for the process of
planning experiments. However, in cases where the experiments are based on failure
times with increasing or decreasing hazard or failure rate, then the limitation of the
exponential distribution is obvious and an alternative lifetime distribution needs to
be considered.
Let T be a random variable that follows exponential distribution with pdf f (t) =
λe−λt , t ≥ 0. Then, the likelihood function, which is the joint probability distribution
of the data, expressed as a function of the parameter (λ) of the distribution and the
sample observations of size n, t 1 , t 2 , …, t n , is
n n
−λ ti
−λti
L= λe =λ e
n i=1 . (3.9)
i=1
n
ln L = n ln λ − λ ti (3.10)
i=1
and differentiating log likelihood with respect to λ, we can show the likelihood
equation
3.2 Exponential Distribution 39
n
n
∂ ln L
= − ti = 0. (3.11)
∂λ λ i=1
Solving the above equation, the maximum likelihood estimate of the parameter, λ,
is
n
λ̂ = n . (3.12)
i=1 ti
n
If we denote T = i=1 ti , then T is a sufficient statistic for λ, and since λt i ’s are
independent exponential variates, λT has a one-parameter gamma distribution with
index parameter n. Equivalently, 2λT ∼ χ(2n)2
.
where χ(2n),
2
p is the pth quantile of χ(2n) . Then,
2
2
P χ(2n),α/2 /(2T ) ≤ λ ≤ χ(2n),1−α/2
2
/(2T ) = 1 − α (3.13)
(i) What is the probability that the product will survive 10,000 h?
(ii) What is the probability that the product will survive the next 20,000 h?
(iii) What is the mean time to failure (MTTF)?
(iv) What is the median time to failure?
(v) At what point in time is it expected that 30% of the products will fail?
(vi) When will 63.2% fail?
(ii) The conditional survivor function for surviving another 20,000 h for a product
that has already survived 10,000 h can be obtained by using the conditional
survivor function (2.10):
S(a + t)
ST |T ≥a (t) =
S(a)
ln(1 − 0.3)
t0.3 = − = 1427 h.
0.00025
(vi) For this product, the mean time to failure is 4000 h and we know that the
probability of failure by mean time to failure is
This indicates that 63.2% of the products are expected to fail by the mean time
to failure, 4000 h.
As mentioned in Murthy et al. (2004), the Weibull distribution is named after Waloddi
Weibull (1887–1979) who was Swedish Engineer, Scientist, and Mathematician and
the first to promote the usefulness of this distribution to model data sets of widely
differing character. The initial study by Weibull (1939) appeared in a Scandina-
vian journal and dealt with the strength of materials. A subsequent study in English
(Weibull 1951) was a landmark work in which he modeled data sets from many differ-
ent disciplines and promoted the versatility of the model in terms of its applications
in different disciplines (Murthy et al. 2004).
The failure rate h(t) remains constant in an exponential model; however, in reality,
it may increase or decrease over time. In such situations, we need a model that may
take into account the failure rate as a function of time representing a change in failure
rate with respect to time. The exponential distribution fails to address this situation.
3.3 Weibull Distribution 41
We can define a distribution where h(t) is monotonic and this type of distribution is
known as the Weibull distribution. The Weibull distribution can be applied to a wide
range of situations having monotonic failure rates commonly observed in survival
and reliability data analyses.
The probability density function of the failure time T is said to be Weibull dis-
tributed with parameters β (>0) and η (>0) if the density function is given by
β t β−1 t β
f (t) = exp − , t ≥ 0. (3.14)
η η η
with shape parameter α and scale parameter λ. The above two forms of the probability
density functions are related to the relationship among the parameters as α = β and
λ = 1/η.
The cumulative distribution function can be obtained as follows:
t t
β τ β−1 τ β t β
F(t) = f (τ ) dτ = exp − dτ = 1 − exp − , t ≥ 0.
η η η η
0 0
(3.16)
Fig. 3.2 pdf, cdf, reliability function, and hazard function of Weibull distribution
t t β
β τ β−1 t
H (t) = h(τ ) dτ = dτ = , t ≥ 0. (3.19)
η η η
0 0
The pdf, cdf, reliability function, and hazard function of the Weibull distribution
are displayed graphically in Fig. 3.2 for the values of shape parameter β = 0.8, 1.0,
1.5 and scale parameter η = 1. The plot of hazard functions includes a DFR, CFR,
and IFR for the values of shape parameter, respectively, 0.8 (<1), 1.0 (=1.0), and 1.5
(>1.0).
Since its introduction of a statistical theory of the strength of material in 1939 (Weibull
1939) and then further providing a more comprehensive summary in 1951 in a paper
3.3 Weibull Distribution 43
For estimating the parameters of a Weibull distribution, we can use the likelihood
method. Let us consider a random sample of n failure times (T1 , . . . , Tn ) with
observed values (T1 = t1 , . . . , Tn = tn ). The likelihood function is
n
n
β ti β−1 ti β β n ti β−1 ti β
L= exp − = exp − (3.20)
i=1
η η η η i=1
η η
n n β
ti
ln L = n ln β − nβ ln η + (β − 1) ln ti − . (3.21)
i=1 i=1
η
The likelihood equations are obtained by differentiating the log likelihood function
with respect to the parameters, η and β, as shown below
∂ ln L nβ
t n β
=− +β i
β+1
=0 (3.22)
∂η η i=1
η
and
n
n β
∂ ln L n ti ti
= − n ln (η) + ln(ti ) − ln = 0. (3.23)
∂β β i=1 i=1
η η
Using Eq. (3.23), the MLE of the shape parameter β can be obtained by solving
the following equation
n
β̂
n
β̂ 1 1
n
ti ln(ti )/ ti − − ln(ti ) = 0. (3.25)
i=1 i=1 β̂ n i=1
and
2 1 2
Var(T ) = η 1 +
2
− 1+ , (3.27)
β β
The variance is
2 1 2
Var(T ) = η 1 +
2
− 1+
β β
2
2 1
= 40002 1 + − 1+ = 6, 011, 045. (3.29)
1.5 1.5
3.3 Weibull Distribution 45
5000 1.5
S(5000) = exp − = exp (−1.3975) = 0.2472.
4000
That is, the probability that the product will survive 5000 h is 0.2472.
(iii) The conditional survivor function for surviving another 2000 h for a product
that has already survived 5000 h can be obtained by using the conditional
survivor function (2.10):
S(a + t)
ST |T ≥a (t) =
S(a)
Fig. 3.3 Weibull probability plot with MLEs of Weibull parameters for the variable usage of an
automobile component
The MTTF for the Weibull distribution can be estimated by substituting the MLEs
of the parameters in the formula expressed in terms of the parameters of the distri-
bution as given in Eq. (3.26). The estimated MTTF is 23, 324.2 × (1 + 1/1.291)
= 21,572.2.
Note that the data in the Weibull probability paper (WPP) plot1 fall roughly along
a straight line. The roughly linear pattern of the data on Weibull probability paper
suggests that the Weibull distribution can be a reasonable choice (Blischke et al. 2011)
for modeling the usage variable in this application. As an alternative, the lognormal
distribution will be considered in analyzing this data set later.
The extreme value distribution is widely used in modeling lifetime data and is closely
related to the Weibull distribution. This distribution is extensively used for different
applications and referred to as the extreme value Type I or the Gumbel distribution.
There are two different forms of the extreme value Type I distribution based on: (i) the
smallest extreme value (minimum) and (ii) the largest extreme value (maximum). We
can show the extreme value distribution as a special case of the Weibull distribution.
1 The detail on probability plots can be found in Blischke et al. (2011) and Murthy et al. (2004).
3.4 Extreme Value Distribution 47
In the Weibull pdf (3.14), if we let X = ln T with μ = ln(η) and σ = 1/β, then the
pdf for the general form of the extreme value Type I or the Gumbel distribution for
minimum (also known as smallest extreme value distribution) becomes
1
f (x; μ, σ ) = exp (x − μ)/σ − exp{(x − μ)/σ } , −∞ < x < ∞ (3.30)
σ
where μ (−∞ < μ < ∞) is the location parameter and σ > 0 is the scale parameter.
It may be noted here that although the range includes negative lifetimes, if the choice
of location parameter is made such that μ is sufficiently large then the probability
of negative lifetimes becomes negligible. The standard Gumbel distribution for the
minimum is a special case where μ = 0 and σ = 1. The pdf of the standardized
Gumbel distribution for the minimum is
f (x) = exp x − exp(x) , −∞ < x < ∞. (3.31)
Similarly, the general form of the Gumbel distribution for the maximum value
(also known as largest extreme value distribution) is
1
f (x; μ, σ ) = exp −(x − μ)/σ − exp{−(x − μ)/σ } , −∞ < x < ∞.
σ
(3.32)
In this case also, μ (−∞ < μ < ∞) and σ > 0 are location and scale parameters,
respectively. Then, we obtain the standard Gumbel distribution for maximum (μ = 0
and σ = 1) as follows
f (x) = exp −x − exp(−x) , −∞ < x < ∞. (3.33)
The cumulative distribution functions for the general forms for minimum and max-
imum are shown below:
Minimum extreme value Type I: F(x) = 1 − exp − exp{(x − μ)/σ } , −∞ < x <
∞.
Maximum extreme value Type I: F(x) = exp − exp{−(x − μ)/σ } , −∞ < x <
∞.
48 3 Probability Distribution of Lifetimes: Uncensored
The survival/reliability functions for minimum and maximum extreme values are:
Minimum extreme value Type I: R(x) = S(x) = exp − exp{(x − μ)/σ } .
Maximum extreme value Type I: R(x) = S(x) = 1 − exp − exp{−(x − μ)/σ } .
The cumulative distribution functions of the standard Gumbel distributions for
minimum and maximum are:
Minimum extreme value Type I: F(x) = 1 − exp − exp(x) , −∞ < x < ∞.
Maximum extreme value Type I: F(x) = exp − exp(−x) , −∞ < x < ∞.
The survival/reliability functions of the standard Gumbel distributions for mini-
mum and maximum extreme values are:
Minimum extreme value Type I: R(x) = S(x) = exp − exp(x) , −∞ < x < ∞.
Maximum extreme value Type I: R(x) = S(x) = 1 − exp − exp(−x) , −∞ <
x < ∞.
The hazard functions for the general form of the Gumbel and standard Gumbel
distributions are shown below.
Gumbel (minimum): h(x) = σ1 exp[(x − μ)/σ ], −∞ < x < ∞.
Gumbel (maximum): h(x) = σ {exp exp[−(x−μ)/σ ]
, −∞ < x < ∞.
[exp(−(x−μ)/σ )]−1}
Standard Gumbel (minimum): h(x) = exp(x), −∞ < x < ∞.
Standard Gumbel (maximum): h(x) = exp[exp(−x)
exp(−x)]−1
, −∞ < x < ∞.
The pdf, cdf, reliability function, and hazard function of the smallest extreme value
distribution are displayed in Fig. 3.4 for the values of scale parameter (σ = 5, 6, 7)
and location parameter (μ = 50). This figure shows that the pdf is skewed to the
left (although most failure time distributions are skewed to the right). The exponen-
tially increasing hazard function suggests that this distribution would be suitable for
modeling the life of a product that experiences very rapid wear-out after a certain
age/usage. The distributions of logarithms of failure times can often be modeled with
the smallest extreme value distribution (Meeker and Escobar 1998).
Figure 3.5 shows the pdf, cdf, reliability function, and hazard function of the
largest extreme value distribution for the values of scale parameter (σ = 5, 6, 7) and
location parameter (μ = 10). This figure shows that the pdf is skewed to the right and
the hazard function is increasing but it is bounded in the sense that lim x→∞ h(x) =
1/σ. The largest extreme value distribution could be used as a model for the lifetime
if σ is small relative to μ > 0 (Meeker and Escobar 1998).
3.4 Extreme Value Distribution 49
Fig. 3.4 pdf, cdf, reliability function, and hazard function of smallest extreme value distribution
The likelihood function of the random variable with a Gumbel (minimum) probability
distribution is
n
1
L= exp (xi − μ)/σ − exp{(xi − μ)/σ }
i=1
σ
n
xi − μ
n
ln L = −n ln σ + − exp[(xi − μ)/σ ]. (3.34)
i=1
σ i=1
∂ ln L
Differentiating with respect to μ and solving ∂μ
= 0 for μ, we obtain the
maximum likelihood estimator of μ as
50 3 Probability Distribution of Lifetimes: Uncensored
Fig. 3.5 pdf, cdf, reliability function, and hazard function of largest extreme value distribution
1
xi
n
μ̂ = σ̂ ln e σ̂ . (3.35)
n i=1
∂ ln L
There is no closed-form solution for σ . The estimating equation for σ is ∂σ
=0
which can be simplified as shown below
n n
xi xi exp(xi /σ )
−σ − i=1
+ i=1
= 0. (3.36)
n n exp(μ/σ )
The mean and variance of minimum extreme value Type I distribution are
σ 2π 2
E(X ) = μ − σ γ and V (X ) = (3.37)
6
3.4 Extreme Value Distribution 51
where γ = 0.5772 is Euler’s constant. Similarly, the mean and variance of maximum
extreme value Type I distribution are
σ 2π 2
E(X ) = μ + σ γ and V (X ) = . (3.38)
6
The lognormal distribution has become one of the most popular lifetime models for
many high technology applications. In particular, it is very suitable for semiconductor
degradation failure mechanisms. It has also been used successfully for modeling
material fatigue failures and failures due to crack propagation (Tobias and Trindade
2012). It has been used in diverse situations, such as the analysis of failure times of
electrical insulation.
Many of the properties of the lognormal distribution can be investigated directly
from the properties of the normal distribution, since a simple logarithmic transfor-
mation transforms the lognormal data into normal data. So, we can directly use our
knowledge about the normal distribution and normal data to the study of lognormal
distribution and lognormal data as well.
The distribution is most easily specified by saying that the lifetime T is lognor-
mally distributed if the logarithm Y = ln T of the lifetime is normally distributed,
say with mean μ (−∞ < μ < ∞) and variance σ 2 > 0. The probability density
function of Y is therefore normal as shown below
1
e− 2σ 2 (y−μ) , −∞ < y < ∞,
1 2
f (y) = √ (3.39)
2π σ 2
and from this, the probability density function of T = eY is lognormal and found to
be
1
e− 2σ 2 (ln t−μ) , t > 0.
1 2
f (t) = √ (3.40)
t 2π σ 2
The survivor and hazard functions for the lognormal distribution involve the stan-
dard normal distribution function (Lawless 2003), where the cumulative distribution
function is
ln t t
1 − 2σ12 (x−μ)2 1
e− 2σ 2 (ln u−μ) du
1 2
F(t) = √ e dx = √ (3.41)
2π σ 2 u 2π σ 2
−∞ −∞
52 3 Probability Distribution of Lifetimes: Uncensored
Fig. 3.6 pdf, cdf, reliability function, and hazard function of lognormal distribution
as T = eY , F(t) = P(T < t) = P eY < t , and similarly F(t) = P(T < t) = P(Y
< ln t). The lognormal survival function/reliability function is S(t) = 1 − F(t), and
the hazard function is h(t) = f (t)/S(t), t > 0.
The pdf, cdf, reliability function, and hazard function of the lognormal distribution
are displayed in Fig. 3.6 for the values of scale parameter (σ = 0.3, 0.5, 0.8) and
location parameter (μ = 0). This figure shows that the pdf is skewed to the right.
The hazard function of the lognormal distribution starts at 0, increases to a point in
time, and then decreases eventually to zero.
n
1
e− 2σ 2 (ln ti −μ) .
1 2
L= √ (3.42)
i=1 ti 2π σ 2
n n
n
1
n
ln L = − ln(2π ) − ln σ 2 − ln ti − 2
(ln ti − μ)2 . (3.43)
2 2 i=1
2σ i=1
1
n
μ̂ = ln ti , (3.44)
n i=1
and
1
2
n
σ̂ 2 = ln ti − μ̂ . (3.45)
n i=1
1
2
n
s2 = ln ti − μ̂ . (3.46)
n − 1 i=1
t 2π σ 2
t
F(t) = F ln /σ , t > 0,
tM
t tM
S(t) = 1 − F(t) = 1 − F ln /σ = F ln /σ , t > 0,
tM t
and
2
− 2σ12 ln t t
√1 e
f (t) t 2πσ 2
M
h(t) = = tM , t > 0.
S(t) F ln t /σ
It is seen from the above expression that the hazard function tends to 0 as t → ∞.
This restricts the use of lognormal distribution for extremely large values of failure
times.
References 55
Fig. 3.7 Lognormal probability plot with MLEs of lognormal parameters for the variable usage of
an automobile component
Example 3.4 For purposes of illustration, the lognormal distribution will be con-
sidered here for analyzing the variable usage of an automobile component failure
data of Table 2.1. Estimates of the parameters of the lognormal distribution can be
obtained by solving Eq. (3.44) for μ and Eq. (3.45) for σ 2 . Instead, we may use the
Minitab software, which provides the output given in Fig. 3.7. From this, we obtain
the parameter estimates as μ̂ = 9.62047 and σ̂ = 0.891818 are, respectively, the
sample mean and sample standard deviation (with divisor n rather than n − 1) of
the data transformed to the log scale. The functions “mle()” and “survreg(Surv())”
are given in the R-libraries stats4 and survival, respectively, that can also be used to
find the MLEs of the parameters. The relationship between the parameters and the
MTTF for this distribution,
given in Eq. (3.47), is used to estimate this quantity. The
result is exp μ̂ + σ̂ 2 /2 = exp 9.62047 + 0.8918182 /2 = 22,429.6, as shown in
Fig. 3.7.
Note that the data appear to follow roughly a linear pattern in the lognormal plot.
It is noteworthy that the adjusted Anderson–Darling (AD*) value2 for the lognormal
distribution (0.998, in Fig. 3.7) is smaller than that of AD* value for the Weibull
distribution (1.115, in Fig. 3.3). Therefore, the AD* values indicate that the lognormal
distribution provides a better fit for the usage data of this example than the Weibull
distribution.
2 The detail on adjusted Anderson–Darling (AD*) can be found in Blischke et al. (2011) and Murthy
et al. (2004).
56 3 Probability Distribution of Lifetimes: Uncensored
References
Blischke WR, Karim MR, Murthy DNP (2011) Warranty data collection and analysis. Springer,
London
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley, New Jersey
Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley, New York
Murthy DNP, Xie M, Jiang R (2004) Weibull models. Wiley, New York
Ravindran AR (ed) (2009) Operations research applications. CRC Press, Taylor & Francis Group,
LLC
Tobias PA, Trindade DC (2012) Applied reliability, 3rd edn. CRC Press, Taylor & Francis Group
Weibull W (1939) A statistical theory of the strength of material. Ingeniors Vetenskapa Acadamiens
Handligar 151:1–45
Weibull W (1951) A statistical distribution function of wide applicability. J Appl Mech 18:293–296
Wolstenholme LC (1999) Reliability modelling: a statistical approach. Chapman and Hall/CRC
Chapter 4
Censoring and Truncation Mechanisms
Abstract Censoring and truncation are the special types of characteristics of time
to event data. A censored observation arises when the value of the random variable
of interest is not known exactly, that is, only partial information about the value is
known. In the case of truncation, some of the subjects may be dropped from the study
due to the implementation of some conditions such that their presence or existence
cannot be known. In other words, the truncated subjects are subjects to screening by
some conditions as an integral part of the study. This chapter presents the maximum
likelihood estimation method for analyzing the censored and truncated data.
4.1 Introduction
Time to event data present themselves in different ways which create special prob-
lems in analyzing such data (Klein and Moeschberger 2003). One peculiar feature,
generally present in time-to-event data, is known as censoring, which, broadly speak-
ing, occurs when in some cases, the exact time of occurrence of the desired event
is not known. In other words, the lifetime is known partially until the censoring
occurs in these cases. A censored observation arises when the value of the random
variable of interest is not known exactly, that is, only partial information about the
value is known. In addition to censoring, another source of incomplete lifetime data
is known as truncation. In the case of truncation, the observation is not considered
due to conditions implied in a study or an experiment.
The outline of the chapter is as follows: Sect. 4.2 defines various types of censor-
ing. Section 4.3 discusses the truncation of lifetime data. Construction of likelihood
functions for different types of censored data is explained in Sect. 4.4.
In order to handle censoring in the analysis, we need to consider the design which
was employed to obtain the reliability/survival data. Right censoring is very common
in lifetime data and left censoring is fairly rare.
If the exact value of an observation is not known but only known that it is greater
than or equal to time t c , then the observation is said to be right censored at t c . Right
censoring is more common in real-life situations. Generally, we observe the following
types of right-censored data:
(i) Type I censoring,
(ii) Type II censoring,
(iii) Progressive Type II censoring, and
(iv) Random censoring.
If we fix a predetermined time to end the study, then an individual’s lifetime will
be known exactly only if it is less than that predetermined value. In such situations,
the data are said to be Type I (or time) censored (Islam and Al-Shiha 2018). Type
I censoring arises in both survival and reliability analyses. Let T1 , . . . , Tn be inde-
pendently, identically distributed random variables each with distribution function
F. Let tc be some (preassigned) fixed number which we call the fixed censoring
time. Instead of observing T1 , . . . , Tn (the random variables of interest), we can only
observe t1 , . . . , tn where
Ti if Ti ≤ tc
ti = (4.1)
tc if Ti > tc , i = 1, 2, . . . , n.
(T1 , C1 ), . . . , (Tn , Cn ) where T is failure time and C is censoring time. We may now
define a new pair of variables (t, δ) for each item with
Ti if Ti ≤ Ci
ti = (4.3)
Ci if Ti > Ci
and
1 if Ti ≤ Ci
δi = (4.4)
0 if Ti > Ci .
If all the items in the experiment are considered to start at the beginning, and the
endpoint of the study is a prefixed time, tc , then all the failures that occur before
tc provide complete observations (uncensored) and the items not failing before tc
provide incomplete observations (censored). In case of the incomplete observations,
failure time might be greater than the prefixed time tc . In some experiments, the
items may not start at the same time or at the time of the beginning of the study. In
that case, there is no single prefixed time that can be applied to each item instead,
each item may have its own endpoint which can be set as the time of entry to time of
exit. As the time of entry varies for each item, the duration of stay in the experiment
varies for each item too. Hence, we need to observe whether Ti ≤ Ci indicating that
the failure time is completely known or Ti > Ci , i = 1, 2, …, n, indicating that only
partial information about censoring time is known.
It is observed that the lifetimes are complete for r items which are denoted by
t(1) , . . . , t(r ) but after the r-th failure, the lifetimes of (n − r) items are not known
except their time of censoring, T(r ) , as the experiment is terminated at that time.
60 4 Censoring and Truncation Mechanisms
Hence, the time obtained after the r-th failure can be shown as t(r +1) = t(r +2) =
· · · = t(n) = t(r ) . This experiment results in the smallest r complete and remaining
(n − r) incomplete observations. The complete observations are uncensored, and the
incomplete observations are termed as censored observations.
Example 4.1 This example is taken from Miller (1981). Both Type I and Type II
censoring arise in engineering applications. In such situations, there is a batch of
transistors or tubes, we put them all on test at t = 0, and record their times to failure.
Some transistors may take a long time to burn out, and we will not want to wait that
long to end the experiment. Therefore, we might stop the experiment at a prespecified
time, tc , in which case we have Type I censoring. If we do not know beforehand what
value of the fixed censoring time is good, so we decide to wait until a prespecified
fraction r/n of the transistors has burned out, in which case we have Type II censoring
(Miller 1981).
and
1 if Ti ≤ Ci
δi = (4.5)
0 if Ti > Ci .
4.2 Types of Censoring 61
Example 4.2 Random censoring arises in medical applications where the censoring
times are often random. In a medical trial, patients may enter the study in a more or
less random fashion, according to their time of diagnosis. We want to observe their
lifetimes. If the study is terminated at a prearranged date, then censoring times, that
is the lengths of time from an individual’s entry into the study until the termination
of the study, are random (Lawless 1982).
and
1 if Ti ≥ Cli
δi = (4.7)
0 if Ti < Cli .
Example 4.3 In early childhood learning centers, interest often focuses upon testing
children to determine when a child learns to accomplish certain specified tasks (Klein
and Moeschberger 2003). The time to event would be considered as the age at which
a child learns the task. Assume that some children are already performing the task at
the beginning of their study. Such event times are considered as left-censored data.
The interval censoring occurs if the exact time of failure or event cannot be observed
due to the observations taken only in intervals (L i , Ri ) where L i = starting time
point of the interval and Ri = end time point of the interval i. For example, let an
item be observed in the state functioning at the starting time point of interval i gives
the value of L i and the end time point of interval i at which status of an item is
observed (functioning/not functioning) and is denoted as Ri . In other words, failure
62 4 Censoring and Truncation Mechanisms
occurs only within an interval due to the fact that the observations are taken only at
specified times such as follow-up at times one year intervals. In that case at the last
follow-up, the response could be still functioning but at the subsequent follow-up
the response could be not functioning. The failure occurred in between the interval
(L i , Ri ) where only information known is that the failure time lies between L i and Ri
or L i < Ti < Ri . Such interval censoring occurs when patients visit in a clinical trial
or longitudinal study at specified intervals only, and the patient’s event time is only
known to fall in the specified interval. In the studies performed in reliability analysis
such as industries where observations are taken only at the time of inspections at
specified intervals may provide interval-censored data.
Example 4.4 In the Health and Retirement Study, the age at which a subject first
developed diabetes mellitus may not be known exactly due to the collection of data
after every two years. The incidence of the disease may occur any time between
the last follow-up when the subject was observed to be free from diabetes mellitus
and observed to be suffering from the disease for the first time at the subsequent
follow-up. The disease occurred during the interval of two years between the two
follow-ups. This observation is an example of interval censoring.
4.3 Truncation
subjects with delayed entry might be exposed to the event of interest during the study
but due to exclusion from the study, they are not considered causing left truncation.
Example 4.5 An example of left truncation is given in Balakrishnan and Mitra (2011)
referring to the data collected by Hong et al. (2009). In that study, Hong et al. (2009)
considered 1980 as the cutoff time for inclusion in the study on a lifetime of machines
due to the fact that detailed record keeping on the lifetime of machines started in
1980 and the detailed information on failure of machine could be observed only
after 1980 causing left truncation. The left-truncated machines had information on
the date of installation but no information was available on the date of failure prior
to 1980. Hence, if the machines were installed and failed prior to 1980, then left
truncation occurred because the failure time prior to 1980 cannot be known by the
experimenter.
Example 4.6 Right truncation is particularly related to the studies of (AIDS) acquired
immune deficiency syndrome. In a study on AIDS, if a subject is included in the
sample only after the diagnosis of AIDS, then the potential patient of AIDS who
was infected but had not developed or diagnosed with AIDS during the study period
results in right truncation. In this case, the subjects are included in the study only if
the subjects are diagnosed with AIDS before the end of the study period. Those who
were suffering from infection during the study period but would develop the disease
after the end of the study are right truncated. This may happen for diseases with long
duration of the incubation period.
A Type II censored sample is one for which only the r smallest observations in a
random sample of n (1 ≤ r ≤ n) items are observed (Lawless 1982). It should be
stressed here that with Type II censoring, the number of observations, r, is decided
before the data are collected. Let us consider a random sample of n observations,
(T1 , . . . , Tn ). The r smallest lifetimes are T(1) , . . . , T(r ) out of the random sample
of n lifetimes (T1 , . . . , Tn ). Let us consider that the failure times (T1 , . . . , Tn ) are
independently and identically distributed with probability density function f (t) and
survivor function S(t).
64 4 Censoring and Truncation Mechanisms
r
ln L = r ln λ − λ t(i) − λ(n − r )t(r ) .
i=1
r
r
∂ ln L
= − t(i) − (n − r )t(r ) = 0.
∂λ λ i=1
Solving for λ, we obtain the maximum likelihood estimator under Type II cen-
soring scheme as
r
λ̂ =
r . (4.10)
i=1 t(i) + (n − r )t(r )
n
n
L= Li = f (ti )δi S(ti )1−δi . (4.11)
i=1 i=1
n
−λti δi −λti 1−δi
n
−λ ti
L= λe e =λ e
r i=1 = λr e−λt
i=1
n
n
where r = i=1 δi and t = i=1 ti . The log likelihood function is
ln L = r ln λ − λt
∂ ln L r
= −t
∂λ λ
and equating to 0, we find the maximum likelihood estimator of λ
r
λ̂ = . (4.12)
t
The second derivative of the log likelihood function is
∂ 2 ln L r
= − 2.
∂λ 2 λ
The observed information is
∂ 2 ln L r
− = 2
∂λ2 λ
and the Fisher information is defined as
66 4 Censoring and Truncation Mechanisms
2
∂ ln L E(r )
E − = 2 . (4.13)
∂λ2 λ
In random censoring, we consider that each item may have both failure times
T1 , . . . , Tn with density function f (t) and survivor function S(t) and censoring times
C1 , . . . , Cn with probability density function f C (c) and survivor function SC (c). Let
us assume independence of failure time T and censoring time C and define the
following variables
Ti if Ti ≤ Ci
ti = (4.14)
Ci if Ti > Ci
and
1 if Ti ≤ Ci
δi = . (4.15)
0 if Ti > Ci
The likelihood contribution of the ith item for a pair of observations (ti , ci ) is
f (ti )SC (ti ) if Ti ≤ Ci
Li =
f C (ci )S(ci ) if Ti > Ci
implying
f (t) = h(t)S(t) if δ = 1
,
f C (t) = h C (t)SC (t) if δ = 0
respectively.
Hence, equivalently, the likelihood function can be expressed as shown below
4.4 Construction of Likelihood Function 67
n
n
L= h(ti )δi S(ti ) h C (ti )1−δi SC (ti ). (4.17)
i=1 i=1
The second product term of Eq. (4.17) does not involve any information about
event time and corresponding parameters of the underline distribution; hence, this
can be ignored under the assumption of independence of event time and censoring
time. If the second product term is ignored, then it reduces to the likelihood function
of Type I censoring discussed earlier
n
L= f (ti )δi S(ti )1−δi (4.18)
i=1
because
We have shown that the likelihood function from random censoring based on the
assumption that we are interested in the parameters of the failure time only (not in
the parameters of the censoring time) as well as failure time and censoring time are
independent, then the likelihood function can be expressed as
n
L= f (ti )δi S(ti )1−δi = f (td ) S(tr ). (4.20)
i=1 d∈D r ∈R
In the above formulation, D is the set of failure times and R is the set of right-
censored times. We have shown that a failure at time td is proportional to the proba-
bility of observing a failure at time td while only known information about tr is that
the right-censoring time tr is less than the true survival time Tr . If we include the
other two sources of censoring, left censoring and interval censoring, then the above
likelihood can be generalized in the following form
L= f (td ) S(tr ) [1 − S(tl )] [S(L i ) − S(Ri )] (4.21)
d∈D r ∈R l∈L i∈I
where L used in the right side of Eq. (4.21) denotes the set of left-censored time
and we know only the information about the corresponding failure time that the
left-censored time tl is greater than the true survival time Tl , I is the set of interval-
68 4 Censoring and Truncation Mechanisms
censored times and we know that L i < Ti < Ri which means that the event occurred
between (L i , Ri ).
Odell et al. (1992) provided the construction of likelihood for left, right, or interval
censoring using the following indicators:
n
L= f (ti )δ Ei F(tli )δli [1 − F(tri )]δri [F(tri ) − F(tli )]δ I i
i=1
n
L= f (ti )δ Ei [1 − S(tli )]δli [S(tri )]δri [S(tli ) − S(tri )]δ I i .
i=1
n
f (ti )
L=
i=1
1 − S(Tr ti )
f (ti )
where the probability of failure time Ti given Tr ti is 1−S(T r ti )
, in this case Ti is
observable only if Ti < Tr ti . We cannot observe the failure times that occur after the
truncation time Tr ti . In case of interval truncated, Ti is observed only if Ti < Tr ti or
Ti > Tlti .
There are several studies on the use of left truncation and right censoring. Using
the general likelihood, we obtain (Balakrishnan and Mitra 2013)
References 69
f (ti )
δi
1 − F(ti ) 1−δi
δi
L= f (ti ) [1 − F(ti )] ×
1−δi
i∈S1 i∈S2
1 − F(tlti ) 1 − F(tlti )
f (ti ) δi S(ti ) 1−δi
δi
= f (ti ) [S(ti )] 1−δi
× . (4.23)
i∈S i∈S
S(tlti ) S(tlti )
1 2
where S 1 and S 2 denote the index sets corresponding to the units which are not
left truncated and left truncated, respectively. Balakrishnan and Mitra (2011, 2012,
2014) have discussed in detail the fitting of lognormal and Weibull distributions to
left truncated and right censored data through the Expectation–Maximization (EM)
algorithm.
Example 4.9 This example is taken from Balakrishnan and Mitra (2011). Let us
consider a lifetime variable T follows the lognormal distribution with parameters
μ and σ . The probability density function is
1 (ln t−μ)2
f (t) = √ e− 2σ 2 , t > 0
σ t 2π
1 yi − μ δi
yi − μ 1−δi
L= f 1− F
i∈S1
σ σ σ
yi −μ δi yi −μ 1−δi
1
f 1 − F
σ
× tσlti −μ tltiσ−μ (4.24)
i∈S
1 − F σ
1 − F σ
2
n
1 yi − μ
ln L(μ, σ ) = −δi ln σ − δi 2 (yi − μ) + (1 − δi ) ln 1 − F
2
i=1
2σ σ
tlti − μ
− ln 1 − F (4.25)
i∈S
σ
2
where μ and σ are location and scale parameters, respectively, f (·) and F(·) are
probability density and cumulative distributions of the standard normal distribution,
respectively, δi = 0 for right censored and δi = 1 for uncensored, tlti is the left-
truncation time, S1 is the index set for not left truncated and S2 is the index set for
left truncated.
70 4 Censoring and Truncation Mechanisms
References
Balakrishnan N, Mitra D (2011) Likelihood inference for lognormal data with left truncation and
right censoring with an illustration. J Stat Plan Infer 141:3536–3553
Balakrishnan N, Mitra D (2012) Left truncated and right censored Weibull data and likelihood
inference with an illustration. Comput Stat Data Anal 56:4011–4025
Balakrishnan N, Mitra D (2013) Likelihood inference based on left truncated and right censored
data from a gamma distribution. IEEE Trans Reliab 62:679–688
Balakrishnan N, Mitra D (2014) Some further issues concerning likelihood inference for left trun-
cated and right censored lognormal data. Commun Stat Simul Comput 43:400–416
Hong Y, Meeker WQ, McCalley JD (2009) Prediction of remaining life of power transformers based
on left truncated and right censored lifetime data. Ann Appl Stat 3:857–879
Islam MA, Al-Shiha A (2018) Foundations of biostatistics. Springer Nature Singapore Pte Ltd
Klein JP, Moeschberger ML (2003) Survival analysis: techniques for censored and truncated data,
2nd edn. Springer, New York
Lawless JF (1982) Statistical models and methods for lifetime data. Wiley, New Jersey
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley, New Jersey
Miller RG Jr (1981) Survival analysis. Wiley, New York
Odell PM, Anderson KM, D’Agostino RB (1992) Maximum likelihood estimation for interval-
censored data using a Weibull-based accelerated failure time model. Biometrics 48(3):951–959
Chapter 5
Nonparametric Methods
Abstract This chapter discusses the nonparametric approach for analyzing relia-
bility and survival data. It explains the nonparametric approach to inference based
on the empirical distribution function, product-limit estimator of survival function,
warranty claims rate, etc. This chapter also deals with the hypothesis tests for com-
parison of two or more survival/reliability functions. Examples are given to illustrate
the methodology.
5.1 Introduction
Data analysis begins with the use of graphical and analytical approaches in order
to gain insights and draw inferences without making any assumption regarding the
underline probability distribution that is appropriate for modeling the data (Blis-
chke et al. 2011). Nonparametric methods play an important role for analyzing the
data. These methods provide an intermediate step toward building more structured
models that allow for more precise inferences with a degree of assurance about the
validity of model assumptions. As such, nonparametric methods are also referred to
as distribution-free methods. This is in contrast to parametric and semiparametric
methods (given in the next chapters), which begin with a probabilistic model and
then carry out the analyses as appropriate for that model.
The ability to analyze data without assuming an underlying life distribution avoids
some potential errors that may occur because of incorrect assumptions regarding the
distribution (Blischke et al. 2011). It is recommended that any set of reliability and
survival data first be subjected to a nonparametric analysis before moving on to
parametric analyses based on the assumption of a specific underlying distribution.
This chapter deals with some of the common methods used for the nonparametric
analysis of data. It includes a number of examples to illustrate the methods.1
The outline of this chapter is as follows: Sect. 5.2 discusses the empir-
ical distribution function. Section 5.3 explains the Product-Limit estimator
of survival function. Section 5.4 deals with the nonparametric estimation of
language (http://cran.r-project.org/) will be used mainly in performing the analyses in this book.
© Springer Nature Singapore Pte Ltd. 2019 71
M. R. Karim and M. A. Islam, Reliability and Survival Analysis,
https://doi.org/10.1007/978-981-13-9776-9_5
72 5 Nonparametric Methods
One of the key tools for investigating the distribution underlying the data is the
sample equivalent of F(t), denoted by F̂(t), and called the empirical cumulative
distribution function (ecdf) or empirical distribution function (edf) (Blischke et al.
2011). Its value at a specified value of the measured variable is equal to the proportion
of sample observations that are less than or equal to that specified value. The ecdf
plots as a “step function,” with steps at observed values of the variable. The form of
the function depends on the type of population from which the sample is drawn. On
the other hand, the procedure is nonparametric in the sense that no specific form is
assumed in calculating the ecdf (Ben-Daya et al. 2016).
The calculation of ecdf depends on the type of available data as discussed below
(Blischke et al. 2011).
In this case, the data are given by t1 , t2 , . . . , tn which are observed values of inde-
pendent and identically distributed (iid) real-valued random variable. The ecdf is
obtained as follows:
1. Order the data from the smallest to the largest observations. Let the ordered
observations are t(1) ≤ t(2) ≤ · · · ≤ t(n)
2. Compute
# of observations ≤ t(i) 1
n
F̂ t(i) = = I t( j) ≤ t(i) , i = 1, 2, . . . , n (5.1)
n n j=1
where I is the indicator function, namely I t( j) ≤ t(i) is one if t( j) ≤ t(i) and zero
otherwise.2
In other words, the value of the ecdf at a given point t (i) is obtained by dividing
the number of observations that are less than or equal to t (i) by the total number of
observations in the sample.
For any fixed real value t, it can be shown that the random variable n F̂(t) has a
binomial distribution with parameters n and success probability F(t), where F(t) is
2 Sometimes (n + 1) is used as the divisor rather than n in the Step 2 (Makkonen 2008).
5.2 Empirical Cumulative Distribution Function 73
Fig. 5.1 Empirical cdfs for age (left side) and usage (right side)
For censored data, we look only at right-censored case since that is the most common
type of censoring found in many reliability and survival applications. Detail discus-
sion about censored data is presented in Chap. 4. To calculate ecdf, the observations
are ordered, including both censored and uncensored values in the ordered array.
Suppose that m observations in the ordered array are uncensored. Denote these as
t1 , t2 , . . . , tm . These are the locations of the steps in the plot of the ecdf. To deter-
mine the heights of the steps, for i = 1, …, m, form the counts ni = number at risk
3 As before, the results for age and usage are based on a small subsample of the original data.
74 5 Nonparametric Methods
(the number of observations greater than or equal to ti in the original ordered array),
and d i = number of values tied at ti (=1 if the value is unique), then calculate the
“survival probabilities” (Blischke et al. 2011) as
d1 di
S t1 = 1 − and S ti = 1 − S ti−1 , i = 2, . . . , m. (5.2)
n1 ni
Then, the corresponding ecdf becomes F ti = 1 − S ti , i = 1, 2, . . . , m.
Note This procedure for censored data may also be applied to grouped data. Since
this is the sample version of F(t), it may be used to estimate the true cdf. In this
context, the ecdf is generally known as the Kaplan–Meier estimator (Meeker and
Escobar 1998, Sect. 3.7).
Kaplan and Meier (1958) derived the nonparametric estimator of the survival function
for censored data which is known as the product-limit (PL) estimator. This estimator
is also widely known as the Kaplan–Meier (KM) estimator of the survival function.
Nonparametric estimation of the survival function for both complete and censored
data is discussed in Lawless (1982).
Suppose that there are observations on n individuals and that there are k (k ≤ n)
distinct times t 1 < t 2 < ··· < t k at which deaths/failures occur. Or, suppose the time
line (0, ∞) is portioned into k + 1 intervals as (t 0 , t 1 ], (t 1 , t 2 ], …, (t i−1 , t i ], …, (t k ,
t k+1 ], where t 0 = 0 and t k+1 = ∞. Let d i denote the number of units that died/failed
at in the ith interval (t i−1 , t i ] and r i represent the number of units that survive interval
i and are right-censored at t i , i = 1, 2, …, k. Then, the size of the risk set (number of
units that are alive) at the beginning of interval i is
i−1
i−1
ni = n − dj − r j , i = 1, 2, . . . , k, (5.3)
j=0 j=0
As for any two events A and B, P{A and B} = P{ A|B} × P{B}, hence
Ŝ(t) = P T > t j |T > t j−1 × P T > t j−1
= P T > t j |T > t j−1 × P T > t j−1 |T > t j−2 × P T > t j−2
= P T > t j |T > t j−1 × P T > t j−1 |T > t j−2 × P T > t j−2 |T > t j−3
× · · · × P{T > t2 |T > t1 } × P{T > t1 |T > t0 } × P{T > t0 }
= 1 − p̂ j × 1 − p̂ j−1 × 1 − p̂ j−2 × · · · × 1 − p̂1 × 1 − p̂0
where t0 = 0, tk+1 = ∞ and p̂0 = 0
dj
= 1 − p̂ j = 1− (5.4)
j : t <t j : t <t
nj
j j
This is known as the Kaplan–Meier estimator of the survival function S(t).4 The
nonparametric estimator of F(t) is obtained by using the Kaplan–Meier estimator of
the survival function. The result is
4 Kaplan and Meier (1958) allowed the width of the interval (t i−1 , t i ], i = 1, 2, …, k, to approach 0
and the number of intervals to approach ∞ (Meeker and Escobar 1998).
76 5 Nonparametric Methods
This is called Greenwood’s formula. This formula can also be obtained by using the
popular delta method.
Meeker and Escobar (1998, p. 54) discussed the estimation method for confidence
intervals for F(t) using the normal-approximation of point estimator of F(t). By
using the logit transformation, they showed that two-sided 100(1 − α)% confidence
intervals for F(t) can be calculated as
⎡ ⎤
⎣ F̂(t) F̂(t)
, ⎦ (5.9)
F̂(t) + 1 − F̂(t) × w F̂(t) + 1 − F̂(t) /w
where w = exp z (1−α/2) s ê F̂(t) / F̂(t) 1 − F̂(t) and s ê F̂(t) = V F̂(t) .
The PL estimate Eq. (5.4) possesses a number of important properties. It is a con-
sistent estimator of S(t) under quite general conditions and a nonparametric maxi-
mum likelihood estimator of S(t) (Lawless 1982; Meeker and Escobar 1998). The PL
estimate of survival function can be used to estimate the cumulative hazard function
for censored data. A nonparametric estimator of the cumulative hazard function was
derived by Nelson (1969, 1972, 1982) and Aalen (1976).
Example 5.2 The lifetime data comprising of both censored and uncensored obser-
vations on a sample of 54 batteries are given in Table 5.1. The data include failure
times (in days) for 39 items that failed under warranty and service times for 15 items
5.3 Product-Limit Method 77
that had not failed at the time of observation. That is, the data are right-censored with
39 failures and 15 censored units.5
Table 5.2 illustrates numerical computations for obtaining nonparametric esti-
mates of S(t) and F(t). In this table, the column with heading “status” indicates
whether the corresponding t i s are failure or censored, where 1 indicates failure and
0 indicates censored. To make the table, short the calculations for 41 rows (from row
No. 8 to No. 48) are not shown in the table.
The function survfit,6 given in R- and S-plus software provides Kaplan–Meier
estimate of the survival function. Figure 5.2 shows the Kaplan–Meier estimate of
the survival function (solid line) for the battery failure data. The dashed lines are the
95% confidence intervals for the survival function. The Ŝ(t) values given in Table 5.2
coincide with the estimate shown in Fig. 5.2.
Table 5.2 and Fig. 5.2 show the decreasing step function of the estimated survival
function. The survival function drops at the values of the observed failure times and is
constant between observed failure times. We can see from Table 5.2 and Fig. 5.2 that
about 11.38% of the batteries are estimated to survive until 1100 days. The estimated
95% confidence interval for Ŝ(t) at 1100 days is (0.047, 0.276).
Table 5.2 Calculations for the nonparametric estimates of S(t i ) and F(t i ) for battery failure data
i Time (t i ) Status di ri ni p̂i 1 − p̂i Ŝ(ti ) F̂(ti ) V̂ Ŝ(ti )
Fig. 5.2 Plot of the nonparametric estimate of S(t) for battery failure data with 95% confidence
interval
Age-based (or age-specific) failure rate (or claim rate) estimation is used for assessing
the reliability of a product as a function of the age of the product. Many factors
contribute to product failures that result in warranty claims. One of the most important
factors is the age of the product (Karim and Suzuki 2005). Age is calculated by the
service time measured in terms of calendar time since the product is sold or entered
in service. The age-based analysis of product failure data has generated considerable
interest in the literature (Kalbfleisch et al. 1991; Kalbfleisch and Lawless 1996;
Lawless 1998; Karim et al. 2001; Suzuki et al. 2001), and a number of approaches
have been developed with regard to addressing the age-based analysis of warranty
claims data. Age-based analysis of claim data forms the basis for estimating and
predicting warranty claims rates and comparing claim rates among different product
groups and/or different production periods.
To estimate the age-based claim rate, we use the following notations. Let N i be
the number of products sold in the ith month for i = 1, 2, …, I, where i is the number
of months of sale (MOS). Let r ij be the number of products sold in the ith month
which failed in jth months, j = 1, 2, …, J, where j is the number of observed months
and J ≥ I. Also, let W be the warranty period and r j be the counts of claims occurring
in the jth month where
80 5 Nonparametric Methods
min
(I, J )
rj = ri j , j = 1, 2, . . . , J. (5.10)
j=max (1, j−W +1)
The structure of the monthly counted warranty claims for different MOS is shown
in Table 5.3.
Let nit be the number of items from MOS i that fail at age t or month in service
(MIS) t, t = 1, 2, …, min(W, J). The r ij can be expressed in terms of nit as ni,t −i+1
= r it , i = 1, 2, …, I and t = 1, 2, …, min(W, J), and Table 5.3 can be rearranged as
Table 5.4.
The age-based number of warranty claims, WC(t) or nt , can be calculated as
min (I,
J −t+1)
WC(t) = n t = n it , t = 1, 2, . . . , min (W, J ). (5.11)
i=1
min (I,
J −t+1)
RC1 (t) = Ni , t = 1, 2, . . . , min (W, J ) (5.12)
i=1
and for the one-dimensional pro-rata warranty (PRW) policy (with refund)7 it is given
by
⎧
⎪
⎪ J −t+1)
min (I,
⎪
⎨ Ni , if t = 1
i=1
RC2 ( t) = min (I, . (5.13)
⎪
⎪ J −t+1) J −i)
min (t−1,
⎪
⎩ Ni − n iu , if t > 1
i=1 u=1
7 In case of a free-replacement warranty (FRW) policy, the seller agrees to repair or provide replace-
ments for failed items free of charge up to a time W from the time of the initial purchase. In the
case of a pro-rata warranty (PRW) policy, the seller agrees to refund an amount α(T )C s if the item
fails at age T prior to time W from the time of purchase, where C s is the sale price and α(T ) is a
non-increasing function of T, with 0 < α(T ) < 1 (Murthy and Jack 2014).
Table 5.3 Monthly counted warranty claims {r ij } for different MOS
MOS (i) Ni Warranty claims in a calendar time (month, j)
1 2 … W W +1 … I I +1 … J
5.4 Age-Based Failure Rate Estimation
1 N1 r 11 r 12 … r 1W
2 N2 r 22 … r 2W r 2,W +1
.. .. .. ..
. . . . … … … … …
.. .. .. ..
I NI . . … . . … r II r I,I+1 … r IJ
{r j } r1 r2 … rW r W +1 … rI r I+1 … rJ
Note I is the total number of months of sale; J is the number of observed months (I ≤ J); W is the length of the warranty period; in this table W < J; however,
W ≥ J also be possible
81
82 5 Nonparametric Methods
Table 5.4 Age-based count of warranty claims {nit } for different MOS
MOS (i) Ni Warranty claims at age t (in month)
1 2 … W
1 N1 n11 n12 … n1W
2 N2 n21 n22 … n2W
.. .. .. .. ..
. . . . … .
I NI nI1 nI2 … nIW
{nt } n1 n2 … nW
Table 5.6 Age-based count of warranty claims {nit } for different MOS
MOS (i) Ni Warranty claims at age t (in month)
1 2 3 W =4
1 N 1 = 250 n11 = 2 n12 = 4 n13 = 4 n14 = 6
2 N 2 = 200 n21 = 1 n22 = 2 n23 = 3 n24 = 4
I =3 N 3 = 150 n31 = 0 n32 = 1 n33 = 2
{nt } n1 = 3 n2 = 7 n3 = 9 n4 = 10
min
(3,5)
RC1 (1) = Ni = N1 + N2 + N3 = 600;
i=1
min
(3,2)
RC1 (4) = Ni = N1 + N2 = 450;
i=1
⎧ ⎫
min
(3,4) ⎨ min
(1,5−i) ⎬
RC2 ( 2) = Ni − n iu = {N1 − n 11 } + {N2 − n 21 } + {N3 −n 31 }
⎩ ⎭
i=1 u=1
Using Eqs. (5.12) and (5.13), and Table 5.6, we obtain the first four columns of
Table 5.7.
5.4 Age-Based Failure Rate Estimation 83
Table 5.7 Estimation of WCR1 (t) and WCR2 (t) for each t
Age (t) RC1 (t) RC2 (t) nt WCR1 (t) WCR2 (t)
1 600 600 3 0.00500 0.00500
2 600 597 7 0.01167 0.01173
3 600 590 9 0.01500 0.01525
4 450 434 10 0.02222 0.02304
The age-based warranty claims rate (WCR) analysis examines claims as a fraction
of the units still under warranty. Here, we define WCR in two ways. The first is
WC (t) nt
WCR1 (t) = = , t = 1, 2, . . . , min (W, J ). (5.14)
RC1 (t) RC1 (t)
Note that WCR1 is the ratio of the total number of claims for period t and the
number of items under warranty prior to that period for FRW policy. The second
definition is
WC (t) nt
WCR2 (t) = = , t = 1, 2, . . . , min (W, J ), (5.15)
RC2 (t) RC2 (t)
which is the ratio of the total number of claims for period t and the number of items
under warranty prior to that period for PRW policy. The estimates of WCR1 (t) and
WCR2 (t) for the data shown in Table 5.5 are obtained by using Eqs. (5.14) and (5.15),
respectively, and are given in Table 5.7.
Table 5.7 indicates that WCR for FRW policy, WCR1 (t), is smaller than that of
PRW policy, WCR2 (t).
We are often interested in assessing whether there are differences in survival curves
among different groups of individuals. For example, in a clinical trial with a sur-
vival outcome, we might be interested in comparing survival between participants
receiving a new drug as compared to a placebo that is made to resemble drugs but
does not contain an active drug. In an observational study, one might be interested
in comparing survival between men and women or between participants with and
without a particular complication (e.g., hypertension or diabetes), (Sullivan 2019).
There are several tests available to compare survival among different independent
groups of participants. We describe how to compare two or more groups of survival
data based on their survival curves using the log-rank test of the null hypothesis of
84 5 Nonparametric Methods
statistically equivalent KM survival curves. First, we apply the test for two groups
only, and then we extend the procedure for several groups.
di n i − di
d1i n 1i − d1i
P(D1i = d1i ) =
, d1i = 0, 1, . . . , n 1i (5.16)
ni
n 1i
where D1i is the random variable representing the number of death in G1 at time t i .
The mean and variance of D1i are
di n 1i
E(D1i ) = (5.17)
ni
and
n 1i di (n i − di )(n i − n 1i )
V (D1i ) = . (5.18)
n i2 (n i − 1)
Equation (5.16) interprets that if the null hypothesis is true, d i should proportion-
ally be allocated into G1 and G2 , and hence the expected value of D1i is simply the
proportion of ni selected to n1i , then multiplied by d i .
Mantel and Haenszel (1959) proposed to compute the sum of differences between
m values of D1i over all the observed survival times, let
the observed and the expected
us denote this by D̃1 = i=1 [d1i − E(D1i )], where m is the total number of distinct
observed lifetimes. Similarly, the variance of D̃1is the sum of the variances of D1i
m
over the total number of survival times m, V D̃1 = i=1 V (D1i ). As sample size
increases, D̃1 tends
to be normally distributed with mean 0 and variance V ( D̃1 ).
Therefore, D̃1 / V D̃1 ∼ N (0, 1) and a z-score can be derived for testing the
independence of survival and group (Liu 2012), with the test statistic defined by
m
[d1i − E(D1i )]
$
z = i=1 ∼ N (0, 1). (5.19)
m
i=1 V (D1i )
where χ12 indicates the chi-square distribution with one degree of freedom for two
groups. This test is known as the log-rank test. The term “log-rank test” actually
comes from Peto and Peto’s inference, in which the method uses the log transforma-
tion of the survival function to test a series of ranked survival times (Liu 2012).
Example 5.3 The summarized data presented in Table 5.9 are the Instant Power
Supply (IPS) battery performance data consisting of both failure and censored life-
times of the batteries. The data are taken from Ruhi (2016). The IPS were used in
86 5 Nonparametric Methods
Table 5.9 IPS battery failure and censored data for maintained and nonmaintained groups
Time (in months) (t i ) Maintained group: no. of Nonmaintained group: no. of
Failures (d 1i ) Censored (c1i ) Failures (d 2i ) Censored (c2i )
2 0 0 0 1
3 0 2 0 0
6 1 1 1 2
7 0 2 1 2
8 1 2 1 1
9 0 1 0 1
10 0 3 2 1
11 0 0 3 0
12 2 9 3 2
13 0 0 3 1
14 0 2 0 0
15 0 1 3 0
18 5 2 1 0
20 2 0 0 1
21 0 1 0 0
22 0 1 0 0
23 1 1 0 0
24 8 14 0 0
25 0 1 0 0
29 1 0 0 0
30 13 6 0 0
31 2 0 0 0
33 0 1 0 0
34 1 0 0 0
36 15 4 0 0
42 12 2 0 0
44 1 0 0 0
45 1 0 0 0
52 1 0 0 0
53 2 0 0 0
Total 69 56 18 12
5.5 Hypothesis Tests for Comparison of Survival/Reliability Functions 87
some offices and residences in Rajshahi region of Bangladesh. Some batteries were
maintained regularly with proper care and some were not maintained regularly. The
information regarding the names of the manufacturing companies is not disclosed
here to protect the proprietary nature of the information.
The data set consists of 87 failure data and 68 censored data out of 155 observed
batteries. The column, characterized by “Time” indicates the age (in months) of
the item at the data collection period. Among the 155 batteries, 125 are found as
maintained regularly by the user and 30 observations are found as nonmaintained.
The numbers of failures and censored observations corresponding to each lifetime
under both maintained and nonmaintained groups are given in the table.8
As an example of the application of the log-rank test, we consider the comparison
of maintained (Group 1) and nonmaintained (Group 2) batteries used in IPS.
The quantities needed to calculate the test statistic for equality of two survival
functions for the IPS battery performance data of Table 5.9 are given in Table 5.10.
According to Eq. (5.20), the test statistic for the log-rank test to compare two
groups is
m 2 m m 2
[d1i − E(D1i )] i=1 d1i − i=1 E(D1i )
χ =
2 i=1
m = m
i=1 V (D1i ) i=1 V (D1i )
(69 − 84.3390)2
= = 103.327 (5.21)
2.2771
This test statistic is approximately distributed as chi-square with one degree of
freedom. For this test, the decision rule at 5% level of significance is in favor of
rejecting H0 because χ 2 > 3.84. Therefore, we have statistically significant evidence
at α = 0.05 to conclude that the survival functions for maintained and nonmaintained
groups are different. There is evidence to suggest that the maintained group has much
better survival than the nonmaintained group (see Fig. 5.3).
The function “survdiff” in R is a family of tests parameterized by parameter
rho can be used for this test. The following description is from R documentation
on survdiff: “This function implements the G-rho family of Harrington and Fleming
(1982), with weights on each death of S(t)ˆrho, where S is the Kaplan–Meier estimate
of survival. With rho = 0, this is the log-rank or Mantel–Haenszel test, and with rho
= 1 it is equivalent to the Peto and Peto modification of the Gehan–Wilcoxon test.”
For the IPS battery data, both the tests (with rho = 0 and rho = 1) give very small
p-values (less than 2 × 10−16 ), indicate that there is a significant difference between
the survival curves of maintained and nonmaintained groups.
8 NoteSometimes some offices replace batteries batch-wise assuming failure because of low per-
formance and so the number of failures becomes high, for example, see the lifetimes 30, 36 and
42 months. Also an office decided not to use IPS with a batch of good batteries at age 24 months,
which were considered as censored observations.
Table 5.10 Quantities needed to calculate the test statistic for equality of two survival functions
88
Fig. 5.3 Plots of the nonparametric estimates of S(t) for two groups for IPS battery data
by comparing the observed and the expected number of values of events (Liu 2012),
by extending the method described in the previous Sect. 5.5.1.
Here, we use matrix expressions of mathematical equations for comparison of K
groups. Let Oi = [d 1i , d 2i , … d (K −1)i ] be a vector for the observed number of events
in group 1 to group (K − 1) at lifetime t i , i = 1, 2, …, m. The distribution of counts in
Oi is assumed to follow a multivariate hypergeometric distribution for given [n1i , n2i ,
…, nKi ], conditional on both the row and the column totals (a detailed description can
be found in Liu 2012). This multivariate hypergeometric distribution is associated
with a mean vector
di n 1i di n 2i di n (K −1)i
Ei = , ,..., (5.22)
ni ni ni
n ki di (n i − di )(n i − n ki )
vkki = (5.24)
n i2 (n i − 1)
n ki n li di (n i − di )
vkli = , k = l; i = 1, 2, . . . , m. (5.25)
n i2 (n i − 1)
m m m
Then, letting O = i=1 Oi , E = i=1 Ei , and V = i=1 Vi , we have the
approximate chi-square test statistic for the log-rank test as
This test statistic does not consider any weight for different groups. At lifetime t i ,
let the positive weight function for the group k denoted by wk (t i ) with the property that
wk (t i ) is zero whenever ni is zero (Klein and Moeschberger 2003). With consideration
of weight function, the test statistic Eq. (5.26) can be expressed (Kalbfleisch and
Prentice 1980; Liu 2012) as
' ( ' m (
m
−1
χ =
2
wi (Oi − Ei ) [wi Vi wi ] wi (Oi − Ei ) ∼ χ(K
2
−1) (5.27)
i=1 i=1
92 5 Nonparametric Methods
References
Aalen O (1976) Nonparametric inference in connection with multiple decrement models. Scand J
Stat 3:15–27
Ben-Daya M, Kumar U, Murthy DNP (2016) Introduction to maintenance engineering: modelling,
optimization and management. Wiley
Blischke WR, Karim MR, Murthy DNP (2011) Warranty data collection and analysis. Springer,
London Limited
Fleming T, Harrington DP (1984) Nonparametric estimation of the survival distribution in censored
data. Commun Stat 13(20):2469–2486
Gehan EA (1967) A generalized Wilcoxon test for comparing arbitrarily single singly censored
samples. Biometrica 52:203–223
Gibbons JD, Chakraborti S (2003) Nonparametric statistical inference. Chapman and Hall/CRC
Harrington DP, Fleming TR (1982) A class of rank test procedures for censored survival data.
Biometrika 69:553–566
Hosmer DW, Lemeshow S (1999) Applied survival analysis: regression modeling of time-to-event
data. Wiley
Kalbfleisch JD, Lawless JF (1996) Statistical analysis of warranty claims data. In: Blischke WR,
Murthy DNP (eds) Product warranty handbook. M. Dekker, New York, pp 231–259
Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Wiley, New York
Kalbfleisch JD, Lawless JF, Robinson JA (1991) Methods for the analysis and prediction of warranty
claims. Technometrics 33:273–285
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat
Assoc 53:457–481
Karim MR, Suzuki K (2005) Analysis of warranty claim data: a literature review. Int J Qual Reliab
Manag 22(7):667–686
Karim MR, Yamamoto W, Suzuki K (2001) Statistical analysis of marginal count failure data.
Lifetime Data Anal 7:173–186
Klein JP, Moeschberger ML (2003) Survival analysis: techniques for censored and truncated data,
2nd edn. Springer
Lawless JF (1982) Statistical models and methods for lifetime data. Wiley, New York
Lawless JF (1998) Statistical analysis of product warranty data. Int Stat Rev 66:41–60
Liu X (2012) Survival analysis: models and applications. Wiley, UK
Makkonen L (2008) Bringing closure to the plotting position controversy. Commun Stat Theory
Methods 37:460–467
References 93
Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from retrospective studies
of disease. J Natl Cancer Inst 22(4):719–748
MathSoft, Inc (1998) S-PLUS 5 for UNIX guide to statistics, vol 2. Data Analysis Products Division
MathSoft, Inc, Seattle, Washington
Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley Interscience, New
York
Murthy DNP, Jack N (2014) Extended warranties, maintenance service and lease contracts: modeling
and analysis for decision-making. Springer, London
Nelson W (1969) Hazard plotting for incomplete failure data. J Qual Technol 1:27–52
Nelson W (1972) Theory and application of hazard plotting for censored survival data. Technomet-
rics 14:945–966
Nelson W (1982) Applied life data analysis. Wiley, New York
Peto R, Peto J (1972) Asymptotically efficient rank invariant test procedures. J R Stat Soc A
135:185–207
Prentice RL (1978) Linear rank tests with right censored data. Biometrika 65:167–179
Ruhi S (2016) Application of complex lifetime models for analysis of product reliability data.
Unpublished doctoral dissertation, University of Rajshahi, Bangladesh
Sullivan L (2019) Survival analysis. Available at Boston University School of Public Health website
http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Survival/. Accessed 28 May 2019
Suzuki K, Karim MR, Wang L (2001) Statistical analysis of reliability warranty data. In: Rao CR,
Balakrishnan N (eds) Handbook of statistic: advances in reliability. Elsevier, Amsterdam
Tarone RE, Ware JH (1977) On distribution-free tests for equality for survival distributions.
Biometrika 64:156–160
Chapter 6
Probability Distribution of Lifetimes:
Censored and Left Truncated
Abstract This chapter discusses the maximum likelihood estimation method for
analyzing the censored and truncated data using some common lifetime distribu-
tions. The likelihood functions under the schemes of different types of censoring and
truncation constructed in Chap. 4 will be applied in this chapter.
6.1 Introduction
In Chap. 4, we have discussed the failure time distributions for uncensored data.
Now, we discuss the failure time distributions for right-censored and left-truncated
data. The construction of likelihood functions is shown in the previous chapter under
the schemes of different types of censoring and truncation. In this chapter, two most
extensively used types of censoring, Types I and II, will be considered along with
left truncation. The left-truncated and right-censored observations are sometimes
found in the same data set and need to be taken into account jointly for estimating
parameters of a lifetime distribution. In this chapter, some examples are provided.
The outline of the chapter is as follows: Sect. 6.2 presents the exponential dis-
tribution with a discussion on the statistical inference on its parameter for censored
data. Section 6.3 explains the extreme value and Weibull distributions. The normal
and lognormal distributions are presented in Sect. 6.4. Section 6.5 deals with the
gamma distribution for censored and left-truncated data.
r
ln L = r ln λ − λ t(i) − λ(n − r )t(r ) . (6.3)
i=1
r
r
∂ ln L
= − t(i) − (n − r )t(r ) = 0.
∂λ λ i=1
Solving for λ, we obtain the maximum likelihood estimator under Type II cen-
soring scheme as
r
λ̂ = r . (6.4)
i=1 t (i) + (n − r )t(r )
6.2 Exponential Distribution 97
1 t
MTTF = =
λ̂ r
χ2r,α/2
2
χ2r,1−α/2
2
P ≤λ≤ = 1 − α. (6.5)
2t 2t
ti = min (Ti , ci ),
and
1, if ti = Ti
δi = .
0, if ti = ci
98 6 Probability Distribution of Lifetimes: Censored and Left Truncated
Then, the likelihood function can be obtained as follows for Type I censoring
n
n
L= Li = f (ti )δi S(ti )1−δi
i=1 i=1
where
f (ti ), if δi = 1
Li = .
S(ti ), if δi = 0
Example 6.1 The likelihood function for the exponential distribution is shown as
n
−λti δi −λti 1−δi
n
−λ ti
L= λe e = λr e i=1 = λr e−λt (6.6)
i=1
n n
where r = i=1 δi and t = i=1 ti .
As shown in Chap. 4, the maximum likelihood estimator of λ based on the above
likelihood function is
r
λ̂ = , (6.7)
t
and the Fisher information can be obtained as
2
∂ ln L E(r )
I (λ) = E − = 2 . (6.8)
∂λ2 λ
∞
p = P(δi = 1) = f (u i )[1 − F(u i )] du i
0
It may be noted here that we can define the observed information as − ∂ ∂λln2 L =
2
r
λ2
. Sometimes, we use the observed information instead of Fisher information for
computational convenience.
6.2 Exponential Distribution 99
6.2.2.1 Tests
Some tests for the parameter of an exponential distribution in the presence of censor-
ing are shown below. Both the large sample and small sample tests are introduced.
(i) In the previous section, we have shown that the Fisher information is I (λ) =
E(r )/λ2 and for large sample size, n, the standardized form
λ̂ − λ
W =
I −1 (λ)
(ii) An alternative
to the above test for the large sample (i), we can assume that
ln λ̂ ∼ N ln λ, r1 where
∂ ln λ 2 λ2 1 1
Var ln λ̂ Var λ̂ = I −1 (λ) = = .
∂λ np λ2 np
Here, p̂ = r/n and estimated variance is Var̂ ln λ̂ 1/r . Hence, for testing the
null hypothesis H0 : λ = λ0 , we can use the following under H0
ln λ̂ − ln λ0
W = √ .
1/r
From the above confidence limits for ln λ, we can obtain the (1−α)100% confidence
limits for λ as follows:
100 6 Probability Distribution of Lifetimes: Censored and Left Truncated
√1 √1
λ̂e−z1−α/2 r , λ̂ez1−α/2 r . (6.11)
(iii) If there are two samples from populations 1 and 2 drawn from two exponential
distributions with parameters λ1 and λ2 , respectively, then we may be interested
in testing for the equality of the parameters. Let us consider sample sizes in
samples 1 and 2 be n1 and n2 and number of failures be r 1 and r 2 , respectively.
The null hypothesis is the equality of two parameters H0 : λ1 = λ2 .
Let ln λ̂1 ∼ N ln λ1 , r11 and ln λ̂2 ∼ N ln λ2 , r12 and it can be shown that
1 1
Var ln λ̂1 − ln λ̂2 Var ln λ̂1 + Var ln λ̂1 = + .
r1 r2
ln λ̂1 − ln λ̂2
W = .
1
r1
+ 1
r2
From the above confidence limits for ln λ1 − ln λ2 , we can obtain the (1 − α)100%
confidence limits for λ1 − λ2 as follows:
−z 1−α/2 r1 + r1 z 1−α/2 r1 + r1
λ̂1 − λ̂2 e 1 2 , λ̂1 − λ̂2 e 1 2 . (6.12)
(iv) A test for both small and large samples can be developed using the sug-
gestion made by Sprott (1973). Lawless (2003) illustrated the test for the
1
exponential distribution. Sprott (1973) indicated that φ̂ = λ̂ 3 provides a
better approximation to a normal distribution for even small samples. The
1
distribution of φ̂ is approximately normal with mean φ = λ 3 and variance
2
Var φ̂ = ∂φ ∂λ
Var λ̂ .
6.2 Exponential Distribution 101
and
1 2 2 λ2 2
λ3 φ2
Var φ̂ = λ− 3 = = .
3 np 9np 9np
φ̂ − φ
Z = 1/2 ∼ N (0, 1).
φ̂2
9np
φ̂ 2
If we consider the observed information, then Var̂ φ̂ 9r
and the test statistic
is
φ̂ − φ
Z = 1/2 ∼ N (0, 1).
φ̂2
9r
1
As φ = λ 3 which implies that λ = φ 3 and the estimator is λ̂ = φ̂ 3 , the (1−α)100%
confidence interval for λ can be shown as
√ 3 √ 3
P λ̂ 1 − z 1−α/2 /3 r < λ < λ̂ 1 + z 1−α/2 /3 r = 1−α. (6.13)
(v) The likelihood ratio method can be used for either tests or interval estimation.
Under the hypothesis H0 : λ = λ0 , the likelihood ratio statistic is
102 6 Probability Distribution of Lifetimes: Censored and Left Truncated
Fig. 6.1 Fitted pdf, cdf, reliability function and hazard function of exponential distribution for
battery failure data
L(λ0 )
= −2 ln ∼ χ12 ,
L λ̂
where L(λ0 ) is the likelihood under H0 and L(λ̂) is the likelihood under H1 .
0)
The likelihood ratio = −2 ln L(λL(λ)
can be expressed as
= 2r λ̂/λ − 1 − ln λ̂/λ ∼ χ12
and an approximate 95% confidence interval for λ can be obtained from the following
inequality
= 2r λ̂/λ − 1 − ln λ̂/λ ≤ χ12 . (6.14)
6.2 Exponential Distribution 103
Fig. 6.2 Exponential probability plot for the battery failure data
Example 6.2 In this example, we consider the battery failure data of Table 5.1, for
which n = 54, with r = 39 failure times and n − r = 15 censored times. The unit of
measurement is the number of days. Nonparametric analysis of this data set is given in
Example 5.2. Our objective in this example is to analyze these data using exponential
distribution. The likelihood function (6.6) and hence the Eq. (6.7)
n can be used for
finding the MLE of the parameter λ. The MLE of λ is λ̂ = r/ i=1 ti = 39/33,256
= 0.001172403. The estimate of MTTF is 1/λ̂ = 1/0.001172403 = 852.949 days.
The standard error of MTTF is 136.581, and the 95% normal confidence interval for
the MTTF is [623.192, 1167.410].
Figure 6.1 shows the ML estimates of the pdf, cdf, reliability function, and hazard
function of the fitted exponential distribution for the battery failure data.
The exponential probability plot for the battery failure data given in Fig. 6.2
indicates that the exponential distribution is not a suitable distribution for the data.
The search for a better model for this data set requires further investigation.
The Weibull distribution (Weibull 1951) is one of the most used lifetime distributions
in both reliability and survival analyses. The Weibull distribution has extensive use in
lifetimes and fatigue of electronic devices as well as in longitudinal data on survival
times of human being. On the other hand, the extreme value distributions or Gumbel
distributions are closely related to the Weibull distribution. It has been shown as
104 6 Probability Distribution of Lifetimes: Censored and Left Truncated
a special case of the Weibull distribution under certain transformations and can be
represented equivalently.
A two-parameter Weibull probability density function is
α
f (t|α, β) = αλ(λt)α−1 e−(λt) , t ≥ 0. (6.15)
where α is the shape and λ is the scale parameter of the distribution. Using the
following transformations, we can show the equivalence between the Weibull and
the extreme value distributions: Y = ln T, u = ln(1/λ) and b = 1/α. We can also
show that T = eY , λ = e−u and α = 1/b. Then, the extreme value distribution can
be expressed as
(y−u)/b
f (y|u, b) = e(y−u)/b e−e , −∞ < y < ∞ (6.16)
where u is the location and b is the scale parameter. The reliability or survival func-
tions of the Weibull and the extreme value distributions are
α
Weibull: R(t) = S(t) = 1 − F(t) = e−(λt)
(y−u)/b
Extreme Value: R(y) = S(y) = 1 − F(y) = e−e .
The hazard functions are
Weibull: h(t) = αλ(λt)α−1
Extreme Value: h(y) = b1 e(y−u)/b .
The likelihood function under Type II censoring is discussed here. Let t(1) <
· · · < t(r ) be the r smallest ordered lifetimes from a random sample of size n from
the Weibull distribution. The r smallest log lifetimes are y(1) = ln t(1) , . . . , y(r ) =
ln t(r ) . It may be noted here that t(r +1) = t(r ) , . . . , t(n) = t(r ) which are censored.
Equivalently, we can show that y(r +1) = y(r ) , . . . , y(n) = y(r ) for log lifetimes. The
r smallest ordered log lifetimes from an extreme value distribution can be written as
y(1) < · · · < y(r ) . The likelihood function for partially censored extreme values is
r
r
ln L = −r ln b + y(i) − u /b − e( y(i) −u )/b − (n − r ) e( y(r ) −u )/b . (6.17)
i=1 i=1
Differentiating the log likelihood function with respect to u and equating to 0, the
likelihood equation for estimating u is
r
∂ ln L r e( y(i) −u )/b (n − r ) e( y(r ) −u )/b
=− + i=1
+ =0
∂u b b b
6.3 Extreme Value and Weibull Distributions 105
Similarly, differentiating the log likelihood function with respect to b and equating
to 0, the likelihood equation for estimating b is
∂ ln L n n n n
= + n ln λ + ln ti − ln λ (λti )α − (λti )α ln ti = 0, (6.20)
∂α α i=1 i=1 i=1
∂ ln L rα n
= − αλα−1 α
t(i) = 0. (6.21)
∂λ λ i=1
Lawless (2003) suggested the likelihood ratio test for the null hypothesis H0 :
b = b0 where the likelihood ratio test statistic is
−2 ln = −2[ln L 0 − ln L 1 ]
which is asymptotically χ12 . Here, L 0 = L û(b0 ), b0 and L 1 = L û, b̂ .
The likelihood function for right-censored and left-truncated data is shown in this
example for extreme value distribution (see Balakrishnan and Mitra 2012 for details).
The extreme value distribution is shown below
1 (y−u)/b −e(y−u)/b
f (y|u, b) = e e , −∞ < y < ∞, −∞ < u < ∞, b > 0
b
(6.23)
where Y = ln T , T is the lifetime that follows Weibull distribution, u and b are loca-
tion and scale parameters, respectively. Now let us define δi = 0 for right censored
106 6 Probability Distribution of Lifetimes: Censored and Left Truncated
and δi = 1 for uncensored, tlti be the left-truncation time, S1 be the index set for not
left truncated and S2 be the index set for left truncated, then the likelihood function
for left-truncated and right-censored data is
1 yi −u
yi−u
δi yi−u 1−δi
L= b −e e−e
b b
e
i∈S1
σ
δi tlti−u yi−u 1−δi
1 e tlti−u yi −u
yi−u
−e b
× × ee e−e .
b b b
e e b (6.24)
i∈S2
b
Let us denote νi = 0 for the ith item truncated and νi = 1 for the ith item not
truncated, then the log likelihood function becomes
n
n
yi − u yi −u tlti −u
ln L = −δi ln b + δi −e b + (1 − vi )e b . (6.25)
i=1
b i=1
Like the likelihood function of the exponential distribution, discussed in Sect. 6.2.2,
the likelihood function of Weibull distribution for Type I censoring becomes
n
L= f (ti )δi S(ti )1−δi (6.26)
i=1
and the maximum likelihood estimating equations are shown in Eqs. (6.20) and
(6.21). Solving those equations, we obtain the estimates of the parameters of Weibull
distribution.
Example 6.3 In this example, Weibull distribution is applied to analyze the battery
failure data of Table 5.1 (also discussed in Example 6.2). The likelihood function
(6.26) and hence the Eqs. (6.20) and (6.21) can be used for finding the MLE of the
parameters.
The MLEs of the shape parameter, α̂ = 1.9662, and scale parameter, λ̂ = 836.3442.
The estimate of MTTF is 741.449 days. The MLE of the shape parameter of the
Weibull distribution is 1.9662, which is greater than one, indicates increasing failure
rate for the battery with respect to its age. Also, the shape parameter is much higher
than one and implies that the exponential distribution is not a suitable distribution
for the lifetime of the battery (as explained in Example 6.2).
Figure 6.3 shows the ML estimates of the pdf, cdf, reliability function, and hazard
function of the fitted Weibull distribution for the battery failure data.
6.3 Extreme Value and Weibull Distributions 107
Fig. 6.3 Fitted pdf, cdf, reliability function and hazard function of Weibull distribution for battery
failure data
The Weibull probability plot for the battery failure data is given in Fig. 6.4. The
roughly linear pattern of the data on the Weibull probability paper suggests that
the Weibull model may be a reasonable choice for modeling the lifetime of the
battery data. Comparing Fig. 6.4 with Fig. 6.2, it can be concluded that the Weibull
distribution appears to be better than the exponential distribution for this data set.
The lognormal distribution has become one of the most popular life distribution
models in reliability for many high technology applications such as semiconductor
degradation failure mechanisms. The use of lognormal distribution is quite satis-
factory for modeling fatigue failures and crack propagation. We observe the use
of lognormal distribution in modeling failure times of electrical insulation as well.
Although we cannot use the normal distribution directly in reliability and survival
analyses due to the fact that the data are always nonnegative, still it is important
108 6 Probability Distribution of Lifetimes: Censored and Left Truncated
Fig. 6.4 Weibull probability plot for the battery failure data
1
e− 2σ 2 (y−μ) , −∞ < y < ∞
1 2
f (y) = √ (6.27)
2π σ 2
and
1
e− 2σ 2 (ln t−μ) , t > 0.
1 2
f (t) = √ (6.28)
t 2π σ 2
Let us now define the probability density function, f (z), and the reliability func-
tion R(z) or S(z), where Z = Y −μ
σ
then
1
f (z) = √ e−z /2 , −∞ ≤ z ≤ ∞
2
2π
6.4 Normal and Lognormal Distributions 109
and
∞
R(z) = S(z) = f z
dz
.
z
r
2
n ∞
1 − 2σ12 (ln t(i) −μ) 1
e− 2σ 2 (ln u (i) −μ) du (i)
1 2
L= √ e √
i=1 t(i) 2π σ 2 i=r +1 u (i) 2π σ 2
ln t(i)
n
y(i) − μ
r
1 y(i) − μ
= f R (6.29)
i=1
σ σ i=r +1
σ
It may be noted here that R(t) = P(T ≥ t) = P eY ≥ t = P(Y ≥ ln t).
The log likelihood function is
1 2
r n
y(i) − μ
ln L = −r ln σ − y(i) − μ + ln R . (6.30)
2σ 2 i=1 i=r +1
σ
Differentiating the log likelihood with respect to μ and σ , we obtain the likelihood
equations as shown below
1 1
r n
∂ ln L y(i) − μ y(i) − μ
= 2 y(i) − μ + f /R =0
∂μ σ i=1 σ i=r +1 σ σ
and
1 2
r
∂ ln L r
=− + 3 y(i) − μ
∂σ σ σ i=1
n
1 y(i) − μ y(i) − μ y(i) − μ
+ f /R = 0.
σ i=r +1 σ σ σ
The construction of the likelihood function for left truncated and right censored
in case of the lognormal distribution is shown in Chap. 4 , (also in Balakrishnan and
Mitra 2011, 2014), where the log likelihood function is
n
1 yi − μ
ln L(μ, σ ) = −δi ln σ − δi 2 (yi − μ)2 + (1 − δi ) ln 1 − F
i=1
2σ σ
tlti − μ
− ln 1 − F , (6.31)
i∈S
σ
2
110 6 Probability Distribution of Lifetimes: Censored and Left Truncated
where μ and σ are location and scale parameters, respectively, f (·) and F(·) are
probability density and cumulative distributions of the standard normal distribution,
respectively, δi = 0 for right censored and δi = 1 for uncensored, tlti is the left-
truncation time, S1 is the index set for not left truncated and S2 is the index set for
left truncated. Let us denote νi = 0 for the ith item truncated and νi = 1 for the ith
item not truncated, then the score equations are
y −μ
∂ ln L n
δi (1 − νi ) f i σ
= (yi − μ) −
∂μ i=1
σ2 σ 1 − F yi σ−μ
y −μ
(1 − δi ) f i σ
+ = 0, (6.32)
σ 1 − F yi σ−μ
∂ ln L n
δi δi f tltiσ−μ tlti − μ
= − + 3 (yi − μ) − (1 − νi )
2
∂σ i=1
σ σ 1 − F tltiσ−μ σ2
n
f yi σ−μ yi − μ
+ (1 − δi ) yi −μ = 0. (6.33)
i=1
1− F σ σ2
The two-parameter gamma probability density function for failure time data is
1 γ γ −1 −λt
f (t|λ, γ ) = λ t e , t ≥0 (6.33)
γ
∞
where λ > 0 and γ > 0 and γ = 0 t γ −1 e−t dt.
Let us consider Type II sample data of size n are drawn from a gamma dis-
tribution. The sample of failure times in ordered form can be shown as t(1) <
t(2) < · · · < t(r ) , and the remaining (n − r) observations are censored which are
t(r +1) = t(r ) , . . . , t(n) = t(r ) . Following the approach suggested by Wilk et al. (1962),
and Gross and Clark (1975) showed the likelihood function as
⎡ ⎤n−r
r (γ −1) ∞
n! τ nγ
G
L(τ, γ ) =
e(−dτ A) ⎣ t γ −1 e−τ t dt ⎦ (6.34)
(n − r )! γ n t(rr )
1
where
r !r 1/r
i=1 t(i) i=1 t(i)
τ = λt(r ) , A= , G= .
r t(r ) t(r )
6.5 Gamma Distribution 111
Based on Wilk et al. (1962), and Gross and Clark (1975), Lawless (1982) suggested
estimation procedures for τ and γ or more specifically for λ and γ . The procedure
is tedious and hence an alternative procedure can be used as proposed by Mitra
(2012) and Balakrishnan and Mitra (2012, 2013). Mitra (2012) proposed a likelihood
function for left-truncated and right-censored data. To keep relevance with the pdf
mentioned above, let us introduce λ = θ1 and γ = κ. Then, the form of the likelihood
function for left-truncated and right-censored data is
" #δi " #1−δi
f (ti ) 1 − F(ti )
L(κ, θ ) = { f (ti )}δi {1 − F(ti )}1−δi × .
i∈S1 i∈S2
1 − F tiL 1 − F tiL
(6.35)
n
ti
ln L(κ, θ ) = δi (κ − 1) ln ti − − κ ln θ + (1 − δi ) ln κ, tiL /θ
i=1
θ
n
− νi ln κ + (1 − νi ) ln κ, tiL /θ . (6.36)
i=1
n
ti
ln L c (κ, θ ) = (κ − 1) ln ti − − κ ln θ
i=1
θ
n
− νi ln κ + (1 − νi ) ln κ, tiL /θ . (6.38)
i=1
⎡ ⎤ ⎡ ⎤
t E (r )
Q λ, λ(r ) = ⎣ (κ − 1)E 1i(r ) ⎦ − ⎣ 2i ⎦
i
(κ − 1) log ti + +
i:δ =1 i:δ =0 i:δ =1
θ i:δ =0
θ
i i i i
n
$ %
− nκ log θ − νi log κ + (1 − νi ) log κ, tiL /θ (6.39)
i=1
where E 1i(r ) = E λ(r ) log Ti |Ti > yi and E 2i(r ) = E λ(r ) [Ti |Ti > yi ]. See Balakrishnan
and Mitra (2013) for further details.
The M-step At the M-step, maximizing Q λ, λ(r ) with respect to θ , we obtain
⎡ L κ −t L /θ ⎤
1 ⎣
n
t e i
θ= ti + E 2i(r ) − (1 − νi ) κ−1i L ⎦ (6.40)
nκ i:δ =1 i:δ =0 i=1
θ κ, ti /θ
i i
∂Q (r )
= log ti + E 1i − n log θ − n log θ
∂κ i:δi =1 i:δi =0
" #
n ∂
(κ, tlti /θ )
− νi ψ(κ) + (1 − νi ) ∂κ (6.41)
i=1
(κ, tlti /θ )
References
Balakrishnan N, Mitra D (2011) Likelihood inference for lognormal data with left truncation and
right censoring with an illustration. J Stat Plan Infer 141:3536–3553
Balakrishnan N, Mitra D (2012) Left truncated and right censored Weibull data and likelihood
inference with an illustration. Comput Stat Data Anal 56:4011–4025
Balakrishnan N, Mitra D (2013) Likelihood inference based on left truncated and right censored
data from a gamma distribution. IEEE Trans Reliab 62:679–688
Balakrishnan N, Mitra D (2014) Some further issues concerning likelihood inference for left trun-
cated and right censored lognormal data. Commun Stat Simul Comput 43:400–416
References 113
Gross AJ, Clark VA (1975) Survival analysis: reliability applications in the biomedical sciences.
Wiley, New York
Lawless JF (1982) Statistical models and methods for lifetime data. Wiley, New York
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley, New Jersey
Mitra D (2012) Likelihood inference for left truncated and right censored lifetime data. Ph.D.
dissertation, McMaster University, Ontario
Sprott DA (1973) Normal likelihoods and their relation to large sample theory of estimation.
Biometrika 60:457–465
Weibull W (1951) A statistical distribution function of wide applicability. J Appl Mech 18:293–296
Wilk MB, Gnanadesikan R, Huyett MJ (1962) Separate maximum likelihood estimation of scale
or shape parameters of the gamma distribution using order statistics. Biometrika 50:217–221
Chapter 7
Regression Models
Abstract In both reliability and survival analyses, regression models are employed
extensively for identifying factors associated with probability, hazard, risk, or sur-
vival of units being studied. This chapter introduces some of the regression models
used in both reliability and survival analyses. The regression models include logistic
regression, proportional hazards, accelerated failure time, and parametric regression
models based on specific probability distributions.
7.1 Introduction
In both reliability and survival analyses, regression models are employed extensively
for identifying factors associated with probability, hazard, risk, or survival of units
being studied. The use of linear regression models assuming normality assumption
is very limited in reliability and survival analyses due to the fact that: (i) the lifetime
variables are non-negative and skewed and (ii) the relationship between lifetimes and
explanatory variables are not directly linear. However, if we consider the relationships
between probability, hazard, risk, or survival/failure of units, then regression models
can be used. In this chapter, some of the regression models used very extensively in
both reliability and survival analyses are introduced. The regression models include
logistic regression, proportional hazards, accelerated failure time, and parametric
regression models based on specific probability distributions.
The outline of the chapter is as follows. Section 7.2 presents the logistic regression
model. Section 7.3 explains the proportional hazards model. Section 7.4 discusses the
accelerated failure time model. Section 7.5 deals with parametric regression models,
including the exponential, Weibull, and lognormal regression models.
The logistic regression model is one of the most widely used models mainly due to
its simplicity and useful and natural interpretation of the estimates of the parameters.
This model is based on the survival status of a unit over time. Let us consider time
points; t0 denotes the starting time, and te denotes the endpoint of the study. In other
words, during the process of the study, we observe the survival or failure status at the
beginning and at the end of the study. Let us denote the outcome variable as follows:
1, if the unit fails during the period (t0 , te ),
Y = (7.1)
0, if the unit survives during the period (t0 , te ).
Here Y is a binary variable. Let us consider a sample of size n. The outcomes can
be shown as a vector for n units as Y = (Y1 , . . . , Yn ). Now we may consider a
vector of p explanatory variables or risk factors for Yi , X i = X i1 , . . . , X i p , i =
1, 2, . . . , n. We want to know whether these explanatory variables are associated
with the outcome variable, Y. In other words, the outcome variable, survival status,
observed for each unit is associated with any or all of the corresponding covariates
included in the vector, X. The covariate vector values can be represented by the
following matrix for a sample of size n:
⎡ ⎤
x11 . . . x1 p
⎢ .. ⎥
X =⎣ . ⎦.
xn1 . . . xnp
To find the relationship between the outcome variable, Y, and the covariates, X, let
us define the following probability function:
exi β
P(Yi = 1|X i = xi ) = (7.2)
1 + exi β
where xi = 1 xi1 . . . xi p and β = β0 β1 . . . β p . We can also define
1
P(Yi = 0|X i = xi ) = (7.3)
1 + exi β
This is known popularly as the logit function, and this regression model is known
as the logistic regression model. In Chap. 8, this will be discussed again as a link
function of the generalized linear models.
The interpretation of the parameters of a logistic regression model is very mean-
ingful, and it can be linked with a well-known measure called the odds ratio. Let
us consider the covariate value of an explanatory variable xi j = 0 or 1. Keeping
the values of all other explanatory variables constant, the odds for xi j = 1 can be
obtained as follows
eβ0 +β1 xi1 +···+β j−1 xi, j−1 +β j ×1+···+β p xi p
P Yi = 1 X i = xi , i = j; X i j = 1 1+eβ0 +β1 xi1 +···+β j−1 xi, j−1 +β j ×1+···+β p xi p
=
P Yi = 0 X i = xi , i = j; X i j = 1 1
eβ0 +β1 xi1 +···+β j−1 xi, j−1 +β j ×1+···+β p xi p
β0 +β1 xi1 +···+β j−1 xi, j−1 +β j ×1+···+β p xi p
=e .
Then the odds ratio is obtained by taking the ratio of two odds for xi j = 1 and
xi j = 0 as shown below
P Yi = 1 X i = xi , i = j; X i j = 1 P Yi = 1 X i = xi , i = j; X i j = 0
= eβ j .
P Yi = 0 X i = xi , i = j; X i j = 1 P Yi = 0 X i = xi , i = j; X i j = 0
(7.6)
the event P(Yi = 0|X i = xi ) = 1+e1xi β . The likelihood function can be shown as
n
L= {P(Yi = 1|X i = xi )}Yi {P(Yi = 0|X i = xi )}1−Yi
i=1
118 7 Regression Models
Yi 1−Yi
n
exi β 1
= . (7.7)
i=1
1 + exi β 1 + exi β
n
ln L = yi xi β − ln 1 + exi β − (1 − yi ) ln 1 + exi β
i=1
n
= yi xi β − ln 1 + exi β . (7.8)
i=1
Differentiating the log likelihood with respect to parameters β0 and β j and putting
equal to zero, we obtain the estimating equations
n
∂ ln L exi β
= yi − =0
∂β0 i=1
1 + exi β
n
∂ ln L exi β
= xi j yi − = 0, j = 1, . . . , p. (7.9)
∂β j i=1
1 + exi β
x β
∂ 2 ln L n
ei
=− 2
xi j 2 , j = 0, 1, . . . , p;
∂β j2
i=1 1 + exi β
x β
∂ 2 ln L n
ei
=− xi j xik 2 , j = 0, 1, . . . , p; k = 0, 1, . . . , p, j = k.
∂β j ∂βk i=1 1 + exi β
(7.10)
L(H0 ) L0
= = .
L(H1 ) L1
where L(H0 ) = L 0 is the likelihood under the null hypothesis and L(H1 ) = L 1 is
the likelihood under the alternative hypothesis (extended logistic regression model
including the parameters, β1 , …, βp ). The log likelihood ratio test statistic is
−2 ln = −2(ln L 0 − ln L 1 ) (7.11)
which is asymptotically χ p2 .
(ii) The test for H0 : β j = 0 can be conducted by using the following test statistic
β̂ j
t= (7.12)
se β̂ j
which has t distribution with (n-p-1) degrees of freedom. For large n, this test statis-
tic can be shown asymptotically as standard normal. For more on logistic regres-
sion model, see, for example, Agresti (2002, 2015), Hosmer and Lemeshow (2000),
and Kleinbaum and Klein (2010).
In a longitudinal study, we can observe or record the failure times at the time of
occurrence. It is frequently the case that time to failure is dependent on other random
variables—characteristics which are perhaps subject to natural variation, but may
also be under a certain amount of control. These explanatory variables or covariates
influence the lifetime model through the reliability function, and thus by implication
through the hazard function. Common to such models is the notion of a baseline
reliability function which corresponds to the lifetime behavior for some standard or
initializing condition.
Let us denote T be the failure time variable. The vector of failure times for a random
sample of size n is represented by T = (T1 , . .. , Tn ). Let us represent the set of
covariates by a vector X = X1 , X 2 , . . . , X p and the corresponding parameter
vector β = β1 , β2 , . . . , β p . The observed values of T and X are denoted by
T = (t1 , . . . , tn ) and X = x1 , x2 , . . . , x p , respectively. The reliability function
conditional on covariate vector X is defined as
120 7 Regression Models
In the proportional hazards model, proposed by Cox (1972), the combined effect of
the X variables is to scale the hazard function up or down (Islam and Shiha 2018).
The hazard function satisfies
where h 0 (t) is the baseline hazard function. For covariate values x1 and x2 , the hazard
ratio at failure time t is
h(t; x1 ) h 0 (t)g(x1 ) g(x1 )
= = (7.15)
h(t; x2 ) h 0 (t)g(x2 ) g(x2 )
which is independent of time t and depends on the values of the covariates. This is
well-known proportionality assumption of a proportional hazards model.
Cox (1972) proposed the proportional hazards model of the following form
In other words, the hazard function h(t; x) depends on two components: (i) base-
line hazard function of time, h 0 (t), independent of covariates, and (ii) function of
covariates, exβ . The hazard function is more sensitive to any change during a small
period of time which makes it suitable to use for defining the underlying regression
model instead of survivor or reliability function and probability density function.
However, the hazard function can be shown to have relationship with both probabil-
ity density and survivor functions (see Chap. 2). We know that
f (t; x)
h(t; x) =
S(t; x)
and
t
H (t; x) = h(τ ; x)dτ
0
t
Let S0 (t) = e− 0 h 0 (τ )dτ
; then the survivor function is
xβ
S(t; x) = [S0 (t)]e .
r
ex(i) β
L(β) = ! . (7.20)
i=1 l∈R(t(i) ) e xl β
1 Here R(t (i) ) denotes the risk set, not the reliability function. The reliability function in this section
is denoted by S(t (i) ).
122 7 Regression Models
The unspecified parameters, h 0 (t), are canceled out from both the numerator and
denominator.
In the presence of ties, the partial likelihood is
r
esi β
L(β) = ! di
(7.21)
xl β
i=1 l∈R(t(i) ) e
!
where di is the number of ties at time t(i) and si = l∈di xl sum of covariate vector
values for all the failures at time t(i) . This partial likelihood represents the contribution
to likelihood well in case of a small number of ties. The log partial likelihood function
is
⎡ ⎤
r p
r
ln L(β) = x(i) j β j − ln⎣ ex(i) β ⎦. (7.22)
i=1 j=1 i=1 l∈R(t(i) )
The estimating equations are obtained by taking the first derivative of the log partial
likelihood function and equating to 0 similar to the log likelihood for a parametric
form
! x(i) β
∂ ln L(β)
r r
l∈R(t(i) ) xl j e
U j (β) = = x(i) j − ! x(i) β
= 0. (7.23)
∂β j i=1 i=1 l∈R(t(i) ) e
!
∂ 2 ln L(β)
r
l∈R(t(i) xl j xlk ex(i) β
I jk (β) = − = !
∂β j ∂βk i=1 l∈R(t(i) ) ex(i) β
! !
r
l∈R(t(i) xl j ex(i) β l∈R(t(i) xlk ex(i) β
− ! ! . (7.24)
i=1 l∈R(t(i) ) ex(i) β l∈R(t(i) ) ex(i) β
The variance and covariance of the estimators of β can be found approximately from
the inverse of the information matrix
Var β̂ = I −1 β̂ . (7.25)
7.3 Proportional Hazards Model 123
The estimates are obtained by solving the estimating equations for β1 , . . . , β p using
an iterative method such as Newton–Raphson method of iteration.
Sometimes we need the estimate of survivor function that requires the estimation
of baseline hazard function. Breslow (1972a, b) proposed a method of estimating
survivor function using the estimate of parameters of a proportional hazards model
as shown below
⎡ ⎤
Ĥ0 (t) = ⎣d j e xl β ⎦ (7.26)
t j ≤t l∈R(ti )
where d j is the number of events occurred at time t j and the baseline survivor function
is
sTests
Two tests are generally performed to test for the null hypothesis H0 : β = β0 .
(i) Wald test
If we assume asymptotically normal estimators, then
χW
2
= β̂ − β0 I β̂ β̂ − β0 (7.28)
β̂ j
W = (7.30)
se β̂ j
γ̂ j
W =
se γ̂ j
−2 ln = −2(ln L 0 − ln L 1 )
where L 0 is the likelihood under the null hypothesis and L 1 is the likelihood under
the extended proportional hazards model including the parameters γ1 , . . . , γ p . The
likelihood ratio test statistic is asymptotically χ p2 .
If there is violation of proportionality assumption due to variable, say X p , then
stratified proportional hazards model can be used to control by stratification of the
predictor X p keeping (p − 1) remaining variables in the model that do not violate
the assumption. The stratified proportional hazards model is represented by
k
exi β
Ls = ! (7.31)
i=1 l∈R (t(i) ) e xl β
S
L(β) = L s (β).
s=1
7.3 Proportional Hazards Model 125
The estimates are obtained by taking first derivatives with respect to parameters
of the model and equating to zero.
Prentice et al. (1978) and Farewell (1979) extended the proportional hazards for
competing causes. The cause-specific hazard function is defined by
The function h s (t; x) gives the instantaneous failure rate from cause s at time t,
given the vector of explanatory variables, X, in the presence of other failure types.
Assuming distinct failure types, the overall hazard function can be expressed in
terms of cause-specific hazard function (Prentice et al. 1978) as
S
h(t; x) = h s (t; x).
s=1
and the probability density function for time to failure and cause of failure
h s (t; x) = h s (t)exβs
h s (t; x) = h 0s (t)ex(t)βs
where βs = βs1 , . . . , βsp is the vector of regression coefficients corresponding to
the observed values of the covariate vector xs for the failure of type s (s = 1, 2, …, S).
Let the ordered failure times for failures of type s (s = 1, …, S) are t(s1) < · · · < t(sks ) ;
then the partial likelihood is
S
ks
exsi βs
L= ! (7.32)
s=1 i=1 l∈R (t(si) ) exl βs
where k S is the total number of failures due to cause S and R t(si) is the risk set for
a failure due to cause s at time t(si) .
126 7 Regression Models
Example 7.1 This example illustrates the ideas of construction of partial likelihood
and estimation of the parameter with hypothetical data. Let a group of five patients
with chronic kidney disease was observed for 5 years from their ages 60 years. The
hypothetical data given in Table 7.1 show the age, gender, and status of the patients.
Assume the hazard function, h(t; x) = h 0 (t)exβ , given in (7.16), for the data
where t denotes age, and x = {0 for males and 1 for females}. First, we derive the
partial likelihood for these observations and then find the MLE of the parameter β
based on the above model.
Since there are three deaths, the partial likelihood will be the product of three
terms—one in respect of each age at which deaths occur. For the first death, the
contribution to the partial likelihood is
h 1 (61|z = 0)
L1 = .
h 1 (61|z = 0) + h 2 (61|z = 1) + · · · + h 5 (61|z = 0)
This gives the ratio of the hazard probability for the patient who dies at the youngest
age and the total hazard probability for those patients alive at that age. L 1 is equivalent
to
h 0 (61) 1
L1 = = .
h 0 (61) + h 0 (61)eβ + h 0 (61)eβ + h 0 (61) + h 0 (61) 3 + 2eβ
Similarly, for the second death, the contribution of the partial likelihood is
h 3 (63|z = 1) eβ
L2 = = .
h 3 (63|z = 1) + h 4 (63|z = 0) + h 5 (63|z = 0) 2 + eβ
Finally, for the third death, the contribution to the partial likelihood is
h 4 (64|z = 0) 1
= .
h 4 (64|z = 0) + h 5 (64|z = 0) 2
1 eβ 1 eβ
L = L1 × L2 × L3 = × × = C .
3 + 2eβ 2 + eβ 2 2eβ + 3 eβ + 2
where C is a constant.
The log likelihood is
ln L ∝ β − ln 2eβ + 3 − ln eβ + 2 .
2eβ eβ
1− β − β =0
2e + 3 e + 2
β β
2e + 3 e + 2 − 2eβ eβ + 2 − eβ 2eβ + 3
or, =0
2eβ + 3 eβ + 2
or, 2e2β + 4eβ + 3eβ + 6 − 2e2β − 4eβ − 2e2β − 3eβ = 0
or, 6 − 2e2β = 0.
That is, these hypothetical data indicate that the hazard probability for a male
patient is 42.26% lower than that of a female patient.
a nuisance parameter. This restricts the use of the proportional hazards model for
prediction purposes. Another limitation of the proportional hazards model is the need
for satisfying the proportionality assumption which is violated very often in reality.
An alternative to the proportional hazards model is the accelerated failure time (AFT)
model. In an accelerated failure time model, we consider the role of a covariate is
to accelerate or decelerate the lifetime by a constant in terms of hazard, probability
density, or survivor functions. This makes an accelerated failure time model more
attractive for direct interpretation of results meaningfully.
We know that lifetime is non-negative; hence, the linear relationship between log
lifetime and covariates can be written as
ln T = xβ + ε (7.33)
T = exβ+ε
= T0 exβ (7.34)
T1 = T0 eβ = T0 γ
or, alternatively
T0 = T1 e−β = T1 /γ .
A comparison can be made between the survivor functions for groups 0 and 1. For
group 1, we can show
Similarly,
7.4 Accelerated Failure Time Model 129
γ 0 (t/γ )
1
f 1 (t) f 1
h 1 (t) = = = h 0 (t/γ ). (7.37)
S1 (t) S0 (t/γ ) γ
γ h 1 (t) = h 0 (t/γ )
implying the hazard at time t for group 1 is γ times that of the hazard at time t/γ
for group 0. In case of γ = 1, both remain the same. If γ = 3, then the risk at time
t for group 1 will be one-third of the risk of group 0 items/subjects at one-third of
the time. Another way to interpret is that the risk of group 0 at one-third time is
equivalent to three times risk of the group 1 items/individuals at time t. This is clear
from the relationship between group 1 and group 0 that T1 = T0 eβ which implies
that the survival time increases in group 1 compared to 0 if β > 0 for a unit increase
in variables which results in an acceleration in survival or failure time.
To generalize the accelerated failure time model based on the background provided
above, let us rewrite
S(t; x) = S0 (tg(x))
where γ1 = g(x) = exβ . Similarly, the probability density and hazard functions
are f (t; x) = g(x) f 0 (tg(x)) and h(t; x) = g(x)h 0 (tg(x)), respectively. Using
g(x) = exβ , the survivor, probability density, and hazard functions for the accelerated
failure time model are shown below:
S(t; x) = S0 (tg(x)) = S0 texβ , (7.38)
f (t; x) = exβ f 0 texβ , (7.39)
and
h(t; x) = exβ h 0 texβ . (7.40)
For the estimation and test for the accelerated failure time models, several infer-
ence procedures have been proposed. Both rank-based and least-squares-method-
based techniques have been suggested, but still the estimation procedure remains
difficult. In this section, the method proposed by Buckley and James (1979) is intro-
duced. The Buckley–James method is a usual least squares approach for censored
data. Stare et al. (2000) observed that the Buckley–James method provides consistent
130 7 Regression Models
estimators under usual regularity conditions and appears to be superior to other least
squares approaches for censored data.
Let us consider a model for lifetime T (or some monotonic transformation) as
follows
Ti∗ = ln Ti = xi β + εi , i = 1, . . . , n, (7.41)
where εi are iid with E(εi ) = 0 and Var(εi ) = δ 2 . Here the distribution of εi is
unspecified with distribution function F. It is also assumed that ε and x are inde-
pendent. We observe Yi = min(Ti∗ , Ci∗ ), where Ci∗ is the log censoring time and
δi = I (Ti ≤ Ci ) is the censoring indicator.
According to Buckley and James
Yi∗ = Yi δi + E Ti∗ Ti∗ > Yi (1 − δi ).
It can be shown that E Yi∗ = E Ti∗ . Then
E Ti∗ Ti∗ > Yi = xi β + E(εi |εi > Yi − (xi β) )
where
∞
dF
E(εi |εi > Yi − (xi β) ) = ε .
1 − F(Yi − (xβ))
Yi −(β0 +xβ)
We can obtain the distribution function approximately from the survivor function
using the product-limit method. Hence
" ! #
ε j >εi wjεj
Yi∗ = Yi δi + xi β + (1 − δi ). (7.42)
1 − F(εi )
Let us denote the vector of initial values b for the vector of parameters β; then,
the estimating equations are
7.4 Accelerated Failure Time Model 131
n $ %
U (β, b) = (xi − x̄) Ŷi∗ (b) − Ȳ (b) − (xi − x̄) β = 0 (7.44)
i=1
!n !n
where x̄ = n1 i=1 xi and Ȳ ∗ (b) = n1 i=1 Ȳi∗ (b).
The iterative procedure provides a consistent and asymptotically normal estimator
of β, and confidence intervals can be obtained by using the Wald method.
In both survival and reliability analyses, we need to fit regression models to identify
factors associated with lifetimes. Due to the nature of the data, it is not a practical
option to use a linear regression model, and we need alternative regression models.
The logistic regression models are used for nominal or ordinal binary/polytomous
outcomes. The proportional hazards and accelerated failure models are semipara-
metric models, because the baseline hazards or survivor functions are not specified.
However, the accelerated models can be parametric as well if the underlying proba-
bility distributions are specified. The accelerated failure time model is ln T = xβ +ε,
where ε is not specified in Sect. 7.4. A fully specified accelerated failure time model
requires specification of ε, and this becomes a parametric regression model. In engi-
neering, the analysis of reliability of components may require parametric regression
models.
with the survivor, hazard, and cumulative hazard functions S(t) = e−λt , h(t) = λ,
and H (t) = λt. The expected value of failure time under exponential distribution is
E(T ) = μ = 1/λ.
ln T = xβ + ε (7.46)
T = exβ+ε .
ε = ln T − xβ.
g(μ) = ln μ = xβ
and, it is equivalently
λ = e−xβ .
S(t) = e−(e t )
−xβ
h(t) = λ = e−xβ
and
H (t) = λt = e−xβ t.
Based on the models proposed by Glasser (1967), Cox (1972), Prentice (1973), and
Breslow (1974), we can consider a special case for expressing the hazard function
∗
which is equal to h(t) = λ = exβ where β ∗ = −β. The likelihood function
for partially censored data, we can define two variables (Ti , δi ), i = 1, . . . , n
where Ti is lifetime for the ith item/subject and δi = 1 indicates that the lifetime
is uncensored and δi = 0 indicates that the observed lifetime is censored that is
the lifetime is observed only partially up to the time of censoring. The likelihood
function is
7.5 Parametric Regression Models 133
∗ δi ∗ 1−δi
n
xβ ∗ − e ti − e xβ ti
xβ
L= e e e (7.48)
i=1
n
∗ ∗ ∗
ln L = δi xi β − exi β ti + (1 − δi ) −exi β ti
i=1
n
∗ xi β ∗
n
n
∗
= δi xi β − e ti = δi xi β ∗ − exi β ti . (7.49)
i=1 i=1 i=1
The estimating equations are obtained by differentiating the log likelihood func-
tion with respect to the regression parameters
∂ ln L n n
∗
= δi xi j − xi j e(xi β ) ti = 0, j = 0, 1, . . . , p. (7.50)
∂β j i=1 i=1
∂ 2 ln L n
∗
=− xi j xi j e(xi β ) ti , j, j = 0, 1, . . . , p. (7.51)
∂β j ∂βx j i=1
= −2[ln L 0 − ln L 1 ] ∼ χ p2 (7.52)
where α is the shape parameter and λ is the scale parameter of the distribution.
α
The hazard and survivor functions are h(t) = αλ(λt)α−1 and S(t) = e(−λt) . Let
ln T = xβ + σ ε, where σ ε is distributed as extreme value with scale parameter σ .
Now let λ = e−xβ and replacing λ with e−xβ , we obtain
α−1 −(e−xβ t )α
f (t|α, λ, x ) = αe−xβ e−xβ t e , t ≥ 0,
α−1
h(t, x) = αe−xβ e−xβ t ,
α
S(t, x) = e( ) .
−xβ
−e t
(7.54)
∂ ln L n
= −αxi j δi − e−αxi β = 0. (7.58)
∂β j i=1
n
yi − xi β yi −xi β yi −xi β
ln L = −r ln σ + δi −e σ
− (1 − δi )e σ
i=1
σ
n
yi − xi β yi −xi β
= −r ln σ + δi −e σ (7.60)
i=1
σ
!n
where r = i=1 δi .
The estimating equations for β and σ are
n
∂ ln L −xi j yi −xi β
= δi − e σ
= 0, j = 1, 2, . . . , p
∂β j i=1
σ
n
∂ ln L r 1 yi − xi β yi −xi β
=− + − δi − e σ = 0.
∂σ σ i=1
σ σ
∂ ln L 1
n
yi −xi β
=− 2 xi j xik e σ = −I β̂ j , β̂k , j, k = 0, 1, . . . , p
∂β j ∂βk σ i=1
y −x β
1 yi − xi β
n
∂ ln L i i
=− 2 xi j e σ = −I β̂ j , σ̂ , j = 0, 1, . . . , p
∂β j ∂σ σ i=1 σ
-
.
∂ ln L r n
1 yi − xi β 2 yi −xσ i β
=− 2 − e = −I σ̂ .
∂σ ∂σ σ i=1
σ 2 σ
For testing the null hypothesis H0 : β j = 0, we can use the Wald test
β̂ j
W =
se β̂ j
1
e− 2σ 2 (ln t−μ) , t > 0.
1 2
f (t) = √ (7.62)
t 2π σ 2
(7.62)
and
1
r
∂ ln L r
=− + 3 (ln ti − μi )2
∂σ σ σ i=1
n
1 ln ti − μi ln ti − μi ln ti − μi
+ f /S = 0. (7.66)
σ i=r +1 σ σ σ
∗∗
ratio test for testing the null hypothesis H0 : β = 0
We can use the likelihood
∗∗
where β = β1 , . . . , β p as follows
= −2(ln L 0 − ln L 1 )
7.5.4 Example
Example 7.2 This example is reproduced, with permission, from Blischke et al.
(2011). Table 7.2 shows a part of the warranty claims data for an automobile com-
138 7 Regression Models
ponent (20 observations out of 498).2 The data set includes age (in days), mileage
(in kilometers), failure mode, region, types of automobile that used the unit, and
other factors. Failure modes, type of automobile used the component, and auto-used
zone/region are shown in codes. We analyze the failure data (498 claims) using the
parametric regression model.
The aim of the analysis is to investigate how the usage-based lifetime (used km) of
the component differs with respect to age (x 1 ) and other three categorical covariates:
region [(x 2 : Region1 (R1), Region2 (R2), Region3 (R3), Region4 (R4)], type of
automobiles that use the component [x 3 : Auto1 (A1), Auto2 (A2)], and failure modes
[x 4 : Mode1 (M1), Mode2 (M2), Mode3 (M3)]. The number of observed claims or
failures in R1, R2, R3, R4, A1, A2, M1, M2, and M3 are, respectively, 29, 105, 172,
192; 143, 355; 364, 106, and 28.
Without loss of generality, {R1, A1, M1} is taken as the reference or baseline
level, the level against which other levels (all possible combinations of the values
of three covariates) are compared. The covariate vector x = (1, x1 , x2 , x3 , x4 )
can then be rewritten as x = (1, x D , x R2 , x R3 , x R4 , x A2 , x M2 , x M3 ) under
the assumption that, except for x D , the other six dichotomous covariates take on
the values 1 or 0 to indicate the presence or absence of a characteristic. A Weibull
regression model f (y|x, β, σ ) is assumed for mileage Y, with scale parameter σ
and location parameter dependent on covariates x, namely
μ(x) = xβ = (β0 + x D β D + x R2 β R2 + x R3 β R3 + x R4 β R4
+x A2 β A2 + x M2 β M2 + x M3 β M3 ).
Table 7.2 summarizes the numerical results for the Weibull regression model
obtained by using Minitab3 . In this table, very small p-values for all of the regression
coefficients except β M2 ( p = 0.867), provide strong evidence of the dependency of
average lifetime on those covariates.
The log likelihood of the final model is −5479.3, while the log likelihood of the
null model (with intercept only) is −5628.4. The likelihood ratio chi-square statistic
is −2[−5628.4 − (−5479.3)] = 298.2 with 7 degrees of freedom, and the associated
p-value is 0. Thus, we reject the null hypothesis that all regression parameters are
zero.
Comment: A set of models (smallest extreme value, exponential, Weibull, normal,
lognormal, logistic, and log logistic) were fitted to the data. It was found, based on
the Akaike Information Criterion (AIC) values and the plots of residuals, that the
Weibull is the best model for the data among these alternatives (Blischke et al. 2011).
The estimates of Table 7.3 can be used to estimate and compare other reliability-
related quantities (e.g., B10 life, MTTF) at specified levels of the covariates. For
example, when the covariates of age, region, auto type, and failure mode are fixed,
2 The information regarding the names of the component and manufacturing company are not dis-
closed to protect the proprietary nature of the information.
3 This may also be done with S-plus and R-language.
140 7 Regression Models
Table 7.3 Estimates of parameters β and γ for the Weibull regression model for usage
Parameters ML estimates Standard error Z p 95% Normal CI
Lower Upper
β0 8.9713 0.1516 59.18 0.000 8.6741 9.2684
βD 0.0052 0.0003 16.03 0.000 0.0045 0.0058
β R2 0.3860 0.1432 2.70 0.007 0.1055 0.6666
β R3 0.5678 0.1377 4.12 0.000 0.2980 0.8376
β R4 0.5027 0.1319 3.81 0.000 0.2442 0.7613
β A2 −0.1638 0.0690 −2.37 0.018 −0.2991 −0.0286
β M2 0.0127 0.0758 0.17 0.867 −0.1359 0.1614
β M3 0.2593 0.1304 1.99 0.047 0.0037 0.5149
Shape γ = 1/σ 1.5376 0.0495 1.4435 1.6377
respectively, at 365 days, Region1, Auto1, and Mode1, the ML estimate, and 95%
confidence intervals of B10 life are 12,072.7 km and [9104.48, 16,008.7]. These
estimates become 23,433.6 km and [16,890.3, 32,511.8] for covariate values of age
365 days, Region3, Auto2, and Mode3. Under the first combination of levels of
covariates, the estimates imply that there is 95% confidence that 10% of the units are
expected to fail between usages 9104 and 16,009 km. Estimates of Bp life for other
values of the covariates may be estimated and interpreted similarly.
The probability plot for standardized residuals (Fig. 7.1) is used to check the
Fig. 7.1 Smallest extreme value probability plots for standardized residuals
7.5 Parametric Regression Models 141
assumptions of a Weibull model with assumed parameters for the data. The plotted
points do not fall on the fitted line perfectly, but the fit appears to be adequate, with
the possibility of one or a few outliers. This suggests that the residual plot does not
represent any serious departure from the Weibull distributional assumption in the
model for the observed data (Blischke et al. 2011).
References
Abstract The concept of generalized linear models has become increasingly useful
in various fields including survival and reliability analyses. This chapter includes the
generalized linear models for various types of outcome data based on the underlying
link functions. The estimation and test procedures for different link functions are
also highlighted.
8.1 Introduction
1 Sectionsof this chapter draw from the co-author’s (M. Ataharul Islam) previous published work,
reused here with permissions (Islam and Chowdhury 2017).
© Springer Nature Singapore Pte Ltd. 2019 143
M. R. Karim and M. A. Islam, Reliability and Survival Analysis,
https://doi.org/10.1007/978-981-13-9776-9_8
144 8 Generalized Linear Models
The outline of the chapter is as follows. Section 8.2 presents the exponential fam-
ily and GLM. Section 8.3 explains the expected value and variance for exponential
family of distributions. Section 8.4 discusses the components of a GLM. Section 8.5
deals with estimating equations. Sections 8.6–8.11 discuss the deviance of mod-
els, including the exponential, gamma, Bernoulli, Poisson, and Weibull regression
models.
f (y; p) = p y (1 − p)1−y , y = 0, 1
= exp[y ln p + (1 − y) ln(1 − p)]
⎡ ⎤
y ln 1−p p − (− ln(1 − p))
= exp⎣ ⎦ (8.2)
1
Here θ = ln p
1− p
, b(θ ) = − ln(1 − p), c(y, φ)= 0, a(φ) = 1.
Example 8.2 Exponential distribution. We can express the exponential form of the
exponential distribution as follows
f (t; λ) = λe−λt , t ≥ 0
= exp[−λt + ln λ]
{−λt−(− ln λ)}
= exp (8.3)
1
1
e− 2σ 2 (y−μ) , −∞ < y < ∞
1 2
f y; μ, σ 2 = √
2π σ 2
1 2
1
= exp − 2 y − 2μy + μ − ln 2π σ 2 2
2σ 2
yμ − 2 μ1 2
1 2 1
= exp − y − ln 2π σ 2
(8.4)
σ2 2σ 2 2
where θ = μ, b(θ ) = μ2 /2, φ = σ 2 , c(y, φ) = − y 2 / 2σ 2 + 21 ln 2π σ 2 .
The lognormal distribution can be shown as
1
e− 2σ 2 (ln t−μ) , t ≥ 0
1 2
f t; μ, σ 2 = √
t 2π σ 2
1
1
= exp − 2 (ln t) − 2μ(ln t) + μ − ln 2π σ t
2 2 2 2
2σ 2
(ln t)μ − 2 μ 1 2
1 1
= exp − (ln t)2 − ln 2π σ 2 t 2 . (8.5)
σ2 2σ 2 2
The above formulation shows that a canonical link function does not exist for a
lognormal distribution because the distribution does not belong to the exponential
family. However, as the transformation Y = ln T belongs to the exponential family,
we can use the relationship to obtain estimates for the lognormal distribution based
on the GLM estimates for normal distribution.
The expected value and variance can be shown easily under regularity conditions
for exponential family of distributions. Differentiating f (t, θ ) with respect to θ , we
obtain
d f (t, θ ) 1
= t − b (θ ) f (t, θ ) (8.6)
dθ a(φ)
E(T ) = b (θ ).
Taking second derivative with respect to the parameter and then integrating over
the outcome variable, we obtain
d2 f (t, θ ) 1 1
2
dt = −b (θ ) + E t − b (θ ) = 0.
dθ 2 a(φ) a(φ)
where a(φ) is known as the dispersion parameter and b (θ ) is the variance function
or the function of the mean.
In a generalized linear model, there are three components: (i) random component, (ii)
systematic component, and (iii) link function. The random component specifies the
underlying distribution of the outcome variable, T ∼ f (t, θ, φ), and the systematic
component describes the linear function of selected covariates in the model, η = xβ.
The link function is the link between the random and the systematic component,
θ = g(μ) = η = xβ, where μ = E(T |x ). Here the link function, g(μ), provides
the link between the random variable T and the systematic component, η = xβ such
that function of E(T ) = μ = g −1 (xβ). This implies that the expected value can also
be expressed as a function of regression parameters such as μ(β). The link function
is θ = g(μ(β)) = β0 + β1 x1 + · · · + β p x p .
For binary outcome data, let the random variable, Y ∼ Bernoulli( p), which can
be shown as
f (y, θ ) = p y (1 − p)1−y , y = 0, 1
θ = g[μ(β)] = η = Xβ.
Let us denote μ(β) = μ for brevity; then the logit link function is
μ
g[μ] = ln = Xβ.
1−μ
e Xβ
μ=
1 + e Xβ
θ = Xβ
and
1
g(μ) = − = Xβ
μ
1
μ=−
Xβ
In many instances, the negative reciprocal link function may fail to provide results
for ensuring non-negative values of the mean, and there may also be a problem with
convergence in the process of estimating parameters; alternatively, we can use the
following link function
1
θ = ln λ = g(μ) = ln = Xβ (8.11)
μ
and the model becomes μ = e−Xβ . This relationship implies λ = e Xβ which is com-
monly used in many parametric and semiparametric regression models in reliability
and survival analyses.
n
n
l(θ, φ, t) = l(θi , φ, ti ) = [{ti θi − b(θi )}/a(φ) + c(ti , φ)]. (8.13)
i=1 i=1
As it is shown that θi = ηi in case of canonical link, the chain rule reduces to:
where
∂li ti − b (θi ) ti − μi
= = ,
∂θi a(φ) a(φ)
and
p
∂ηi
θi = Xi j β j , = Xi j .
j=1
∂β j
∂l ∂l ∂θi
= · , j = 1, . . . , p
∂β j ∂θi ∂β j
1
n
= [ti − μi ]X i j , j = 1, . . . , p. (8.15)
a(φ) i=1
1
n
[ti − μi ]X i j = 0.
a(φ) i=1
150 8 Generalized Linear Models
n
[ti − μi ]X i j = 0. (8.16)
i=1
It may be noted here that μi = μi (β) and in case of canonical link, the relationship
between linear function and canonical link function is θi = g[μi (β)]. Some examples
are shown below:
(i) Identity link: θi = μi (β); hence, μi (β) = X i β.
The estimating equations are:
n
[ti − X i β]X i j = 0, j = 1, . . . , p.
i=1
n
ti − e X i β X i j = 0, j = 1, . . . , p.
i=1
μi (β) eXi β
(iii) Logit link: θi = ln 1−μi (β)
; hence, μi (β) = 1+e X i β
.
eXi β
ti − X i j = 0, j = 1, . . . , p.
i=1
1 + eXi β
8.6 Deviance
Deviance is introduced with GLM to measure the goodness of fit for a model that
links the random component and systematic component through a link function. The
random component provides the probability distribution of the outcome variable,
and from its exponential form, we obtain the natural parameter that is used as a link
function as shown below:
8.6 Deviance 151
and
eθ 1
Var(Y ) = a(φ)b (θ ) = = μ(1 − μ).
1 + eθ 1 + eθ
The systematic component is η = Xβ, and the canonical link function can be
rewritten as
μ e Xβ
θ = g(μ) = ln = Xβ, μ = , and b(θ ) = ln(1 + e Xβ ).
1−μ 1 + e Xβ
n
L(θ, φ, t) = e[{ti θi −b(θi )}/a(φ)+c(ti ,φ)]
i=1
n
l(θ, φ, t) = ln L(θ, φ, t) = [{ti θi − b(θi )}/a(φ) + c(ti , φ)].
i=1
n
l(μ, φ, t) = ln L(μ, φ, t) = [{ti θi (μ) − b(θi (μ))}/a(φ) + c(ti , φ)] (8.19)
i=1
where θ = g(μ) = Xβ and hence b(θ ) is a function of Xβ. In this likelihood function,
we consider a model with (p + 1) parameters. Hence, the likelihood estimation
procedure involves (p + 1) parameters for estimating the expected value E(Ti ) =
μi . As n expected values are estimated using only a small number of parameters
compared to the sample size, the estimated means may deviate from the true values
and one of the ways to have an idea about such deviation is to compare with the
likelihood based on a saturated model. The saturated model for the observed sample
data is to replace the mean by its observed value; in other words, E(Ti ) is replaced
by Ti . This saturated model can be referred as the full model. For the full model, the
canonical parameter can be defined as θ = g(t). The log likelihood function for the
saturated model is
n
l(t, φ, t) = ln L(t, φ, t) = [{ti θi (t) − b(θi (t))}/a(φ) + c(ti , φ)]. (8.20)
i=1
A small value of deviance may indicate good fit, but a large value may reflect
poor fit of the model to the data.
f (t; λ) = λe−λt
8.7 Exponential Regression Model 153
f (t; λ) = λe−λt
= e−λt+ln λ (8.23)
n
n
l(ti ; λ) = − λi ti + ln λi .
i=1 i=1
The deviance is
The two-parameter gamma probability density function for failure time data is
1 γ γ −1 −λt
f (t|λ, γ ) = λ t e ,t ≥0
γ
∞ γ −1 −t
where λ > 0, γ > 0 and γ = 0 t e dt.
The exponential form is
where
λ
θ = − , b(θ ) = − ln λ = − ln(−γ θ ), a(φ) = 1/γ , and c(t, φ) = (γ − 1) ln t − ln γ .
γ
−γ
The expected value of T is E(T ) = μ = b (θ ) = − −γ θ
= γλ , the variance function
γ2
isV (μ) = b (θ ) = 1
(θ)2
= λ2
= μ2 , and the variance is V (T ) = a(φ)b (θ ) =
γ
γ
μ
1 2
= λ2
.
The log likelihood function is
n
(−λi /γ )ti + ln λi
l(λ, γ , t) = ln L(λ, γ , t) = + (γ − 1) ln ti − ln γ
i=1
1/γ
n
= −λi ti + γ ln λi + (γ − 1) ln ti − ln γ
i=1
n
γ
l(μ, γ , t) = ln L(μ, γ , t) = − ti − γ ln μi + γ ln γ + (γ − 1) ln ti − ln γ .
i=1
μi
n
γ
l(t, γ , t) = ln L(t, γ , t) = − ti − γ ln ti + γ ln γ + (γ − 1) ln ti − ln γ .
i=1
ti
The deviance is
D = 2 l(t, γ , t) − l(μ, γ , t)
n
ti ti − μi
=2 −γ ln − . (8.26)
i=1
μi μi
n
l(μ, γ , t) = −γ {(X i β∗)ti − ln(X i β∗)} + γ ln γ + (γ − 1) ln ti − ln γ .
i=1
8.8 Gamma Regression Model 155
1
n
∂l
∗ = [ti − μi ]X i j = 0, j = 0, 1, . . . , p
∂β j a(φ) i=1
which are
n
∂l 1 1
= ti − X i j = 0, j = 0, 1, . . . , p.
∂β ∗j a(φ) i=1 X i β∗
eXi β
ti − X i j = 0, j = 1, . . . , p.
i=1
1 + eXi β
n
yi 1 − yi
= yi ln + (1 − yi ) ln . (8.28)
i=1
μ̂i 1 − μ̂i
156 8 Generalized Linear Models
e X i β̂
Here μ̂i = .
1+e X i β̂
n
yi − e X i β X i j = 0, j = 1, . . . , p.
i=1
n n
Here μ̂i = e X i β̂ . If i=1 yi = i=1 μ̂i , then the deviance for log link is
n
yi
D=2 yi ln . (8.31)
i=1
μ̂i
The Weibull distribution is one of the most widely used distributions in reliability
and survival analyses. The Weibull distribution belongs to the exponential family but
does not have a canonical parameter; hence, a direct application of generalized linear
8.11 Weibull Regression Model 157
n
λ
ln L = ti X i β + ln λ + ln −X i β + (λ − 1) ln ti . (8.34)
i=1
n λ
∂ ln L ti − μi xi j
= = 0,
∂β j i=1
μi
and
∂ 2 ln L n
=− xi j xik μi2 , j, k = 0, 1, . . . , p.
∂β j ∂βk i=1
158 8 Generalized Linear Models
The first set of equations shown above are scores used as estimating equations,
and the second set of equations provide elements of the information matrix if taken
the negative value of the second derivative
∂ 2 ln L
I jk = − .
∂β j ∂βk
The iterative weighted least squares (IWLS) method can be easily applied to
obtain the MLEs, b, iteratively using the following equation
−1 (m−1) (m−1)
b(m) = X W (m−1) X XW Z
where W is an n × n diagonal matrix with the elements wii = μi2 , and the modified
tiλ −μi
dependent variable, Z , has elements z i = X i b(m−1) + μi2
. Initial approximation,
b , is used in an iterative algorithm to determine the subsequent estimates b(1) , and
(0)
the iterations continue until convergence is obtained. The iterative procedure begins
by setting λ = 1.
As we have discussed earlier that there may be problem in obtaining convergence
if the negative reciprocal link is used and alternatively log link function is a good
choice. In that case, θ = ln μ = X β.
Sharmin and Islam (2017) showed a generalized Weibull distribution in the sense
of Cox and Reid (1987) that transforms the parameters from (η, λ) to (α, λ). The
parameters α and λ are globally orthogonal. This procedure is discussed and applied
by Prudente and Cordeiro (2010). Let α = ηe{λ (2)} ; then, the link function is
−1
defined by
μ (2)
g(μ) = g(α) = ln
e λ
,
1 + λ1
and
(m) (m−1) 1 (m)
λ =λ 1+ 1+ζ , (8.36)
ψ (1)
References
Abramowitz M, Stegun IA (1970) Handbook of mathematical functions with formulas, graphs and
mathematical tables. National Bureau of Standards, Washington, DC
Agresti A (2015) Foundations of linear and generalized linear models. Wiley, New Jersey
Cox DR, Reid N (1987) Parameter orthogonality and approximate conditional inference. J Royal
Stat Soc B 49(1):1–39
Davis CS (2002) Statistical methods for the analysis of repeated measuments. Springer, New York
Dobson AJ, Barnett AG (2008) An introduction to generalized linear models, 3rd edn. Chapman
and Hall, London
Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models.
Springer, New York
Islam MA, Chowdhury RI (2017) Analysis of repeated measures data. Springer, Singapore
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, London
McCulloch CE, Searle SR, Neuhaus JM (2008) Generalized, linear, and mixed models, 2nd edn.
Wiley, New Jersey
Nelder JA, Wedderburn RWM (1972) Generalized linear models. J Royal Stat Soc, Ser A (General)
135(3):370–384
Prudente AA, Cordeiro GM (2010) Generalized Weibull linear models. Commun Stat Theory Meth-
ods 39:3739–3755
Sharmin AA, Islam MA (2017) Generalized Weibull linear models with different link functions.
Adv Appl Stat 50:367–384
Stroup WW (2012) Generalized linear mixed models. CRC Press, Boca Raton
Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models and the Gaussian
Newton method. Biometrika 61:439–447
Chapter 9
Basic Concepts of System Reliability
9.1 Introduction
This chapter describes and illustrates the basic concepts of system reliability analysis
by focusing on the components within the system. A component is a part or element
of a larger whole, especially a part of a machine or vehicle. A system is a collection of
components, modules, assemblies, or subsystems that are interconnected to a specific
design in order to perform a given task. The types of components, their quantities
and qualities, and the way they are assembled within the system have a direct impact
on the reliability of the system. That is, the reliability of a system is related to the
types, quantities, and reliabilities of its components. One of the main objectives of
system reliability analysis is to derive or select a suitable probability distribution that
represents the lifetime of the entire system based on the probability distributions of
the lifetimes of its components.
Generally, a component is characterized in terms of two states—working or
failed—first it starts in its working state and changes to a failed state after a cer-
tain time and/or usage. The failure of a component occurs due to a complex set
of interactions between the material properties and other physical properties of the
component and the stresses that act on the component (Blischke and Murthy 2000).
The time to failure (with respect to age or usage) is a random variable, and it can be
modeled by a probability distribution. Subsequent failures of a component depend on
the type of implemented rectification actions. These, in turn, depend on whether the
component is repairable or nonrepairable. In the case of a nonrepairable component,
a failed component needs to be replaced by a new one. On the other hand, in the case
As stated in Blischke et al. (2011), almost all products are built using many com-
ponents, and the number of components used increases with the complexity of the
product. As such, a product can be viewed as a system of interconnected compo-
nents. The decomposition of a product or system involves several levels. The number
of levels that is appropriate depends on the system. The performance of the system
depends on the state of the system (working, failed, or in one of several partially
failed states), and this, in turn, depends on the state (working/failed) of the various
components (Blischke and Murthy 2000; Blischke et al. 2011).
A diagram that displays the relationships of components of a system showing all
logical connections of (functioning) components required for system operation is
called reliability block diagram (RBD). In a RBD, each component is represented
by a block with two endpoints. When the component is in its working state, there is
a connection between the two endpoints. This connection is broken when the com-
ponent is in a failed state (Blischke and Murthy 2000; Blischke et al. 2011; Murthy
and Jack 2014; Ben-Daya et al. 2016). The component block might have a single
input and a single output or multiple inputs and multiple outputs. By convention,
inputs are generally assumed to enter either from the left side or from the top side
of the box, and outputs exit either from the right side or from the bottom side of the
box (Myers 2010). Systems may be of different types, e.g., series structure, parallel
structure, and general structure (combination of series and parallel substructures).
RBD represents a system that uses interconnected blocks arranged in combinations
of series and/or parallel configurations. It can be used to estimate quantitatively the
reliability and availability (or unavailability) of a system by considering active and
standby states.
A system can be analyzed by decomposing it into smaller subsystems or compo-
nents and estimating reliability of each subsystem/component to assess the overall
reliability of the system by applying the rules of probability theory1 according to
the RBD. If a system is constructed for performing more than one function, each
function must be considered individually, and a separate RBD has to be established
for each function of the system. The construction of RBD for a very complex system
may be complicated.
1 Here the rules of probability theory mean, for example, the additive, multiplicative, and conditional
In general, if a series system consists of k components and F i (t) denotes the cdf
of the ith component, i = 1, 2, …, k, then the cdf of the system can be expressed as
k
Fsys (t) = 1 − [1 − Fi (t)], t ≥ 0. (9.3)
i=1
If F i (t) and F sys (t) are replaced in terms of their reliability functions, we get
k
1 − Rsys (t) = 1 − [Ri (t)], t ≥ 0.
i=1
Or,
k
Rsys (t) = Ri (t), t ≥ 0. (9.4)
i=1
Thus, the reliability of a series system can be expressed as the product of the
reliabilities of individual components of the system. The reliability of a series system
is always lower than the reliability of any of its components. If the Ri (t)’s are estimated
from independent sets of data, with estimates denoted R̂i (t), the estimate of Rsys (t)
is calculated as the product of the R̂i (t), i = 1, 2,…, k. Let hi (t) denotes the hazard
function of the ith component, i = 1, 2, …, k; then using (2.27) and (9.4), the hazard
function of the system can be expressed as
k
d d
h sys (t) = − ln Rsys (t) = − ln Ri (t)
dt dt i=1
9.3 Series System Reliability 165
k k
d
= − ln{Ri (t)} = h i (t), t ≥ 0. (9.5)
i=1
dt i=1
This implies that the hazard function of a series system is equal to the sum of the
hazard functions of the individual components of the system.
If the estimators of the component reliabilities are unbiased, so is the estimator of
Rsys (t), since under independence the expectation of the product is the product of the
expectations (See Blischke et al. 2011). The variance of the estimator is somewhat
more complex. For k = 2, the result is
V { R̂(t)} = [E{ R̂1 (t)}]2 V { R̂2 (t)} + [E{ R̂2 (t)}]2 V { R̂1 (t)} + V { R̂1 (t)}V { R̂2 (t)}.
(9.6)
For larger values of k, the result becomes increasingly complex. The estimated
variance may be used to obtain asymptotic confidence intervals in the usual way.2
The reliability of components (especially in systems containing many compo-
nents) is often characterized by constant failure rate. In this case, the exponential
distribution (discussed in Chap. 4) can be applied, which assumes that the hazard
rate is constant with respect to component age. When a component is functioning
during its useful lifetime, constant-hazard model is precisely suitable. The reliability
function of a constant-hazard rate model for component i at age t is
k
where λ = i=1 λi . The MTTF of the system (assuming that each failed component
is immediately replaced by an identical component) becomes
∞
∞
1
MTTF = Rsys (t)dt = exp(−λt)dt = . (9.9)
λ
0 0
Equation (9.8) indicates that if the distributions of times to failure of each com-
ponent of a series system follow an exponential distribution, then the distribution of
time to failure of the system is again exponential with the failure rate as the sum of
failure rates of individual components.
2 The variances of
F̂(t) and R̂(t) can also be computed by using the delta method (e.g., see Meeker
and Escobar 1998).
166 9 Basic Concepts of System Reliability
4
Rsys (1) = Ri (1) = 0.6 × 0.7 × 0.85 × 0.8 = 0.2856. (9.10)
i=1
Note that the reliability of the system, 0.2856, is less than that of the reliability of
the worst component, R1 (1) = 0.6.
In the reliability block diagram (given in Fig. 9.3), if we assume four different
systems such that the first system consists of only component 1, the second system
consists of components 1 and 2, the third system consists of components 1, 2, and 3,
and fourth system consists of all four components; then, the changes of the reliabilities
of assumed four systems would be as shown in Table 9.1.
Table 9.1 Changes of reliabilities for changing the number of components in the series system
System No. of Reliability of system % changes in the reliability
compo-
nents in
the system
First 1 0.6 –
Second 2 0.6 × 0.7 = 0.42 (0.6 − 0.42)/0.6 × 100 = 30
Third 3 0.6 × 0.7 × 0.85 = 0.357 (0.6 − 0.357)/0.6 × 100 = 40.5
Fourth 4 0.6 × 0.7 × 0.85 × 0.8 = 0.2856 (0.6 − 0.2856)/0.6 × 100 = 52.4
9.3 Series System Reliability 167
Table 9.2 Changes of reliabilities for improving the reliabilities of each component one by one in
the series system
R1 (1) R2 (1) R3 (1) R4 (1) Rsys (1) % changes in the reliability
0.6 0.7 0.85 0.8 0.2856 –
0.7 0.7 0.85 0.8 0.3332 (0.3332 − 0.2856)/0.2856 × 100 = 16.67
0.6 0.8 0.85 0.8 0.3264 (0.3332 − 0.3264)/0.2856 × 100 = 14.29
0.6 0.7 0.95 0.8 0.3192 (0.3332 − 0.3192)/0.2856 × 100 = 11.76
0.6 0.7 0.85 0.9 0.3213 (0.3332 − 0.3213)/0.2856 × 100 = 12.50
The fourth column of Table 9.1 shows the percentages of reliabilities of the systems
decrease compared to the system having a single component, Component 1. This
column indicates that the reliability of a series system decreases as the number of
components in it increases.
In the reliability block diagram (Fig. 9.3), if we increase the reliabilities of each
component one by one by 0.1 taking unchanged the reliabilities of the remaining
components, then the changes of the reliabilities of the systems would be as shown
in Table 9.2.
The sixth column of Table 9.2 shows the percentages of reliabilities of the systems
increase compared to the system having unchanged the reliabilities of the compo-
nents. This column indicates that the improvement in the reliability of the system (in
percentage) is higher when the reliability of the weakest component (Component 1)
is increased by 0.1 in comparison with the cases when reliabilities of the other three
components are increased one by one by the same amount (0.1) (can be seen also in
eGyanKosh 2019). This suggests that if one wants to improve the reliability of the
series system, effort should be given first in improving the reliability of the weakest
component of the system.
where f i (t) denotes the pdf of the ith failure mode, i = 1, 2, …, k. Equation (9.11)
may be rewritten as
k
f i (t)
f sys (t) = Rsys (t) , t ≥ 0. (9.12)
i=1
Ri (t)
Figure 9.4 displays a comparison of the reliability functions for failure mode 1,
failure mode 2, and product (combined failure modes 1 and 2) for 0 ≤ t ≤ 10,000 days.
This figure can be used to assess the reliability of the component for given
days. For example, the figure indicates the reliabilities of the component at age
2000 days are 0.30 for failure mode 1, (R1 (t)), 0.45 for failure mode 2, (R2 (t)), and
3 The competing risk model has also been called the compound model, series system model, and
0.14 for the product (Rsys (t)). The estimated MTTF of the product is found to be
∞
μ = 0 Rsys (t)dt = 1/(λ1 + λ2 ) = 1000 days.
of two mutually independent events equals to the product of the probabilities of the
occurrences of two individual events. Therefore, the resultant cumulative distribution
function for two-component parallel system through age t can be calculated as
k
Fsys (t) = Fi (t), t ≥ 0. (9.14)
i=1
By interchanging F i (t) and F sys (t) with their corresponding reliability functions,
we get
k
Rsys (t) = 1 − [1 − Ri (t)], t ≥ 0. (9.15)
i=1
Thus, the reliability of a parallel system is equal to one minus the product of one
minus the reliabilities of individual components of the system. The reliability of a
parallel system is greater than the individual reliability of any components of the
system. The variance of the estimator (9.15) may be obtained by applying the delta
method.
If the reliabilities of individual components are characterized by constant failure
rates (if λi denotes the hazard rate for component i, i = 1, 2, …, k), the reliability
function of a parallel system changes to
k
Rsys (t) = 1 − 1 − exp(−λi t) , t ≥ 0. (9.16)
i=1
This implies that for the case of parallel system, the reliability function estimating
equation becomes more complex than the case of a series system, even if the failure
rates of the individual elements are constant (also can be seen in Menčík 2016). In
a special case, where all k identical components have the same failure rate (say λ),
the reliability function of parallel system changes to
k
Rsys (t) = 1 − 1 − exp(−λt) , t ≥ 0. (9.17)
9.4 Parallel System Reliability 171
Under such specific situation and also assuming that each failed component is
immediately replaced by an identical component, the MTTF for the system can be
expressed as
1 1 1 1
MTTF = 1 + + + ··· + . (9.18)
λ 2 3 k
4
Rsys (1) = 1 − [1 − Ri (t)]
i=1
= 1 − (1 − 0.6) × (1 − 0.7) × (1 − 0.85) × (1 − 0.8) = 0.9964. (9.19)
Note that the reliability of the system, 0.9964, is greater than that of the reliability
of the best component, R3 (1) = 0.85. This example also indicates that the reliability
of the system is higher compared to the reliability of the series system of Example
9.1 having the same number of components with the same reliabilities.
For selecting series or parallel system, we have to balance the costs of the com-
ponents and the desire reliability of the system. Whether we give importance to the
R2 (1)=0.7
R3 (1)=0.85
R4 (1)=0.8
172 9 Basic Concepts of System Reliability
which is
In combined series and parallel system (also known as mixed system), the compo-
nents are connected in series and parallel arrangement to perform a required system
operation. Figure 9.7 displays two types of reliability block diagrams for combined
systems, top system with five components and bottom system with seven compo-
nents.
To assess the reliability of this system, the RBD is broken into series or parallel
subsystems. The formulas for estimating reliabilities for series and parallel systems
are used to obtain the reliability of each subsystem first, and then, the reliability of
the system can be obtained on the basis of the relationship among the subsystems.
Example 9.5 Suppose a system is constructed based on three components as shown
by the following diagram (Blischke et al. 2011) (Fig. 9.8).
If the lifetimes of components 1, 2, and 3 all follow exponential distributions with
λ = 0.001, 0.002, and 0.003 failures per hour, respectively, then the reliability of the
system for ten hours (t = 10) can be computed as follows:
Component 1 Component 2
Component 4
Component 3
Component 5
Component 3
Component 5
Fig. 9.7 Diagrams of two combined (or mixed) systems, top with five and bottom with seven
components
Component 3
174 9 Basic Concepts of System Reliability
Component 1 and the subsystem with components 2 and 3 are in series structure.
From (9.4), the reliability of the system at t = 10 is
A system that works if and only if at least k of its n components works is called a
k-out-of-n structure. If k = 1, the system will become parallel structure and if k = n,
the system will become series structure. The RBD of a 2-out-of-3 system is shown in
Fig. 9.9. The 2-out-of-3 system is the system where any two of the three components
are required to work for functioning the system. In Fig. 9.9, a 1-out-of-3 (where k
= 1) system means a parallel structure and 3-out-of-3 (where k = n) system gives a
series structure.
To find the reliability of a k-out-of-n system, we consider first a 2-out-of-3 system
as shown in Fig. 9.9. The cumulative distribution function for a 2-out-of-3 system of
independent components can be expressed as
Component 2 Component 3
9.6 K-Out-of-N System Reliability 175
⎡ n ⎤
n
Fsys (t) = ⎣ Fi (t)δi (1 − Fi (t))(1−δi ) ⎦, t ≥ 0, (9.20)
j=n−k+1 δ∈A j i=1
In a special case, where all n identical components have the same and constant
failure rate (say λ), the cumulative distribution function for a k-out-of-n configuration
(9.21) changes to
n
n j (n− j)
Fsys (t) = 1 − e−λt e−λt , t ≥ 0. (9.22)
j=n−k+1
j
Example 9.6 For an illustration, consider a 4-out-of-6 system that works if at least
four out of six components work. Let the lifetimes of the six components are indepen-
dent and identically distributed Weibull random variables having reliability function
t 1.5
Ri (t) = exp − , t ≥ 0, i = 1, 2, . . . 6.
200
where the shape and scale parameters are 1.5 and 200 days, respectively. The relia-
bility function of the system, using (9.21), becomes
6
6 ! "# j ! "#(6− j)
Rsys (t) = 1 − 1 − exp −(t/200)1.5 exp −(t/200)1.5 , t ≥ 0.
j
j=3
(9.23)
References
Beichelt F, Tittmann P (2012) Reliability and maintenance: networks and systems. Taylor & Francis
Group, CRC Press
Ben-Daya M, Kumar U, Murthy DNP (2016) Introduction to maintenance engineering: modelling,
optimization and management. Wiley, New York
Beyersmann J, Allignol A, Schumacher M (2012) Competing risks and multi-state models with R.
Springer, New York
Blischke WR, Murthy DNP (2000) Reliability—modeling, prediction, and optimization. Wiley,
New York
Blischke WR, Karim MR, Murthy DNP (2011) Warranty data collection and analysis. Springer,
London Limited
Dhillon BS (2002) Engineering maintenance: a modern approach. CRC Press, USA
eGyanKosh (2019) A national digital repository. http://www.egyankosh.ac.in/bitstream/123456789/
35168/1/Unit-14.pdf. Accessed on 26 May 2019
Islam MA, Chowdhury RI (2012) Elimination of causes in competing risks: a hazards model
approach. World Appl Sci J 19(5):608–614
Karim MR (2012) Competing risk model for reliability data analysis. In: Proceedings of international
conference on data mining for bioinformatics, health, agriculture and environment, University of
Rajshahi, Bangladesh, pp. 555–562
Kececioglu D (1994) Reliability engineering handbook—V 2. Prentice-Hall, Englewood Cliffs, NJ
References 177
Klein JP, Houwelingen HCV, Ibrahim JG, Scheike (2014) Handbook of survival analysis. Taylor &
Francis Group, CRC Press
Kuo W, Zhu X (2012) Importance measures in reliability, risk, and optimization: principles and
applications. Wiley, New York
Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley, New York
Menčík J (2016) Concise reliability for engineers. InTech
Murthy DNP, Jack N (2014) Extended warranties, maintenance service and lease contracts: modeling
and analysis for decision-making. Springer, London
Myers A (2010) Basic elements of system reliability. In: Complex system reliability. Springer Series
in Reliability Engineering, Springer, London
Pham H (2006) System software reliability. Springer, London
Rausand M, Høyland A (2004) System reliability theory: models, statistical methods, and applica-
tions, 2nd edn. Wiley, New York
Tobias PA, Trindade DC (2012) Applied reliability, 3rd edn. Taylor & Francis Group, CRC Press
Chapter 10
Quality Variation in Manufacturing
and Maintenance Decision
Abstract This chapter looks at the issues in modeling the effect of quality vari-
ations in manufacturing. It models the effects of assembly errors and component
nonconformance. This chapter constructs the month of production—month in ser-
vice (MOP-MIS) diagram to characterize the claims rate as a function of MOP and
MIS. It also discusses on the determination of optimum maintenance interval of an
object.
10.1 Introduction
Quality variation in manufacturing is one of the main causes for the high infant (early)
failure rate of the product.1 For example, if some of the bearings are defective,
they can wear faster, causing washing machines with these defective bearings to
fail earlier. Failures during the infant failure period are highly undesirable because
they provide a negative impact on customers’ satisfaction, especially on customers’
first impression on the product. As such, proper monitoring and diagnosis of the
manufacturing process play an important role toward continuous improvement of
product quality.
Most consumer durables, industrial products, and commercial products are pro-
duced using either continuous or batch production.2 One may view the continuous
production as a batch production by dividing the time into suitable intervals (e.g.,
8-hour shifts, days, months, etc.). Similarly, components, whether produced inter-
nally or bought from external vendors, are produced in batches. Due to variations in
materials and/or production, the quality of components can vary from batch to batch.
This, combined with variations in assembly of the components, can result in quality
variations, at the product level, from batch to batch. For proper root cause analysis,
it is necessary to identify the batch number for each failed item at both the product
1 Sections of the chapter draw from the co-author’s (Md. Rezaul Karim) previous published work,
reused here with permissions (Blischke et al. 2011).
2 For certain products (e.g., automobiles, personal computers), each unit at the product level has a
unique identification number. For example, in the case of automobiles, each vehicle has a unique
identification number, referred to as Vehicle Identification Number (VIN).
© Springer Nature Singapore Pte Ltd. 2019 179
M. R. Karim and M. A. Islam, Reliability and Survival Analysis,
https://doi.org/10.1007/978-981-13-9776-9_10
180 10 Quality Variation in Manufacturing and Maintenance Decision
and component levels. The ability to perform this identification is called traceability
(Blischke et al. 2011). The date of production is critical for traceability of the batch
in production.
Once the batch numbers of failed products or components are identified, the
analysis for assessing quality variation from batch to batch can be performed and the
results related to other variables of the production process can be utilized in order
to control or reduce the quality variations. The analysis results can be used to find
answers to the questions (Jiang and Murthy 2009), such as, (i) Is there a problem
with product reliability? (ii) If so, is it a design or a manufacturing problem? (iii)
If it is a manufacturing problem, is it an assembly or component nonconformance
problem?
Every object (product, plant, or infrastructure) degrades with its age and/or usage
and finally fails (when it is no longer capable of fulfilling the function for which it
was designed). Maintenance actions are used to control the degradation processes
and to reduce the likelihood of failure of an object. In order to preserve the function of
the system, it is vital to identify the maintenance strategies that are needed to manage
the associated failure modes that can cause functional failure (Ahmadi 2010). An
effective maintenance requires proper data management—collecting, analyzing, and
using models for decision making (Murthy et al. 2015). Finally, it determines an
appropriate lifetime (age, usage, etc.) interval for the preventive maintenance of an
object.
This chapter looks at the issues in modeling the effect of quality variations in
manufacturing and in determining the optimum maintenance interval of an object.
The outline of the chapter is as follows. Section 10.2 discusses on reliability from a
product life cycle perspective. Section 10.3 explains the effect of quality variations
in manufacturing. This section models the effects of assembly errors and component
nonconformance. Section 10.4 deals with the construction of month of production—
month in service (MOP-MIS) diagram. Section 10.5 discusses on maintenance and
optimum maintenance interval of an object.
Design Reliability At the design stage, the desired product reliability is determined
through a tradeoff between the cost of building in reliability and the corresponding
consequence of failures. Design reliability is assessed during the design stage linking
the reliability of the components.3 Manufacturers assess this reliability prior to the
launching of the product based on limited test data.
Inherent Reliability For standard products produced in volume, the inherent relia-
bility is the reliability of the produced item that can differ from the design reliability
due to quality variations in manufacturing (such as assembly errors and/or noncon-
forming components) (Jiang and Murthy 2009).
Reliability at Sale Reliability at sale is the reliability of the product that a customer
gets at the time of purchase. After production, the product must be transported to the
market and is often stored for some time before it is sold. The reliability of a unit at sale
depends on the mechanical load (resulting from vibrations during transport), impact
load (resulting from mishandling), duration of storage, and the storage environment
(temperature, humidity, etc.) (Blischke et al. 2011). As a result, the reliability at sale
can differ from the inherent reliability (Murthy et al. 2008).
Field Reliability The field reliability is the reliability of the product in operation.
The field reliability is calculated based on the recorded performance to the customers
during the use of the product. The reliability performance of a unit in operation
depends on the length and environment of prior storage and on operational factors
such as the usage intensity (which determines the load—electrical, mechanical, ther-
mal, and chemical on the unit), usage mode (whether used continuously or intermit-
tently), and operating environment (temperature, humidity, vibration, pollution, etc.)
and, in some instances, on the human operator (Murthy et al. 2008; Blischke et al.
2011). The field reliability is also called the actual reliability or actual performance
of the product.
3 The linking of component reliabilities to product reliability is discussed in Chap. 9 and in Blischke
et al. (2011).
182 10 Quality Variation in Manufacturing and Maintenance Decision
Quality variation in manufacturing is one of the main causes for the high infant
(early) failure rate of the product. As such, proper monitoring and diagnosis of the
manufacturing process play an important role toward continuous improvement of
product quality. The reliability of manufactured products can differ from the desired
design reliability due to variations in manufacturing quality (Jiang and Murthy 2009).
Failure data collected from the field provide useful information to assess whether the
change in product reliability variation may be considered to have a reasonable impact
or not. This section looks at the issues in modeling the effect of quality variations in
manufacturing on product reliability.
Let F0 (t) denote the distribution function of the lifetime variable T for the products
with design reliability. Let R0 (t), f 0 (t), and h 0 (t) denote, respectively, the reliability
function, the density function, and the hazard function associated with F0 (t). Quality
variations in manufacturing can occur for a variety of causes. Two of the main causes
of variations are (i) assembly error and (ii) component nonconformance.
Generally, a simple product consists of several components that are made separately
and then assembled together in production. There may occur errors in the individual
components or in the assembly of the components. If the errors occur during the
assembly operation of the components, the errors are known as assembly errors.
Assembly errors occurred in the products due to improper manufacturing of the
products. This type of assembly operation depends on the product. For an electronic
product, one of the assembly operations is soldering. If the soldering is not done
properly (called dry solder), then the connection between the components can break
within a short period, leading to a premature failure. For a mechanical component, a
premature failure can occur if the alignment is not correct or the tolerance limits are
violated (Blischke et al. 2011).
Failures resulting from assembly errors can be viewed as a new mode of failure
that is different from other failure modes that one examines during the design process
(Jiang and Murthy 2009). Let F1 (t) denote the distribution function associated with
this new failure mode, and R1 (t), f 1 (t), and h 1 (t) denote, respectively, the reliability
function, density function, and hazard function associated with F1 (t). The hazard
function h 1 (t) is a decreasing function of t, implying that failure will occur sooner
rather than later, and that the MTTF under this new failure mode is much smaller
than the design MTTF. Let q, 0 ≤ q ≤ 1, denote the probability that an item has
an assembly error. If Ra (t) denote the reliability function of the produced items,
then according to Jiang and Murthy (2009) and Blischke et al. (2011), Ra (t) can be
expressed as
10.3 Effect of Quality Variations in Manufacturing 183
Items that are produced with nonconforming components will not meet design speci-
fications and will tend to have an MTTF that is much smaller than the intended design
MTTF. Let F2 (t) denote the distribution function of items that have nonconforming
components, and R2 (t), f 2 (t), and h 2 (t) denote, respectively, the reliability, density,
and hazard functions associated with F2 (t). Here, h 2 (t) is an increasing function of
t, with h 2 (t) > h 0 (t) for all t. Let p, 0 ≤ p ≤ 1, denote the probability that an
item produced has nonconforming components, so that its distribution function is
given by F2 (t). Then (1 − p) is the probability that the item is conforming and has
distribution function F0 (t). As a result, the reliability of the items produced is given
by
If items are produced with both assembly errors and component nonconformance,
the reliability function of the items is given by6
4 A general k-fold competing risk model means the competing risk model derived based on k failure
7 In some cases, it can be a shift, if a company operates more than one shift per day.
10.4 Month of Production—Month in Service (MOP-MIS) Diagram 185
Fig. 10.2 Reliability, distribution, density, and hazard functions for the models of assembly errors,
component nonconformance, and combined effects
Note that I represents the total number of production periods, J is the total number
of sale periods and K is the total number of claim periods with I ≤ J ≤ K .
MOP Month of production (indexed by subscript i),
MOS Month of sale (indexed by subscript j),
MIS Month in service (duration for which the item is in use—indexed by t = k− j),
and
n it Number of items from MOP i that fail at age t.
Let the number of items under warranty (or risk of claims) at the beginning of age
group t (MIS t) for the MOP i denoted by RC 3 (i, t), is given by
186 10 Quality Variation in Manufacturing and Maintenance Decision
⎧
⎪ −t+1)
⎪ min(J,K
⎪
⎪ Si j , if t = 1
⎨
j=i
RC3 (i, t) = min(J,K −t+1) (10.4)
⎪
⎪
j+t−2
⎪
⎪ Si j − n i jk if t > 1.
⎩
j=i k= j
After we calculate the number of warranty claims (WC) and the warranty claims
rate (WCR), and we can construct MOP-MIS plots or tables. The age-based number
of warranty claims at age t, WC(t) or nt , defined in (5.11), do not make any difference
by month of production. Another way of defining this is
min(J,K
−t+1)
W C2 (i, t) = n it = n i, j, j+t−1 , i = 1, 2, . . . , I ; t = 1, 2, . . . min(K , W ),
j=i
(10.5)
W C2 (i, t) n it
W C R3 (i, t) = = , i = 1, 2, . . . , I ; t = i, i + 1, . . . , min(W, K )
RC3 (i, t) RC3 (i, t)
(10.6)
which indicates the age-based or MIS-based claim rates for each production month.
For a particular month of production, if the warranty claims rate shows a sudden
change such as a noticeable increase compared to other months of production, then
it may indicate a quality-related problem in that month of production. Definition and
the procedure of estimation of age-based WCR are discussed in detail in Sect. 5.4.
Example 10.1 This example (reproduced with permission from Blischke et al. 2011)
considers warranty claims data for an automobile component. Data relating to com-
ponents manufactured over a 12-month period (I = 12). The component is non-
repairable and the automobiles on which it is used are sold with a nonrenewing
two-dimensional free replacement warranty. The warranty region is a rectangle with
W = 18 months (age limit) and U = 100,000 km (usage limit). The components were
sold over 28 months (J = 28) and the data of the study were on claims that were
filed over an observation interval of 29 months (K = 29). The number of warranty
claims over the 29-month period from the start of production was 2230.
The supplementary data required for performing the analysis are the monthly
production amounts (M i ) over 12 months and monthly sales amounts (S ij ) over
28 months, i = 1, 2,…, 12, j = 1, 2,…, 28. For reasons of confidentiality, details of
monthly production and sales are not disclosed. The total number of units produced
10.4 Month of Production—Month in Service (MOP-MIS) Diagram 187
over the 12-month period was 75,700. The total number sold was 75,666, implying
that nearly all the items produced were sold, but with a lag8 between production and
sale dates. The objective of the analysis is to investigate production-related problems
by looking at the reliability of components produced in different batches (monthly).
The monthly sales data (S ij ) and failures as a function of MIS(t) and MOS(j) for
a particular MOP (September, i = 9) are given in Table 10.1.
The number of items under warranty at the beginning of time period t, RC 3 (i, t),
the number of warranty claims, WC 2 (i, t), and the warranty claims rates WCR3 (i, t)
can be calculated using Eqs. (10.4), (10.5), and (10.6), respectively. These values are
given in Table 10.2.
The estimates of WCR3 (i, t) for the other eleven MOP can be calculated similarly.
All these results are displayed in Table 10.3.
Based on Table 10.3, the MOP-MIS plot of WCR3 (i, t) for all MOP (i = 1, 2,
…, 12) and MIS (t = 1, 2, …, 18) is shown in Fig. 10.3. This figure is useful in
determining if the failure rates are related to the month in service and/or month of
production. The figure indicates that the warranty claims rates are initially decreasing
with respect to the month of production and that there is a significant increase for
the sixth month (June). In MOP June, the high claims rates are for 10, 12, 13, 15, 16,
17, and 18 MIS.
Figure 10.4 shows the estimates of WCR for each MIS separately.9 This figure
indicates that the production period July–November, MOP(7)–MOP(11), is the best
in the sense that the claim rates are low and stable for all MIS in this period. For MIS
from 1 to 10, the claim rates are low and approximately constant in all MOPs. The
variation in claim rates in different MOP increases as MIS increases.
The MOP-MIS charts given above are useful in determining whether or not there
are problems in production. If the WCR for a MOP is unexpectedly high, this indicates
that there may be production-related problems in that MOP. Figures 10.3 and 10.4
indicate that there are some problems with the January, February, June, and possibly
December MOP and these MOPs need further investigation.
8 More on sales lag can be found in Karim and Suzuki (2004) and Karim (2008).
9 The library ggplot2 of R-language is used to create Figs. 10.1 and 10.3. These figures can also be
generated after importing the estimated WCR3 (i, t) in a Minitab worksheet and choosing graph →
scatterplot → with connect and group.
188
Table 10.1 Monthly sales (Si j ) and failures (n it ) indexed by MOS(j) and MIS(t) for a particular MOP(i = 9)
j Si j Failures {nit } in MIS (t) under warranty Tot
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 1064 1 1 2 1 1 2 1 1 1 1 12
2 3600 4 5 1 1 3 5 6 4 5 4 3 4 3 3 5 7 5 68
3 2113 1 1 2 3 2 1 1 2 5 2 3 2 1 3 29
4 720 1 1 1 1 1 1 1 1 1 1 1 11
5 442 1 3 2 1 1 8
6 235 1 2 1 1 1 1 7
7 168 0
8 94 0
9 74 1 2 3
10 90 1 1 2
11 51 1 1 2
12 16 0
13 63 0
14 82 1 1 2 4
15 51 0
16 27 0
17 8 0
18 20 0
19 12 0
20 8 1 1
Tot 8938 6 8 3 5 6 7 9 12 8 10 10 13 8 8 9 6 9 10 147
10 Quality Variation in Manufacturing and Maintenance Decision
10.5 Maintenance of an Object 189
but also on its design and operation and on the servicing and maintenance during
its operational lifetime. Thus, proper functioning over an extended lifetime requires
proper actions (e.g., changing the oil in an engine) on a regular basis, adequate repair
or replacement of failed components, proper storage when not in service, and so
forth (Blischke and Murthy 2003). These actions are parts of maintenance that are
applied to control the degradation processes, to reduce the likelihood of failure, to
restore a failed object to an operational state, etc. Maintenance is the combination
of technical, administrative, and managerial actions carried out during the life cycle
of an item and intended to retain it in, or restore it to, a state in which it can perform
the required function (SS-EN 13306:2010).
There are two primary types of maintenance actions: (i) Preventive maintenance
(PM) is comprised of actions to control the degradation processes and to reduce
the likelihood of failure of an item (component or object). This provides system-
atic inspection, detection, and correction of incipient failures before they occur. (ii)
Corrective maintenance (CM) is comprised of actions to restore a failed product or
system to a specified operational state. It is carried out to restore a failed object to
the operational state by rectification actions (repair or replace) on the failed compo-
nents necessary for the successful operation of the object. Corrective maintenance
actions are unscheduled actions, whereas preventive maintenance actions are sched-
uled actions.
Table 10.3 MOP-MIS table of WCR3 (i, t) of automobile component for all MOP
190
The main objectives of a good preventive maintenance program are to (i) minimize
the overall costs, (ii) minimize the downtime of the object, and (iii) have the reliability
of the object at a specified level. In order to achieve these objectives, it is important to
determine an appropriate lifetime (age, usage, etc.) interval for the preventive mainte-
nance of the object. Here, we consider the optimal age-based preventive replacement
policy. Under this policy, to determine the optimal replacementtime, an objective
function is formulated that explains the associated costs and risks.
10.5 Maintenance of an Object 193
Fig. 10.4 MOP-MIS chart of WCR3 (i, t) for each MIS separately
where C(T ) and L(T ) denote the cost per cycle and length of a cycle at T, respectively.
Let the time to failure, X, is a random variable with cumulative density function F(x).
A PM action results if X ≥ T in which case the cycle length is T with probability
R(T ), the reliability at T. A CM action results when X < T and the cycle length is X
(Sultana and Karim 2015).
Let C f and C p denote the average cost of a CM and PM replacement, respectively.
At time T, the cost per cycle C(T ) is a random variable, which takes values C p with
probability R(T ) and C f with probability F(T ). As a result, E[C(T )] is given by
Again, at time T, the length of a cycle L(T ) is another random variable having
both discrete value T with probability R(T ) and continuous value t with probability
f (t)dt, which implies P[L(T ) ∈ (t, t + dt)] = f (t)dt for t ≤ T. As a result, E[L(T )]
is given by
T
E[L(T )] = T R(T ) + t f (t)dt. (10.9)
0
Recall that, d
dt
R(t) = − f (t), and by applying integration by parts, we get
⎡ ⎤
T
T
T
d
t f (t)dt = − t R(t)dt = −⎣ t R(t)|0T − 1.R(t)dt ⎦
dt
0 0 0
⎡ ⎤
T
T
= −⎣T R(T ) − R(t)dt ⎦ = R(t)dt − T R(T )
0 0
T
E[L(T )] = R(t)dt (10.10)
0
Thus, using (10.8) and (10.10), the objective function can be expressed as
C p R(T ) + C f F(T )
J (T ) = T . (10.11)
0 R(t)dt
The optimum replacement time can be found by minimizing the expected cost
per unit time J(T ) with respect to T.
10.5 Maintenance of an Object 195
Example 10.2 Optimum Replacement Time: Suppose that the lifetime random vari-
able X (age in hours) of a component follows the Weibull distribution with shape
parameter
β = 2.5 and scale parameter η = 1000 h. Therefore,
F(x) = 1 −
exp −(x/1000)2.5 , x ≥ 0 and R(x) = exp −(x/1000)2.5 , x ≥ 0. Assume that
the cost for a corrective maintenance, C f = $5, and the cost for a preventive mainte-
nance, C p = $1. We estimate the optimum replacement age in order to minimize the
objective function, J(T ), which depends on preventive and corrective maintenance
costs defined in (10.11).
Note that the component has an increasing failure rate with age (since the value
of the shape parameter of Weibull distribution is greater than 1) and the cost for
preventive maintenance is less than the cost of corrective maintenance. This implies
that the conditions for the optimum age replacement policy have been satisfied with
the component. From (10.11), the asymptotic expected maintenance cost per unit
time (the objective function) is given by
5 × 1 − exp −(T /1000)2.5 + exp −(T /1000)2.5
J (T ) = T . (10.12)
0 exp −(x/1000)
2.5
dx
Fig. 10.5 Optimum replacement age when lifetime follows the Weibull distribution with parameters
(shape = 2.5, scale = 1000) assuming C p = $1 and C f = $5
196 10 Quality Variation in Manufacturing and Maintenance Decision
References
Abstract In survival and reliability analysis, the role of Markov chain models is
quite useful in solving problems where transitions are observed over time. It is very
common in survival analysis that a subject suffering from a disease at a time point
will recover at a later time. Similarly, in reliability, a machine may change state from
nondefective to defective over time. This chapter discusses the Markov chain model,
Markov chain model with covariate dependence, and Markov model for polytomous
outcome data.
11.1 Introduction
In survival analysis and reliability, the role of Markov chain models is quite useful
in solving problems where transitions are observed over time.1 The transition may
occur longitudinally and we observe change in state space. It is very common in
survival analysis that a subject suffering from a disease at a time point will recover at
a later time. Similarly, in reliability, a machine may change state from nondefective
to defective over time. There may be transitions from the normal state of health to
disease state, or defective state to nondefective state due to appropriate application
of recovery procedures.
In reliability analysis, we often have to construct block diagrams for analyzing
complex systems that become relatively convenient with the application of Markov
models. Similarly, in analyzing the repair process and availability problems, the
Markov models can provide useful insights. The underlying concepts of system state
and transition of states represent both the functioning and failed states over time
without making the representation complex.
The use of stochastic processes, particularly Markov models have been increasing
rapidly in quality of service management in reliability in both safety and business
critical applications (Rafiq 2015). The discrete time Markov models are used exten-
sively in reliability-related quality of service applications while the continuous time
Markov processes are used generally in analyzing the performance of quality of ser-
1 Sections of this chapter draw heavily from the co-author’s (M. Ataharul Islam) previous published
vices that depends on time such as response time and throughput problems including
several other issues of concern in the field of reliability. The use of Markov models
in statistical software testing has been quite old (Whittaker and Thomason 1994).
It is applied for predicting the severity of risk of workers in industries (Okechukwu
et al. 2016). In a reinforced concrete structure, service life has been studied for
degradation of concrete overtime using Markov models (Possan and Andrade 2014).
In analyzing the reliability of components of both parallel and series systems, the
applications of Markov models can be very useful. In obtaining system performance
measures of reliability, the long-run steady-state properties of Markov models can
provide helpful insights. Similarly, the importance of Markov models in analyzing
survival data emerging from time series, panel, or longitudinal data is well docu-
mented. In epidemiological and survival studies, there are extensive applications too
including chronic diseases with well-defined phases like cancer and autoimmune
diseases, dementia due to the neurodegenerative disease for possible prediction and
treatment strategies, depression status of elderly, disease status over time, etc.
The outline of the chapter is as follows. Section 11.2 discusses the Markov chain.
Section 11.3 deals with the higher-order Markov chains. The first-order Markov chain
model with covariate dependence and the second-order Markov chain model with
covariate dependence are presented, respectively, in Sects. 11.4 and 11.5. Section 11.6
explains the Markov model for polytomous outcome data. Section 11.7 illustrates an
application of Markov model to analyze the health and retirement study (HRS) data.
for all t and yt . The joint cumulative distribution function of Y (t1 ) and Y (t2 ) is
We can define Markov process if it satisfies the following property for continuous
time
Pik−1,k
j = P(Yk = j|Yk−1 = i ). (11.5)
The Markov property states that the future status of a Markov chain or a Markov
process depends only on the current status irrespective of the past behavior. This
implies that additional information will not change the one-step transition probability
or Markov chain of first order.
The transition probability for two time points t = 0, 1, can be defined as
01 01
P00 P01
P= 01 01 . (11.6)
P10 P11
A more simple way of displaying the above one-step transition probability matrix
is
P00 P01
P= . (11.7)
P10 P11
200 11 Stochastic Models
In this transition probability matrix, the Markov chain probabilities are indepen-
dent of time which is called stationary transition probabilities. A stationary transition
probability is a stochastic process with the following characteristic
for all t and all states i, j. Hence, as the transition probabilities do not depend on
time, we can write
This can be generalized for k + 1 states as shown below for two consecutive time
points
⎛ ⎞
P00 ... P0k
⎜ P10 ... P1k ⎟
⎜ ⎟
P=⎜ . .. .. ⎟. (11.9)
⎝ .. . . ⎠
Pk0 . . . Pkk
∞
Pi j ≥ 0, i, j = 0, 1, 2, . . . ; and Pi j = 1, i = 0, 1, 2, . . .
j=0
Example 11.1 Suthaharan (2004) used the Markov model for solving the congestion
control for Transmission Control Protocol (TCP). For categorizing queue size, let
us denote th_min be the minimum threshold used for queue management, th_max
be the maximum threshold used for queue management, and buffer size be the total
number of package in the queue. Let us consider three states at time t, based on
average queue size:
State 0: average queue size in the interval [0, th_min],
State 1: average queue size is in between (th_min, th_max),
State 2: average queue size in the interval [th_max, buffer size].
Let the successive observations of average queue size be denoted by
X 0 , X 1 , . . . , X i , . . . where X i is a random variable. Then, we can define:
p j (t) = P X j = j ,
p jk (1) = P X t+1 = k|X t = j
for any t ≥ 0 where j,k = 0, 1, 2.
Then, the transition probabilities of the Markov chain with state space 0, 1, 2 is:
11.2 Markov Chain 201
⎛ ⎞
P00 P01 P02
P(1) = ⎝ P10 P11 P12 ⎠.
P21 P22 P22
For example,
⎛ ⎞
0.90 0.05 0.05
P(1) = ⎝ 0.05 0.90 0.05 ⎠.
0.05 0.05 0.90
The goal is to keep the average queue size in the middle range between the
thresholds th_min and th_max. The transition probability matrix shows the one-step
transition probabilities during the interval [ti−1 , ti ], i = 0, 1, . . .
It appears clearly from a transition probability matrix that the transition probabil-
ities are conditional probabilities where the given values are assumed to be known.
The joint probability of Y0 = y0 , . . . , Yk = yk can be expressed for first order as
P(Y0 = y0 , Y1 = y1 , . . . , Yk = yk )
= P(Y0 = y0 ) × P(Y1 = y1 |Y0 = y0 ) × · · · × P(Yk = yk−1 ).
The initial probability of P(Y0 = y0 ) is required for the full specification of the
joint probability.
We can write a two-step transition probability of a Markov chain for k + 1 states
as
Pi(2)
j = P(Ym+2 = j|Ym = i ). (11.10)
k
Pi(2)
j = P(Ym+2 = j, Ym+1 = s|Ym = i ).
s=0
k
Pi(2)
j = P(Ym+2 = j|Ym+1 = s , Ym = i)(Ym+1 = s|Ym = i )
s=0
k
= P(Ym+2 = j|Ym+1 = s )(Ym+1 = s|Ym = i )
s=0
k
= Pis Ps j
s=0
202 11 Stochastic Models
Pi(n)
j = P(Yn+m = j|Ym = i ) (11.11)
which shows the probability of making a transition from state i to state j in n steps.
If we assume that the Markov chain is homogeneous with respect to time, then the
above probability is invariant of time. Then the n-step transition probabilities satisfy
the following relationship
k
Pi(n)
j = Pis(l) Ps(n−l)
j ,
s=0
Pi(0)
j = 1 if i = j, 0 otherwise.
P (n) = P × P × · · · × P = P n ,
1
lim Pii(n) = .
n→∞ μi
k
k
pi(1) = P(Y1 = i) = P(Y1 = i|Y0 = s)P(Y0 = s) = Psi ps(0)
s=0 s=0
p (1) = p (0) P
p (n) = p (0) P n .
p (n) → π. (11.15)
The first-order Markov chain depends on the current state to find the transition to a
future state. We can extend this for a second and higher order. Let us consider three
time points T = 0, 1, 2 and the corresponding states are Y0 , Y1 , Y2 , respectively. Then
a second-order Markov chain can be defined as
where the process depends only on the outcomes at times k − 1 and k − 2 and
outcomes at times 0, 1, …, k − 3 are ignored and assumed to have no contribution
on the future outcomes. In the above expression, the Markov property is satisfied
if we redefine the outcomes by partitioning as follows: Yk |Yk−1 , Yk−2 which can be
viewed as Z k |Z k−1 . In other words, Z k−1 = (Yk−1 , Yk−2 ) and Z k = (Yk ) satisfies the
Markov property by shifting the time to k − 2 instead of defining for k − 1. Thus,
the higher dependence can be viewed as first-order dependence without making the
theory complex.
204 11 Stochastic Models
The second-order transition probabilities for time points t = (0, 1, 2) are displayed
below:
Y0 Y1 Y2
0 1
⎡ ⎤
0 0 P000 P001
(11.17)
0 1 ⎢
⎢ P010 P011 ⎥
⎥
1 0 ⎣ P100 P101 ⎦
1 1 P110 P111
Y0 Y1 Y2 Y3
0 1
⎡ ⎤
0 0 0 P0000 P0001
0 0 1 ⎢ P0010 P0011 ⎥
⎢ ⎥
⎢P ⎥
0 1 0 ⎢ 0100 P0101 ⎥
⎢ ⎥ (11.18)
0 1 1 ⎢ P0110 P0111 ⎥
⎢ ⎥
1 0 0 ⎢ P1000 P1001 ⎥
⎢ ⎥
1 0 1 ⎢ P1010 P1011 ⎥
⎢ ⎥
1 1 0 ⎣ P1100 P1101 ⎦
1 1 1 P110 P1111
k
ni !
L= P ni0 P ni1 . . . Piknik (11.20)
i=0
n i0 !n i1 ! . . . n ik ! i0 i1
where n i j denotes the number of transitions from ith to jth state at consecutive time
points and n i denotes the total number in state i, i = 0, 1, …, k. The estimates obtained
using the maximum likelihood method are
ni j
P̂i j = , i, j = 0, . . . , k.
ni
k
k
n i (Pi j − Pi0j )2
χ2 = . (11.21)
i=0 j=0
Pi0j
The transition probabilities discussed in the previous sections can be further general-
ized by introducing covariate dependence. For covariate dependence, let us consider
a two-state Markov model as shown below
Y1 Y2
0 1
0 π00 (x) π01 (x)
(11.22)
1 π10 (x) π11 (x)
exβ01
P(Y2 = 1|Y1 = 0, x ) = , (11.23)
1 + exβ01
206 11 Stochastic Models
and
exβ11
P(Y2 = 1|Y1 = 1, x ) = , (11.24)
1 + exβ11
where β01 = β010 , . . . , β01 p , β11 = β110 , . . . , β11 p , and x = 1, x1 , . . . , x p .
It can be shown that
1
P(Y2 = 0|Y1 = 0, x ) = ,
1 + exβ01
and
1
P(Y2 = 0|Y1 = 1, x ) = .
1 + exβ11
The likelihood function for the sample of size n displayed above essentially rep-
resents the multiplication of two separate likelihoods for conditional probabilities
based on given values of Y1 . Let us denote the likelihood from state Y1 = 0 as L 0
and the likelihood from state Y1 = 1 as L 1 . Then the log likelihood is
ln L = ln L 0 + ln L 1 .
n0
ln L 0 = δ01 xl β01 − ln 1 + exl β01 (11.26)
l=1
and
n1
ln L 1 = δ11 xl β01 − ln 1 + exl β11 . (11.27)
l=1
Differentiating with respect to the parameters and solving the following equations
we obtain the likelihood estimates for 2(p + 1) parameters:
11.4 First-Order Markov Model with Covariate Dependence 207
∂ ln L 0 n0
xlq exl β01
= δ01 xlq − = 0, q = 0, 1, . . . , p (11.28)
∂β01q l=1
1 + exl β01
and
∂ ln L 1 n1
xlq exl β11
= δ11 xlq − = 0, q = 0, 1, . . . , p. (11.29)
∂β11q l=1
1 + exl β11
∂ 2 ln L 0 n0
=− xlq xlq π00 (xl )π01 (xl ), (11.30)
∂β01q ∂β01q l=1
∂ 2 ln L 1 n1
=− xlq xlq π10 (xl )π11 (xl ). (11.31)
∂β11q ∂β11q l=1
To test for the parameters H01 : β011 = . . . = β01 p = 0 and H02 : β111 = . . . =
β11 p = 0, we can use the likelihood ratio tests as shown below:
β̂i1q
W = , (11.37)
s ê β̂i1q
exβ001
P(Y3 = 1|Y1 = 0, Y2 = 0, x ) = ,
1 + exβ001
exβ011
P(Y3 = 1|Y1 = 0, Y2 = 1, x ) = ,
1 + exβ011
exβ101
P(Y3 = 1|Y1 = 1, Y2 = 0, x ) = ,
1 + exβ101
exβ111
P(Y3 = 1|Y1 = 1, Y2 = 1, x ) = ,
1 + exβ111
where
β001 = β0010 ,. . . , β001 p , β011 = β0110 , . . . , β011p , β101 =
β1010 , . . . , β101 p , β111 = β1110 , . . . , β111 p and x = 1, x1 , . . . , x p .
We can now construct four likelihood functions for given values of Y1 =
0, 1 and Y2 = 0, 1 as shown below
ni
1
δi jkl
Li j = πi jk (xl ) (11.39)
k=0 l=1
11.5 Second-Order Markov Model with Covariate Dependence 209
where δi jkl = 1 if a transition of type i-j-k occurs for item l, δi jkl = 0 otherwise.
n i n i
It can be shown that l=1 δi jkl = n i jk , 1k=0 l=1 δi jkl = 1k=0 n i jk = n i j , and
1 1 exl βi j1
i=0 j=0 n i j = n. Let us denote πi jk (xl ) = πi jkl ; then, πi j1l = 1+exl βi j1 , i, j = 0, 1
and πi j0l = 1+e1xl βi j1 , i, j = 0, 1.
The log likelihood functions are
n0
ln L i j = δi j1l xl βi j1 − ln 1 + exl βi j1 , i, j = 0, 1. (11.40)
l=1
There are four different likelihoods L 00 , L 01 , L 10 and L 11 for four models based
on given values of Y1 and Y2 . The estimating equations are shown below
ni j
∂ ln L i j xlq exl βi j1
= δi j1l xlq − = 0, i, j = 0, 1; q = 0, 1, . . . , p
∂βi jq l=1
1 + exl βi j1
(11.41)
∂ 2 ln L i j ni j
=− xlq xlq πi j0l πi j1l ; i, j = 0, 1, q, q = 0, 1, . . . , p.
∂βi j1q ∂βi j1q l=1
(11.42)
for the model of transition type i-j-1 and the approximate variance–covariance matrix
is
⎡ ⎤−1
I001 0 0 0
⎢0 I011 0 0 ⎥
V ⎢
⎣0
⎥ . (11.44)
0 I101 0 ⎦
0 0 0 I111
for transition type i-j-1. The Wald test is performed for testing the null hypothesis
H0 : βi j1q = 0
β̂i j1q
W = , (11.46)
s ê β̂i j1q
In previous sections, two-state Markov models are shown for covariate dependence of
transition probabilities. In many instances, the number of outcomes is more than two
and we need a further generalization. In this section, a covariate dependent multistate
Markov model is discussed.
Let us consider m state transition probability matrix as follows
⎡ ⎤
π00 ... π0m−1
⎢ π10 ... π1m−1 ⎥
⎢ ⎥
π = ⎢. .. .. ⎥, (11.47)
⎣ .. . . ⎦
πm−10 . . . πm−1m−1
!
where πus =
P(Y j = s !Y j−1 = u ) and m−1 s=0 πus = 1, u = 0, 1, . . . , m − 1.
Let X i = 1, X i1, · · · , X i p denote the
vector of covariates for the ith item where
X i0 = 1, and βus = βus0 , βus1 , . . . , βusp is the vector of parameters corresponding
to the covariates for the transition from u to s. The transition probabilities are defined
as follows:
! egus (X )
πus (Y j = s !Y j−1 = u, X ) = m−1 , u, s = 0, 1, . . . , m − 1, (11.48)
guk (X )
k=0 e
where
"
0, if s= 0
gus (X ) = πus (Y j =s |Y j−1 =u,X )
ln π Y =0|Y =u,X , if s= 1, . . . , m − 1
us ( j j−1 )
n us m−1
Lu = [πus (X i )]δusi , u, s = 0, 1, . . . , m − 1, (11.49)
i=1 s=0
usi = 1 if a transition
where δ type u − s is observed
m−1 for the ith item and 0 oth-
nu
erwise; i=1 δusi = n us , m−1 n
s=0 us = n u and u=0 n u = n. The estimates of the
parameters can be obtained using the overall likelihood shown below
m−1
L= Lu.
u=0
It may be noted that m−1
s=0 δusi = 1.
The estimating equations for transition type u to s are
⎧ ⎫
⎪
⎪ ⎪
⎪
n ⎪
⎨ ⎪
⎬
∂ ln L eβus0 +βus1 X 1i +···+βusp X pi
= X ki δusi − m−1 = 0, (11.52)
∂βusk ⎪
⎪ β +β X +···+β X ⎪ ⎪
i=1 ⎪
⎩ 1+ e us0 us1 1i usp pi ⎪
⎭
s=1
∂ ln L n
= X ki {δusi − πus (X i )} = 0. (11.53)
∂ βusk i=1
∂ 2 ln L n
=− X ki X k i πus (X i )[1 − πus (X i )] = −Ius β̂usk , β̂usk ,
∂βusk ∂βusk i=1
∂ 2 ln L n
= X ki X k i πus (X i )πu s (X i ) = −Ius,us β̂usk , β̂us k , (11.54)
∂βusk ∂βus k i=1
where u = 0, 1, . . . , m − 1; s = 1, 2, . . . , m − 1; k, k = 0, 1, . . . , p.
The variance–covariance matrix of the estimators is
−1
Var β̂ = I β̂ , (11.55)
where
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
β̂us0 β̂u1 β̂0
⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟
β̂us = ⎝. ⎠, β̂u = ⎝ . ⎠, β̂ = ⎝ . ⎠,
β̂usp β̂u,m−1 β̂m−1
I β̂ is a m(m − 1)( p + 1) × m(m − 1)( p + 1) matrix.
To test the significance of the kth parameter for transition type u to s, the null
hypothesis is H0 : βusk = 0 and the corresponding Wald test is
β̂usk
W = , (11.57)
s ê β̂usk
where W is asymptotically N(0, 1). Further details on basic theories, methods, and
applications of stochastic models can be found in Bailey (1964), Cox and Miller
(1965), Bhat (1971), Hoel et al. (1972), Taylor and Karlin (1984), Lawler (2006),
Ross (2007), and Islam et al. (2009, 2012).
11.7 Application 213
11.7 Application
The Health and Retirement Study (HRS) data are applied to illustrate a Markov model
with three states (Islam et al. 2012). The Health and Retirement Study is conducted by
the University of Michigan nationwide on individuals over age 50 and their spouses.
The panel data used in this application are collected in every two years since 1992.
We have considered data from the first six rounds in this application. The outcome
variable is self-reported emotional health status (perceived emotional health) among
the elderly people in the USA. The perceived health statuses considered in this
application are: State 1: Poor, State 2: Fair/Good, and State 3: Very Good/Excellent.
The number of respondents in 1992 was 9772 elderly people and respondents in
subsequent follow-ups are 8039 in 1994, 7823 in 1996, 7319 in 1998, 6824 in 2000,
and 6564 in 2002. The selected explanatory variables of the models are: gender (male
= 1, female = 0), marital status (unmarried = 0, married = 1), vigorous physical
activity (3 or more days per week) (yes = 1, no = 0), ever drank any alcohol (yes =
1, no = 0), ever smoked (yes = 1, no = 0), felt depressed during the past week (yes
= 1, no = 0), felt lonely during the past week (yes = 1, no = 0), race (white = 1,
else 0; black = 1, else 0; others = reference category), and age (less than or equal to
60 years = 0 and more than 60 years = 1).
The transition counts and transition probabilities are displayed in Table 11.1. All
the transitions made by the respondents during 1992–2002 are considered and a first-
order model is assumed. It appears from Table 11.1 that about 57% remained in the
poor state of perceived emotional health status starting from poor but a considerable
percentage (40%) made transition from poor to good/fair and a small percentage
(3%) made the transition from poor to very good/excellent status. From the fair/good
perceived emotional health status, 7% reported a move to worse (poor health status),
72% reported the same status, and 21% reported improved status in subsequent
follow-up (very good/excellent). It is also important to note that nearly 1% reported
transition from very good/excellent to poor perceived emotional health status, while
25% reported transition to good/fair and remaining 74 % remained in the same status
of perceived health status (very good/excellent).
The estimates of parameters, standard errors, W-values, p-values, and confidence
intervals for the models are shown in Table 11.2. The number of models is m(m −
1) = 3 × 2 = 6. Each model contains 11 parameters. The first-order Markov
models considered in this application for the following transition types: (i) poor →
fair/good, (ii) poor → very good/excellent, (iii) fair/good → poor, (iv) fair/good →
very good/excellent, (v) very good/excellent → poor, and (vi) very good/excellent
→ fair/good.
In case of the model for the transition type Poor → Fair/Good, we observe: (i)
positive association with physical activity and drinking alcohol and (ii) negative
association with feeling depressed. The transition type Poor → Very Good/excellent
appears to have a positive association with physical activity and elderly black. The
transition type Fair/Good → Poor shows negative association with marital status,
physical activity, drinking alcohol, whites and blacks as compared to Asians or other
214
Table 11.2 Estimates of three-state Markov model for perceived emotional health
Variables Coeff. Std. err. W-value p-value 95% C.I.
LL UL
Transition type poor → fair/good
Constant −0.520 0.189 −2.76 0.006 −0.890 −0.151
Gender −0.141 0.092 −1.53 0.127 −0.322 0.040
Marital status 0.157 0.090 1.74 0.082 −0.020 0.333
Physical activity 0.326 0.115 2.84 0.005 0.101 0.551
Drink 0.288 0.095 3.03 0.002 0.102 0.474
Smoke 0.000 0.094 0.000 0.999 −0.184 0.184
Felt depression −0.310 0.098 −3.18 0.001 −0.502 −0.119
Felt lonely −0.080 0.103 −0.77 0.439 −0.282 0.122
White 0.264 0.171 1.54 0.123 −0.071 0.599
Black 0.221 0.182 1.21 0.225 −0.136 0.578
Age −0.031 0.087 −0.35 0.724 −0.202 0.140
Transition type poor → very good/excellent
Constant −4.674 1.054 −4.44 0.000 −6.739 −2.609
Gender 0.212 0.267 0.80 0.426 −0.310 0.735
Marital status 0.264 0.269 0.98 0.327 −0.264 0.793
Physical activity 1.076 0.272 3.96 0.000 0.544 1.609
Drink 0.307 0.274 1.12 0.263 −0.231 0.845
Smoke 0.143 0.288 0.50 0.618 −0.420 0.707
Felt depression −0.380 0.280 −1.36 0.175 −0.928 0.168
Felt lonely −0.145 0.300 −0.48 0.628 −0.733 0.443
White 1.666 1.023 1.63 0.103 −0.338 3.670
Black 1.992 1.031 1.93 0.053 −0.030 4.014
Age 0.266 0.245 1.08 0.278 −0.215 0.747
Variables Coeff. Std. err. W-value p-value 95% C.I.
LL UL
Transition type fair/good → poor
Constant −1.602 0.147 −10.87 0.000 −1.891 −1.313
Gender 0.006 0.070 0.08 0.933 −0.130 0.142
Marital status −0.155 0.070 −2.22 0.027 −0.292 −0.018
Physical activity −0.293 0.075 −3.92 0.000 −0.440 −0.147
Drink −0.375 0.067 −5.60 0.000 −0.507 −0.244
Smoke 0.308 0.071 4.33 0.000 0.169 0.448
Felt depression 0.586 0.081 7.26 0.000 0.427 0.744
Felt lonely 0.262 0.086 3.02 0.002 0.092 0.431
(continued)
216 11 Stochastic Models
races and positive association with smoking, feeling depressed, and feeling lonely.
On the other hand, improvement to very good/excellent is associated with mari-
tal status (p < 0.10), physical activity, drinking alcohol, and negatively associated
with age, smoking, feeling depressed, and feeling lonely. The transition from very
good/excellent status of perceived emotion health to poor status shows positive asso-
ciation with smoking and feeling depressed and negative association with marital
status, physical activity, and drinking alcohol. It is also seen that transition type
Very Good/Excellent to Good/Fair status of perceived emotional health is positively
associated with gender, smoking, feeling depressed, feeling lonely, blacks compared
to Asians, and other groups but negatively associated with marital status, physical
activity, and drinking alcohol.
References
Bailey NTJ (1964) The elements of stochastic processes: with applications to the natural sciences.
Wiley, New York
Bhat UN (1971) Elements of applied stochastic processes. Wiley, New York
Cox DR, Miller HD (1965) The theory of stochastic processes. Methuen & Co Ltd, London
Hoel PG, Port SC, Stone CJ (1972) Introduction to stochastic processes. Houghton-Mifflin, Bostons
Islam MA, Chowdhury RI, Huda S (2009) Markov models with covariate dependence for repeated
measures. Nova Science, New York
Islam MA, Chowdhury RI, Singh KP (2012) A Markov model for analyzing polytomous outcome
data. Pakistan J Stat Oper Ress 8:593–603
Lawler GF (2006) Introduction to stochastic processes, 2nd edn. Chapman and Hall/CRC, Boca
Ratons
218 11 Stochastic Models
Okechukwu OM, Nwaoha TC, Garrick O (2016) Application of Markov theoretical model in pre-
dicting risk severity and exposure levels of workers in the oil and gas sector. Int J Mech Eng Appl
4:103–108
Possan E, Andrade JJO (2014) Markov chains and reliability analysis for reinforced concrete struc-
ture service life. Mater Ress 17:593–602
Rafiq Y (2015) Online Markov chain learning for quality of service engineering in adaptive computer
systems. Ph.D. dissertation, Computer Science, University of York
Ross SM (2007) Introduction to probability models, 9th edn. Academic Press, New York
Suthaharan S (2004) Markov model based congestion control for TCP. In: 37th annual simulation
symposium, Hyatt Regency Crytal City, Arlington, VA, 18–22 Apr 2004
Taylor HM, Karlin S (1984) An introduction to stochastic modeling. Academic Press, Orlando
Whittaker JW, Thomason MG (1994) A Markov chain model for statistical software testing. IEEE
Trans Softw Eng 20(10):812–824
Chapter 12
Analysis of Big Data Using GLM
Abstract The application of the generalized linear models to big data is discussed
in this chapter using the divide and recombine (D&R) framework. In this chapter, the
exponential family of distributions for binary, count, normal, and multinomial out-
come variables and the corresponding sufficient statistics for parameters are shown
to have great potential in analyzing big data where traditional statistical methods
cannot be used for the entire data set.
12.1 Introduction
During the past decade, we observed a very rapid demand for tools in analyzing big
data in every field including reliability and survival analysis. According to Gartner
IT Glossary, big data is defined as high-volume, high-velocity and high-variety infor-
mation assets that demand cost-effective, innovative forms of information process-
ing for enhanced insight and decision making.1 The definition of big data in terms
of three Vs includes: volume: data arising from transactions, social media, medi-
cal problems, Internet, usage of Facebook, Twitter, Google and YouTube, genome
sequences, CCTV, mobile financing, sales in supermarkets, online business, etc.;
velocity: in recent times, we observed an unprecedented speed in generating data
that need to be dealt with timely, or more specifically, very quickly; variety: data are
generated from various structured or unstructured formats such as text, email, audio,
video, financial transactions, etc. There are two additional dimensions being consid-
ered by many: variability and complexity. It may be noted that big data analysis is
useful for cost reduction, time reduction, new product developments and developing
new strategies and optimum decision making.
Although big data is not a new phenomenon, the volume, velocity, and variety
of big data arising from increasingly developed new frontiers of generating data in
various fields including industry, medical, and biological sciences, human behavior,
socio-economic patterns, etc., pose difficult challenges to statisticians and computer
scientists. The size of data is so big, in terms of number of cases or number of variables
or both, that we need to address new challenges by developing new techniques.
1 https://www.gartner.com/it-glossary/big-data.
The development of new techniques can be categorized into two broad classes: (i)
statistical learning and (ii) machine learning. The important techniques include the
following: (i) regression models and subset selection, (ii) classification and tree-
based methods, (iii) resampling and shrinkage methods, (iv) dimension reduction,
and (v) support vector machines and unsupervised learning.
Major steps in big data analysis are: (i) acquisition of data: needs to be filtered
using data reduction techniques taking into account heterogeneous scales of mea-
surement and dependence in measurements, (ii) extraction/cleaning of structured
and unstructured data, (iii) integration/aggregation/representation of data by inte-
grating data from heterogeneous sources using suitable database designs, (iv) anal-
ysis/modeling of data stemmed from heterogeneous sources that require developing
appropriate or suitable statistical techniques for big data, (v) big data are noisy,
dynamic, heterogeneous, interrelated, untrustworthy which make the work more
challenging for prediction purposes, (vi) developing appropriate visualization tech-
niques, and (vii) interpretation of big data for addressing specific targets.
Donoho (2015, 2017) suggested six divisions of data science which are: (i) data
gathering, preparation, and exploration, (ii) data representation and transformation,
(iii) computing with data, (iv) data modeling, (v) data visualization and presentation,
and (vi) science about data science. It is noteworthy that the fast emergence of the
utility of big data from both private and public sectors in various fields of applications
including reliability and survival analysis has created a new research paradigm (Einav
and Levin 2014). This new paradigm involves statistical analysis with big data when
analysis on the basis of a single combined data set is not feasible using traditional
techniques due to storage limitation of data in a single computer. Another limitation
of extensive use of big data arises from the privacy concerns of unit records. Lee
et al. (2017) indicate that the appropriate use of sufficiency and summary statistics
can provide a very strong statistical base to overcome the concerns about the violation
of privacy associated with the use of unit records with identification. In addition, the
use of sufficiency provides the background for making use of dividing the big data
into smaller subsets where sufficiency principle can be applied to recombine the
essential statistics to obtain the estimates necessary for the big data.
In recent years, statisticians are facing new challenges in methodology, theory,
and computation of great implications arising from big data. The sources of these
data are not from traditional sample survey as a single combined data set but from
data generated through overwhelming use of information technology such as Face-
book, Twitter, and various other websites, in addition to data generated in very large
sizes in sectors such as tele-communication, banking, medical statistics, genetics,
environment, business, engineering, prediction, and forecasting based on big data,
etc. These challenges have manifold implications due to: (i) the nature of data emerg-
ing from wide variety of sources mostly not from the domains defined in statistical
terms, (ii) the variety, velocity, volume, and veracity are the characteristics of big
data, hence size and complexity pose difficult challenges to statisticians, and (iii) the
process and decision making from the data as fast as possible.
In a special issue on the role of statistics in the Era of big data of the Statistics and
Probability Letters (Cox et al. 2018), the divergence stemming from major issues of
12.1 Introduction 221
concern and debates are visibly identified. On one hand, there is a strong argument
about the centrality of statistical modeling supported by the theoretical knowledge
of the phenomenon under study, and on the other hand, contrarily, strong opposed
view by some others (mostly computer scientists) that with the advent of big data the
models and theory are useless (Sangalli 2018). No doubt, these are extreme views
from the viewpoint of users, because the central issue of statistics is always data,
may be challenged by the size and complexity of data to a large extent in the new
paradigm on one hand and the dimension and complexity pose formidable difficulty
to make use of traditional statistical techniques and thus increasing role of computer
scientists and mathematicians are emerging challenges on the other hand. This is
the most difficult challenge to statisticians since its inception and leads to a turning
point that will either bring about new frontiers of theoretical and computational
developments or make statistical applications narrower in scope limited to small or
moderate sample sizes leaving big data modeling and applications to the hands of
the computer scientists or more specifically to machine learning. Further details on
big data analytics can be found in Chen and Xie (2014), Buhlmann et al. (2016),
Zomaya and Sakr (2017), Dunson (2018), Härdle et al. (2018), and Reid (2018).
In this chapter, the divide and recombine (D&R) technique proposed in recent
past (Guha et al. 2012) is studied with special focus on statistical issues of concern
such as sufficiency and dimension reduction, modeling, and estimation procedure.
The outline of the chapter is as follows. Section 12.2 presents a short note on
sufficiency and dimensionality. Section 12.3 discusses the generalized linear models.
Section 12.4 explains the divide and recombine technique for different link functions.
Finally, Sect. 12.5 presents some comments on the chapter.
The notion of sufficient statistics was first introduced by Fisher (1920, 1922, 1925)
and studied extensively by many others (Pitman 1936, Koopman 1936, Halmos and
Savage 1949, Lehmann 1959, Bahadur 1954). Fraser (1961, 1963) showed that the
likelihood function can be used to analyze the effect of sampling on the dimension-
ality of the sufficient statistics. It was shown that fixed dimension for the sufficient
statistic is restricted to the exponential family.
Let us consider n observations (x1 , . . . , xn ) then the reduced statistics can be
denoted by
where
n
l(θ |x1 , . . . , xn ) = l(θ |xi ). (12.1)
i=1
222 12 Analysis of Big Data Using GLM
to
n
l(θ |x1 , . . . , xn ) = l(θ |xi ).
i=1
r
l(θ |x ) = φ0 (θ ) + a j (x)φ j (θ ) (12.2)
j=1
where (φ1 (θ ), . . . , φr (θ )) are assumed to be the basis for generating the space for r
minimum dimensions. Then, the probability density takes the form
r
φ0 (θ)+ a j (x)φ j (θ)
f (x|θ ) = f (x|θ0 )e j=1
(12.3)
where the fixed dimensionality sufficient statistics (r) are indexed by variables
(u 1 , . . . , u r ).
In a univariate generalized linear model for the outcome variable Y, the exponential
form is
yθ −b(θ )
a(φ) +c(y,φ)
f (y; θ, φ) = e (12.5)
Xβ. Using the Koopman-Darmois-Pitman form, we can show the data reduction as
follows for sample of size 1
where r0 (y, φ) = ec(y,φ) and the sufficient statistic is y. For a sample of size n, the
likelihood function of this exponential form is
n
a(yi )c(θi )
L(θ, φ, y) = r0 (y, φ)e i=1 (12.7)
The maximum likelihood estimates of the parameters can be found by solving the
system of equations
1
n
[yi − μi ]xi j = 0. (12.11)
a(φ) i=1
More on generalized linear models are given in Chap. 8. Fahrmeir and Tutz (2001)
explained the multivariate statistical modelling based on generalized linear models.
224 12 Analysis of Big Data Using GLM
The generalized linear models and the estimation procedure of parameters of the
models are discussed in Sects. 12.2 and 12.3. These estimation procedures are quite
attractive if the size of sample is small, moderate, or large in usual statistical sense.
However, in the case of big data, it would not be possible to use the same procedure
due to the amount of data being captured and stored. We need to review the theory,
methodology, and computation techniques for big data analysis. Let us consider
a situation where we have to consider 1,000,000 observations with 100 variables
including both outcome and explanatory variables for each item resulting in a total
of 1,000,000 × 100 observations. In reality, the data would be much more complex
and bigger. This is not a typical statistical challenge simply due to the size of data
and hence we need to find a valid way to use all the data without sacrificing statistical
rigor. In this case, one logical solution is to divide and recombine data (see Guha
et al. 2012, Cleveland and Hafen 2014, Hafen 2016, Liu and Li 2018). The idea is
simple: we have to divide the big data into subsets, each analytic method is applied
to subsets and the outputs are recombined in a statistically valid manner. In the
process of dividing and recombining, the big data set is partitioned into manageable
subsets of smaller data and analytic methods such as fitting of models are performed
independently for each subset. One way to recombine is to use the average of the
estimated model coefficients obtained from each subset (Guha et al. 2012). The
resulting estimates may not be exact due to the choice of the recombining procedure
but statistically valid. The advantage is obvious, and we can make use of statistical
techniques without any constraint arising from big data using R or available statistical
packages.
Lee et al. (2017) summarized D&R steps as follows: (i) the subsets are obtained
by dividing the original big data into manageable smaller groups; (ii) the estimates
or sufficient statistics are obtained for the subsets; and (iii) the results from subsets
are combined by using some kind of averaging to obtain the estimate for the whole
data set. According to Hafen (2016), the division into subsets can be performed by
either replicate division or conditioning variable division. Replicate division takes
into account random sampling without replacement and the conditioning variable
division considers stratification of the data based on one or more variables included
in the data. A feasible measure of a good fit is the least discrepancy with the estimate
obtained from the entire data set. Other than a few exceptions, D&R results are
approximate (Lee et al. 2017).
The partitioning of big data may be performed using either the random sampling
without replacement or applying some conditioning variables. It may be noted that
replicate division divides the data based on random sampling without replacement
12.4 Divide and Recombine 225
while conditioning variable division stratifies the data according to one or more
variables in the data (Lee et al. 2017). Guha et al. (2012) defined the replicate division
for the data with n observations and p variables under the same experiment and
conditions. The random-replicate division uses random sampling of observations
without replacement to create subsets. This is attractive and computationally fast but
it makes no effort to create subsets each of which is representative of the data set.
Lee et al. (2017) referred two procedures for recombining the results: (i) summary
statistics D&R and (ii) horizontal D&R. They illustrated the procedures with the
regression model or GLM with identity link function
θ = g(μ) = μ = E(Y ) = Xβ
or alternatively
Y = Xβ + ε (12.12)
For big data, let us consider partitioning into S subsets. Then the estimates can be
obtained by (i) summary statistics D&R and (ii) horizontal D&R.
Case 1 (Summary Statistics) The summary statistics D&R, shown in Fig. 12.1,
includes the following steps (Lee et al. 2017):
Step I: Divide the data into S subsets of similar structure, with Ys and X s being
the vector of responses and design matrix in subset s (s = 1,2,…, S).
Step II: Compute X s X s and X s Ys , s = 1, 2, . . . , S, and
−1
S S
Step III: Recombine using s=1 X s X s s=1 X s Ys . Chen et al. (2006)
referred this as regression cube technique. This result provides the same
Fig. 12.1 Flowchart displaying D&R method for linear regression (summary statistics)
226 12 Analysis of Big Data Using GLM
Fig. 12.2 Flowchart displaying D&R method for linear regression (horizontal)
S
S
XX = X s X s and X Y = X s Ys . (12.14)
s=1 s=1
Case 2 (Horizontal) The horizontal D&R, shown in Fig. 12.2, includes the following
steps (Lee et al. 2017):
Step I: Divide the data into S subsets of similar structure, with Ys and X s being
the vector of responses and design matrix in subset s (s = 1,2,…, S).
Step II: Compute X s X s and X s Ys , s = 1, 2, . . . , S, and compute
−1
β̂s = X s X s X s Ys , (12.15)
Xi et al. (2008) showed a way to divide and recombine the big data for analyzing and
predicting categorical attributes in a data cube format. For binary outcome variable
Y and explanatory variable vector X, the logit link function is
12.4 Divide and Recombine 227
μ
ln = xβ (12.17)
1−μ
n
∂l exi β
= yi − xi j = 0, j = 1, . . . s, p.
∂β j i=1
1 + exi β
where l is the log likelihood for the entire data. The estimate of β based on all the
data in the big data set is denoted by β̂.If we divide the data set into S strata each
S
with size n s , s = 1, 2, . . . , S then n = s=1 n s . For partitioned data set s = 1,2,…,
S, we can define
μs
ln = xs βs , s = 1, 2, . . . , S. (12.18)
1 − μs
The independent estimating equations for each partitioned set of data are
ns
∂ls exsi βs
= ysi − xsi j = 0, j = 1, . . . , p; s = 1, 2, . . . , S,
∂βs j i=1
1 + exsi βs
where ls denotes the log likelihood for subset s, s = 1, 2, …, S. As the data are drawn
independently, it may be shown that
S
l= ls .
s=1
∂l ∂ls S
= =0
∂β s=1
∂β
and provides the maximum likelihood estimators for overall as well as for the par-
titioned set of data as the whole data set is assumed to be drawn from the same
population. This implies that the maximum likelihood estimates obtained from
∂ls
=0
∂β
are equivalent to
∂ls
=0
∂βs
228 12 Analysis of Big Data Using GLM
under the assumption of independent and identical distribution of the big data because
β1 = β2 = . . . = β S = β. The estimating equations under this assumption are
ns
∂ls exsi βs
= ysi − xsi j
∂βs j i=1
1 + exsi βs
ns
exsi β
= ysi − xsi j = 0, j = 1, . . . , p; s = 1, 2, . . . , S
i=1
1 + exsi β
S
l= ls
s=1
S S ns
∂ls exsi β
= ysi − xsi j =0. (12.19)
s=1
∂βs j s=1 i=1
1 + exsi β
Approach 12.1
Zuo and Li (2018) used the logistic regression model
Yi = μi + εi (12.20)
where
exi β
μi = πi = ,
1 + exi β
β is a vector of ( p + 1) × 1 coefficients,
X is n × ( p + 1) data matrix with xi is the ith row of X,
εi error variable with mean zero and variance πi (1 − πi ).
The maximum likelihood estimator of β is
β̂ = C −1 X Ŵ Z (12.21)
where C = X Ŵ X, Ŵ = diag π̂i 1 − π̂i , Z is a column vector with ith element
−π̂i
ln π̂i + π̂ yi1− . Here, β̂ is an asymptotically unbiased estimator of β.
i( π̂i )
For the sth partitioned subset, the estimator of logistic regression parameters is
Fig. 12.3 Flowchart displaying D&R method for logistic regression (Approach 12.1)
230 12 Analysis of Big Data Using GLM
Approach 12.2
Alternatively, it is also possible to divide and recombine using the following steps:
Step I: Divide the data into S subsets of similar structure, with Ys and X s being
the vector of responses and design matrix in subset s (s = 1, 2,…, S).
Step II: For the sth partitioned subset (s = 1, 2,…, S), compute
Approach 12.3
Xi et al. (2008) proposed a method based on first-order approximation of the first
derivative with respect to βs log likelihood using Taylor’s expansion at β̂k as shown
below
n s
ls∗ = − μ̂ β̂s β − β̂s xsi xsi
i=1
exi β̂s
where the ith element of μ̂si (β) is μ̂si (βs ) = .
1+exi β̂s
The above expression can be rewritten as
Fig. 12.4 Flowchart displaying D&R method for logistic regression (Approach 12.2)
12.4 Divide and Recombine 231
where
ns
As = μ̂si β̂s xsi xsi .
i=1
S
The recombined estimates are obtained from l ∗ = ∗
s=1 l s = 0. Let A =
S
s=1 As . Then the estimators are
S −1 S
β̂ = As As β̂s . (12.26)
s=1 s=1
The steps for obtaining estimates using the divide and recombine technique are
described below.
Step I: Divide the data into S subsets of similar structure, with Ys and X s being
the vector of responses and design matrix in subset s (s = 1,2,…, S),
respectively.
Step II: For the sth partitioned subset (s = 1,2,…, S), compute
ns
As = μ̂si β̂s xsi xsi and β̂s
i=1
where the parameters for the subset s,β̂s , are obtained from the following
estimating equations
∂ls s n
ls∗ = = [ysi − μ(β)]xsi = 0.
∂β i=1
In previous Sects. (12.4.1 and 12.4.2), we have shown D&R applications for identity
and logit link functions. In this section, the method is illustrated for count data. Let
us consider count variables Y = (Y1 , . . . , Yn ) with observations y = (y1 , . . . , yn )
232 12 Analysis of Big Data Using GLM
Fig. 12.5 Flowchart displaying D&R method for logistic regression (Approach 12.3)
where n is very large. Let us define the data matrix comprised of observed values of
explanatory variables X = (x1 , . . . , xn ) where xi = (1, xi1 , . . . , xi p ) and the vector
of regression coefficients β = (β0 , β1 , . . . , β p ). The Poisson distribution for count
data is represented by
e−λ λ y
f (y, λ) = , y = 0, 1, . . . (12.27)
y!
1
n
[yi − exi β ]xi j = 0, j = 0, 1, . . . , p (12.29)
a(φ) i=1
where xi0 = 1.
p
∂ηi
zi = xi j β̂ (m−1) + (yi − μi ) ,
j=1
j
∂μi
∂ηi
μi and ∂μi
are evaluated at β̂ (m−1) and W is a diagonal matrix with ith element
1 ∂μi 2
wii = .
Var(Yi ) ∂ηi
The estimators for S subsets are obtained by performing the iterative process until
the convergence is attained for s = 1,2,…, S
(m−1)
X s Ws X s β̂s(m) = X s Ws z s(m−1) , s = 1, 2, . . . , S (12.31)
p
evaluated at β̂s(m−1) , where z si = xsi j β̂s(m−1)
j + (ysi − μsi ) ∂ηsi
∂μsi
, μsi , and ∂ηsi
∂μsi
j=0
are evaluated at β̂s(m−1) and Ws is a diagonal matrix with ith element
1 ∂μsi 2
wii = .
Var(Ysi ) ∂ηsi
The steps for D&R are discussed below and Fig. 12.6 displays the steps graphically
for the Poisson regression model.
Step I: Divide the data into S subsets of similar structure, with Ys and X s being
the vector of responses and design matrix in subset s (s = 1,2,…,
S).
Step II: For the sth partitioned subset (s = 1,2,…, S), compute X s Ws X s ,
X s Ws z s ,and β̂s s = 1, 2, . . . , S.
Step III: Recombine using the estimates obtained in Step II as follows
Fig. 12.6 Flowchart displaying D&R method for Poisson regression for count data
234 12 Analysis of Big Data Using GLM
S −1 S
β̂ = X s Ws X s X s Ws z s . (12.32)
s=1 s=1
1
n
yi − exi β xi j = 0, j = 0, 1, . . . , p. (12.33)
a(φ) i=1
The steps for D&R are discussed below and Fig. 12.7 illustrates the computational
procedure of the generalized linear model with log link function.
Step I: Divide the data into S subsets of similar structure, with Ys and X s being
the vector of responses and design matrix in subset s (s = 1,2,…, S).
Step II: For the sth partitioned subset (s = 1,2,…, S), compute β̂s , s = 1, . . . , S,
solving the following estimating equations
1
n
ysi − exsi βs xsi j = 0, s = 1, 2, . . . , S; j = 0, 1, . . . , p.
as (φ) i=1
Fig. 12.7 Flowchart displaying D&R method for Poisson regression with log link for count data
12.4 Divide and Recombine 235
S
β̂s
β̂ R = s=1
. (12.34)
S
n! y y
P(Y1 = y1 , . . . , Y J = y J ) = π 1 . . . πJ J . (12.35)
y1 ! . . . y J ! 1
⎛ ⎞
J y
yj
J e−μ j μ j
J yj! μ j /μ j
⎝
P Y1 = y1 , . . . , Y J = y J ⎠
Yj = n =
j=1
= n! ,
e−μ μn yj!
j=1 n! j=1
(12.36)
μ
which is equivalent to the multinomial form with π j = μj .
We can express the above distribution in exponential form
⎛ ⎞ μ
J J J
y j ln μj +ln(n!)−ln yj!
P ⎝Y1 = y1 , . . . , Y J = y J Y j = n ⎠ = e j=1 j=1
.
j=1
μ
The link functions for Y1 , . . . , Y J are ln μi1i j = xi j β j , i = 1, 2,…, n, where
xi j = (1, xi j1 , . . . , xi j p ) and β j = (β j0 , β j1 , . . . , β j p ) .
The log likelihood function is
⎛ ⎞
n
J
J
μi j
l= yi j ln + ln(n!) − ln ⎝ yi j !⎠
i=1 j=1
μi j=1
⎡ ⎛ ⎞ ⎛ ⎞⎤
n
J
J J
= ⎣ yi j (xi j β j ) − ⎝1 + exi j β j ⎠ + ln(n!) − ln ⎝ yi j !⎠⎦. (12.37)
i=1 j=2 j=2 j=1
∂l n
= yi j − πi j (β) xi jk = 0, j = 2, . . . , J ; k = 0, 1, 2, . . . , p.
∂β jk i=1
n
I (β) = πi (β)(1 − πi (β))xi xi
i=1
∂l n
= ysi j − πsi j (βs ) xsi jk = 0,
∂βs jk i=1
s = 1, . . . , S; j = 2, . . . , J ; k = 0, 1, 2, . . . , p.
The role of statistics in big data analysis has become a focal issue in the recent debate
on data science. For big data, the statisticians need to address some formidable chal-
lenges that require developing new theories, methods, and tools for data integration
and visualization in dealing with volume, velocity, and variability of big structured
or unstructured data. The role of sufficiency in reducing the dimension of data is well
known in statistics. The exponential family provides sufficient statistics and GLM
can be applied as a possible modeling approach for analyzing big data. It is shown in
this chapter that the D&R technique can be used for analyzing big data by employ-
ing data reduction strategy. Using the D&R approach, the big data is partitioned into
smaller subsets and sufficient statistics from each subset are used to obtain recom-
bined estimates. It can be shown that the recombined estimates are very close to the
aggregate values if appropriate GLM technique is used. The D&R method through
the use of GLM can be a very useful modeling approach in big data.
References 237
References
Bahadur RR (1954) Sufficiency and statistical decision functions. Ann Math Stat 25:423–462
Buhlmann P, Petros D, Michael K, van der Mark L (2016) Handbook of big data. Routledge, London
Chen Y, Dong G, Han J, Pei J, Wah BW, Wang J (2006) Regression cubes with lossless compression
and aggregation. IEEE Trans Knowl Data Eng 18:1–15
Chen X, Xie M (2014) A split-and-conquer approach for analysis of extraordinarily large data. Stat
Sinica 24:1655–1684
Cleveland S, Hafen R (2014) Divide and recombine (D&R): data science for large complex data.
Stat Anal Data Min 7:425–433
Cox DR, Kartsonaki C, Keogh RH (2018) Big data: some statistical issues. Stat Probab Lett
1(36):111–115
Dobson AJ, Barnett AG (2018) An introduction to generalized linear models, 4th edn. CRC Press,
Boca Raton
Donoho D (2015) 50 Years of data science. Presentation at the Tukey Centennial Workshop, Prince-
ton, New Jersey, Sep 2015
Donoho D (2017) 50 Years of data science. J Comput Graph Stat 26(4):745–766
Dunson DB (2018) Statistics in the big data era: failures of the machine. Stat Probab Lett 1(36):4–9
Einav L, Levin J (2014) Economics in the age of big data. Science 346:1243089-1, -5
Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models,
2nd edn. Springer, New York
Fisher RA (1920) A mathematical examination of the method of determining the accuracy of an
observation by the mean error and by the mean square error, M.N.R. Astron Soc 80(8):758–770
Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos Trans R Soc
Lond A 222:309–368
Fisher RA (1925) Theory of statistical estimation. Proc Camb Philos Soc 22:700–725
Fraser DAS (1961) Invariance and the fiducial method. Biometrika 48:261–280
Fraser DAS (1963) On sufficiency and the exponential family. J R Stat Soc Ser B 25:115–123
Guha S, Hafen R, Rounds J, Xia J, Li J, Xi B, Cleveland WS (2012) Large complex data: divide
and recombine (D&R) with RHIPE. Stat 1(1):53–67
Hafen R (2016) Divide and recombine: approach for detailed analysis and visualization of large
complex data. Handbook of big data. Chapman and Hall, Boca Raton
Halmos PR, Savage LJ (1949) Application of the radon-nikodym theorem to the theory of sufficient
statistics. Ann Math Stat 20:225–241
Härdle WK, Lu HHS, Shen X (eds) (2018) Handbook of big data analytics. Springer
Koopman BO (1936) On distribution admitting a sufficient statistic. Trans Am Math Soc 39:399–409
Lee JYL, Brown JJ, Ryan MM (2017) Sufficiency revisited: rethinking statistical algorithms in the
big data era. Am Stat 71(3):202–208
Lehmann EL (1959) Theory of hypothesis testing. Wiley, New York
Liu W, Li Y (2018) A new stochastic restricted Liu estimator for the logistic regression model.
Open J Stat 8:25–37
Pitman EJG (1936) Sufficient statistics and intrinsic accuracy. Proc Camb Philos Soc 32:567–579
Reid N (2018) Statistical science in the world of big data. Stat Probab Lett 1(36):42–45
Sangalli LM (2018) The role of statistics in the era of big data. Stat Probab Lett 1(36):1–3
Xi R, Lin N, Chen Y (2008) Compression and aggregation for logistic regression analysis in data
cubes. IEEE Trans Knowl Data Eng 1(1):1–14
Zomaya AY, Sakr S (eds) (2017) Handbook of big data technologies. Springer
ZuoW Li Y (2018) A new stochastic restricted Liu estimator for the logistic regression model. Open
J Stat 8:25–37
Appendix A
Programming Codes in R
summary(Age.days)
mean(Age.days, trim = 0.05)
sd(Age.days)
sd(Age.days)/mean(Age.days)*100
IQR(Age.days, type=6)
summary(Usage.km)
mean(Usage.km, trim = 0.05)
sd(Usage.km)
sd(Usage.km)/mean(Usage.km)*100
IQR(Usage.km, type=6)
cor(Age.days,Usage.km)
Example 3.3 Weibull distribution fitting for uncensored data. The variable usage
(in km at failure) is denoted by Usage.km.
library(survival)
delta <- rep(1, length(Usage))
Weib.fit <- survreg(Surv(Usage.km, delta) * 1, dist='weibull')
eta.hat <- exp(Weib.fit$coefficient)
Appendix A: Programming Codes in R 241
Example 3.4 Lognormal distribution fitting for uncensored data. The variable usage
(in km at failure) is denoted by Usage.km.
library(survival)
delta <- rep(1, length(Usage))
lognorm.fit <- survreg(Surv(Usage.km, delta) * 1, dist='lognormal')
mu.hat <- lognorm.fit$coefficients
sigma.hat <- lognorm.fit$scale
MTTF <- exp(mu.hat + sigma.hat^2/2)
Example 5.3 Comparison of two survival functions. In this example, the data frame
named bat consists of three variables time, status, and x, where time denotes the time in
months, status denotes the censoring indicator for time (1 means failure and 0 means
censored), and x represents a factor with two levels Maintained and Nonmaintained.
library(survival)
test <- survdiff(Surv(time, status) * x, data = bat)
test$chisq
survdiff(Surv(time, status) * x, data = bat, rho=0)
survdiff(Surv(time, status) * x, data = bat.IPS, rho=1)
S.group <- survfit(Surv(time, status) * x, data = bat)
summary(S.group)
plot(S.group, main=''S(t) for two groups for IPS Battery data'', xlab=''Time
in months'', ylab=''Proportion surviving'', lty = 2:3, col=1:2, lwd=2)
legend(''topright'', c(''Maintained'', ''Nonmaintained''), lty = 2:3,
col=1:2, text.col=c(1,2), lwd=2)
Example 6.2 Exponential distribution fitting for failure and censored data. The data
and variables are the same as of Example 5.2.
library(survival)
battery.exp <- survreg(Surv(time, status)* 1, data = batterydata, dist='-
exponential')
battery.exp$coefficients
lambda.hat <- 1/exp(battery.exp$coefficients)
exp.mean <- 1/lambda.hat
par(mfrow=c(2,2))
t <- seq(0, 4500, 0.5)
f <- dexp(t, rate = lambda.hat, log = FALSE)
plot(t, f, main=''Probability density function'', xlab= ''Days, t'',
ylab=''f(t)'', col=1, lty=1, type=''l'', lwd=2)
F <- pexp(t, rate = lambda.hat, lower.tail = TRUE, log.p = FALSE)
plot(t, F, main=''Cumulative density function'', xlab= ''Days, t'', ylab=''F
(t)'', col=1, lty= 1, type=''l'', lwd=2)
R <- 1-pexp(t, rate = lambda.hat, lower.tail = TRUE, log.p = FALSE)
plot(t, R, main=''Reliability function'', xlab=''Days, t'', ylab=''R(t)'',
col=1, lty= 1, type=''l'', lwd=2)
h <- rep(lambda.hat, length(t))
plot(t, h, main=''Hazard function'', ylim=c(0,0.0020), xlab=''Days, t'',
ylab=''h(t)'', col=1, lty=1, type=''l'', lwd=2)
Appendix A: Programming Codes in R 243
Example 6.3 Weibull distribution fitting for failure and censored data. The data and
variables are the same as of Example 5.2.
battery.Weib <- survreg(Surv(time, status)* 1, data=batterydata, dis-
t='weibull')
# scale parameter
lambda.hat <- exp(battery.Weib$coefficient)
# shape parameter
alpha.hat <- 1/battery.Weib$scale
MTTF <- lambda.hat*gamma(1 + 1/alpha.hat)
par(mfrow=c(2,2))
t <- seq(0, 2500, 0.5)
f <- dweibull(t, shape=alpha.hat, scale = lambda.hat, log = FALSE)
plot(t, f, main=''Probability density function'', xlab= ''Days, t'',
ylab=''f(t)'', col=1, lty=1, type=''l'', lwd=2)
F <- pweibull(t, shape=alpha.hat, scale = lambda.hat, lower.tail = TRUE,
log.p = FALSE)
plot(t, F, main=''Cumulative density function'', xlab= ''Days, t'', ylab=''F
(t)'', col=1, lty= 1, type=''l'', lwd=2)
R <- 1- pweibull(t, shape=alpha.hat, scale= lambda.hat, lower.tail = TRUE,
log.p = FALSE)
plot(t, R, main=''Reliability function'', xlab=''Days, t'', ylab=''R(t)'',
col=1, lty= 1, type=''l'', lwd=2)
h <- f/R
plot(t, h, main=''Hazard function'', ylim=c(0,0.007),xlab =''Days, t'',
ylab=''h(t)'', col=1, lty=1, type=''l'', lwd=2)
Example 7.1 Fitting of proportional hazard (PH) model based on hypothetical data.
library(survival)
data7.1 <- list(age=c(61, 62, 63, 64, 65),
status=c(1,0,1,1,0),
gender=c(0,1,1,0,0))
ph.fit <- coxph(Surv(age, status) * gender, data7.1)
ph.fit
summary(ph.fit)
# or
beta.hat <- log(3)/2
I <- 6*exp(beta.hat)/(2*exp(beta.hat)+3)^2 +2*exp(beta.hat)/(exp(beta.
hat)+2)^2
244 Appendix A: Programming Codes in R
1/I
exp(-beta.hat)
Example 7.2 Weibull regression model. In this example, the data frame named
auto.data consists of the following variables:
Example 9.2 Reliability function of a competing risk model (or series system).
t <- seq(1,10000)
lambda1 <- 0.0006; lambda2 <- 0.0004;
lambda <- lambda1+lambda2
R.FM1 <- 1 - pexp(t, rate=lambda1, lower.tail=T, log.p=F)
R.FM2 <- 1 - pexp(t, rate=lambda2, lower.tail=T, log.p=F)
Appendix A: Programming Codes in R 245
TT = seq(100, 1500, 1)
TT.no <- length(TT); JT.out <- array()
for(j in 1 : TT.no){
JT.out[j] <- JT(TT[j])
}
JT.star.opt = min(JT.out)
for(i in 1:TT.no){
if(JT.out[i] == JT.star.opt) {T.est <- TT[i]}
}
248 Appendix A: Programming Codes in R
References
Hosmer DW, Lemeshow S, May S (2008) Applied survival analysis: regression modeling of
time-to-event data, 2nd edn. Wiley-Interscience
Kleinbaum DG, Klein M (2012) Survival analysis: a self-learning text, 3rd edn. Springer, New
York
Moore DF (2016) Applied survival analysis using R. Springer International Publishing
Singer JD, Willett JB (2003) Applied longitudinal data analysis: modeling change and event
occurrence. Oxford University Press
Tableman M, Kim JS (2003) Survival analysis using S: analysis of time-to-event data. Chapman
and Hall/CRC
Index
A interval, 61
Accelerated failure time model, 128 left, 61
Age-based analysis, 79 progressive Type II, 60
Age-based claim rate, 79 right, 58
Age-specific death rate, 24 Type I, 58
Assembly error, 182, 183 Type II, 59
Association Censoring time, 4
negative, 117 Chain rule, 149
no, 117 Chapman-Kolmogorov equation, 202
positive, 117 Chi-squared distribution, 85
Assumption of proportionality Claim rate, 79
assessing, 124 Coefficient
At sale reliability, 181 correlation, 16
Average, 15 of variation, 16
Average failure rate, 25 rank correlation, 16
Combined system, 173
B Comparison
Baseline hazard function, 120 reliability function, 83
Baseline level, 139 survival function, 83
Baseline reliability function, 119 Competing risk model, 167, 168, 183
Bernoulli distribution, 117 Conditional cdf, 23
Bernoulli regression model, 155 Conditional reliability function, 23
Big data, 219, 224 Conditioning variable division, 224, 225
Big data analysis Confidence interval
steps, 220 normal approximation, 76
Binary outcome, 205 Constant failure rate, 24
Binary variable, 116 Constant hazard function, 35
B ten life, 29 Corrective maintenance, 189
Correlation
C rank, 17
Canonical link function, 149, 151 Cost per unit time, 193
Canonical parameter, 222 Count data, 231
Censored observation, 57 Covariance, 17
Censoring, 57 Covariate dependence, 205