Considering Fault Removal Efficiency in Software Reliability Assessment
Considering Fault Removal Efficiency in Software Reliability Assessment
1, JANUARY 2003
Abstract—Software reliability growth models (SRGMs) have been studied. Other reliability measures, such as the mean time
been developed to estimate software reliability measures such until next failure [18], are also investigated.
as the number of remaining faults, software failure rate, and Although some software reliability studies addressed the
software reliability. Issues such as imperfect debugging and the
learning phenomenon of developers have been considered in imperfect debugging phenomenon, most of them only con-
these models. However, most SRGMs assume that faults detected sidered possibilities of adding new faults while removing
during tests will eventually be removed. Consideration of fault the existing ones. However, imperfect debugging also means
removal efficiency in the existing models is limited. In practice, that detected faults are removed with an imperfect removal
fault removal efficiency is usually imperfect. This paper aims efficiency other than 100%. Jones [7] pointed out that the
to incorporate fault removal efficiency into software reliability
assessment. Fault removal efficiency is a useful metric in software defect-removal efficiency is an important factor for software
development practice and it helps developers to evaluate the quality and process management. It can provide software
debugging effectiveness and estimate the additional workload. In developers with the estimation of testing effectiveness and
this paper, imperfect debugging is considered in the sense that the prediction of additional effort. Moreover, fault removal
new faults can be introduced into the software during debugging efficiency is usually below 100% (e.g., it ranges from 15% to
and the detected faults may not be removed completely. A model
is proposed to integrate fault removal efficiency, failure rate, and 50% for unit test, 25% to 40% for integration test, and 25% to
fault introduction rate into software reliability assessment. In 55% for system test [7]. Goel and Okumoto [4] also considered
addition to traditional reliability measures, the proposed model a similar conception in their Markov model. They assumed
can provide some useful metrics to help the development team that after a failure the residual faults remain the same with
make better decisions. Software testing data collected from real probability and it reduces to one less than current value with
applications are utilized to illustrate the proposed model for both
the descriptive and predictive power. The expected number of probability . In other words, fault removal is not always 100%.
residual faults and software failure rate are also presented. Kremer [8] applied a birth–death process to software reliability
modeling, considering both imperfect fault removal probability
Index Terms—Akaike’s information criteria (AIC), max-
imum-likelihood estimate (MLE), nonhomogeneous poisson (death–process) and fault introduction (birth process).
process (NHPP), software reliability growth. In practice, software fault debugging is a very complex
process. Usually, when testers detected a deviation from the
requirements, they create a modification request. Then the
I. INTRODUCTION review board members will assign this request to a particular
developer. After the developer studies the software fault, he/she
A S SOFTWARE systems get larger and more complex, the
software development process inevitably becomes more
complicated. Powerful metrics play an important role in the as-
will submit a code change to fix it. The changed code has to go
through the various tests (unit test, integration test, and system
sisting management decisions making for a complicated process test) again to make sure that it fixes the reported problem. The
like software development. For instance, reliability is a signif- fix may not pass these tests and sometimes even if it passes
icant factor in quantitatively characterizing quality and deter- these tests the fault may not be completely removed due to
mining when to stop testing and release software on the basis of the fact that the test environment maybe not be the same as
predetermined reliability objectives. Software reliability growth the customer environment. It is not unusual for the software
models [2]–[5], [8]–[14], [19], [22], [23] have been proposed development team to find that a software fault has been reported
to estimate reliability metrics such as the number of residual multiple times before it is finally removed. Some faults can
faults, failure rate, and reliability of software. Perfect [3], [5], only be encountered in the customer field trials. Therefore,
[11], [12], [21], [22] and imperfect debugging [14], [15] are con- fault removal efficiency is an important factor for software
sidered in the NHPP models. In some of these models, learning reliability estimation and software project management.
phenomenon of the software developers [11], [12], [17] has also In this paper, we propose a methodology to integrate fault re-
moval efficiency into software reliability growth models. Sec-
tion II presents the formulation of the NHPP model addressing
fault removal efficiency and fault introduction rate. The explicit
Manuscript received January 3, 2000; revised February 28, 2003. This re-
search was supported in part by the FAA William J. Hughes Technical Center solution of the mean value function (MVF) for the proposed
under Grant 98-G-006 and by the NSF under Grant INT-0107755. This paper NHPP model is derived. This model considers the learning phe-
was recommended by Associate Editor L. Fang. nomenon using an S-shaped fault detection rate function and
The authors are with the Department of Industrial Engineering, Rutgers Uni-
versity, New Brunswick, NJ 08903, USA (e-mail: [email protected]). introduces a constant fault introduction rate. Section III evalu-
Digital Object Identifier 10.1109/TSMCA.2003.812597 ates the proposed model and compares and contrasts it to the
1083-4427/03$17.00 © 2003 IEEE
ZHANG et al.: CONSIDERING FAULT REMOVAL EFFICIENCY IN SOFTWARE RELIABILITY ASSESSMENT 115
other existing NHPP models using two sets of data collected members will assign a developer to look into the code. Although
from real software applications. Software reliability metrics in- the fault that causes the failure may not be removed immedi-
cluding the expected number of remaining faults and software ately, the debugging effort is still initiated. When the developer
failure rate are estimated using the proposed fault removal effi- tries to modify the code, new faults could be introduced to the
ciency model. The results show that the proposed model has the software. This is captured by the assumption 3 and 4.
following technical merits: improving both software reliability
assessment and providing additional metrics for development A. General NHPP Software Reliability Model With Fault
project evaluation and management. Section IV summarizes the Removal Efficiency
conclusions. In this section, fault removal efficiency and fault introduction
rate are integrated into the MVF of an NHPP model. Fault re-
Notation moval efficiency is defined as the percentage of bugs eliminated
Counting process for the total number of failures in by reviews, inspections, and tests. This section also presents
. an explicit solution to the differential equation of the proposed
Expected number of software failures by time , model. The MVF that incorporates both fault removal efficiency
. and fault introduction phenomena can be obtained by solving
Total fault content rate function, i.e., the sum of ex- the system of differential equations as follows:
pected number of initial software faults and intro-
duced faults by time . (1)
Failure detection rate function, which also repre-
sents the average failure rate of a fault. (2)
Fault removal efficiency, i.e., percentage of faults
eliminated by reviews, inspections and tests. where represents the fault removal efficiency, which means
Fault introduction probability at time . % percentage of detected faults can be eliminated completely
Intensity function or fault detection rate per unit during the development process. Therefore, in (1), rep-
time, . resents the expected number of faults detected by time , and
Software reliability function by time for a mission then, represents the expected number of faults that can
time . be successfully removed. Existing models usually assume that
is 100%.
The marginal conditions for the differential equations (1) and
II. SOFTWARE RELIABILITY MODELING
(2) are as follows:
In the family of software reliability models, NHPP software
reliability models have been widely used in analyzing and (3)
estimating the reliability related metrics of software products in
many applications, such as, telecommunications [6], [24] etc. (4)
These models consider the debugging process as a counting
process, which follows Poisson distribution with a time-de- where is the number of initial faults in the software system
pendent intensity function. Existing NHPP software reliability before testing starts.
models can be unified into a general NHPP function proposed Most existing NHPP models assume that the fault failure rate
by Pham etc. [16]. The primary task of using the NHPP models is proportional to the total number of residual faults. Equation
to estimate software reliability metrics is to determine the (1) can be deduced directly from assumption 2 and 3. Software
Poisson mean, which is known as the MVF. system failure rate is a function of the number of residual faults
In this section, an NHPP model with fault removal efficiency at any time and the fault detection rate (which can also be in-
is presented. The following are the assumptions for this model: terpreted as the average failure rate of a fault). The expected
number of residual faults is given by
1) The occurrence of software failures follows an NHPP.
2) The software failure rate at any time is a function of fault (5)
detection rate and the number of remaining faults pre- Notice, that when , the proposed model can be reduced to
sented at that time. an existing NHPP model [17].
3) When a software failure occurs, a debugging effort will be Equation (2) can also be directly deduced from assumption 3
initiated immediately with probability . This debugging and 4. The fault content rate in software at time is pro-
is s-independent at each location of the software failures. portional to the rate of debugging efforts to the system, which
4) For each debugging effort, whether the fault is success- equals to because of assumption 3.
fully removed, or not, some new faults may be introduced Equation (5) can be used to derive explicit solutions of (1)
into the software system with probability , . and (2). By taking derivatives on both sides of (5), we obtain
Assumption 1 is a widely accepted assumption. Assumption 2
can be interpreted as follows: software failure rate is the product
of the number of residual faults (which incorporates the concept or
of fault removal efficiency) and the average failure rate of a fault.
In practice, once a software failure is reported, the review board (6)
116 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO. 1, JANUARY 2003
(7)
From (1), the failure rate function can be expressed as follows:
(8)
Therefore, the explicit expression of the MVF can be obtained
as follows:
(9)
Using the result in (8), one can also obtain the solution for the
fault content rate function by taking the integral of (2). The fault
content rate function is given by
B. NHPP Model
In this section, we derive a new NHPP model from the general
class of model presented in the previous section. The fault detec-
tion rate function in this model, , is a nondecreasing func-
tion with inflexion S-shaped curve [11], [12], which captures
the learning process of the software developers. In the existing
models [11], [12], however, the upper bound of fault detection
rate is assumed to be the same as the learning curve increasing
rate. This is for the purpose of calculation convenience. In this
paper, we relax this assumption and use a different parameter
for the upper bound of fault detection rate [see (11)]. The model
also addresses imperfect debugging by assuming faults can be
introduced during debugging with a constant fault introduction
probability, . That is
(11)
Substituting (11) into (9), we obtain the MVF for the proposed and the software failure rate is
model as follows:
(14)
(12)
Note, that when the testing time goes to infinity, con- Table I summarizes the features of the proposed model and the
verges to its upper bound . The expected number of existing ones.
residual faults is given by
C. Parameter Estimation and Model Comparison
(13) Parameter Estimation: Once the analytical expression for
the MVF is derived, the parameters in the MVF need to be
ZHANG et al.: CONSIDERING FAULT REMOVAL EFFICIENCY IN SOFTWARE RELIABILITY ASSESSMENT 117
TABLE II
REAL-TIME CONTROL SYSTEM DATA
estimated, which is usually carried out by using the maximum III. MODEL EVALUATION AND COMPARISON
likelihood estimate (MLE) method. A. Case 1: Data From a Real Time Control System
Model Comparison: Two criteria are used for model com-
parison. In this section, we evaluate the performance of the In this section, we examine the goodness-of-fit and predictive
models using the sum of squared errors (SSE) and the Akaike’s power of the proposed model and compare it with the existing
information criterion [1]. Both the descriptive and the predictive models. The first set of data is documented in Lyu [9]. There are
power of the models are considered. The sum of squared error totally 136 faults reported and the time-between failures (TBF)
is usually used as a criterion for comparison of goodness-of-fit in second are listed in Table II.
and predictive power. SSE can be calculated as follows: We need to separate the data sets into two subsets for the
goodness-of-fit test and predictive power evaluation. Since an
extremely long TBF from the 122nd fault to the 123rd fault is
SSE (15) observed, and the TBFs after the 123rd fault increases tremen-
dously, implying reliability growth, the system becomes stabi-
where lized. In this study, we use the first 122 data points for the good-
observed number of faults by time ; ness-of-fit evaluation and the remaining data points for the pre-
dictive power test. The SSE and AIC values for goodness-of-fit
expected number of faults by time estimated by a
and prediction are listed in Table III.
model;
As seen from Table III, the proposed model provides the best
fault index. fit and prediction for this data set (both the SSE and the AIC
Another criterion used for model comparison is AIC, which values are the lowest among all models). Furthermore, some
can be calculated as follows: instrumental information can be obtained from the parameter
estimation provided by the proposed model. For example, the
likelihood function at its maximum value fault removal efficiency is 90%, which is relatively high
(16) according to [7]. The number of initial faults is estimated
to be 135, together with 90% fault removal efficiency, the ex-
where represents the number of parameters in the model. The pected number of total detected faults is then 152. Therefore,
AIC measures the ability of a model to maximize the likelihood at the assumed stopping point of 57 042 s, there are about 30
function that is directly related to the degrees of freedom during ( ) faults remaining in the software. The fault in-
fitting, increasing the number of parameters will usually result troduction probability is 0.012, that is, on average, one fault
in a better fit. AIC criterion takes the degree of freedom into con- will be introduced when 100 faults are removed. Some models
sideration by assigning a model with more parameters a larger underestimate the expected number of total faults .
penalty. The lower the SSE and AIC values, the better the model Software failure rate can be predicted after the parameters
performs. are estimated. Fig. 1 shows the trend of failure rate forthe test
118 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 33, NO. 1, JANUARY 2003
TABLE III
PARAMETER ESTIMATION AND MODEL COMPARISON