Predicting and Estimating Nov 06
Predicting and Estimating Nov 06
13
Linda M. Laird 2004
All Rights Reserved
So how do we predict what the
reliability be?
14
Linda M. Laird 2004
All Rights Reserved
Prediction Model steps
Can either make prediction for each step or
use actual data if that step has already
occurred.
Multiple methodologies for each step
Issue remains of predicting failures from
faults.need to be carefulit is a weak link
Fault
Profile
& Defect
Density
Initial Failure
Rate
Delivered
and
On-going
Failure Rate
15
Linda M. Laird 2004
All Rights Reserved
Predicting Fault Density and Distribution
Typical Distribution of Faults
Defect Prediction Models
Dynamic
Rayleigh, Exponential, S-Curve Models
(Fault Injection)
Static
Coqualmo Model
Based on Process
RL-TR-92-95
Industry Data
Such as SEI Delivered Fault Data
Local Models Historical Data
Note: Much of this
Material is from CS533 --
Included as a review
16
Linda M. Laird 2004
All Rights Reserved
Typical Fault Distributions
17
Linda M. Laird 2004
All Rights Reserved
Defect Dynamics and Behaviors
Defects have certain dynamics,
behaviors, and patterns which are
important to understand in order to
understand the dynamics of software
development
18
Linda M. Laird 2004
All Rights Reserved
Projected Software Defects
In general, defect arrivals follow a Rayleigh Distribution Curvecan predict,
based upon project size and past defect densities, the curve, along with the
Upper and Lower Control Bounds
Time
Defects
Upper Limit
Lower
Limit
F(t) = 1 e^((-t/c)^2)
f(t) = 2*((t/c)^2) *e ^((-t/c)^2)
Recall that F(t) is the cumulative distribution density, f(t) is the
probability distribution, t is time, and c is a constant.
19
Linda M. Laird 2004
All Rights Reserved
Defects Detected tends to be similar to Staffing
Curves
People
Defects
Time
Source: Industrial Strength Software,
Putnam & Myers, IEEE, 1997
20
Linda M. Laird 2004
All Rights Reserved
Which is related to Code Production Rate
People
Defects
Time
Code Production
Rate
And all tend to
follow Rayleigh
Curves
TEST
Note: Period during test is similar to
exponential curve
Source: Putnam &
Myers
21
Linda M. Laird 2004
All Rights Reserved
Defect Prediction/Estimation Models
Total number of defects
Distribution of defects over time
22
Linda M. Laird 2004
All Rights Reserved
Defect Model Types: Static and Dynamic
Dynamic is usually based on statistical distributions of faults
found (aka estimated)
Two types
One that model the entire development Rayleigh distributions
One that models the testing/deployment process Exponential and
S-Curve models
Work better in the large on projects when you need to estimate
when/if the project will fail.
Static uses attributes of the program to estimate number of
defects (aka predicted)
Typically of form y = f(a,b,c,d,e) where y is the defect rate or # of
defects, and a->z are attributes of the product, process, and/or
project
COQUALMO & RL-TR-92-95 Model, Industry Data, Local Historical
are all static models
Usually work better at the module level to provide indication to
engineers on where to focus
23
Linda M. Laird 2004
All Rights Reserved
Total Defects and Defect Distribution
If you have fault data, you can estimate the
total number of faults and the distribution.
Via Calculations or Tools, using predictive models
Method for the three primary distributions:
Rayleigh
Exponential
S-Curves
If you dont have fault data, you use historical
data from other projects and static models
24
Linda M. Laird 2004
All Rights Reserved
Development Phase Model Applicability
Start tracking
defects
Start
Independent
Testing
Rayleigh
Model
Exponential (Reliability Growth)
Model & S-Curves
Static
Models
25
Linda M. Laird 2004
All Rights Reserved
Exponential and S-Shaped Distributions
S-Shaped Curve
Exponential
Time
Cumulative
Failures
Found (e,g.
F(t))
26
Linda M. Laird 2004
All Rights Reserved
Exponential and S-Shaped Distributions
S-Shaped Arrival Curve
Exponential
Time
Defects
Found
(Arrival
Distribution
f(t))
27
Linda M. Laird 2004
All Rights Reserved
S curves: Overview
Resemble an S---with a slow start, then a much
quicker discovery rate, and than a slow tail-off at the
end
Based upon view that software defect removal
process is a defect detection, defection isolation and
defect correctionand all of them take time.
Multiple S curve models, all Based upon the non-
homogeneous Poisson process for the arrival
distribution
One equation:
M(t) =
Where M(t) is the expected number of failures by time t,
and K is the total number of failures
| |
t
e t k
+ ) 1 ( 1
28
Linda M. Laird 2004
All Rights Reserved
Rayleigh & Exponential Curves
In the family of Weibull curves;
Which have the form of:
F(t) = 1 e
(-t/c)
m
;
f(t) = (m/t)*(t/c)
m
e
(-t/c)
m
For m = 1Exponential Distribution
For m = 2 Rayleigh Distribution
29
Linda M. Laird 2004
All Rights Reserved
Rayleigh Model
Defect Arrival Rate (PDF) the number of defects to
arrive at time t =
Cumulative Defects (CDF) -- the total number of defects
to arrive by time t =
Where:
K=total number of injected defects
c is a function of the time t
max
that the curve reaches
its peak
c = t
max
* sqrt (2)
Note: at t
max,
~ 40% of the defects should have
been found
) 1 ( * ) (
2
) / ( c t
e K t F
=
2 ) / ( 2
* ) / 2 ( * ) (
c t
e c t K t f
=
30
Linda M. Laird 2004
All Rights Reserved
Using Rayleigh Model
Simple extensions of the model provide
other useful information.
For example, defect priority classes can
be specified as percentages of the total
curve.
This allows the model to predict defects
by severity categories over time
31
Linda M. Laird 2004
All Rights Reserved
Plotting the graphs/looking at the fxns
If K = 1, F(t) =
probability of 1
defect arriving by
time t
f(t) = probability
of defect arriving
at time 1
So. what do
these charts
mean?
Raleigh distribution - c=2
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15
time
p
r
o
b
a
b
i
l
t
y
F(t) for c = 2
f(t) for c = 2
Raleigh Distribution c = 10
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
time
p
r
o
b
a
b
i
l
i
t
y
F(t) for c = 10
f(t) for c = 10
32
Linda M. Laird 2004
All Rights Reserved
Plotting the graphs/working with the fxns
These are all for K =
1.
For case 1, t
max
~ =
1.4, => c = ~ 1.96
(close to 2)
For example, the
probability that the
defect will arrive at
time 2 is ~.39, and
the probability that it
has arrived by time
2 is ~.62
For case 2, t
max
= ~7
=> c = 7*1.4 = 9.8
(almost 10)
Raleigh distribution - c=2
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15
time
p
r
o
b
a
b
i
l
t
y
F(t) for c = 2
f(t) for c = 2
Raleigh Distribution c = 10
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
time
p
r
o
b
a
b
i
l
i
t
y
F(t) for c = 10
f(t) for c = 10
33
Linda M. Laird 2004
All Rights Reserved
Predicting defects analytically
You can
assuming a Distribution.and with defect
data collected from early in the project
Mathematically determine the curve and
the equation, as long as youve hit the
maximum.
With Rayleigh, you need a maximum
With Exp, you need enough data to see slope starting
to change
34
Linda M. Laird 2004
All Rights Reserved
Method for using the Rayleigh Distribution
Given n data points, plot them
Determine t
m
(the time t at which f(t) is
max)
Then since you have the formulae
F(t) = K[1-e
-(t
2
/2*tm
2)
]
f(t) = K[ (1/t
m
)
2
*t*e
-(t
2
/2*tmax^
2)
]
Where F(t) is the cumulative arrival rate,f(t)
is the arrival rate for defects, and K is the
total number of defects
And you can then use these to predict the
later arrival of defects.
35
Linda M. Laird 2004
All Rights Reserved
Example: 594 Faults found by day 9
Faults vs. Days
0
20
40
60
80
100
0 2 4 6 8 10
Days
F
a
u
l
t
s
F
o
u
n
d
What is the arrival
function f(t)?
Need tmax and K
to determine f(t).
Tmax -> 7
| |
) 2
)
2
7 * 2 / 1 ( 2
) 7 / 1 ( * ) (
t
te K t f
=
36
Linda M. Laird 2004
All Rights Reserved
Then what would you do?
Solve for K ( you can pick any points --- I use
t = 1 defects = 20 for simplicity) =>
K=20*49/e(-1/98)= ~990
You now have an equation:
Then plot out the equation and use it to
predict arrival rates (and also see how well it
matches to the data)
2
) 98 / 1 (
) 49 / 990 ( ) (
t
te t f
=
2
01 .
2 . 20
t
te
=
Note: this is an extremely simplistic way to solve for the equation. Using more than 1 point
Or tools would be a good idea
37
Linda M. Laird 2004
All Rights Reserved
Using this data
Remember that K is the expected total
number of faults to be found
You can determine # of defects found so far by
taking sum of points on chart which happens to
equal 594.
This chart and analysis says that
You expect ~ 990 faults
Therefore, have found ~60% of faults so far
If you wanted to predict when at least 95% had
been found, you could either
Solve for (K F(t))/K >= .05
Use the equations with an excel model
38
Linda M. Laird 2004
All Rights Reserved
What did we just do?
We figured out how to predict the total
number of defects and the distribution based
upon the defects found to date, and assuming
a Rayleigh distribution
What would you do if you missed some of the
initial data (for example, no one tracked
defects found in requirement)would this
method be useless?
NO youd use the maximum,
and then project the faults found
initially as well.
39
Linda M. Laird 2004
All Rights Reserved
Rayleigh Model Implementation
SPSS (Regression Module)
SAS
SLIM (by Quantitative Software
Management)
STEER (by IBM)
40
Linda M. Laird 2004
All Rights Reserved
Now lets look at an exponential distribution
F(t) = 1 e
-t
f(t) = *e
- t
Or, given K total defects,
F(t) = K*( 1 e
- t
)
f(t) = K * *e
- t
41
Linda M. Laird 2004
All Rights Reserved
Exponential Distributions
Exponential
Time
Cumulative
Failures
Found
K
total
number
of
defects
42
Linda M. Laird 2004
All Rights Reserved
Exponential Distributionswhat is K?
Exponential
Time
Cumulative
Failures
Found
K
total
number
of
defects
43
Linda M. Laird 2004
All Rights Reserved
Solve equations
Either
solve for a few points (ok)
Draw in your own K (not so good)
let excel figure it out for you with trendlines
(better)
44
Linda M. Laird 2004
All Rights Reserved
OK now you try one by hand
Given the following data, what are
K
t 1 2 3 4 5 6 7 8 9
defects found 17 16 15 14 14 13 12 12 12
45
Linda M. Laird 2004
All Rights Reserved
Answer
Since f(t) = K* *e
-
t
, then
f(a)/f(b) = (K* *e
-
a
) /(K* *e
-
t
)
= e
(b-a)
And K = f(t)/ ( *e
-
t
)
Therefore, selecting a = 1 and b = 5, we have
f(1)/f(5) = e
(5-1)
17/14 = e
*4
Ln(17/14)=4
= .048
And then pick a few points to determine K try 1
and 5 again
K
1
= 17/(.048*e
-.048
) = 372
K
5
= 15/(.048* e
-.048*5
) = 371
46
Linda M. Laird 2004
All Rights Reserved
Using the Rayleigh Model instead of the
exponential distribution
47
Linda M. Laird 2004
All Rights Reserved
Predicting arrival rates
If you have a projection of the total
number of defects (using static models
or historical data) you can also predict
the arrival rates of defects using the
Rayleigh model
Then use it as a plan to manage against
If there are significant deviations, this
would cause the manager to investigate
and potentially take remedial action
48
Linda M. Laird 2004
All Rights Reserved
Using The Rayleigh Distribution
This model, given total number of defects
expected, spreads them out over the life-
cycle of the project in a Rayleigh Curve.
Use: to compare projected with actual faults found
to determine project performance
Input is:
Td Total duration of project (to operational
delivery)
Er Total expected # of faults for lifetime of
project
Errors for each time period is
Em = (6 * Er/Td^2)*t*exp(-3t^2/td^2)
NOTE: this assumes ~95% of faults found
before delivery
Source: Putnam and Myers
49
Linda M. Laird 2004
All Rights Reserved
Rayleigh Model - example
Given a 26 week project, and expected
faults of 1000..then.using formula
Defects Per Week
0
10
20
30
40
50
60
0 10 20 30
Week
D
e
f
e
c
t
s
F
o
u
n
d
50
Linda M. Laird 2004
All Rights Reserved
Try another problem
If you expect to have 100 defects, and you
think that the time it takes to shipment is 10
weeks. Youve found 60 defects by week 6.
Are you in good shape or not?
Since the errors you should be finding for
each time period is
Em = (6 * 100/10^2)*t*exp(-3t^2/10^2)
= 6t *exp(-3t
2
/100)
You should have found
Total for weeks 1 to 6 = Sum(E
m
) for m= 1 to 6
= 71.45 (I used a spreadsheet to calculate).
So.either your software is better than you
expected, or you are behind in finding defects.
51
Linda M. Laird 2004
All Rights Reserved
Other Tools
If you dont like the calculations, there
are tools (such as SLIM and STEER)
which, given the arrival rate data, will
help you predict the remaining defects
and arrival patterns.
52
Linda M. Laird 2004
All Rights Reserved
Putnam and Myers (1992) found total defects
projected using Rayleigh curves were within 5% to
10%
Others not as close, but may have had dirty data.
With small projects, have smaller number of data
points, and therefore, less confidence.
Using their STEER software tool IBM Federal
Systems in Gaithersburg, MD estimated latent
defects for 8 projects and compared the estimate with
actual data collected for the first year in the
field..very close.
Some data suggests that m=1.8 for Weibull curves
may be best
Kans recommendation: Use as many models as
possible to predict, compare with each other, track
results, and see what works the best.
Experiential Data and Recommendations:
53
Linda M. Laird 2004
All Rights Reserved
Dynamic Model Distribution Summary
Formal Parametric Models for projecting
latent software defects when
development is complete and the
project is ready to ship
Encompasses both defect prevention
and early defect removal
54
Linda M. Laird 2004
All Rights Reserved
Predicting Fault Density and Distribution
Typical Distribution of Faults
Defect Prediction Models
Dynamic
Rayleigh, Exponential, S-Curve Models
Static
Coqualmo Model
Based on Process
RL-TR-92-95
Industry Data
Such as SEI Delivered Fault Data
Local Models Historical Data
55
Linda M. Laird 2004
All Rights Reserved
COQUALMO by Chulani and Boehm
Defect Analysis Tool from USC
Extension to the COCOMO estimation model
(Software Sizing Model developed by Boehm and
others at USC)
Based on the Defect Insertion/Removal model
Tool/Paper available on our course website
56
Linda M. Laird 2004
All Rights Reserved
Coqualmo is a model which predictsDelivered
Defect Density (per KLOC or per FP)
Defects In
Defects
out
Based upon a variety of
factorsAnd which you can
tune based on your own
experience.
Delivered Defect Density
57
Linda M. Laird 2004
All Rights Reserved
Coqualmo Models
2 Separate models
Source: COCOMO II
Size
Software
Platform,
Product,
Personnel, and
Project
Attributes
Defect
Introduction
Number of non-
trivial reqmts,
design, and coding
defects introduced
Defect
Removal
Number of
Defects per
KLOC
Defect Removal Activities
(Automated Analysis,
Reviews, Testing and Tools
58
Linda M. Laird 2004
All Rights Reserved
Input Parameters
For defect introduction, it uses the COCOMO II project
descriptors (size, personnel capability and experience,
platform characteristics, project practices, and product
characteristics such as complexity and required
reliability) to estimate the number of requirements,
design, and code defects introduced into the project.
For defect removal, it uses ratings of a projects level
of use of analysis tools, peer reviews, and execution
testing, to determine what fraction of the introduced
defects are removed. Its estimates to date are consistent
with general project experience and a small number of
detailed project data points.
59
Linda M. Laird 2004
All Rights Reserved
COQUALMO More detail
Quantitative model for defect introduction and
removal
Acronym for Constructive Quality Model
Chulani and Boehm at USC 1999
Consistent with COCOMO model by Boehm
Current data is from the COCOMO clients and Expert
Opinion
Need addl data from more projects to tune the model
Defects Introduced (DI) =
Where A is the multiplicative constant (for rqmts, design,
coding)
B is initially set to 1 and accounts for economies of scale
QAF is the quality factor that is taking into account 21 defect
introduction factors (Platform, Product, Personnel, and
Project)
( )
j
B
j
j
QAF Size A
j
* *
3
1
=
60
Linda M. Laird 2004
All Rights Reserved
DI equation in English
What does that equation say?
That the number of defects introduced is the sum
of the number of defects introduced in each
requirements, design, and coding
The number of defects introduced in a given
phase = A * (size) ^ B * QAF where
A is based upon which phase
B is based upon size
QAF is based upon the quality of the process, platform,
etc.
61
Linda M. Laird 2004
All Rights Reserved
Example (Using dummy data):
Assume that you calculated the QAF for each phase ---
and that you have the following values, and that the model
has given you the values for A as shown
This says that the Defects Introduced by phase would be:
Phase QAF A
Rqmts 1.2 10
Design 1 20
Coding 0.5 30
Phase QAF A DI
Rqmts 1.2 10 12
Design 1 20 20
Coding 0.5 30 15
Note that the QAFs imply a
requirements activity worse than
average and a coding activity
better than average
62
Linda M. Laird 2004
All Rights Reserved
QAF Quality Assessment Factor
The QAF is a factor which is the product
of 21 defect introduction drivers such
as analyst capability, programmer
capability, required reliability of the
system, etc.
63
Linda M. Laird 2004
All Rights Reserved
Defects Introduced
Nominal values, per KSLOC are:
DI(requirements) = 10;
DI (design) = 20
DI (coding) = 30
DI(total) = 60
E.G., for for every 1K lines of code, the model
predicts that, assuming a nominal situation there
would typically be 60 defects injected into the code,
10 of which were requirements defects, 20 were
coding, etc. etc.
Process Maturity had highest impact on defect
introductionwith everything else held constant, it
varies result by a factor of 2.5which says that if you
have a very good process, you significantly reduce
the number of defects introduced
64
Linda M. Laird 2004
All Rights Reserved
COQUALMO Defect Removal
Initial values determined by experts using the 2-
Delphi technique
Looked at three different removal techniques:
Automated Analysis, People Reviews, Execution
Testing and Tools
Rated %DRE for removing defects for 6 levels of
effectiveness of technique for each phase (rqmts,
design, coding)
Computed residual defects as
If all techniques Very Low Effectiveness = 60 defects
per KSLOC
If all techniques Extra High Effectiveness = 1.57 defects
per KSLOC
If all techniques Nominal = 14.3 defects per KSLOC
65
Linda M. Laird 2004
All Rights Reserved
Summary on COQUALMO model
Mathematical model which takes as input
Your view of your defect injection drivers
Your view of your defect removal drivers
Gives you a projection of # of defects
remaining in your system at any phase
Can be used to estimate impact of driver
changes on defect density
what if analysis
improvement investment analysis
Other Similar Models available
66
Linda M. Laird 2004
All Rights Reserved
RL-TR-92-52
Seems to be a primary reference and model
for both default density and fault density
Could not obtain a copy of report, I believe
very similar to CoQualmo
Key Fault Parameters for predicting defect
density are:
Application Type & Difficulty: 2 to 14
Development Organization: .5 to 2
Software Complexity: .8 to 1.5
Compliance with Design Rules: .75 to 1.5
Note: <1 takes out defects, >1 adds them
67
Linda M. Laird 2004
All Rights Reserved
Predicting Fault Density and Distribution
Typical Distribution of Faults
Defect Prediction Models
Dynamic
Rayleigh, Exponential, S-Curve Models
Static
Coqualmo Model
Based on Process
RL-TR-92-95
Industry Data
Such as SEI Delivered Fault Data
Local Models Historical Data
68
Linda M. Laird 2004
All Rights Reserved
SEI Defect Removal
Cumulative % of defects removed thru
acceptance test:
SEI Level 2: 25.5%
SEI Level 3: 41.5%
SEI Level 4: 62.3%
SEI Level 5: 87.3%
Diaz & King,
2002 (in Kan)
69
Linda M. Laird 2004
All Rights Reserved
Industry data
CMM Approach
Measure Average defects/
function points
Typical defect potential and delivered defects
for SEI CMM Level 1
5.0 potential
.75 delivered
Typical defect potential and delivered defects
for SEI CMM Level 2
4.0 potential
.44 delivered
Typical defect potential and delivered defects
for SEI CMM Level 3
3.0 potential
.27 delivered
Typical defect potential and delivered defects
for SEI CMM Level 4
2.0 potential
.14 delivered
Typical defect potential and delivered defects
for SEI CMM Level 5
1.0 potential
.05 delivered
Source:
Capers Jones, 1995
70
Linda M. Laird 2004
All Rights Reserved
Industry Data
Industry Approach
Measure Average defects/ function
points
Delivered defects per industry System Software - .4
Commercial Software - .5
Information Software - 1.2
Military Software - .3
Overall average - .65
Source:
Capers Jones, 1995
71
Linda M. Laird 2004
All Rights Reserved
Defect Data By Application Domain - Reifer
Application Domain Number
Proje
cts
Error Range
(Errors/
KESLOC)
Normative Error Rate Notes
(Errors/ KESLOC)
Automation 55 2 to 8 5 Factory automation
Banking 30 3 to 10 6 Loan processing, ATM
Command & Control 45 0.5 to 5 1 Command centers
Data Processing 35 2 to 14 8 DB-intensive systems
Environment/ Tools 75 5 to 12 8 CASE, compilers, etc.
Military -All 125 0.2 to 3 < 1.0 See subcategories
Airborne 40 0.2 to 1.3 0.5 Embedded sensors
Ground 52 0.5 to 4 0.8 Combat center
Missile 15 0.3 to 1.5 0.5 GNC system
Space 18 0.2 to 0.8 0.4 Attitude control system
Scientific 35 0.9 to 5 2 Seismic processing
Telecom 50 3 to 12 6 Digital switches
Test 35 3 to 15 7 Test equipment, devices
Trainers/ Simulations 25 2 to 11 6 Virtual reality simulator
Web Business 65 4 to 18 11 Client/server sites
Other 25 2 to 15 7 All others
72
Linda M. Laird 2004
All Rights Reserved
Domain Data Comments
Defect rates in military systems are much
smaller due to the safety requirements
Defect rates after delivery tend to be cyclical
with each version released. They initially are
high, and then stabilize around 1 to 2 defects
per KLOC in systems with longer lifecycles (>
5 years). Web Business systems tend to
have shorter lifecycles (<=2 years) and may
never hit the stabilization point.
73
Linda M. Laird 2004
All Rights Reserved
Local History
Simplest Defect Densities and Defect
Removal Efficiencies from other project
Remember from 533 what DRE is -- the %
of defects removed in each development
phase
74
Linda M. Laird 2004
All Rights Reserved
Prediction Model 2nd Step
Now at 2nd step -- predicting failure rate from defects
Fault
Profile
& Defect
Density
Initial Failure
Rate
Delivered
and
On-going
Failure Rate
75
Linda M. Laird 2004
All Rights Reserved
Predicting Failure Rate from Fault Density
Issues
Musa Model
Using Past Projects Data
76
Linda M. Laird 2004
All Rights Reserved
Issue is that
failures are a
function of
Faults
Environment
System Usage
& Mix
77
Linda M. Laird 2004
All Rights Reserved
Issue is that failures are a function of
Faults
Environment
System Usage & Mix
If you can make
these the same as your
Target environment
Then the
projection should
work out better
78
Linda M. Laird 2004
All Rights Reserved
Typically cant have those similar to
operational environment until
operational testingso before that time,
use empirical data..
79
Linda M. Laird 2004
All Rights Reserved
The Musa Prediction Method
For predicting failure rate given a fault
density -- developed for predicting
expected failure rate in system test
Caveat: This method seems like magic
to me. But is does have an empirical
basis..
80
Linda M. Laird 2004
All Rights Reserved
Musa Model Underlying Concepts
Each fault is embodied in machine instructions
There is a probability that the faulty machine
instructions will cause a failure
Therefore, if you know the number of faults
remaining, the number of machine
instructions for the program, the speed of the
machine, and the probability, you can predict
the arrival rate of failures.
81
Linda M. Laird 2004
All Rights Reserved
Musa Prediction Model I/O
Input: Fault Density, Size of Program,
Processor Speed and
A probability that a given faulty line of code
will cause a failure when it is
executede.g, a ratio of failures to faults
can either be from past history, or can
use default (4.2*10^-7)
Output: Expected Failure Rate
82
Linda M. Laird 2004
All Rights Reserved
Musa Model for Failure Rate
Let w = number of faults
Let I = number of object code instructions
Let r = process speed in instructions per sec
Let L = expected failure rate (e.g., lambda)
K=magic constant = 4.2*10^-7 --- the
probability that a given faulty line of code will
cause a failure when it is executed
Then L = r*K*w/I
83
Linda M. Laird 2004
All Rights Reserved
The Key is obviously K
And where does it come from?
If you have other similar programs/project,
generate it from those (K = L*I/(r*w))
Interestingly, Musas data across
multiple projects only has K slightly
varyingwith a range of 1*10^-7 to
7.5*10^-7
84
Linda M. Laird 2004
All Rights Reserved
Example: Musa Model
Let w = number of faults
Let I = number of object code instructions
Let r = process speed in instructions per sec
Let L = expected failure rate (e.g., lambda)
K=magic constant = 4.2*10^-7 failures per fault
Then L = r*K*w/I
Assume a 100 MIP machine; 5 defects
per KLOC, 100K Source Lines, C++
Then what is the expected failures per
execution second?
85
Linda M. Laird 2004
All Rights Reserved
Class Example: Musa Model
Let w = number of faults
Let I = number of object code instructions
Let r = process speed in instructions per sec
K=magic constant = 4.2*10^-7 failures per fault
Then L = r*K*w/I
Assume a 100 MIP machine; 5 defects per KLOC,
100K Source Lines, C++
Then w=5*100 = 500 total faults
I = 100K*6 (from table in Rome Notebook) = 600K
lines of object code
L=(100*10^6)*(4.2*10^-7)*500/6*10^5
=10^8*10^-7*10^2*4.2*5/10^5*6 = 10^-2*3.5= .035
=.035 failures per execution sec
= 2.1 failures per minute
So this says that the initial failure rate is estimated to be 2.1
failures per EXECUTION minute.
86
Linda M. Laird 2004
All Rights Reserved
Musa Model Summary
The theory behind Musas model is that
the faults are embedded in the code,
and that the probability of the faults
becoming failures is dependent upon
the fault density, and the frequency of
the code being executed.
87
Linda M. Laird 2004
All Rights Reserved
Kans Empirical data
For system platforms to have > 99.9+%
availability, the defect level has to be
<=.01 defects per KLOC in the field
88
Linda M. Laird 2004
All Rights Reserved
Projecting Reliability Summary
Prediction vs. Estimation
Model Overview
Predicting Defect Densities
Predicting Failures From Defect Densities
Estimating Reliability from Testing
Execution Time vs. Calendar Time
Estimating Failure Models
Reliability Growth
Reliability Estimation
Tools
89
Linda M. Laird 2004
All Rights Reserved
Homework
For the Rayleigh curve example, solve for t such that
95% of the defects have been found.
Play with Coqualmo so you can actually use it (on
website in tools)
Understand parameters (may need to look up COCOMO
model to understand them)
Read articles on website
Do project on website.