cs2003 05
cs2003 05
5 • 2003
Bogdan Stępień∗
SOFTWARE DEVELOPMENT
COST ESTIMATION METHODS
AND RESEARCH TRENDS
Early estimation of project size and completion time is essential for successful project plan-
ning and tracking. Multiple methods have been proposed to estimate software size and cost
parameters. Suitability of the estimation methods depends on many factors like software
application domain, product complexity, availability of historical data, team expertise etc.
Most common and widely used estimation techniques are described and analyzed. Current
research trends in software estimation cost are also presented.
Keywords: software project effort, size estimation, software cost
METODY ESTYMACJI
KOSZTÓW PRODUKCJI OPROGRAMOWANIA
Wczesna estymacja rozmiaru i czasu zakończenia projektu jest kluczowa dla efektywnego
planowania i śledzenia postępów pracy. W celu rozwiązania problemu estymacji rozmiaru
i kosztów produkcji oprogramowania opracowano wiele metod. Użyteczność różnych metod
estymacji zależna jest od wielu czynników, takich jak obszar zastosowania oprogramowania,
złożoność produktu, dostępność danych historycznych, doświadczenie zespołu itd. W ni-
niejszym artykule zostały przedstawione i przeanalizowane najczęściej stosowane techniki
estymacji, jak również najnowsze kierunki badań.
Słowa kluczowe: koszt produkcji oprogramowania, estymacja rozmiaru
1. Introduction
The emphasis on software cost estimation has been increasing gradually over last three
decades. Today, it is especially strong and visible since it provides the link between
the general concepts of economic analysis and the world of software engineering. The
software cost estimation technique is also an essential part of the foundation for the
good software management.
There are several approaches used by models to estimate software development
cost (effort to produce the software). Some are based on analogy, some on theory,
and others on statistics, but all of them consider size of the software product as the
67
68 Bogdan Stępień
most influential factor in predicting effort. Other factors that also affect effort are for
example product complexity, the experience of the development team, development
tool support, project coordination complexity, maturity of the technology in which
the software product is to be produced.
The aim of this article is to provide an overview of some software size and cost
estimation techniques and to define their strengths and weaknesses. At the end of
the article, some of the areas of current and future research in software estimation
techniques will also be described.
Table 1
Strengths and weaknesses of the LOC method
Strengths Weaknesses
Simple metric and directly related to the Indication of construction methods used
size
Suitable for estimation using the Wideband LOC cannot be estimated reliably in the
Delphi process early phases of the development cycle
unless data are available from similar, com-
pleted projects
Counting must always follow the same ru-
les defining the LOC measure and must be
independent of factors like coding style
Lines of Code (LOC) counting is the simplest way to estimate the size of a softwa-
re product. It provides simple and well understood metric. Usually LOC estimation
is performed by software developers experienced in similar projects and the method
is suitable for the Wideband Delphi process (this process will be described in section
3.2.1). The experts analyze the work packages and based on their own experience,
they derive the LOC estimate needed to fulfill the requirements of each work package.
Defining a line of code is difficult (see Tab. 1) due to conceptual differences invo-
lved in accounting for executable statements and data declarations in different pro-
Software Development Cost Estimation Methods and Research Trends 69
gramming languages. The goal is to measure the amount of intellectual work put into
program development, but difficulties arise when trying to define consistent measures
across different languages. To minimize these problems, the Software Engineering In-
stitute (SEI) definition checklist for a logical source statement is used in defining the
line of code measure. Pragmatically, there seems to be no real reason to choose one
definition over another, so long as the same definition is used consistently.
• External Input (Inputs) – Count each unique user data or user control input
type that (i) enters the external boundary of the software system being measured
and (ii) adds or changes data in a logical internal file.
• External Output (Outputs) – Count each unique user data or control output
type that leaves the external boundary of the software system being measured.
• Internal Logical File (Files) – Count each major logical group of user data or
control information in the software system as a logical internal file type. Include
each logical file (e.g., each logical group of data) that is generated, used, or
maintained by the software system.
• External Interface Files (Interfaces) – Files passed or shared between so-
ftware systems should be counted as external interface file types within each
system.
• External Inquiry (Queries) Count each unique input-output combination,
where an input causes and generates an immediate output, as an external inquiry
type.
Each instance of these function types is then classified by complexity level. The
complexity levels determine a set of weights, which are applied to their corresponding
function counts to determine the Unadjusted Function Points quantity (see Fig. 1).
This is the Function Point sizing metric used as input by COCOMO II estimation
model (described in section 3.4.1).
70 Bogdan Stępień
For EO and EQ
Record Data Elements
Elements 1–5 6–19 20+
0–1 low low avg
2–3 low avg high
4+ avg high high
For EI
Record Data Elements
Elements 1–4 5–15 16+
0–1 low low avg
2–3 low avg high
4+ avg high high
Step 3. Apply complexity weights. Weight the number in each cell using the
following scheme. The weights reflect the relative value of the function
to the user.
Complexity-Weight
Function Type Low Average High
Internal Logical Files 7 10 15
External Interfaces Files 5 7 10
External Inputs 3 4 6
External Outputs 4 5 7
External Inquiries 3 4 6
Step 4. Compute Unadjusted Function Points. Add all the weighted functions
counts to get one number, the Unadjusted Function Points.
The standard Function Point procedure involves assessing the degree of influence
(DI) of fourteen application characteristics on the software project determined accor-
ding to a rating scale from 0.0 to 0.05 for each characteristic. The sum of 14 ratings
is added to a base level of 0.65 to produce a general characteristics adjustment factor
that ranges from 0.65 to 1.35. Each of these fourteen characteristics, such as distribu-
ted functions, performance, and reusability, thus has a maximum of 5% contribution
to estimated effort.
Table 2
Strengths and weaknesses of the function points
Strengths Weaknesses
Not dependent on the construction method Not a technology independent measure
of size
Can be used early in the project life cycle The calculation of function points is com-
plex and tends to take a black box view
of the system
A normalized size Suitability varies for different classes
of software systems
Albrecht’s original work [1] has grown and mutated over the years – function-
point counting has now its own standards group, the International Function Point
Users Group. IFPUG has classes on function-point counting and reference manuals
with all of the rules. A function-point counting spreadsheet and other resources are
available from the [13]. Strengths and weaknesses of the method are summarized in
Table 2.
The Use-case Point counting procedure starts with determining for each actor,
whether it’s simple, average, or complex. You count how many of each kind you have
and multiply each by its weighing factor. After adding these products we get the total
unadjusted actor weights (UAW ). Then, for each use case, you determine whether it’s
simple (three or fewer transactions), average (four to seven transactions), or complex
(eight or more transactions) by counting its transactions, including secondary scena-
rios. Each use-case type is multiplied by the weighting factor and after adding these
products we get the unadjusted use-case weights (UUCW ). The sum of the UAW and
the UUCW gives the unadjusted use-case points (UUCP ): UAW + UUCW = UUCP
The Use-case Points method employs a technical and environmental factors mul-
tiplier that attempts to quantify areas such as ease of use and programmer motivation.
Those factors, when multiplied by the unadjusted use-case points, produce the adju-
sted use-case points, an estimate of the size of the software.
To estimate effort, Karner proposed a factor of 20 staff hours per use-case point,
although many other factors can affect such a rate, including time pressure, uniqueness
of the architectural solution and programming language.
• NOP : New Object Points (Object Point count adjusted for reuse);
• srvr : number of server (mainframe or equivalent) data tables used in conjunction
with the SCREEN or REPORT;
• clnt: number of client (personal workstation) data tables used in conjunction
with the SCREEN or REPORT;
• %reuse: the percentage of screens, reports, and 3GL modules reused from pre-
vious applications, pro-rated by degree of reuse.
Software Development Cost Estimation Methods and Research Trends 73
For Screens
# and source of data tables
Number of
Total <4 Total <8 Total 8+
Views contained
(<2 srvr <3 clnt) (2/3 srvr 3-5 clnt) (>3 srvr >5 clnt)
<3 simple simple medium
3–7 simple medium difficult
>8 medium difficult difficult
For Reports
# and source of data tables
Number of
Total <4 Total <8 Total 8+
Views contained
(<2 srvr <3 clnt) (2/3 srvr 3–5 clnt) (>3 srvr >5 clnt)
0 or 1 simple simple medium
2 or 3 simple medium difficult
4+ medium difficult difficult
Step 3. Weigh the number in each cell using the following scheme. The weights
reflect the relative effort required to implement an instance of that
complexity level:
Complexity-Weight
Object Type
Simple Medium Difficult
Screen 1 2 3
Reports 2 5 8
3GL Compoment 10
Step 4. Determine Object-Points: add all the weighted object instances to get
one number, the Object-Point count.
Step 5. Estimate percentage of reuse you expect to be achieved in this project.
Compute the New Object Points to be developed, NOP = (Object-
Points) (100-%reuse)/ 100.
Step 6. Determine a productivity rate, PROD = NOP / person-month, from
the following scheme:
Putnam used productivity to link the basic Rayleigh manpower distribution mo-
del to software size and technology factors. Productivity has been defined as the size
of the software product, S, divided by the development effort, E:
S
P = (2)
E
To find E in the Rayleigh model, Putnam made the assumption that the peak staffing
level (top of the curve) corresponded to the development time. With this assumption,
the area under the curve represented development effort, E. E was found to be ap-
proximately 40% of K, the total life-cycle effort which is the total area under the
curve.
Putnam observed from project data that the more productive projects had an
initial slower staff buildup and the less productive projects had an initial faster staff
buildup. He associated the initial staff buildup of a project with the difficulty of the
Software Development Cost Estimation Methods and Research Trends 75
project, D. The difficulty is represented on the Rayleigh curve as the slope of the
curve at time t = 0. By taking the derivative of Rayleigh equation and setting t = 0,
difficulty is defined as:
K
D= 2 (3)
td
Putnam links the Rayleigh manpower distribution and software development effort.
He assumes that there must be a relation between difficulty, D, and productivity, P
and he finds this relationship to be:
2
P = αD− 3 (4)
By combining the equations (2), (3), (4) and the assumption that E = 0.4K, we get
the cube root of total life-cycle effort K:
− 23
S K
=α 2 (5)
0.4K td
1 4
S = 0.4αK 3 td3 (6)
1 S
K3 = 4 (7)
0.4αtd3
Equation (8) introduces a technology factor, C, which is the product of 0.4 and
α. The technology factor accounts for differences among projects such as hardware
constraints, personnel experience, and programming environment. Putnam suggests
using 20 different values for C ranging from 610 to 57, 314.
S3 1
K= (8)
C 3 t4d
Development effort, E, is found by substituting E = 0.4K:
S3 1
E = 0.4 (9)
C 3 t4d
Some Rayleigh curve assumptions do not always hold in practice (e.g. flat staffing cu-
rves for incremental development; less than t4 effort savings for long schedule stretch-
outs). Putnam has developed several model adjustments for these situations. It can
be seen from Equation (9) that the effort E increases as the third power of the size
S if the schedule remains constant. For a fixed program size, the effort E increases
with the inverse of the fourth power of td . The optimum development schedule can
be calculated from Equation (10) and it agrees with most statistical models used in
practice today.
1
td = 2.4E 3 (10)
Strengths and weaknesses of the theoretical models in general are summarized in
Table 3.
76 Bogdan Stępień
Table 3
Strengths and weaknesses of the theoretical models
Strengths Weaknesses
Objective, repeatable, analyzable formula Subjective inputs
Efficient, good sensitivity analysis Assessment of exceptional circumstances
Objectively calibrated to experience Calibrated to past, not future
Table 4
Strengths and weaknesses of the expertise-based techniques
Strengths Weaknesses
Easily incorporates knowledge of differen- However at the same time the output is no
ces between past project experiences better than the experts
Handles assessment of the exceptional cir- Estimates can be biased due to incomplete
cumstances, interactions and is representa- experience or human nature
tive
Subjective estimates that may not be ana-
lyzable
by a central coordinator and sent back to the respondents in a synthesized form. Then
the process is repeated. The aim of each iteration is to gradually produce a consensus
amongst the group, or alternatively for responses to become stable, since there is no
guarantee that a consensus will result and a range of opinions or responses may be
produced instead of a single answer.
Farquhar performed an experiment at Rand Corporation in 1970 where he gave
4 groups the same software specification and asked the groups to estimate the ef-
fort needed to develop the product [9]. Two groups used the Delphi technique and
two groups had meetings. The groups that had meetings came up with an extremely
accurate estimate as compared to the groups that used the Delphi technique. To im-
prove the estimate consensus obtained by the Delphi technique, Boehm and Farquhar
formulated an alternative method, the wideband Delphi technique [4].
The wideband Delphi approach can be described with following steps:
1. Coordinator provides Delphi instrument to each of the participants to review.
2. Coordinator conducts a group meeting to discuss related issues.
3. Participants complete the Delphi forms anonymously and return it to the Coor-
dinator.
4. Coordinator feeds back results of participants’ responses.
5. Coordinator conducts another group meeting to discuss variances in the partici-
pants’ responses to achieve a possible consensus.
6. Coordinator asks participants for re-estimates, again anonymously, and steps 4–6
are repeated for as many times as appropriate.
This technique of software estimating involves breaking down the product to be deve-
loped into smaller and smaller components until the components can be independently
estimated. The estimation can be based on analogy from an existing database of com-
pleted components, or can be estimated by experts, or by using the Delphi technique
described above. Once all the components have been estimated, a project-level esti-
mate can be derived by rolling-up the estimates.
As discussed in [4], a software Work Breakdown Structure (Fig. 4, 5) consists of
two hierarchies, one representing the software product itself, and the other represen-
ting the activities needed to build that product. The product hierarchy describes the
fundamental structure of the software, showing how the various software components
fit into the overall system. The activity hierarchy indicates the activities that may be
associated with a given software component.
Development
Activities
Software
Application
Subcomponent Subcomponent
B1 B2
Data Inputs
Estimation Algorithms
Project Size
Model Output
Complexity
Effort
Languages
Estimate
Skill Levels
Actuals
Training Algorithm
Table 5
Strengths and weaknesses of the neural networks
Strengths Weaknesses
Accuracy compares favorably with other Requires large training sets in order to give
methods good predictions
The method is objective and repeatable Accuracy is sensitive to decisions regarding
the net topology
Can be applied when only partial informa- Little explanation value – such models do
tion about project is available not help us understand needed software ef-
fort
Neural networks operate as “black boxes” and do not provide any information
or reasoning about how the outputs are derived. And since software data is not well-
behaved it is hard to know whether the well known relationships between parameters
are satisfied with the neural network or not. For example, both theory and other data
sources agree that if you’re developing a software product for future reuse, more effort
is required to make the components less dependent on other components.
Table 6
Strengths and weaknesses of the analogy estimation
Strengths Weaknesses
Based on representative experience Historical data and experience may be not
representative
High accuracy in case of very similar pro-
jects
Software Development Cost Estimation Methods and Research Trends 81
3.4.1. COCOMO II
COCOMO II effort estimation model is based on regression. It consists of three sub-
models, each one aiming to offer increased fidelity the further along one is in the
project planning and design process (Tab. 7).
Table 7
Strengths and weaknesses of the COCOMO II estimation
Strengths Weaknesses
Objective and not influenced by politics Size dependent estimation method
Repeatable, versatile and initially calibra- Needs to be calibrated to achieve better
ted predicability
The original COCOMO (Constructive Cost Model) model was first published
in [4], and reflected the software development practices of the day. In the last two
decades, software development techniques changed dramatically, for example the so-
ftware components became reusable, and new systems can be built using common
off-the-shelf software. That is why the authors formulated a new version of the model
called COCOMO II, which provides the following three sub-models for estimation of
software projects cost:
1. Application Composition model involves prototyping efforts to resolve poten-
tial high-risk issues such as user interfaces, software/system interaction, perfor-
mance, or technology maturity. It uses object points for sizing.
2. Early Design model involves exploration of alternative software/system archi-
tectures and concepts of operation. It involves use of function points for software
product sizing and a small number of additional cost drivers.
82 Bogdan Stępień
where A is a multiplier that scales the effort according to the specific project condi-
tions, Size is the estimated size of a project in Kilo Source Lines Of Code (KSLOC)
or Unadjusted Function Points (UFP), E is an exponential factor that accounts for
the relative economies or diseconomies of scale encountered as a software project in-
creases its size, and EMi are the effort multipliers. The coefficient E (scale exponent)
is determined by weighing the predefined scale factors SFi and summing them via
following formula: X
E = 0.91 + 0.01 SFi (12)
i
Five scale factors has been defined – precedentedness, development flexibility,
architecture/risk resolution, team cohesion and process maturity. The number of the
effort multipliers depends on the model and varies from 7 in case of Early Design
model to 17 in Post Architecture model. The example effort multipliers are: reliability,
complexity, reuse, experience, schedule acceleration and others.
The development time T DEV is derived from the effort according to the following
formula:
F
TDEV = C × (Effort) (13)
Latest calibration of the method shows that the multiplier C is equal to 3.67 and
the coefficient F is determined is a similar way as the scale exponent:
X
F = 0.28 + 0.002 SFi (14)
i
When all the factors and multipliers are taken with their nominal values, the
equations for effort and schedule are as follows:
1.1
Effort = 2.94 × (Size) (15)
3.18
TDEV = 3.67 × (Effort) (16)
This methods can handle correctly the imprecision and the uncertainty when
describing software project. Fuzzy Analogy is also applicable when the variables are
numeric (no uncertainty). First software prototypes and emprical validation of the
approach were just started.
4.4. Automation
The automation of the estimation process reduces the measurement costs and speeds
the process. There are two main areas of research in the automation of the software
functional size measurement process. The first one covers methods based on the source
code analysis (retro-engineering). An example framework for automating Function
Points counting from source code can be found in [20].
The other one includes methods based on specifications and case-tools. The func-
tional size measure can be automatically generated from designs in UML (see section
4.1) once mapping between the UML elements and the estimation method rules is
defined. Formalization of the IFPUG defnition of function points using the formal
specifcation language B was proposed in [10]. The goals of the formalization were to
provide an objective defnition of function points (which should reduce variance due
to interpretation) and to automate function point counts for B specifcations.
Table 8
Characteristics of the size estimation methods
Size estimation method name Complexity of Construction Suitable for
the metric and method early project
method independent phases
Lines Of Code Low No No
Function Points High Yes Yes
Use-case Points Medium Yes Yes
Object Points Medium Yes No
Although new techniques based on rule systems, agents or neural networks were
developed, they are not widely used in the real-life projects. They have not gained po-
pularity with the software engineering community either because of limited applicabi-
lity or poor results and their black-box approach to estimation. Chosen characteristics
of widely used effort estimation methods are presented in Table 9.
Table 9
Characteristics of the effort estimation methods
Effort estimation Repeatable Objective Historical data
method name used
Putnam Model Yes Yes No
Wide Band Delphi No No Yes
COCOMO II Yes Yes Only by
recalibration
Analogy Yes No Yes
The main conclusion we can draw from this article is that the key to arriving at
solid estimates is to use a variety of methods and tools and then to investigate the
reasons why the estimates obtained using one method might differ significantly from
those provided by another. Also during a project, the estimates shall be revised often
to help to keep a software project on track.
References
[1] Albrecht A. J.: Measuring Application Development Productivity. Proceedings of
the Joint SHARE, GUIDE, and IBM Application Development Symposium, Oct.
14–17, 1979
[2] Albrecht A. J., Gaftney J. E.: Software Function, Source Lines of Code, and De-
velopment Effort Prediction: A Software Science Validation. IEEE Transactions
on Software Engineering, vol. 9, No. 2, November 1983
[3] Banker R., Kauffman R., Kumar R.: An Empirical Test of Object-Based Out-
put Measurement Metrics in a Computer Aided Software Engineering (CASE)
Environment. Journal of Management Information Systems, 1994
86 Bogdan Stępień
[4] Boehm B. W.: Software Engineering Economics. Englewood Cliffs, New Jersey,
Prentice-Hall 1981
[5] Boehm B. W., Clark B., Horowitz E., Westland C.: Cost Models for Future So-
ftware Life Cycle Processes: COCOMO 2.0. Annals of Software Engineering Spe-
cial Volume on Software Process and Product Measurement, Arthur J. D., Henry
S. M. (Eds.), Amsterdam, The Netherlands, J.C. Baltzer AG, Science Publishers
1995
[6] Bévo V., Lévesque G., Abran A.: Application of FFP method from a specification
with UML notation: First test and questions raised. International Workshop on
Software Measurement 1999
[7] Conte S., Dunsmore H., Shen V.: Software Engineering Metrics and Models.
Benjamin/Cummings, Menlo Park, Ca. 1986
[8] Common Software Measurement International Consortium, COSMIC-FPP Mea-
surement Manual, version 2.1, 2001
[9] Farquhar J. A.: A Preliminary Inquiry Into the Software Estimation Process.
RM-6271-PR, The Rand Corporation, 1970
[10] Diab H., Frappier M., St-Denis R.: Counting Function Points From B Specifica-
tions. International Workshop on Software Measurement, 1999
[11] Gray A. R., MacDonnell S. G.: A Comparison of Techniques for Developing Pre-
dictive Models for Software Metrics. Information and Software Technology 39,
1997
[12] Idri A., Abran A., Khoshgoftaar T. M.: Fuzzy Analogy: A New Approach for Soft-
ware Cost Estimation. International Workshop on Software Measurement, 2001
[13] International Function Point Users Group, http://www.ifpug.org
[14] Kitchenham B.: Software Development Cost Models. [in:] R. Rook (Ed.),
Software Reliability Handbook, London, U.K., Elsevier 1990
[15] Madachy B.: Heuristic Risk Assessment Using Cost Factors. IEEE Software,
May/June 1997
[16] Pierre D., Maya M., Abran A., Desharnais J.: Adapting Function Points to Real
Time Software. IFPUG Conference, Fall 1997
[17] Putnam L.H.: A General Empirical Solution to the Macro Software Sizing and
Estimating Problem. IEEE Transactions on Software Engineering, July 1978,
345–361
[18] Stutzke R. D.: Using UML Elements To Estimate Feature Points. International
Workshop on Software Measurement, 1999
[19] Wittig G. E., Finnie G. R.: Using Artificial Neural Networks and Function
Points to Estimate 4GL Software Development Effort. Australian Journal of
Information Systems, 1994
[20] Ho V. T., Abran A.: A Framework for Automating Function Points Counting
from Source Code. International Workshop on Software Measurement, 1999