Itec513 Fall20172018 Estimation
Itec513 Fall20172018 Estimation
26/12/2016 1
Project Management and Mr.
Murphy
1. Logic is a systematic method of coming to the
wrong conclusion with confidence.
2. Technology is dominated by those who manage
what they do not understand.
3. Nothing ever gets built on schedule or within
budget.
4. If mathematically you end up with the incorrect
answer, try multiplying by the page number.
26/12/2016 2
26/12/2016 3
Motivation
The software cost estimation provides:
• The vital link between the general concepts and
techniques of economic analysis and the
particular world of software engineering.
• Software cost estimation techniques also provides
an essential part of the foundation for good
software management.
26/12/2016 4
Planning Prerequisites
• The planning process requires the following
inputs:
– Required human effort (man-months)
– Project duration (months)
– Project costs ($)
• We would like our estimates to be perfectly
precise and accurate
– But this requirement is impossible until the project is
over
5
Cost of a project
• The cost in a project is due to the requirements for
software, hardware and human resources
• The cost of software development is due to the human
resources needed
• Most cost estimates are measured in person-months
(PM)
• At any point, the accuracy of the estimate will depend
on the amount of reliable information we have about
the final product.
• The cost of the project depends on the nature and
characteristics of the project
26/12/2016 6
Software Cost Estimation
26/12/2016 7
Importance of Estimates
• In the early days of computing,
– Software costs were a small part of the total system
cost
• Even large errors (order of magnitude) = little
impact on the total system cost
– Today, software costs are the largest component of total
system cost
• Large errors in estimating cost equate to
– The difference between profit and loss or
– Survival and demise
8
Reasons for Inaccuracy in
Estimates
10
Expert Judgment
• Several experts in the application domain
independently prepare estimates
• Estimates are compared (together with the
rationale for the estimate)
• Differences are resolved by discussion
• It is not
– Estimation by committee
– An averaging of the independent estimates
11
Estimation by Analogy
• Cost of a new project is estimated by analogy to
similar systems previously developed
– Identify differences and estimate cost of these
differences
• How to handle no previously developed similar
systems?
• What about changes in development environment?
– How to handle employee turnover?
– How to handle new language, case tools, …
12
Pricing to Win
• The cost is what you believe the customer is
willing to spend
• What circumstances would lead you to price a
project this way?
13
Parkinson Pricing
• Parkinson’s Law
– Parkinson’s Law – The work expands to fill the time
available
• Cost is determined by available resources rather
than by objective analysis
• Example
– If the software is needed in 1 year and you have 5
developers available to work on the project, the effort is
60 man-months
14
Importance of Deviation
• It is important to identify changes from previous
projects, especially when employing
– Expert Judgment or
– Estimation by Analogy
• Failure to identify change and account for its
influence
– Distorts the estimate
• Perhaps to the point that the estimate is of little value
15
Importance of Deviation (cont)
16
Algorithmic Models
• A formula (or set of formulae) is evaluated to
provide an estimate
• Size or functionality metrics are the independent
variables
• Constants in the formula are based upon historic
cost data
17
Algorithmic Cost Modeling
• The most systematic approach to cost modeling
– The most precise method, but
– Don’t confuse with the most accurate
• A formula or set of formulae is used to predict
cost based on project size, and sometimes other
project factors
• Most algorithmic cost models have an exponential
component
– Realizing that cost does not scale linearly with size
18
19
Algorithmic Modeling (cont)
• The simplest model is a static single-variable model
21
Productivity
• Productivity equation
– (DSI) / (PM)
• where PM = number of person-month (=152
working hours),
26/12/2016 23
Schedule
• Schedule equation
– TDEV = C * (PM)n (months)
• where TDEV = number of months estimated for
software development.
26/12/2016 24
Average Staffing
• Average Staffing Equation
– (PM) / (TDEV) (FSP)
• where FSP means Full-time-equivalent Software
Personnel.
26/12/2016 25
Cost Estimation Process
Cost=SizeOfTheProject x Productivity
26/12/2016 26
Cost Estimation Process
Effort
Size Table
Development Time
Lines of Code
Estimation Process
Number of Use Case Number of Personnel
26/12/2016 27
Project Size - Metrics
1. Number of functional requirements
2. Cumulative number of functional and non-functional requirements
3. Number of Customer Test Cases
4. Number of ‘typical sized’ use cases
5. Number of inquiries
6. Number of files accessed (external, internal, master)
7. Total number of components (subsystems, modules, procedures,
routines, classes, methods)
8. Total number of interfaces
9. Number of System Integration Test Cases
10. Number of input and output parameters (summed over each interface)
11. Number of Designer Unit Test Cases
12. Number of decisions (if, case statements) summed over each routine or
method
13. Lines of Code, summed over each routine or method
26/12/2016 28
Project Size – Metrics(.)
Availability of Size Estimation Metrics:
f Implementation 12, 13
26/12/2016 29
LOC Metric
• There are two different ways of implementing
LOC
– Lines of Code (LOC or KLOC)
• Count all lines
– Thousand of delivered source instructions (KDSI)
• Count of the physical source statements, includes:
– Format statements
– Data declarations
• Excludes
– Comments
– Unmodified utilities
30
Problems associated with lines
of code as a metric
1. Lack of Accountability:
2. Lack of Cohesion with Functionality:
3. Adverse Impact on Estimation:
4. Developer’s Experience:
5. Difference in Languages:
6. Advent of GUI Tools:
7. Problems with Multiple Languages:
8. Lack of Counting Standards:
31
9. Psychology:
• Lack of Accountability:
– Not useful to measure the productivity of a project using only results from the coding phase, which
usually accounts for only 30% to 35% of the overall effort
• Lack of Cohesion with Functionality:
– Effort may be highly correlated with LOC, but functionality is not so much!
– skilled developers may be able to develop the same functionality with far less code,
– developer who develops only a few lines may still be more productive than a developer creating
more lines of code
• Adverse Impact on Estimation:
– Because of point 1 estimates based on lines of code can adversely go wrong
• Developer’s Experience:
– Implementation of a specific logic differs based on the level of experience of the developer.
Hence, number of lines of code differs from person to person.
– An experienced developer may implement certain functionality in fewer lines of code than another
developer of relatively less experience does, though they use the same language.
• Difference in Languages:
– Consider two applications that provide the same functionality (screens, reports, databases). One of
the applications is written in C++ and the other application written in a language like COBOL. The
number of function points would be exactly the same, but aspects of the application would be
different. The lines of code needed to develop the application would certainly not be the same. As a
consequence, the amount of effort required to develop the application would be different
32
• Advent of GUI Tools:
– GUI-based programming languages and tools such as Visual Basic, allow programmers to write
relatively little code and achieve high levels of functionality.
– a user with a GUI tool can drag-and-drop and other mouse operations to place components on a
workspace.
• Problems with Multiple Languages:
– software is often developed in more than one language depending on the complexity and
requirements.
– Tracking and reporting of productivity and defect rates poses a serious problem in this case since
defects cannot be attributed to a particular language subsequent to integration of the system.
• Lack of Counting Standards:
– There is no standard definition of what a line of code is. Do comments count? Are data
declarations included? What happens if a statement extends over several lines?
– Organizations like SEI and IEEE have published some guidelines in an attempt to standardize
counting, it is difficult to put these into practice since new languages being introduced every
year.
• Psychology:
– A programmer whose productivity is being measured in lines of code will have an incentive to
write unnecessarily verbose code.
– This is undesirable since increased complexity can lead to increased cost of maintenance33
and
increased effort required for bug fixing.
FFP
• Proposed by van der Poel and Schach
– Medium Size Projects ( 1- 10 man years)
– Identify and score 3 basic structural elements
• Files, Flows, and Processes
34
Structural elements
– Files
• Permanent files only
– Do not count temporary or transaction files
– Flows
• Interfaces between the product and the environment
– Input / Output Screens
– Reports
– Processes
• Functionally coherent manipulations of data
– Sorting
– Validating
– Transforming
35
FFP (cont)
• Size
– The size is the sum of the Files, Flows
and Processes
Size Files Flows Processes
• Cost
– The product of Size and a constant d
• Constant varies from organization to organization
• Based on historic cost and size data
Cost d Size
36
FFP (cont)
• Note:
– This metric is based upon the functionality of the
application
• High level property of the system
• Can be more accurate earlier in the life-cycle than LOC metrics
37
Class Exercise
• An application maintains 8 files: a sorted master
data file, 3 index files, 1 transaction file and 3
temporary files. It has 3 data input screens, 3
display screens, generates 4 printed reports, and 6
error message boxes. The processing includes
sorting the master file, updating transactions,
calculating report data from master file data.
Assume a value of 800 for d .
Determine the Size and Cost using FFP.
38
FFP Summary
• Advantages
– A simple algorithmic model
• Based on easy-to-count characteristics of a high level design
• Disadvantages
– All items are equally weighted
– Requires historic data based upon a particular
organization
– Has not been extended to correctly count databases
– Something unsettling about adding unlike quantities
39
Function Points
• A similar approach taken by Albrecht
• Based on 5 functionality characteristics
– Input items, output items, inquiries, master files,
and interfaces
• First calculate the number of unadjusted
function points
UFP C1 Inp C2 Out C3 Inq C4 Maf C5 Inf
40
Function Points
Measure size in terms of the amount of functionality in a system.
Function points are computed by first calculating an unadjusted
function point count (UFC). Counts are made for the following
categories
External inputs – those items provided by the user that describe
distinct application-oriented data (such as file names and menu
selections)
External outputs – those items provided to the user that generate
distinct application-oriented data (such as reports and messages, rather
than the individual components of these)
External inquiries – interactive inputs requiring a response
External files – machine-readable interfaces to other systems
Internal files – logical master files in the system
26/12/2016 41
Function Points (cont)
• The constantsC1 ... 5 are determined from the
following table
C1 ... 5
42
Function Points (cont)
• The next step is to calculate a technical
complexity factor
• Each of 14 technical factors is assigned a value
from 0 to 5
– 0 - Not present or no influence
– 5 - Strong influence throughout
• The degree of influence DI obtained by summing
the above values
43
Function Points
1 Data(cont)
communication
2 Distributed data processing
• The 14 technical 3 Performance criteria
factors are: 4 Heavily utilized hardware
5 Online data entry
6 End-user efficiency
7 Transaction Rate
8 Online updating
9 Complex computations
10 Reusability
11 Ease of installation
12 Ease of operation
13 Maintainability
14 Multiple Sites 44
Function Points (cont)
• Calculate the technical complexity
factorTCF
TCF 0.65 0.01* DI
45
Function Point Calculations
• You may find the following template useful
46
Class Exercise
• An application has 5 simple inputs, 4 complex
inputs, 30 average outputs, 5 simple queries, 10
average master files and 8 complex interfaces. The
degree of influence is 50. Calculate the number of
unadjusted function points and the number of
function points.
47
Class Exercise
26/12/2016 48
Solution
26/12/2016 49
Solution
• Technical Complexity Factors:
– 1. Data Communication 3
– 2. Distributed Data Processing 0
– 3. Performance Criteria 4
– 4. Heavily Utilized Hardware 0
– 5. High Transaction Rates 3
– 6. Online Data Entry 3
– 7. Online Updating 3
– 8. End-user Efficiency 3
– 9. Complex Computations 0
– 10. Reusability 3
– 11. Ease of Installation 3
– 12. Ease of Operation 5
– 13. Portability 3
– 14. Maintainability 3
» DI =30 (Degree of Influence)
26/12/2016 50
Solution
• Function Points
– FP=UFP*(0.65+0.01*DI)= 55*(0.65+0.01*30)=52.25
26/12/2016 51
Relation between LOC and FP
• Relationship:
– where
• LOC (Lines of Code)
• FP (Function Points)
26/12/2016 52
Relation between LOC and
FP(.)
Assuming LOC’s per FP for:
Java = 53,
C++ = 64
26/12/2016 53
Simple Object-Oriented Estimation
54
The Four Steps
1. Determine the number of problem domain classes
in the application
2. Determine the interface and the associated weight
3. Calculate the number of total classes by
multiplying the number of problem domain classes
by the interface weight and add it to the number of
problem domain classes
4. Calculate the number of man-days by multiplying
the total number of classes by a productivity
constant in the range of 15 - 20
55
Interface Weights
56
Class Exercise
• An object-oriented application has an estimated
50 problem domain cases and a graphical user
interface. Assuming a productivity constant of 18,
calculate the number of man-days that will needed
to develop the application.
57
COCOMO
• COCOMO is a static single variable model
• COCOMO is an acronym for Constructive Cost
Model that was developed by Barry Boehm
• The COCOMO models are defined for three
classes of software projects.
(1) organic mode
(2) semi-detached mode
(3) embedded mode
58
Introduction to COCOMO
models
• The COstructive COst Model (COCOMO) is the
most widely used software estimation model.
• The COCOMO model predicts the effort and
duration of a project based on inputs relating to
the size of the resulting systems and a number of
"cost drives" that affect productivity.
26/12/2016 59
COCOMO
• COCOMO is actually a hierarchy of models of the
following form
– Basic COCOMO – estimates software development effort
and cost as a function of program size in lines of code
– Intermediate COCOMO - estimates software development
and cost as a function of program size in lines of code and a
set of “cost drivers”
– Advanced COCOMO – incorporates the characteristics of
intermediate COCOMO with an assessment of the cost driver
impact on each phase of the software development cycle
• The more complex models account for more factors that
influence software projects, and make more accurate estimates.
• The most important factors contributing to a project's duration
and cost is the Development Mode
60
Project Types/Levels of COCOMO
The level of difficulty was broken into three modes
•Organic Mode
– Constraints on development are mild
– Many similar projects previously developed by the organization
– Relatively small, simple software projects in which small teams with good
application experience work to a set of less than rigid requirements (e.g., a thermal
analysis program developed for a heat transfer group)
•Semi-detached Mode
– More constraints on development, but some flexibility remains
– Few similar projects previously developed by the organization
– An intermediate (in size and complexity) software project in which teams with
mixed experience levels must meet a mix of rigid and less than rigid requirements
(e.g., a transaction processing system with fixed requirements for terminal
hardware and data base software)
•Embedded Mode
– Very tight constraints
– No similar projects previously developed
– A software project that must be developed within a set of tight hardware, software
and operational constraints (e.g., flight control software for aircraft).
61
Modes
Feature Organic Semidetached Embedded
26/12/2016 62
Modes (.)
Feature Organic Semidetached Embedded
26/12/2016 63
Effort Computation
• The Basic COCOMO model computes effort as a
function of program size. The Basic COCOMO equation
is:
– Effort = aKLOC^b
• Effort for three modes of Basic COCOMO.
Mode a b
26/12/2016 65
Example
66
COCOMO (cont)
• The intermediate model calibrated on
– 40 software development projects
• Further work revealed certain difficulty factors
that dramatically influenced the effort estimates
and schedule
67
Intermediate COCOMO
Calculations
• Intermediate COCOMO calculations proceed by
– First, determine the mode (organic, semi-detached,
embedded)
– This determines the constants A - D
68
Intermediate COCOMO (cont)
– Second, using the appropriate constants from the
previous table, calculate the nominal effort and
schedule from
Enominal A * ( KDSI ) B (Nominal Effort)
– Third, calculate a difficulty multiplier that depends upon
the cost drivers (include subjective assessments of
attributes in the general areas of )
• Product
• Hardware
• Personnel
• Project
69
Intermediate COCOMO (cont)
70
Effort Computation
• The intermediate COCOMO model computes effort as a
function of program size and a set of cost drivers. The
Intermediate COCOMO equation is:
– E = aKLOC^b*EAF
• Effort for three modes of intermediate COCOMO.
Mode a b
Each of the 15 attributes receives a rating on a six-point scale that ranges from
"very low" to "extra high" (in importance or value).
The product of all effort multipliers results in an effort adjustment factor (EAF).
72
Effort Computation (..)
Total EAF = Product of the selected factors
73
Example
74
Software Development Time
• Development Time Equation Parameter Table:
76
COCOMO II (cont)
• Major differences
– COCOMO was based upon lines of codes estimate
• COCOMO II allows the use of other metrics, i.e. function
points
– COCOMO had a constant exponent, depending upon
which of three modes is selected
• COCOMO II allows the exponent to continuously vary
between 1.01 and 1.26
– COCOMO assumes that savings due to reuse are
directly proportional to the amount of reuse
• COCOMO II uses a non linear model – even a small amount
of reuse may incur a huge effort in understanding the code
77
COCOMO II (cont)
– COCOMO II modified the difficulty factors
– COCOMO II was calibrated with 83 projects
78
Distribution of Effort
• A development process typically consists of
the following stages:
– Requirements Analysis
– Design (High Level + Detailed)
– Implementation & Coding
– Testing (Unit + Integration)
26/12/2016 79
Distribution of Effort (.)
The following table gives the recommended percentage
distribution of Effort (APM) and TDEV for these stages:
26/12/2016 80
Error Estimation
• Calculate the estimated number of errors in your design, i.e.total errors found in
requirements, specifications, code, user manuals, and bad fixes:
– Adjust the Function Point calculated in step1
AFP = FP ** 1.25
– Use the following table for calculating error estimates
Classes*(2Function Points)
KLOC=Max[aKLOC, bKLOC]
26/12/2016 82
Number of personnel: NP=APM/TDEV