Identifying Factors Affecting Software Development Cost: Ntroduction
Identifying Factors Affecting Software Development Cost: Ntroduction
Identifying Factors Affecting Software Development Cost: Ntroduction
Robert Lagerström, Liv Marcks von Würtemberg, Hannes Holm and Oscar Luczak
Industrial Information and Control Systems
The Royal Institute of Technology
Stockholm, Sweden
Email: {robertl, livm, hannesh, oscarl}@ics.kth.se
Abstract—Software systems of today are often complex, Apparently, the need for a well reinforced decision basis
making development costs difficult to estimate. This paper uses regarding cost estimation is as great as ever. Therefore, the
data from 50 projects performed at one of the largest banks research question addressed in this paper is: What factors
in Sweden to identify factors that have an impact on software
development cost. have an effect on software development project costs?
Correlation analysis of the relationship between factor states This paper analyzes data from 50 projects performed at
and project costs were assessed using ANOVA and regression one of the largest banks in Sweden. In total 32 factors are
analysis. studied. Using one-way ANOVA and bivariate regression
Ten out of the original 32 factors turned out to have an analysis, ten factors that significantly affect project costs are
impact on software development project cost at the Swedish
bank, including the number of function points and involved identified. Several of the factors found are not included in es-
risk. tablished models for software development cost estimation,
Some of the factors found to have an impact on cost are such as COCOMO II and SEER-SEM. The contribution of
already included in estimation models such as COCOMO II this paper is therefore proposed extensions to established
and SEER-SEM, for instance function points and software cost estimation models. Also, some of the factors studied
platform. Thus, this paper validates these well-known factors
for cost estimation. are already used in the existing models. Thus, some of
However, several of the factors found in this study are these factors are further validated by this study and some
not included in established models for software development are challenged.
cost estimation. Thus, this paper also provides indications for
possible extensions of these models. A. Outline
Keywords-cost estimation; software development cost; esti- The remainder of the paper is structured as follows:
mation models; factors; function points Section II describes different methods used for software
development cost estimation. The factors included in the
I. I NTRODUCTION analysis are listed in section III, whereas the statistical meth-
A great majority of today’s companies are highly depen- ods used in the analysis are described in section IV. Results
dent of software systems supporting their business processes. from the analysis are presented in section V and further
When the business processes are modified to meet new discussed in section VI. Finally, section VII summarizes the
requirements, the software systems have to undergo a corre- paper with conclusions.
sponding change or develop new supporting software. This
is also the case when a company introduces new business II. R ELATED WORK
processes. Many different techniques for software development cost
As software systems of today often are interconnected estimation have been elaborated by numerous researchers,
with each other, developing a new system frequently involve this chapter briefly presents some of these methods with fo-
modifications of old systems as well. This makes software cus on factor-based models. As a majority of the established
development costs difficult to predict and methods for de- models use function points [3] and function points are a part
velopment cost estimation useful as decision support. The of the study presented in this paper, the factor is addressed
development of completely new software systems, as well separately in section II-A.
as the modifications of old systems, are often performed in
projects. [1] ascertain that 23% of all software projects are A. Function points
cancelled before completion. Furthermore, of the completed Function point analysis is a way of measuring the size and
projects, only 28% are delivered on time with the average extent of a software system by looking at which functions
project overrunning the budget with 45% [1]. Despite the the system delivers to the user. Function points was first
existance of several established cost estimation models, the introduced by Albrecht and Gaffney in 1983 [4] and it has
Standish Group reports that this situation has aggravated since then been improved and tested in many studies. Today
during the last couple of years [2]. people often use the function points based method called
COSMIC [5]. If used correctly, function points is considered Jorgensen presents a systematic review of software develop-
to be of considerable assistance when estimating software ment cost estimation studies in [19].
development costs. However, counting function points can
be resource-demanding and difficult. A good result also re- C. Dynamic models
quires accurate approximations in the subsequent regression Dynamic-based techniques pay regard to circumstances
analysis [6]. that change during a project, including changes that have a
non-linear influence on the overall cost. Examples of these
B. Factor-based models kinds of circumstances are that project personnel often are
more efficient when they have been working in a project for
The COnstructive COst MOdel (COCOMO), originally a long while, or that extra personnel in a project increases
published by [7] and upgraded to COCOMO II in 1995 the time spent on coordination.
[3], uses probabilistic Bayesian networks to turn accessible System dynamics simulation models use sets of first-order
data into an estimated cost [8]. Factors considered by differential equations to estimate software development costs
COCOMO II include size attributes such as number of [3].
lines of source code or function points and additional cost
drivers required, each related to one of four categories: D. Expert-based methods
1) Platform - software reliability, data base size, required Expertise-based techniques use knowledge and experience
reusability, documentation match to life-cycle needs and from participants in former projects rather than empirical
product complexity. 2) Product - execution time constraint, data [3]. Two common examples of expertise-based tech-
main storage constraint and platform volatility. 3) Person- niques are: 1) WBS (Work Breakdown Structure), where the
nel - analyst capability, programmer capability, personnel products and processes respectively are divided into more
continuity, applications experience, platform experience and lucid parts [3], [7]. 2) Planning Poker, where project team
language and tool experience. 4) Project - project use of members first estimate the cost separately and then discuss
software tools, multisite development, required development the result until consensus is obtained [20].
schedule, precedentedness, development flexibility, architec-
ture/risk resolution, team cohesion and process maturity [3]. III. S TUDIED P ROJECTS & FACTORS
A newly developed method called The Enterprise Archi- A pre-study consisting of studying project documentation
tecture Modifiability Analysis Tool (TEAMATe) also uses at one of the largest banks in Sweden was employed in
probabilistic Bayesian networks, with the additional feature order to assemble a list of factors having potential impact
of being coupled with enterprise architecture models. TEA- on software development cost. All project data used in
MATe consists of the following factors: Change management the analysis has been found in the project database at the
process maturity, documentation quality, software system bank. The study only includes development and integration
understandability, software system size, software system projects involving more than 200 man-hours and/or 500 000
internal and external coupling, software system change size, SEK. Further, 7.28 SEK equals 1.00 USD and 10.11 SEK
software system change difficulty, quality of tools for soft- equals 1.00 EUR (2010.02.04). The data samples, consisting
ware system changes, quality of infrastructure for software of data from 50 projects, include all projects fullfilling the
system changes, project team expertise, project members above stated criterias that was performed at the bank from
time on project, number of project members and software the summer 2006 until the summer 2008. Therefore the data
system change activity synchronization need [9], [10]. can be said to be representative, at least for the bank in
System Evaluation and Estimation of Resources - Soft- question.
ware Estimation Model (SEER-SEM) is offered by Galorath In the data collection phase, plans, follow-up documenta-
Inc. of El Segundo, California [11]. SEER-SEM, based on tion and scorecards for each project were used. All factors
the Jensen model first published in 1983 [3], takes a para- included in this study comes from the internal documen-
metric approach to estimation. By stating parameters such tation performed continously at the bank following the
as size, personnel, complexity, environment and constraints normal procedures for executing projects. This study is thus
the model submit estimations of for instance project cost, delimitated to factors documented by the bank.
risk and schedule. SEER-SEM covers all phases of a project The response variable used in the study, software develop-
life-cycle and handle several environmental and application ment project cost, was measured in Swedish ’kronor’ (SEK),
configurations [3], [8]. the currency used in Sweden.
In [3], [8] several other more or less well-known factor- A total of 32 factors were identified during the pre-study.
based models are presented and discussed such as PRICE-S These factors and their corresponding states are presented in
[12], ESTIMACS [13], Checkpoint [14], Softcost [15], the this section of the paper according to the following syntax:
Putnam Software Life Cycle Model (SLIM) [16], the Jensen Factor name: Factor states
Model [17] and the Bailey-Basili Model [18]. Furthermore, Factor description.
Evaluated factors: Budget revisions: Natural numbers
Function points: Real numbers ≥ 0 Number of budget revisions during a project.
Number of function points in a project. Method for debit: Continuous / Fixed / Manual
Primary platform: DW / TDE / ZOS / other How the method for debit has been handled in a
Main software platform affected by a project. Pos- project.
sible states are DataWarehouse (DW), Touchdown Project participants: Natural numbers
Europe (TDE) and a 64-bit operating system for Number of individuals who have spent more than
mainframes (ZOS). 40 hours within a project.
Secondary platform: DW / TDE / ZOS / other Duration: Domain of natural numbers
Secondary software platform affected by a project. Total number of workdays a project have utilized.
Presentation interface: 3270 / Web browser / External parts: Natural numbers
Windows / None Number of external units involved in a project.
Main interface of the delivered software in a
Consultants: Natural numbers
project.
Total number of consultants participating in a
Risk classification: High / Medium / Low project.
Overall assessed risk associated with a project.
Cooperation: 1 (very poorly) - 5 (very well)
Existence of overall schedule: Yes / No How well involved project participants cooperated.
If a project has included an overall plan or not.
Integrations testing: Yes / No
Existence of overall testing plan: Yes / No If integrations testing has been performed or not
If a project has included an overall plan for testings in a project.
or not.
Commissioning body: Name
Existence of testing conductor: Yes / No Project sponsor.
If a project has included a testing conductor or not.
Commissioning unit: One of nine units
Length of pre-study: Real numbers ≥ 0 Unit cohering to respective commissioning body.
Length of a project’s pre-study phase, measured as
Competence performing assignment: 1 (very
a fraction of the total time spent on a project.
low) - 5 (very high)
Cost of pre-study: Real numbers ≥ 0 How the commissioning body assessed the compe-
Cost of a project’s pre-study phase, measured as a tence of an involved project group.
fraction of the total cost spent on a project.
Performance of estimation- and prognosis ef-
Project type: Development / Integration forts: 1 (very poorly) - 5 (very well)
Type of project being carried out. How the commissioning body assessed the perfor-
Project priority: High priority / Low priority mance of estimation- and prognosis efforts.
Internal prioritization of a project. Quality of delivery: 1 (very low) - 5 (very high)
Business manager: Name How the commissioning body assessed the quality
Individual responsible as business manager in a of the delivery.
project. Conformance to requirements: 1 (very low) - 5
Project manager: Name (very high)
Individual responsible for managing a project. How the commissioning body assessed the final
delivery compared to original requirements in a
Liable for delivery: Name project.
Individual responsible for the delivery of results in
a project. Implementation efficiency: 1 (very low) - 5 (very
high)
Architect: Name How the commissioning body assessed the effi-
Individual assigned the role as system architect in ciency of implementation.
a project.
Final deadline revisions: Natural numbers IV. A NALYSIS M ETHOD
Number of changes that have been made to a Each set of attributes connected to the factors described
project’s final deadline. in section III belong to one of four different measurement
scales; either the nominal, ordinal, interval or ratio scale. In other words, up to 100% of the dataset variation is
Inherent properties of involved ranges generate a need for explained by the chosen model. Calculations framework for
different statistical tools when assessing impact on cost from assessing R2 can be found in [21] [22]. While there are
involved factors. critics of R2 , it is widely considered to be a convenient
Factors belonging to nominal or ordinal scales are through goodness-of-fit statistic [23].
reasons described in [21] assessed using one-way between-
subjects analysis of variance and factors belonging to in-
terval or ratio scales by recommendation of [21] using a
bivariate regression analysis. V. R ESULTS
A. One-Way Between-Subjects Analysis of Variance
One-Way Between-Subjects Analysis of Variance, or one- Analyzed data indicates that ten of the 32 factors evalu-
way ANOVA, is used in situations where the purpose is to ated in the study are determinants of software project costs
compare means of a quantitative Y outcome variable across at the Swedish bank. The one-way ANOVA assessment indi-
two or more groups [21]. Furthermore, ’One-Way’ implies cates that six factors have significant impact on project costs,
that there is only one factor involved in the study. given a boundary of p < 0.05 which is recommended by
One of the key assumptions of one-way ANOVA, other [21]. Also, four of the factors that could provide regression
than quantitative measurements, is that assessed scores are models for describing project costs show reasonably high fit
normally distributed across the entire sample and group, of data (R2 ≥ 0.4), and can thus be used to describe the data.
with no extreme outliers [21]. If data seem to cohere to
another distribution than the normal distribution it might be
possible to remedy the problem through a transformation, or
by removing outliers [21].
The null hypothesis of an ANOVA-test, H0 , is true if A. Factors having an impact on software development cost
the differences between observed groups of data can be
described by chance and false if there are systematic dif- Using one-way ANOVA, the following six factors were
ferences large enough to justify rejection of H0 [21]. The identified to have an impact on the software development
boundary associated with rejecting a null hypothesis is gen- project costs at the Swedish bank:
erally described using probability, p. Datasets which contain
groups of observed data with differences large enough to 1) Risk classification (p = 0.00016)
reject the null hypothesis are statistically significant. A A comparison of group means displayed a strong cor-
commonly used probabilistic boundary is p < 0.05, which relation between project cost and the assessed project
implies that there is less than 5% probability for H0 to be true risk; the costs were roughly four times greater for high
[21]. The reader is referred to the following work regarding risk projects than low risk projects, cf. Fig. 1.
information about the technical aspects for identifying actual
boundaries; [21], [22].
Risk classification
B. Bivariate regression 12000000
Y0 = b0 + bX. 4000000
2000000
Where Y 0 is the predicted outcome variable, b0 is the
0
intercept, and b is the slope. The regression coefficient b High Medium Low
and the intercept b0 are further identified using mathematical
techniques found in [21], [22] to reach optimal fitting to Figure 1. Project costs illustrated using Risk classification
data. Naturally, a key concept in regression analysis is to
determine how well the identified equation actually models
the variation in a dataset. The measure of the spread of
2) Budget revisions (p = 0.0033)
points around the regression line can be presented using the
Projects with many budget revisions seem to end up as
coefficient of determination, R2 , where
more expensive than projects with few revisions (cf.
0 < R2 ≤ 1. Fig. 2).
Budget revisions Commissioning body's unit
10000000 6000000
9000000
5000000
8000000
7000000 4000000
6000000
5000000 3000000
4000000
2000000
3000000
2000000 1000000
1000000
0 0
0 1 2 3 Unit 1 Unit 2 Unit 3 Unit 4 Unit 5 Unit 6 Unit 7
Figure 2. Project costs illustrated using Budget revisions Figure 5. Project costs illustrated using Commissioning body’s unit
Commissioning body
Platform
10000000
8000000 9000000
7000000 8000000
6000000 7000000
6000000
5000000
5000000
4000000 4000000
3000000 3000000
2000000 2000000
1000000
1000000
0
0
1 2 3 4 5 6 7 8 9 10 11
other DW TDE ZOS
4) Project priority (p = 0.018) It can be argued that the sixth factor, ’Commissioning
A project which had a high priority received approxi- body’, should have been excluded since it did not reject H0
mately three times more resources than a project with using the chosen boundary, however, the fact that the differ-
low priority (cf. Fig. 4). ence was marginal (0, 2% probability) provided a reason for
inclusion.
Project priority
7000000
6000000
5000000
2000000
The bivariate regression analysis showed that four factors
1000000
can be used to model project costs:
0
Yes No
1) Project participants
(R2 = 0.52,Y = 316366 + 311375X)
Figure 4. Project costs illustrated using Project priority
The number of project participants provided a rea-
sonably good linear model describing project costs.
5) Commissioning body’s unit (p = 0.048) As shown in Fig. 7, every added project participant
Involved project costs varied a lot with the different resulted in an added cost of approximately 300 000
commissioning body units (cf. Fig. 5). SEK.
Project participants (R2 = 0.40,Y = 2000000 + 1000000X)
30 000 000 As shown in Fig. 10, every additional consultant
25 000 000 increased project costs with about 1 000 000 SEK and
20 000 000
displayed a sufficient amount of correlation.
15 000 000 y = 311375x + 316366
R² = 0.5165
10 000 000 Consultants
30 000 000
5 000 000
25 000 000
0
0 10 20 30 40 50 20 000 000
y = 1E+06x + 2E+06
15 000 000 R² = 0.4008
Figure 7. Project costs modeled using Project participants 10 000 000
5 000 000
2) Duration
0
(R2 = 0.48,Y = 26992X − 3000000, where Y ≥ 0∀X) 0 2 4 6 8 10 12
Correlation between project costs and number of
workdays was good enough to support a model, ev-
Figure 10. Project costs modeled using Consultants
ery added workday provided about 27 000 SEK (cf.
Fig. 8).
C. Non-significant factors
Duration
30 000 000
Factors that did not show any significant correlation with
25 000 000 project costs at the Swedish bank are presented in Table I
20 000 000 and Table II.
15 000 000 y = 26992x - 3E+06
R² = 0.4757 Table I
10 000 000 I NSIGNIFICANT FACTORS (ANOVA)
5 000 000
Factor p
0
Cooperation 0.17
0 100 200 300 400 500 600 700
Architect 0.21
Final deadline revisions 0.22
Secondary platform 0.22
Figure 8. Project costs modeled using Duration
Liable for delivery 0.25
Competence performing assignment 0.26
3) Function points Existence of testing conductor 0.28
Existence of overall schedule 0.29
(R2 = 0.46,Y = 977861 + 11053X) Existence of overall testing plan 0.29
The number of function points showed a reasonable Performance of estimation- and prognosis efforts 0.30
correlation with project cost, with approximately 11 Presentation interface 0.31
Integrations testing 0.34
000 SEK for every added function point (cf. Fig. 9). Project manager 0.35
Quality of delivery 0.43
Conformance to requirements 0.50
Function points
External parts 0.62
30 000 000 Implementation efficiency 0.66
25 000 000
Business manager 0.76
Method for debet 0.83
20 000 000 Area of delivery 0.86
y = 11053x + 977861 Project type 0.91
15 000 000 R² = 0.4641
10 000 000
5 000 000
Table II
0
I NSIGNIFICANT FACTORS ( REGRESSION ANALYSIS )
0 200 400 600 800 1000 1200 1400 1600
4) Consultants
VI. D ISCUSSION which all models resumed by [3] are unaware of.
A. Significant factor results B. Non significant factor results
Some significant factors have quite obvious logical associ- Equally interesting as the significant factor results is
ations with project cost, the scientific contribution from these the circumstance that some factors often assumed to affect
results can thus be discussed. Factors included are duration, the software development cost failed to show a significant
consultants and project participants. impact on the same in the study at the Swedish bank. Among
More interesting is the probable impact on cost from these are:
involved risk, primary software platform, function points, • Implementation efficiency (p = 0.66)
commissioning body and commissioning body’s unit. • Conformance to requirements (p = 0.50)
Collected empirical data indicate that high risk projects • Existence of overall schedule (p = 0.29)
need significantly larger budgets than low risk projects. The • Project manager (p = 0.35)
very nature of high risk projects imply that they contain dif- • Cooperation (p = 0.17)
ficult parts. Nevertheless, this study indicates that awareness • Cost of pre-study (R2 = 0.023)
of the risk status of a project could be of importance while Since the implementation phase of a project typically
estimating the project cost. This is a factor that none of the require large amounts of resources [25], efficient effort
models in section II covers, at least not according to the during this phase was thought to provide less expensive
terminology of [3]. The importance of careful risk handling projects. This study shows signs of this hypothesis to be
has however been discussed in other studies, such as [24] true, though the difference was too small to provide a high
and [25]. correlation coefficient. A reason behind this could be the
Regarding the factor primary software platform, if the nature of the data collection; data was stratified according
software system developed at the Swedish bank is based to the subjective preferences of the commissioning body
on the so called TDE-platform the expenses are deemed connected to the project at hand.
to be relatively steeper than the other possible platforms. An integral part of successful projects are to meet the
Computer and program attributes similar to this are vital specified requirements [26]. The performed case study did
parts of available estimation models such as COCOMO II, not assess any cost differences accompanied by conformance
SEER-SEM and TEAMATe [3], [9]. This study validates the to requirements. However, since projects were not evaluated
importance of including this factor. according to success this is hard to measure. It is also
The relation between number of function points and cost possible that the conformance of requirements correlates
have been discussed before, see for instance [6]. Also, the with the commissioning body (often responsible for the
function point measure is a vital part of both COCOMO requirement specification), which did turn out to have a
II and SEER-SEM [3]. This study points in the same significant effect on development cost, see section VI-A.
direction. More expensive projects due to increased system This could be further investigated in a multiple factorial
functionality is a rational, but no less interesting, conclusion. experiment.
The Standish Group states the ownership of a project The existence of an overall schedule is often thought of
as one of ten project success criteria. This study shows as a key to keeping costs low [3]. The study of the Swedish
that the commissioning body and the commissioning body’s bank points towards the opposite. A reason behind this could
unit significantly affect the project cost, which is in line be that schedules used in the projects in hand were of poor
with the Standish Group’s conclusions. A reason for this quality, however further research is needed in order to draw
might be that different commissioning bodies emphasize the any such conclusions.
requirement specification (another success factor stated by A good project manager can be the difference between
the Standish Group) to various extent [26]. success and failure [27], and often determines the expenses
The significant correlation between cost and number of of a project [3]. The case study of the Swedish bank did
revisions of project budget can be argued; the increased however not find that different project managers affected
expenses are with little doubt implicated by other factors. To the project costs. This could for instance be due to the fact
ratiocinate with certainty about this, further analysis of the that all project managers at the bank are equally competent
underlying causes is required, see section VI-C. However, or all lacking experience to the same amount.
one possible explanation could be that a budget in need In [3], cooperation is included as one of the key factors
of several big changes was poorly performed from the affecting software development costs. The study presented
beginning, which might imply shortcomings in the project in this paper could not assess any significant correlation
plan. If this is the case, a way of measuring the quality between cooperation and costs, nevertheless, the significance
of the project plan could be of use while estimating the level was still fairly low and the factor could thus very well
software development cost, though such measurements are turn out as significant during a follow up study. Another
often difficult to perform. Nonetheless, this is an aspect of reason for the lack of correlation is that cooperation was
subjectively graded by each assigned commissioning body. VII. C ONCLUSIONS
A third prospect is that all projects might have had equally
Software systems of today are often complex, making
functional levels of cooperation.
development costs difficult to estimate. This paper uses
A common expectation is that large projects have more
data from 50 projects performed at one of the largest
expensive pre-study phases than small projects [25] and that
banks in Sweden to identify factors that have an impact on
a well performed pre-study is a good investment of time
software development cost. In total 32 factors were studied.
and money [26]. In line with the research performed by
Correlation analysis of the relationship between factor states
The Standish Group [2], [26], [27], this study indicates that
and project costs were assessed using one-way ANOVA and
it is the quality of the pre-study rather than the length or
bivariate regression analysis.
cost of the pre-study that has an effect on the project cost,
This study validates the use of factors such as function
where a well performed pre-study reduces the project cost.
points and primary software platform, which can be found
This confirm the reasoning in section VI-A, that a way
in established software development cost estimation models,
of taking the quality of the pre-study into consideration
such as COCOMO II and SEER-SEM. The risk classifi-
could increase the accuracy of established development cost
cation of a project was another factor found to affect the
estimation models.
project cost. Accordingly, this study indicates that including
C. Validity and reliability information about the risk classification would increase
the accuracy of established models, as they so far do not
A potential problem with the external validity and relia-
consider this explicitly.
bility includes that most evaluated factors were not chosen
A somewhat more speculative suggestion is to consider
from theory, and thus the significance of these might only
the quality of the pre-study in the cost estimation. This
be evident in the data collected during this particular case
however requires further research regarding the intergroup
study at the Swedish bank. It is however interesting that
dependences.
so many factors not proven by theory excavated correlation
Two factors often assumed to affect the project cost are
with project costs.
the efficiency of the implementation and the cost of the pre-
Some factors had states which were graded subjectively
study. In this study though, these factors failed to display a
by each commissioning body, this could affect the internal
significant correlation with project cost.
validity and reliability of the results. Another possible prob-
lem is that the factors were evaluated individually when there A. Future work
in fact could be correlation between factors which provide
the true significant impacts on cost. Further, increased value To reinforce the results of this study and further investi-
of a factor state variable affects the states of other variables. gate the underlying causes of the results, multiple factorial
Thus, increased cost are in no doubt the result of the experiments of the data could be performed. Also, further
combination of many changes of factor states and not just research on the suggested extentions of the established cost
one, as is modeled in this study. estimation models would be a useful validation of this study.