Experts' Estimates of Task Durations in Software Development
Experts' Estimates of Task Durations in Software Development
www.elsevier.com/locate/ijproman
Abstract
This paper reports a case study of how accurate were experts' subjective estimates of the durations of tasks in a software
project. The data available included the estimated task durations given by experts and the subsequent actual duration times. By
looking at the results of the case study, the paper shows that although the majority of tasks are overestimated, the mean error is
an underestimate of about 1%. The experts however could do even better by taking more cognisance of the number of subtasks
that make up a task and hence use the WBS at a lower level when they are estimating durations. # 1999 Elsevier Science Ltd
and IPMA. All rights reserved.
1. Introduction Jerey and Low [2], Finnie et al. [3]), there is little
work reported on the estimates of the individual tasks
The eectiveness of project management techniques that make up the project. It is these estimates that are
depend heavily on the accuracy of the task duration required if project management techniques are to be
estimates. Most of the project management toolsÐ used to plan and control the project as opposed to an
identifying critical activities, the baseline schedules, initial pricing or scoping estimate. This paper reports
milestone determination, resource schedules, cost/time the results of the task duration estimation procedure
chartsÐdepend on accurate duration estimates. This used by the software development experts of an inter-
can be particularly dicult to make accurately in national organisation on all its software development
research and development situations such as software projects over a three-year period. The experts esti-
development. Although the Program Evaluation and mated the duration of the various tasks making up
Review Technique (PERT) methodology allows the in- each project and these are compared with the actual
clusion of errors in the duration estimation it is also time those tasks took. The results suggest that there
heavily aected by what are the most likely duration may be biases that appear in such estimates and that
times of the tasks. one might be able to develop an automatic correction,
Although project management has been in use for which allows for such biases. This could improve pro-
more than 30 years there are few published case stu-
ject management of software development consider-
dies reporting on the over and under estimates that
ably. In a survey of information systems managers and
occurred in task duration estimates in actual projects.
software developers in over 100 organisations, Lederer
Even in software development, although there is a
and Prasad [4] found that only 25% of such projects
large literature on methods of estimating the total cost
were completed within estimated time and budgets.
or eort of the whole development, (Kemerer [1],
The most frequent cause of overrun was users chan-
ging the scope of the project. However the issues that
* Corresponding author. Tel.: +44-131-650-3798; fax: +44-131- were most highly correlated with overruns were indi-
668-3053. viduals' performance reviews not considering if their
0263-7863/99/$20.00 # 1999 Elsevier Science Ltd and IPMA. All rights reserved.
PII: S 0 2 6 3 - 7 8 6 3 ( 9 8 ) 0 0 0 6 2 - 3
14 J. Hill et al. / International Journal of Project Management 18 (2000) 13±21
estimates were met, lack of ways of setting standard remember the functions developed in a project than
durations for use in estimating common tasks and lack the interfaces involved in the development, while it is
of project reviews to compare estimates with actuals. It true that reasons for large overestimates of tasks tend
is hoped that the results presented hereafter will alert to be recalled more easily than reasons why tasks were
other companies to the need to look at the relationship completed more quickly than expected. To overcome
between their estimates and the actual duration times. these failings of the expert some authors [7, 8]
The next section outlines some of the ways in which suggested using the Delphi technique that allows a
task, total project duration and eort are estimated in group of experts to arrive at a consensus. This
software development. Section 3 describes the case approach could also be used to set the parameters for
study and how the data on estimates of task duration a PERT analysis, where for each task one wants esti-
and the subsequent actual duration times were mates of the fastest possible duration; the slowest poss-
obtained. Section 4 analyses the errors in estimation ible duration; and the most likely duration of a
found in the data while the ®nal section draws some task [9].
conclusions from this case study. Estimating by analogy is a powerful technique if
there is a stable technological environment with some
degree of historical data available. One identi®es which
2. Methods of task estimation was the most similar previous project or task and
takes its actual time as the estimate of the new project
Developing accurate estimates of the duration and or task. The advantages of analogy include low cost,
eort of the project overall, and its separate tasks, is relative simplicity and reasonable accuracy. The pro-
critical to the usefulness of project management blem is to ®nd suitable analogies [7] and as Yeates [10]
ideasÐboth in the planning and monitoring of pro- emphasises it is unlikely that a previous case will
jects. Thus project managers have used a number of exactly match the new project and hence expert judge-
dierent ideas to aid their estimation of the duration ment will be needed to adjust for these dierences.
and eort of the project and its tasks, particularly in A great deal of work and not a little hype has gone
the case of software engineering and software develop- into developing algorithmic or parametric estimation
ment projects. This is partly due to the development of methods. These all identify what are the eort or cost
this industry at the same time as project management drivers in a project and then seek to use data on past
techniques were coming to the fore, and partly that projects to ®t the parameters of a model based on
the skills needed by engineers in this industry mean these cost drivers. The cost drivers tend to be measures
they ®nd it easy to use project management ideas. It is of system size and complexity plus perhaps personnel
also the nature of software development, which is capabilities and experience, hardware constraints and
essentially a research and development activity with all the availability of software development tools. The
the uncertainty involved in that, but where many of measures of size varies from likely number of lines of
the tasks are not dissimilar to tasks that have been executable code, to number of functions, modules or
previously performed. program features required. The models themselves
The techniques that have been used to estimate pro- either use arithmetic formulae or regression
ject eort and task duration include expert judgement, approaches, and even the latter tend to split into those
analogy, parametric models (the most widely used of where the dependent variable is eort (additive ones),
which were COCOMO and Bang), Function Point or the log of eort or time (multiplicative ones), or
analysis and recently neural nets and case based some other function of eort (analytic ones). Boehm [6]
reasoning. adds to this set, in his classi®cation of such models,
Perhaps the most common approach to estimating composite models which are combinations of the pre-
eort is to consider the opinions of experts. This does vious models. Jorgensen [5] gives a review and a cri-
not require the existence of historic data and is par- tique of these approaches.
ticularly useful at the start of system development Some of these parametric approaches have become
when requirements are vague and changing, and it is established methodologies in their own right. These
ballpark ®gures that are required. Experiments suggest include COCOMO, and SLIM. The COCOMO model,
expert judgement can be very accurate but it fails to proposed by Boehm, [6] can be considered a composite
provide an objective and quantitative analysis of what model since it provides a combination of functional
are the factors that aect eort and duration, and it is forms made accessible to the user in a structured man-
hard to separate real experience from the expert's sub- ner. Its purpose is to predict the eort and duration of
jective view [5]. The reliability of the estimate depends the total project, but not an estimate of size since the
how closely the project correlates with past experience major factors in the model are the estimated number
and the ability of the expert to recall all the facets of of delivered source instructions and the environment.
historic projects. Boehm [6] suggests it is easier to The latter recognises three types of development mode,
J. Hill et al. / International Journal of Project Management 18 (2000) 13±21 15
each with its own equation, where the mode depends There have been some comparisons of these methods
on the experience of the team and the innovative of estimating total project eort even if there has been
nature of the project. SLIM is another composite little reported work on task duration estimates.
model outlined by Putnam, [11] and also based on Kemerer [1] performed an empirical validation of four
lines of code, but using Rayleigh curves to modify the algorithmic models (SLIM, COCOMO, FPA and
estimates. ESTIMACS, which is a proprietary system with simi-
Function Point Analysis (FPA) was developed by lar features to FPA) using data on completed projects
Albrecht at IBM [12, 13] for quantifying the size of a to construct estimates of the completion times and
software system particularly in business applications. then comparing these with the actual times. Jerey and
Function points are an alternative to source lines of Low [2] conducted a similar investigation but allowed
code in measuring the size of a system, by capturing for the models to be calibrated at both the industry
things like the number of input transaction types and and the organisation level. Mukhopadhyay et al. [16]
the number of dierent types of reports to be avail- compared a case-based reasoning approach with
able. Thus one ®rst counts the number of user func- COCOMO, FPA and expert judgement, and found
tionsÐexternal input, external output, internal ®les, that the other methods did not outperform expert jud-
external interface ®les, external inquiries, and then gement. Finnie et al. [3] compared FPA with case-
adjusts to allow for processing complexity. The orig- based reasoning and neural network methods on 299
inal Albrecht approach [12] was a two-step approach projects and found the AI models superior. His main
where function points were used to estimate lines of point however was that good practical estimation
code which were then used to estimate eort needed. demands good record keeping of estimates and actual
Kemerer [1] tries to estimate project duration directly outcomes on previous projects.
from the function point count. Neither approach was Heemstra [17] surveyed 364 organisations and found
less than 155 used models to estimate software devel-
particularly accurate, and so Symons [13] revised the
opment eort and also that model-users made no bet-
approach by making the method compatible with
ter estimates than the non-model users. He also found
structured systems analysis techniques, which then
that ®rms did not recalibrate their models in the light
made it easier to count logical transactions rather than
of their results.
user functions, and then by recognising that the model
It should be noted though that all these comparisons
must be calibrated at the level of the organisation
are of the total project eort or cost or duration
which is building the system, and not a general indus-
needed for initially scoping and pricing the project.
try level. This made it closer to the approach adopted
None of them consider the task estimates needed to
by DeMarco [14] in his BANG system. He had also
make the project management approach successful.
recognised that at the design stage of a software sys-
tem, lines of code is not an appropriate metric as one
only has information on the business requirement. He 3. Data on task durations and their estimates
therefore sought to develop metrics of the level of
complexity of the business requirement by considering Data on task durations was obtained from the infor-
the latter as a network of functional primitivesÐsuch mation systems development department of a major in-
as calculating the interest on a loan. The review by ternational ®nancial organisation. The organisation
MacDonell [15] examines how dicult it is to ®nd uses no formal estimation models but uses the expert
such suitable metrics and comments on why reports of opinion of project supervisors and managers to esti-
successful implementation of this method are so few. mate the duration of the tasks that make up each pro-
He felt the main reason was because it was hard to ject. They in turn probably use some form of analogy,
calibrate weightings at an organisation level. and though estimates may be discussed informally
The idea of using Arti®cial Intelligence techniques to between project team members there is no formal use
aid estimation has only been tried since 1990. There of Delphi techniques.
have been two approaches. Case-based reasoning fol- Minor projects are planned at the supervisory level.
lows the analogy approach by trying to compare the In this planning exercise, the tasks and the subtasks
proposed project with similar ones that have been involved are identi®ed and eort needed for each task
completed and then identify what are the dierences is estimated, and the tasks allocated to individuals.
and what implications these have for the eort levels, For more major projects senior management set guide-
see Mukhopadhyay et al. [16]. The second approach is lines regarding project estimates following agreement
to use neural networks where the inputs are the with the user steering group. The eort required for
measures of size and complexity together with descrip- each task making up the project is then estimated by
tions of the programming environment which are used project managers as before, and the total eort of all
in the algorithmic approaches [3]. tasks must be within the management guidelines. If it
16 J. Hill et al. / International Journal of Project Management 18 (2000) 13±21
is not, the project is referred back to the user steering 4. Analysis of data
group either to scale down the scope of the project or
to increase these guidelines. The descriptive statistics for all the tasks outlined in
Once a project is agreed, the data on each task is Tables 1 and 2 and show that although 60% of the
loaded on the computer. This data includes the initial tasks were overestimated, on average the estimated
task eort estimate (recorded in workdays with a mini- durations were slightly less than the actual durations,
mum input of 0.1 day), the project manager who made i.e., the error is less than 1%. This implies that the
the estimate, the type of task, the number of subtasks errors when tasks take longer than estimated are sig-
that make up the task, the number of sta allocated ni®cantly higher than if they take less time than esti-
for the task in total and whether the task is being mated. This is reasonable since there is a limit on how
tracked or not (tracking means the project's progress much quicker than estimated time the task can be
is being controlled by a project management package). completed, but no limit on how much longer.
During development the actual eort needed to com- Table 2 shows that for almost all the dierent ways
plete the task is also recorded. of grouping the tasks the majority of tasks are overes-
The database on tasks has been in existence since timated. There are two groups in which the majority
1989 and includes details of over 16 600 tasks. A sub- of tasks are underestimated. The ®rst group is tasks
set of these was chosen using the following ®lters. involving analysis and construction and the second
Only tasks which were part of enhancement or devel- group is more complex tasks, including ones with
opment projects were included, which removed pro- more than three sta or with more than ®ve subtasks.
jects coded as ®xes and training, while tasks which The underestimating of joint construction and analysis
were part of the administration of the projectÐmeet- tasks suggest that perhaps the requirements were not
ings, walkthroughs, etc.Ðwere also excluded. Only fully understood before the work started and it might
tasks undertaken in the three-year period 1993±1995 be better to split the tasks into separate analysis and
and estimated by one of the six project managers construction tasks. The underestimating of tasks with
employed throughout that period were included. large teams and large numbers of subtasks suggest a
Lastly only tasks with at least 0.5 days of estimated common cause. It may be that estimators tend not to
eort were included. A sample of over 500 tasks meet- understand the true complexity of larger systems devel-
ing these criteria was then analysed. There were some opment tasks.
extreme values. For example a task that was estimated The distribution of the error (estimated timeÿactual
to take 126 days only took 1.6 days, while another was time), as displayed in Fig. 1, is unimodal. There are
estimated to take 1 day and took 150.9 days. These far too many errors (25% of the sample) in the region
and other outliers were examined by the senior man- [ÿ1.0, 1.0] for the distribution of errors to be normal
agers to see if they were true records of the estimates (i.e., the tails are too thin) and 10% of the cases have
and actual times. Twenty tasks attracted comment, but no error at all. The standard tests for normality con-
only the two alluded to were felt to be spurious by the ®rm this, and the extremely high kurtosis of this distri-
senior managers, and these were excluded from the bution.
sample leaving 506 tasks. The comments were them- Looking in more detail at the errors in the estimates
selves interesting, in that in some cases it was felt that for dierent types of tasks con®rm the results of
very low duration estimates had been initially allocated Table 2. Table 3 gives the mean, median and standard
for strategic reasons. deviation for the errors (estimateÿactual) and actual
Lastly the subtasks making up each task were exam- times for each of the task types. Again the major
ined to determine the type of the task. These types dierences occur with tasks of large sizeÐbe it by the
were classi®ed as: number of subtasks or sta allocatedÐand the life
cycle option.
T1: setting objectives and requirements;
T2: external design; Table 1
T3: internal design; Summary statistics of estimates and actual durations
T4: constructionÐprogramme development and
All tasks Estimated Actual EstÿAct
implementation; days (E) days (A) (EÿA)
T5: analysis;
T6: combinations of analysis and construction. Mean 13.51 13.63 ÿ 0.12
Stand. Deviation 27.83 27.2 20.43
The year the project was planned and undertaken Minimum 0.5 0.1 ÿ214.4
Maximum 200 255.4 140.1
was also recorded in case there was a learning eect by Median 5.0 3.9 0.3
the estimators.
J. Hill et al. / International Journal of Project Management 18 (2000) 13±21 17
Table 3
Means and medians of errors and actual times
Description of group EstÿAct mean EstÿAct st. dev. EstÿAct median Actual mean Actual st. dev. Actual median
and the R 2 = 0.82. This high R 2 for Eq. (3) means the
A 0:177E 0:07STAFF 1:33SUBTASKS actual times are almost as well explained by the num-
ÿ 1:68Dtracking 0:21DM1 ÿ 2:13DM2 ÿ 3:30DM3 ber of subtasks alone as when all the variables are put
0:23DM4 ÿ 6:15DM5 ÿ 0:70DT1 2:31DT2 together, and the ®t is better than just using estimated
time.
ÿ 0:55DT4 2:65DT5 2:04DT6 1:79
2
This suggests that if the experts counted subtasks
and incorporated this extensively into their estimate
where STAFF is the number of sta used, they would get better results. So there is the question,
SUBTASKS the number of subtasks, Dtracking = 1 if do the experts consciously or subconsciously already
tracking was used, DMi is the eect of manager i (com- do this. One way of checking is to do a regression of
pared with manager 6) and DTj is the eect of task the estimated times against the task variables. If one
type Tj (compared with type T3). This equation has a does this one ®nds that the regression has an R 2 of
R 2 value of 0.845 which means that the right hand 0.49 and the equation is
side of Eq. (2) is quite a good estimate of the actual
durations. Two of the most signi®cant variables are
SUBTASKS with a t-ratio of 6.21 and E with a t-ratio E 1:18SUBTASKS ÿ 0:11STAFF 2:92Dtracking
of 2.05. The coecient of E is 0.177 which means that 22:93DM1 1:18DM2 0:94DM3 1:92DM4
one takes less than 20% of the estimated time into the ÿ 3:375DM5 3:87DA 3:81DB ÿ 0:37DD
equation, whereas the number of subtasks is multiplied 12:42DE 2:19
4
up by 1.33 when combining all the factors to get the
best estimate of the duration. The details are found in
Table 4 which shows that other signi®cant factors are Again, if one only looks at the relationship between
whether the estimates are made by managers 3 or 5. estimated times and number of subtasks one gets the
Since the subtasks turn out to be so important it is regression equation
worth looking at what happens if one uses only those E 1:22SUBTASKS 3:18
5
to estimate the actual durations.. The regression
equation in this case is with a R 2 = 0.48. What these results seem to say is
Table 4
Results of regression in Eq. (2). Ordinary least square regression applying White's corrections for hetereoskedasticity.
Actualtime = a + b1estimate + b2sta + b3subtasks + b4tracking + b5M1 + b6M2 + b7M3 + b8M4 + b9M5 + b10T1 + b11T2 + b12T4 + b13-
T5 + b14T6 + e
Variable name Estimated coecient Standard error T-ratio P-value Partial corr. Standardized coecient Elasticity at means
491 DF
Durbin-Watson = 1.8597
Ordinary least square regression applying White's corrections for heteroscedacity giving Eq. (2) 0.506 observations.
Dependent variable = ACTUALT IME.
Using White's heteroskedasticity-consistent covariance matrix, R-square = 0.8453, R-square adjusted = 0.8408.
Sum of squared errors-SSE = 57850.
Mean of dependent variable = 13.643. 491 DF.
Log of the likelihood function =ÿ1916.97.
20 J. Hill et al. / International Journal of Project Management 18 (2000) 13±21
that of all the factors described, it is the number of age errors on tasks that were not clearly de®ned (cat-
subtasks that has the biggest impact on the time esti- egory T6) or were complex, involving a number of
mated. However since R 2 is only 0.48 it means that sta and a large number of subtasks. The major over
there is a lot of variation in the estimated times that estimates came in tasks involving setting requirements
cannot be explained by the number of subtasks. It is (T1) or analysis (T5). Tracking of the tasks seemed to
not possible to say from the data what it is that aects have little eect, and the learning eect over time was
the experts estimates. All that can be said after exam- to tighten up the estimates so that the average esti-
ining Eq. (4) and noting that it is no better an estimate mated task times instead of being 1.7 days more than
than Eq. (5) is that it is not anything to do with the actual became 2.3 days less than the actual times. This
type of task or the manager who is doing the estimat- eect might have as much to do with pressure within
ing. the organisation for projects to be done as eciently
Although almost all the explanation of the estimated as possible as any learning eect on what really hap-
times is in its relationship with subtasks, these esti- pens in projects. One way this pressure exerted itself
mates are still not putting enough emphasis on sub- was in a drive to squeeze `more value for money' as a
tasks, since the relationship of subtasks with actual way of improving the returns to the clients.
times is even stronger than that of subtasks with esti- The main conclusion though is the strong relation
mated times. between task time and the number of subtasks
Since the real error is A±E, regressions that estimate involved in the task. This was by far and away the
real errors in terms of estimated times and the other best estimate of the likely time of the taskÐbetter
factors lead to an equation which is just a rearrange- than the estimated time even. What this suggests is
ment of Eq. (2), though of course the R 2 will be larger that a careful use of work breakdown structure
because the values of the dependent variable are so approaches to identify the packets of work at the low-
much smaller. Looking at the relative error so that the est level is not just useful for smooth management of
dependent variable is (E±A)/A or (E±A)/E does not the project, but is also one of the most useful thing
give very good ®ts (R 2 of 0.13 and 0.26 respectively) experts can do to estimate the times of the tasks better.
and although the eect of subtasks remain signi®cant It is clear that the managers in this case study under-
in these regressions it is not so great as in the straight stated the full eect that the number of subtasks has
line error regressions. Other variables, like number of on the time of a task. This may be because such tasks
sta and type of task, which manager does the estimat- were poorly thought through and the dependencies
ing, become signi®cant at the 5% level in these re- between the subtasks not really understood; or it could
gressions, as well as the estimated time being be that there is poor project management at the sub-
signi®cant. task level.
5. Conclusions
Acknowledgements
The results suggest that in this case study the expert
The paper was written while one of us (LT) was a
project managers all tended to overestimate the ma-
visiting professor at Edith Cowan University. We wish
jority of the tasks (around 60%), but if they underesti-
to acknowledge the University's support in funding
mated, the errors tended to be greater so the mean
this visit.
error was an underestimate of about 1%. The distri-
bution of errors was a mixture, with some tasks being
estimated completely accurately, others taking one day
less than estimated and the rest having almost a nor- References
mal distribution. One should be cautious at giving ex-
planations for this, but it could be that some tasks are [1] Kemerer CF. An empirical validation of software cost esti-
routine and there is an understanding in the organis- mation models. Commun. ACM 1987;30:416±29.
ation of the length of time they are to take, which is [2] Jeery DR, Low GC. Calibrating estimation tools for software
development. Software Engineering Journal 1990;5:215±21.
known by project managers and analysts. Hence such [3] Finnie GR, Wittig GE, Desharnais JM. A comparison of soft-
tasks tend to come in on time or a little early. The ware eort estimation techniques: using function points with
others where their development is less clear are the neural networks, case-based reasoning and regression models. J.
ones that can be approximated by a normal distri- Systems and Software 1997;39:281±9.
bution, with a slightly longer underestimating tail than [4] Lederer AL, Prasad J. Causes of inaccurate software-develop-
ment cost estimates. J. of Systems and Software 1995;31:125±34.
an overestimating one. [5] Jorgansen M. Experience with the accuracy of software main-
Looking at the estimates in more detail it seemed tenance task eort prediction models. IEEE Transactions on
that estimators made the largest underestimating aver- software engineering 1995;21:674±81.
J. Hill et al. / International Journal of Project Management 18 (2000) 13±21 21
[6] Boehm BW. Software engineering economics. Englewood Clis: L. C. Thomas is Professor of
Prentice Hall, 1981. Management Science at the University
[7] Goodman PA. Application of cost estimation techniques: indus- of Edinburgh. His undergraduate
trial perspective. Information and Software Technology degree and his D.Phil were in
1999;34:379±82. Mathemtics from the University of
[8] Shepperd M. Foundations of software measurement. Oxford. He has authored and edited
Englewood Clis, New Jersey: Prentice-Hall, 1995. seven books and over 100 papers in
[9] Kerzner H. Project Management: a systems approach to plan- the Management Science area.
ning scheduling and control. New York: van Nostrand
Reinhold, 1995.
[10] Yeates D. Systems project management. London: Pitman, 1986.
[11] Putnam LH. A general empirical solution to the macro-software
sizing and estimation problem. IEEE Trans. on Software
Engineering, 1978;4.
[12] Albrecht AJ, Ganey JE. Software function, source lines of
code and development eort prediction: a software science vali-
dation. IEEE Trans. on Software Engineering 1983;9:639±48.
[13] Symons CR. Software sizing and estimating MkII FPA.
Chichester: Wiley, 1991.
[14] DeMarco T. Controlling software projects: management,
measurement and estimation. New York: Yourdon, 1982.
[15] MacDonell SG. Comparative review of functional complexity
assessment methods for eort estimation. Software Engineering
Journal 1994;9:107±16.
[16] Mukhopadhay T, Vicinanza SS, Prietula MJ. Examining the
feasibility of a case-based reasoning model for software eort
estimation. MIS Quarterly 1992;16(2):155±71.
[17] Heemstra FJ. Software cost estimation. Information and
Software Technology 1992;34:627±39.
D. E. Allen is the foundation Professor
of Finance at Edith Cowan University,
having previously been at Curtin
J. Hill obtained a Masters degree in University, the University of Western
Business Administration from the
Australia and the University of
University of Edinburgh. He is a Edinburgh. He received a degree in
senior manager in the research and economics from the University of
development department of an inter-
St.Andrews, a M.Phil from Leicester
national ®nancial organisation, and University and his Ph.D. in ®nance
has been overseeing software develop- from the University of Western
ment projects for a number of years. Australia. His research interests
include a number of areas of business
economics and ®nance, portfolio
analysis, estimation of risk and the
statistical estimation of ®nancial and
other data.