0% found this document useful (0 votes)
105 views13 pages

Machine Learning Approaches To Estimating Software Development Effort

Uploaded by

Anita Gutierrez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views13 pages

Machine Learning Approaches To Estimating Software Development Effort

Uploaded by

Anita Gutierrez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

126 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 21, NO.

2, FEBRUARY 1995

Machine Learning Approaches to


Estimating Software Development Effort
Krishnamoorthy Srinivasan and Douglas Fisher, Member, IEEE

Abstract-Accurate estimation of software development effort of delivered source lines of code (SLOC). In contrast, many
is critical in software engineering. Underestimates lead to time methods of machine learning make no or minimal assumptions
pressures that may compromise full functional development and
about the form of the function under study (e.g., development
thorough testing of software. In contrast, overestimates can re-
sult in noncompetitive contract bids and/or over allocation of effort), but as with other approaches they depend on historical
development resources and personnel. As a result, many models data. In particular, over a known set of training data, the learn-
for estimating software development effort have been proposed. ing algorithm constructs “rules” that fit the data, and which
This article describes two methods of machine learning, which hopefully fit previously unseen data in a reasonable manner
we use to build estimators of software development effort from
as well. This article illustrates machine learning approaches
historical data. Our experiments indicate that these techniques
are competitive with traditional estimators on one dataset, but to estimating software development effort using an algorithm
also illustrate that these methods are sensitive to the data on for building regression trees [4], and a neural-network learning
which they are trained. This cautionary note applies to any approach known as BACKPROPAGATION [ 191. Our experiments,
model-construction strategy that relies on historical data. All using established case libraries [3], [ll], indicate possible
such models for software effort estimation should be evaluated
advantages of the approach relative to traditional models, but
by exploring model sensitivity on a variety of historical data.
also point to limitations that motivate continued research.
Index Terms-Software development effort, machine learning,
decision trees, regression trees, and neural networks.

11. MODELS FOR ESTIMATING


1. INTRODUCTION SOFTWARE DEVELOPMENT EFFORT

A CCURATE estimation of software development effort


has major implications for the management of software
development. If management’s estimate is too low, then the
Many models have been developed to estimate software
development effort. Many of these models are parametric, in
that they predict development effort using a formula of fixed
software development team will be under considerable pres- form that is.parameterized from historical data. In preparation
sure to finish the product quickly, and hence the resulting for later discussion we summarize three such models that were
software may not be fully functional or tested. Thus, the highlighted in a previous study by Kemerer [ 111.
product may contain residual errors that need to be corrected Putnam [16] developed an early model known as SLIM,
during a later part of the software life cycle, in which the cost which estimates the cost of software by using SLOC as the
of corrective maintenance is greater. On the other hand, if a major input. The underlying assumption of this model is that
manager’s estimate is too high, then too many resources will resource consumption, including personnel, varies with time
be committed to the project. Furthermore, if the company is and can be modeled with some degree of accuracy by the
engaged in contract software development, then too high an Rayleigh distribution:
estimate may fail to secure a contract.
The importance of software effort estimation has motivated R, = $ C-(t2/2k2j,
considerable research in recent years. Parametric models such
as COCOMO [3], FUNCTION POINTS [2], and SLIM [16] where R, is the instananeous resource consumption, t is
“calibrate” prespecified formulas for estimating development the time into the development effort, and Ic is the time at
effort from historical data. Inputs to these models may include which consumption is at its peak. The parameter Ic and other
the experience of the development team, the required reliability “management parameters” are estimated by characteristics
of the software, the programming language in which the of a particular software project, notably estimated SLOC.
software is to be written, and an estimate of the final number The general relationship between inputs such as SLOC and
management parameters can be determined from historical
data.
Manuscript received October 1992; revised October 1993 and October 1994. The Constructive Cost Model (COCOMO) was developed
Recommended by D. Wile. D. Fisher’s work was supported by NASA Ames
Grant NAG 2-834. by Boehm [3] based on a regression analysis of 63 completed
K. Srinivasan is with Personal Computer Consultants, Inc., Washington, projects. COCOMO relates the effort required to develop a
D.C. software project (in terms of person-months) to Delivered
D. Fisher is with the Department of Computer Science, Vanderbilt Univer-
sity, Nashville, Tennessee (e-mail: [email protected]). Source Instructions (DSI). Thus, like SLIM, COCOMO assumes
IEEE Log Number 9408517. SLOC as a major input. If the software project is judged to
0098%5589/95$04.000 1995 IEEE
SRINIVASAN AND FISHER: ESTIMATING SOFTWARE DEVELOPMENT EFFORT 127

be straightforward, then the basic COCOMO model (COCOMO- III. MACHINE LEARNING APPROACHES
basic) relates the nominal development effort (N) and DSI as T O ESTIMATING DEVELOPMENT EFFORT
follows: This section describes two machine learning strategies that
we use to estimate software development effort, which we
N = 3.2 x (KDSI)‘.05, assume is measured in development months (M). In many
respects this work stems from a more general methodology
where KDSI is the DSI in 1000s. However, the prediction of for developing expert systems. Traditionally, expert systems
the basic Coco~o model can be modified using cost drivers . have been developed by extracting the rules that experts
Cost drivers are classified under four major headings relating apparently use by an interview process or protocol analysis
to attributes of the product (e.g., required software reliability), (e.g., ESTOR), but an alternate approach is to allow machine
computer platform (e.g., main memory limitations), personnel learning programs to formulate rulebases from historical data.
(e.g., analyst capability), and the project (e.g., use of modem This methodology requires historical data on which to apply
programming practices). These factors serve to adjust the learning strategies.
nominal effort up or down. These cost drivers and other There are several aspects of software development effort
considerations extend the basic model to intermediate and final estimation that make it amenable to machine learning analysis.
forms. Most important, previous researchers have identified at least
The Function Point method was developed by Albrecht [2]. some of the attributes relevant to software development effort
Function points are based on characteristics of the project that estimation, and historical databasesdefined over these relevant
are at a higher descriptive level than SLOC, such as the number attributes have been accumulated. The following sections de-
of input transaction types and number of reports. A notable scribe two very different learning algorithms that we use to test
advantage of this approach is that it does not rely on SLOC, the machine learning approach. Other research using machine
which facilitates estimation early in the project life cycle learning techniques for software resource estimation are found
(i.e., during requirements definition), and by nontechnical in [5], [14], [15], [22], which we will discuss throughout the
personnel. To count function points requires that one count paper. In short, our work adds to the collection of machine
user functions and then make adjustments for processing learning techniques available to software engineers, and our
complexity. There are five types of user function that are analysis stresses the sensitivity of these approaches to the
included in the function point calculation: external input nature of historical data and other factors.
types, external output types, logical internal file types, external
interface file types, and external inquiry types. In addition,
there are 14 processing complexity characteristics such as A. Learning Decision and Regression Trees
transaction rates and online updating. A function point is Many learning approaches have been developed that con-
calculated based on the number of transactions and complexity struct decision trees for classifying data [4], [17]. Fig. 1
characteristics. The development effort estimate given the illustrates a partial decision tree over Boehm’s original 63
function point, F, is: N = 54 x F - 13390. projects from which COCOMO was developed. Each project
Recently, a case-based approach called ESTOR was devel- is described over dimensions such as AKDSI (i.e., adjusted
oped for software effort estimation. This model was developed delivered source instructions), TIME (i.e., the required system
by Vicinanza et al. [23] by obtaining protocols from a human responsetime), and STOR (i.e., main memory limitations). The
expert. From a library of casesdeveloped from expert-supplied complete set of attributes used to describe these data is given in
protocols, an instance called the source is retrieved that is most Appendix A. The mean of actual project development months
“similar” to the turget problem to be solved. labels each leaf of the tree. Predicting development effort for
The solution of the most similar problem retrieved from the a project requires that one descend the decision tree along an
case library is adapted to account for differences between the appropriate path, and the leaf value along that path gives the
source problem and the target problem using rules inferred estimate of development effort of the new project. The decision
from analysis of the human expert’s protocols. An example of tree in Fig. 1 is referred to as a regression tree, because the
an adjustment rule is: intent of categorization is to generate a prediction along a
IF staff size of Source project is continuous dependent dimension (here, software development
small, AND effort).
staff size of Target is large There are many automatic methods for constructing decision
THEN increase effort estimate of Target and regression trees from data, but these techniques are typi-
by 20%. cally variations on one simple strategy. A “top-down” strategy
Vicinanza et al., have shown that E STOR performs better examines the data and selects an attribute that best divides
than COCOMO and FUNCTION POINTS on restricted samples the data into disjoint subpopulations. The most important
of problems. aspect of decision and regression tree learners is the criterion
In sum, there have been a variety of models developed for used to select a “divisive” attribute during tree construction.
estimating development effort. W ith the exception of ESTOR In one variation the system selects the attribute with values
these are parametric approaches that assume that an initial that maximally reduce the mean squared error (MSE) of
estimate can be provided by a formula that has been fit to the dependent dimension (e.g., software development effort)
historical data. observed in the training data. The MSE of any set, S,
128 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 21, NO. 2, FEBRUARY 1995

<2&O
(MEAN = 57.3) [31] FUNCTION CARTX (Instances)
TKDSI IF termination-condition(Instances)
-c (MEAX = 299.2) [I91 THEN RETURN mean among Instances
>26.0
51.415 ELSE Set Best-Attribute to most informative attribute among
the Instances.
TIME

>I.415 RETURN Best-Attribute


51.0
(MEAN = 367.5) [Z]
1 VEX
-c
>l.O (MEAS = 1069) (21 v,2 of v2 of .o f
Best-Attribute Best-Attribute Best-Attribute
(MEAN = 702) [l]

BUS I I I
CARTX({III is m Instance CARTX( {I(1 with Vs}) CARTX({III with V,})
(MEAN = 243) [l]
with value V,))
51.03
5274.0
Fig. 2. Decision/regression-tree learning algorithm.
STOR r

L -iI 2-partitions only. Similarly, techniques that 2-partition all


>1.03 60.925
(MEAN = 1600) (11

AAF
attribute domains, for both continuous and nominally-valued
(MEAN = 2250) [2]
(i.e., finite, unordered) attributes, have been explored (e.g.,
>0.925
(MEAN = 3836) [2] [24]). For continuous attributes this bisection process operates
as we have just described, but for a nominally-valued attribute
all ways to group values of the attribute into two disjoint sets
L (MEAN = 9000) [2] are considered. Suffice it to say that treating all attributes
Fig. 1. A regression tree over Boehm’s 63 software project descriptions. as though they had the same number of values (e.g., 2) for
Numbers in square brackets represent the number of projects classified under purposes of attribute selection mitigates certain biases that are
a node.
present in some attribute selection measures (e.g., AMSE).
As we will note again in Section IV, we ensure that all
attributes are either continuous or binary-valued at the outset
of training examples taking on values yk in the continuous of regression-tree construction.
dependent dimension is: The basic regression-tree learning algorithm is summarized
in Fig. 2. The data set is first tested to see whether tree con-
c (Yk - %I2 struction is worthwhile; if all the data are classified identically
MSE(S) = ICES or some other statistically-based criterion is satisfied, then
ISI expansion ceases. In this case, the algorithm simply returns
where g is the mean of the yk values exhibited in S. The a leaf labeled by the mean value of the dependent dimension
values of each attribute, A; partition the entire training data found in the training data. If the data are not sufficiently distin-
set, T, into subsets, T;j, where every example in Tij takes on guished, then the best divisive attribute according to AMSE
the same value, say V, for attribute A;. The attribute, A;, that is selected, the attribute’s values are used to partition the data
maximizes the difference: into subsets, and the procedure is recursively called on these
subsets to expand the tree. W h e n used to construct predictors
AMSE = MSE(T) - c MSE(Tij)
along continuous dimensions, this general procedure is referred
to as recursive-partitioning regression. Our experiments use a
is selected to divide the tree. Intuitively, the attribute that partial reimplementation of a system known as CART [4]. W e
minimizes the error over the dependent dimension is used. refer to our reimplementation as CARTX.
While MSE values are computed over the training data, the Previously, Porter and Selby [14], [15], [22], have in-
inductive assumption is that selected attributes will similarly vestigated the use of decision-tree induction for estimating
reduce error over future cases as well. development effort and other resource-related dimensions.
This basic procedure of attribute selection is easily extended Their work assumes that if predictions over a continuous
to allow continuously-valued attributes: all ordered 2-partitions dependent dimension are required, then the continuous dimen-
of the observed values in the training data are examined. sion is “discretized” by breaking it into mutually-exclusive
In essence, the dimension is split around each observed ranges. More commonly used decision-tree induction algo-
value. The effect is to 2-partition the dimension in Ic - 1 rithms, which assume discrete-valued dependent dimensions,
alternate ways (where /? is the number of observed values), are then applied to the appropriately classified data. In many
and the binary “split” that is best according to AMSE cases this preprocessing of a continuous dependent dimension
is considered along with other possible attributes to divide may be profitable, though regression-tree induction demon-
a regression-tree node. Such “splitting” is common in the strates that the general tree-construction approach can be
tree of Fig. 1; see AKDSI, for example. Approaches have adapted for direct manipulation of a continuous dependent
also been developed that split a continuous dimension into dimension. This is also the case with the learning approach
more than two ranges [9], [15], though we will assume that we describe next.
SRINIVASAN AND FISHER: ESTIMATING SOFTWARE DEVELOPMENT EFFORT 129

Input Layer
AKDSI A

f(X)

X
Fig. 4. An example of function approximation by a regression tree.

Fig. 3. A network architecture for software development effort estimation. comparisons are made between a network’s actual output
pattern and an a priori known correct output pattern. The
B. A Neural Network Approach to Learning difference or error between each output line and its correct
A learning approach that is very different from that outlined corresponding value is “backpropagated” through the net and
above is BACKPROPAGATION,which operates on a network guides the modification of weights in a manner that will
of simple processing elements as illustrated in Fig. 3. This tend to reduce the collective error between actual and correct
basic architecture is inspired by biological nerve nets, and outputs on training patterns. This procedure has been shown
is thus called an artificial neural network. Each line between to converge on accurate mappings between input and output
processing elements has a corresponding and distinct weight. patterns in a variety of domains [21], [25].
Each processing unit in this network computes a nonlinear
function of its inputs and passes the resultant value along as C. Approximating Arbitrary Functions
its output. The favored function is In trying to approximate an arbitrary function like de-
1 velopment effort, regression trees approximate a function
r / \l with a “staircase” function. Fig. 4 illustrates a function of
one continuous, independent variable. A regression tree de-
composes this function’s domain so that the mean at each
leaf reflects the function’s range within a local region. The
where C; W iIi is a weighted sum of the inputs, Ii, to a
“hidden” processing elements that reside between the input and
processing element [ 191, [25].
The network generates output by propagating the initial in- output layers of a neural network do roughly the same thing,
though the approximating function is generally smoothed. The
puts, shown on the lefthand side of Fig. 3, through subsequent
granularity of this partitioning of the function is modulated by
layers of processing elements to the final output layer. This net
the depth of a regression tree or the number of hidden units
illustrates the kind of mapping that we will use for estimating
in a network.
software development effort, with inputs corresponding to
various project attributes, and the output line corresponding Each learning approach is nonparametric, since it makes
no a priori assumptions about the form of the function being
to the estimated development effort. The inputs and output are
approximated. There are a wide variety of parametric methods
restricted to numeric values. For numerically-valued attributes
this mapping is natural, but for nominal data such as LANG for function approximation such as regression methods of
statistics and polynomial interpolation methods of numerical
(implementation language), a numeric representation must be
found. In this domain, each value of a nominal attribute analysis [lo]. Other nonparametric methods include genetic
algorithms [7] and nearest neighbor approaches [1], though
is given its own input line. If the value is present in an
we will not elaborate on any of these alternatives here.
observation then the input line is set to 1.0, and if the value
is absent then it is set to 0.0. Thus, for a given observation
the input line corresponding to an observed nominal value D. Sensitivity to Conjiguration Choices
(e.g., COB) will be 1.0, and the others (e.g., FTN) will be 0.0. Both BACKPROPAGATION and CARTX require that the analyst
Our application requires only one network output, but other make certain decisions about algorithm implementation. For
applications may require more than one. example, BACKPROPAGATION can be used to train networks
Details of the BACKPROPAGATION learning procedure are with differing numbers of hidden units. Too few hidden units
beyond the scope of this article, but intuitively the goal of can compromise the ability of the network to approximate a
learning is to train the network to generate appropriate output desired function. In contrast, too many hidden units can lead
patterns for corresponding input patterns. To accomplish this, to “overfitting,” whereby the learning system fits the “nois?’
130 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 21, NO. 2, FEBRUARY 1995

present in the training data, as well as the meaningful trends linear relationship and those close to 0.0 suggest no such
that we would like to capture. BACKPROPAGATION is also typi- relationship. Our experiments will characterize the abilities of
cally trained by iterating through the training data many times. BACKPROPAGATION and CARTX using the same dimensions as
In general, the greater the number of iterations, the greater the Kemerer: MRE and R2.
reduction in error over the training sample, though there is no As we noted, each system imposes certain constraints on the
general guarantee of this. Finally, BACKPROPAGATION assumes representation of data. There are a number of nominally-valued
that weights in the neural network are initialized to small, attributes in the project databases, including implementation
random values prior to training. The initial random weight language. BACKPROPAGATION requires that each value of such
settings can also impact learning success, though in many an attribute was treated as a binary-valued attribute that was
applications this is not a significant factor. There are other either present (1) or absent (0) in each project. Thus, each
parameters that can effect BACKPROPAGATION ‘s performance, value of a nominal attribute corresponded to a unique input
but we will not explore these here. to the neural network as noted in Section III-B. W e represent
In CARTX, the primary dimension under control by the each nominal attribute as a set of binary-valued attributes for
experimenter is the depth to which the regression tree is CARTX as well. As we noted in Section III-A this mitigates
allowed to grow. Growth to too great a deptlr can lead to certain biases in attribute selection measures such as AMSE.
overfitting, and too little growth can lead to underfitting. In contrast, each continuous attribute identified by Boehm
Experimental results of Section IV-B illustrate the sensitivity corresponded to one input to the neural network. There was
of each learning system to certain configuration choices. one output unit, which reflected a prediction of development
effort and was also continuous. Preprocessing for the neural
IV. OVERVIEWOF EXPERIMENTALSTUDIES network normalized these values between 0.0 and 1.0. A
simple scheme was used where each value was divided by
W e conducted several experiments with CARTX and the maximum of the values for that attribute in the training
BACKPROPAGATIONfor the task of estimating software data. It has been shown empirically that neural networks
development effort. In general, each of our experiments converge relatively quickly if all the values for the attributes
partitions historical data into samples used to train our learning are between zero and one [12]. No such normalization was
systems, and disjoint samples used to test the accuracy of the done for CARTX, since it would have no effect on CARTX’S
trained classifier in predicting development effort. performance.
For purposes of comparison, we refer to previous exper-
imental results by Kemerer [ 1 I]. He conducted comparative
analyses between SLIM, COCOMO,and FUNCTION POINTS on A. Experiment I: Comparison with Kemerer’s Results
a database of 15 projects.’ These projects consist mainly of
Our first experiment compares the performance of machine
business applications with a dominant proportion of them
learning algorithms with standard models of software devel-
(12115) written in the COBOL language. In contrast, the
opment estimation using Kemerer’s data as a test sample. To
Coco~o database includes instances of business, scientific,
test CARTX and BACKPROPAGATION, we trained each system
and system software projects, written in a variety of languages
on COCOMO’Sdatabaseof 63 projects and tested on Kemerer’s
including COBOL, PLl, HMI, and FORTRAN. For compar-
15 projects. For BACKPROPAGATION we initially configured the
isons involving COCOMO,Kemerer coded his 15 projects using
network with 33 input units, 10 hidden units, and 1 output
the same attributes used by Boehm.
unit, and required that the training set error reach 0.00001 or
One way that Kemerer characterized the fit between the pre-
continue for a maximum of 12 000 presentations of the training
dicted (A&) and actual (A4& development person-months
data. Training ceased after 12000 presentations without con-
was by the magnitude of relative error (MRE):
verging to the required error criterion. The-experiment was
MRE = Mest - Mact . done on an AT&T PC 386 under DOS. It required about 6-7
M act hours for 12000 presentations of the training patterns. W e
actually repeated this experiment 10 times, though we only
This measure normalizes the difference between actual and
report the results of one run here; we summarize the complete
predicted development months, and supplies an analyst with a
set of experiments in Section IV-B.
measure of the reliability of estimates by different models. In our initial configuration of CARTX, we allowed the
However, when using a model developed at one site for
regression tree to grow to a “maximum” depth, where each
estimation at another site, there may be local factors that leaf represented a single software project description from the
are not modeled, but which nonetheless impact development
COCOMOdata. W e were motivated initially to extend the tree
effort in a systematic way. Thus, following earlier work
to singleton leaves, because the data is very sparse relative to
by Albrecht [2], Kemerer did a linear regression/correlation the number of dimensions used to describe each data point;
analysis to “calibrate” the predictions, with Mest treated as
our concern is not so much with overfitting, as it is with
the independent variable and Mact treated as the dependent
underfitting the data. Experiments with the regression tree
variable. The R2 value indicates the amount of variation in
learner were performed on a SUN 3/60 under UNIX, and
the actual values accounted for by a linear relationship with
required about a minute. The predictions obtained from the
the estimated values. R* values close to 1.0 suggest a strong learning algorithms (after training on the COCOMOdata) are
“W e thank Professor Chris Kemerer for supplying this dataset. shown in Table I with the actual person-months of Kemerer’s
SRINIVASAN AND FLSHER:ESTIMATING SOFTWARE DEVELOPMENT EFFORT 131

TABLE I TABLE II
CARTX AND BACKPROPAGATION
ESTIMATES
ONKEMERER’
DATA
S A COMPARISONOFLEARNINGANDALGORITHMICAPPROACHES.
THE
REGRESION EQUATIONS GIVE jWact AS A FUNCTION OF h&(z)
Actual CARTX I SACKPROP
MRE(%) R-Square Regress. Eq.
287.00 1893.30 86.45
CARTX 364 0.83 102.5 + 0.075~
82.50 162.03 14.14
1107.31 11400.00 1000.43
BACKPROP 70 0.80 78.13 + 0.882

FUNC. PTS. 103 0.58 -37 + 0.96s


86.90 243.00 88.37
COCOMO 610 0.70 27.7 + 0.156~
336.30 6600.00 540.42
84.00 129.17 13.16 SLIM 772 0.89 49.9 + 0.082z

23.20 129.17 45.38


130.30 243.00 78.92 case of all models, Kemerer argues that high R2 suggests that
116.00 1272.00 113.18 by “calibrating” a model’s predictions in a new environment,
72.00 129.17 15.72 the adjusted model predictions can be reliably used. Along
258.70 243.00 80.87 the R2 dimension learning methods provide significant fits to
230.70 243.00 28.65 the data. Unfortunately, a primary weakness of these learning
157.00 243.00 44.29 approaches is that their performance is sensitive to a number
246.90 243.00 39.17 of implementation decisions. Experiment 2 illustrates some of
69.90 129.17 214.71 these sensitivities.

B. Experiment 2: Sensitivity of the Learning Algorithms


15 projects. We note that some predictions of CARTX do not We have noted that each learning system assumes a number
correspond to exact person-month values of any COCOMO of important choices such as depth to which to “grow” regres-
(training set) project, even though the regression tree was sion trees, or the number of hidden units included in the neural
developed to singleton leaves. This stems from the presence network. These choices can significantly impact the success
of missing values for some attributes in Kemerer’s data. If, of learning. Experiment 2 illustrates the sensitivity of our
during classification of a test project, we encounter a decision two learning systems relative to different choices along these
node that tests an attribute with an unknown value in the test dimensions. In particular, we repeated Experiment 1 using
project, both subtrees under the decision node are explored. BACWROPAGATION with differing numbers of hidden units
In such a case, the system’s final prediction of development and using CARTX with differing constraints on regression-tree
effort is a weighted mean of the predictions stemming from growth.
each subtree. The approach is similar to that described in [ 171. Table III illustrates our results with BACKPROPAGATION.
Table II summarizes the MRE and R2 values resulting Each cell summarizes results over 10 experimental trials,
from a linear regression of Mest and Mact values for the rather than one trial, which was reported in Section IV-
two learning algorithms, and results obtained by Kemerer with A for presentation purposes. Thus, Max, and Min values
COCOMO-BASIC, FUNCTION POINTS, and SLIM.~ These results of R2 and MRE in each cell of Table III suggest the
indicate that CARTX’s and BACKPROPAGATION’s predictions sensitivity of BACWROPAGATION to initial random weight
show a strong linear relationship with the actual development settings, which were different in each of the 10 experimental
effort values for the 15 test projects3 On this dimension, the trials. The experimental results of Section IV-A reflect the
performance of the learning systems is less than SLIM’s per- “best” among the 10 trials summarized in Table III’s lo-
formance in Kemerer’s experiments, but better than the other hidden-unit column. In general, however, for 5, 10, and 15
two models. In terms of mean MRE, BACKPROPAGATIONdoes hidden units, MRE scores are still comparable or superior to
strikingly well compared to the other approaches, and CARTX’S some of the other models summarized in Table II, and mean
MRE is approximately one-half that of SLIM and COCOMO. R2 scores suggest that significant linear relationships between
In sum, Experiment 1 illustrates two points. In an absolute predicted and actual development months are often found. Poor
sense, none of the models does particularly well at estimating results obtained with no hidden units indicate the importance
software development effort, particularly along the MRE of these for accurate function approximation.
dimension, but in a relative sense both learning approaches The performance of CARTX can vary with the depth to
are competitive with traditional models examined by Kemerer which we extend the regression tree. The results of Experiment
on one dataset. In general, even though MRE is high in the 1 are repeated here, and represent the case where required
accuracy over the training data is 0%-that is, the tree is
ZResults are reported for COCOMO-BASK(i.e., without cost drivers), which
was comparable to the intermediate and detailed models on this data. In decomposed to singleton leaves. However, we experimented
addition, Kemerer actually reported x2, which is R2 adjusted for degrees of with more conservative tree expansion policies, where CARTX
freedom, and which is slightly lower than the unadjusted R’ values that we extended the tree only to the point where an error threshold
report. x2 values reported by Kemerer are 0.5 5.0.68, and 0.88 for FUNCTION (relative to the training data) is satisfied. In particular, trees
POINTS,COCOMO, and SLIM, respectively.
were grown to leaves where the mean MRE among projects
3Both the slope and R value are significant at the 99% confidence level.
The t coefficients for determining the significance of slope are 8.048 and at a leaf was less than or equal to a prespecified threshold that
7.25 for CARTX and BACKPROPAGATION, respectively. ranged from 0% to 500%. The MRE of each project at a leaf

-
132 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 21, NO. 2, FEBRUARY 1995

TABLE III TABLE V


BACKPROPAGATION RESULTS WITH VARYING NUMBERS OF HIDDEN NODES SENSITIVITY OVER 20 RANDOMIZED TRIALS
ON COMBINED COCOMO AND KEMERER’S DATA
Hidden Units

3::”

TABLE VI
SENSITIVITY OVER 20 RANDOMIZED TRIALS ON KEMERER’S DATA

31

TABLE IV
CARTX RESULTS WITH VARYING TRAINING ERROR THRESHOLDS

tree configuration. The holdout method divides the available


0% data into two sets; one set, generally the larger, is used to build
25% decision/regression trees or train networks under different
50% configurations. The second subset is then classified using each
100% alternative configuration, and the configuration yielding the
200% best results over this second subset is selected as the final
300% configuration. Better yet, a choice of configuration may rest on
400% a form of resarnpling that exploits many randomized holdout
500%
trials. Holdout could have been used in this case by dividing
the COCOMOdata, but the COCOMO dataset is very small as is.
Thus, we have satisfied ourselves with a demonstration of the
was calculated by sensitivity of each learning algorithm to certain configuration

IYMact
decisions. A more complete treatment of resampling and other

IM act
strategies for making configuration choices can be found in
Weiss and Kulikowski [24].

where % ? is the mean person-months development effort of


projects at that node. C. Experiment 3: Sensitivity to Training and Test Data
Table IV shows CARTX'S performance when we vary the re- Thus far, our results suggest that using learning algorithms
quired accuracy of the tree over the training data. Table entries to discover regularities in a historical database can facilitate
correspond to the MRE and R2 scores of the learned trees predictions on new cases. In particular, comparisons between
over the Kemerer test data. In general, there is degradation in our experimental results and those of Kemerer indicate that
performance as one tightens the requirement for regression- relatively speaking, learning system performance is compet-
tree expansion, though there are applications in which this itive with some traditional approaches on one common data
would not be the case. Importantly, other design decisions in set. However, Kemerer found that performance of algorithmic
decision and regression-tree systems, such as the manner in approaches was sensitive to the test data. For example, when
which continuous attributes are “split” and the criteria used a selected subset of 9 of the 15 cases was used to test the
to select divisive attributes, might also influence prediction models, each considerably improved along the a2 dimension.
accuracy. Selby and Porter [22] have evaluated different design By implication, performance on the other 6 projects was likely
choices along a number of dimensions on the success of poorer. W e did not repeat this experiment, but we did perform
decision-tree induction systems using NASA software project similarly-intended experiments in which the COCOMO and
descriptions as a test-bed. Their evaluation of decision trees, Kemerer data sets were combined into a single dataset of
not regression trees, limits the applicability of their findings to 78 projects; 60 projects were randomly selected for training
the evaluation reported here, but their work sets an excellent the learning algorithms and the remaining 18 projects were
example of how sensitivity to various design decisions can be used for test. Table V summarizes the results over 20 such
evaluated. randomized trials. The low average R2 should not mask the
The performance of both systems is sensitive to certain con- fact that many runs yielded strong linear relationships. For
figuration choices, though we have only examined sensitivity example, on 9 of the 20 CARTX runs, R2 was above 0.80.
relative to one or two dimensions for each system. Thus, it W e also ran 20 randomized trials in which 10 of Kemerer’s
seems important to posit some intuition about how learning cases were used to train each learning algorithm, and 5 were
systems can be configured to yield good results on new data, used for test. The results are summarized in Table VI. This
given only knowledge of performance on training data. In experiment was motivated by a study with ESTOR [23], a case-
cases where more training data is available a holdout method based approach that we summarized in Section II: an expert’s
can be used for selecting an appropriate network or regression- protocols from 10 of Kemerer’s projects were used to construct
SRINIVASAN AND FISHER: ESTIMATING SOFTWARE DEVELOPMENTEFFORT 133

a “case library” and the remaining 5 cases were used to test effort estimation suggest the promise of an automated learning
the model’s predictions; the particular cases used for test were approach to the task. Both learning techniques performed well
not reported, but ESTORoutperformed COCOMOand FUNCTION on the R2 and MRE dimensions relative to some other
POINTS on this set. approaches on the same data. Beyond this cursory summary,
W e do not know the robustness of ESTOR in the face of our experimental results and the previous literature suggest
the kind of variation experienced in our 20 randomized trials several issues that merit discussion.
(Table VI), but we might guess that rules inferred from expert
problem solving, which ideally stem from human learning A. Limitations of Learning from Historical Data
over a larger set of historical data, would render ESTOR
more robust along this dimension. However, our experiments There are well-known limitations of models constructed
and those of Kemerer with selected subsets of his 15 cases using historical data. In particular, attributes used to predict
suggest that care must be taken in evaluating the robustness software development effort can change over time and/or differ
of any model with such sparse data. In defense of Vicinanza’s between software development environments. Mohanty [ 131
et aZ.P methodology, we should note that the creation of. makes this point in comparisons between the predictions of
a case library depended on an analysis of expert protocols a wide variety of models on a single hypothetical software
and the derivation of expert-like rules for modifying the project. In particular, Mohanty surveyed approximately 15
predictions of best matching cases, thus increasing the “cost” models and methods for predicting software development ef-
of model construction to a point that precluded more complete fort. These models were used to predict software development
randomized trials. Vicinanza et al. also point out that their effort of a single hypothetical software project. Mohanty’s
study is best viewed as indicating ESTOR’S“plausibility” as a main finding was that estimated effort on this single project
good estimator, while broader claims require further study. varied significantly over models. Mohanty points out that
In addition to experiments with the combined COCOMO each model was developed and calibrated with data collected
and Kemerer data, and the Kemerer data alone, we experi- within a unique software environment. The predictions of
mented with the COCOMOdata alone for completeness. W h e n these models, in part, reflect underlying assumptions that are
experimenting with Kemerer’s data alone, our intent was to not explicitly represented in the data. For example, software
weakly explore the kind of variation faced by ESTOR. Using development sites may use different development tools. These
the COCOMOdata we have no such goal in mind. Thus, this tools are constant within a facility, and thus not representedex-
analysis uses an N-fold cross validation or a “leave-one-out” plicitly in data collected by that facility, but this environmental
methodology, which is another form of resampling. In particu- factor is not constant across facilities.
lar, if a data sample is relatively sparse,as ours is, then for each Differing environmental factors not reflected in data are
of N (i.e., 63) projects, we remove it from the sample set, train undoubtedly responsible for much of the unexplained variance
the learning system with the remaining N - 1 samples, and in our experiments. To some extent, the R2 derived from
then test on the removed project. MRE and R2 are computed linear regression is intended to provide a better measure of a
over the N tests. CARTX’S R2 value was 0.56 (144.48+0.74x, model’s “fit” to arbitrary new data than MRE in cases where
t = 8.82) and MRE was 125.2%. In this experiment we only the environment from which a model was derived is different
report results obtained with CARTX, since a fair and com- from the environment from which new data was drawn. Even
prehensive exploration of BACKPROPAGATIONacross possible so, these environmental differences may not be systematic in
network configurations is computationally expensive and of a way that is well accounted for by a linear model. In sum,
limited relevance. Suffice it to say that over the COCOMOdata great care must be taken when using a model constructed from
alone, which probably reflects a more uniform sample than the data from one environment to make predictions about data
mixed COCOMo/Kemerer data, CARTX provides a significant from another environment, Even within a site, the environment
linear fit to the data with markedly smaller MRE than its may evolve over time, thus compromising the benefits of
performance on Kemerer’s data. previously-derived models. Machine learning research has
In sum, our initial results indicating the relative merits of recently focussed on the problem of tracking the accuracy
a learning approach to software development effort estimation of a learned model over time, which triggers relearning when
must be tempered. In fact, a variety of randomized experiments experience with new data suggests that the environment has
reveal that there is considerable variation in the performance of changed [6]. However, in an application such as software
these systems as the nature of historical training data changes. development effort estimation, there are probably explicit
This variation probably stems from a number of factors. indicators that an environmental change is occurring or will
Notably, there are many projects in both the COCOMOand occur (e.g., when new development tools or quality control
Kemerer datasetsthat differ greatly in their actual development practices are implemented).
effort, but are very similar in other respects, including SLOC.
Other characteristics, which are currently unmeasured in the B. Engineering the Dejinition of Data
COCOMOscheme, are probably responsible for this variation. If environmental factors are relatively constant, then there
is little need to explicitly represent these in the description of
V. GENERAL DISCUSSION
data. However, when the environment exhibits variance along
Our experimental comparisons of CARTX and BACK- some dimension, it often becomes critical that this variance
PROPAGATIONwith traditional approaches to development be codified and included in data description. In this way,
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 21, NO. 2. FEBRUARY 1995

differences across data points can be observed and used in estimating the person-month effort required for the project
model construction. For example, Mohanty argues that the requiring 23.20 M or the project requiring 1107.31 M; the
desired quality of the finished product should be taken into projects closest to each among the remaining 14 projects are
account when estimating development effort. A comprehensive 69.90 M and 336.30 M, respectively.
survey by Scacchi [20] of previous software production studies The root of CARTX’S difficulties lies in its labeling of
leads to considerable discussion on the pros and cons of many each leaf by the mean of development months of projects
attributes for software project representation. classified at the leaf. An alternative approach that would enable
Thus, one of the major tasks is deciding upon the proper CARTX to extrapolate beyond the training data, would label
codification of factors judged to be relevant. Consider the each leaf by an equation derived through regression-e.g.,
dimension of response time requirements (i.e., TIME) which a linear regression. After classifying a project to a leaf, the
was included by Boehm in project descriptions. This attribute regression equation labeling that leaf would then be used to
was selected by CARTX during regression-tree construction. predict development effort given the object’s values along the
However, is TIME an “optimal” codification of some aspect independent variables. In addition, the criterion for selecting
of software projects that impacts development effort? Consider divisive attributes would be changed as well. To illustrate,
that strict response time requirements may motivate greater consider only two independent attributes, development team
coupling of software modules, thereby necessitating greater experience and KDSI, and the dependent variable of software
communication among developers and in general increasing development effort. CARTX would undoubtedly select KDSI,
development effort. If predictions of de&lopment effort must since lower (higher) values of KDSI tend to imply lower
be made at the time of requirements analysis, then perhaps (higher) means of development effort. In contrast, development
TIME is a realistic dimension of measurement, but better team experience might not provide as good a fit using CARTX’S
predictive models might be obtained and used given some error criterion. However, consider a CART-like system that
measure of software component coupling. divides data up by an independent variable, finds a best
In sum, when building models via machine learning or sta- fitting linear equation that predicts development effort given
tistical methods, it is rarely the case that the set of descriptive development team experience and KDSI, and assesseserror
attributes is static. Rather, in real-world success stories in- in terms of the differences between predictions using this
volving machine learning tools the set of descriptive attributes best fitting equation and actual development months. Using
evolves over time as attributes are identified as relevant or this strategy, development team experience might actually be
irrelevant, the reasons for relevance are analyzed, and addi- preferred; even though lesser (greater) experience does not
tional or replacement attributes are added in response to this imply lesser (greater) development effort, development team
analysis [8]. This “model” for using learning systems in the experience does imply subpopulations for which strong linear
real world is consistent with a long-term goal of Scacchi [20], relationships might exist between independent and dependent
which is to develop a knowledge-based “corporate memory” of variables. For example, teams with lesser experience may not
software production practices that is used for both estimating adjust as well to larger projects as do teams with greater
and controlling software development. The machine-learning experience; that is, as KDSI increases, development effort
tools that we have described, and other tools such as ESTOR, increases are larger for less experienced teams than more
might be added to the repertoire of knowledge-acquisition experienced teams. Recently, machine learning systems have
strategies that Scacchi suggests. In fact, Porter and Selby [14] been developed that have this flavor [ 181. W e have not yet
make a similar proposal by outlining the use of decision-tree experimented with these systems, but the approach appears
induction methods as tools for software development. promising.
The success of CARTX, and decision/regression-tree leam-
ers generally, may also be limited by two other processing
C. The Limitations of Selected Learning Methods characteristics. First, CARTX uses a greedy attribute selection
Despite the promising results on Kemerer’s common data- strategy-tree construction assessesthe informativeness of a
base, there are some important limitations of CARTX and single attribute at a time. This greedy. strategy might overlook
BACKPROPAGATION. W e have touched upon the sensitivity attributes that participate in more accurate regression trees,
to certain configuration choices. In addition to these prac- particularly when attributes interact in subtle ways. Second,
tical limitations, there are also some important theoretical CARTX builds one classifier over a training set of software
limitations, primarily concerning CARTX. Perhaps the most projects. This classifier is static relative to the test projects;
important of these is that CARTX cannot estimate a value any subsequent test project description will match exactly one
along a dimension (e.g., software development effort) that is conjunctive pattern, which is represented by a path in the
outside the range of values encountered in the training data. regression tree. If there is noise in the data (e.g., an error in the
Similar limitations apply to a variety of other techniques as recording of an attribute value), then the prediction stemming
well (e.g., nearest neighbor approaches of machine learning from the regression-tree path matching a particular test project
and statistics). In part, this limitation appears responsible for may be very misleading. It is possible that other conjunctive
a sizable amount of error on test data. For example, in the patterns of attribute values matching a particular test project,
experiment illustrating CARTX’S sensitivity to training data but which are not represented in the regression tree, could
using IO/5 splits of Kemerer’s projects (Section IV-C), CARTX ameliorate CARTX’S sensitivity to errorful or otherwise noisy
is doomed to being at least a factor of 3 off the mark when project descriptions.
SRINIVASAN AND FISHER: ESTIMATING SOFlWARE DEVELOPMENT EFFORT 135

The Optimized Set Reduction (OSR) strategy of Briand, VI. CONCLUDINGREMARKS


Basili, and Thomas [5] is related to the CARTX approach This article has compared the CARTX and
in several important ways, but may mitigate problems asso- BACKPROPAGATION learning methods to traditional
ciated with CARTX-OSR conducts a more extensive search approaches for software effort estimation. W e found that the
for multiple patterns that match each test observation. In learning approaches were competitive with SLIM, COCOMO,
contrast to CARTX’S construction of a single classifier that and FUNCTION POINTS as represented in a previous study
is static relative to the test projects, OSR can be viewed as by Kemerer. Nonetheless, further experiments showed the
dynamically building a different classifier for each test project. sensitivity of learning to various aspects of data selection and
The specifics of OSR are beyond the scope of this paper, but representation. Mohanty and Kemerer indicate that traditional
suffice it to say that OSR looks for multiple patterns that models are quite sensitive as well.
are statistically justified by the training project descriptions A primary advantage of learning systems is that they are
and that match a given test project. The predictions stemming adaptable and nonparametric; predictive models can be tailored
from different patterns (say, for software development effort) to the data at a particular site. Decision and regression trees
are then combined into a single, global prediction for the test are particularly well-suited to this task because they make
project. explicit the attributes (e.g., TIME) that appear relevant to
OSR was also evaluated in [5] using Kemerer’s data for the prediction task. Once implicated, a process that engineers
test, and COCOMO data as a (partial) training sample.4 The the data definition is often required to explain relevant and
authors report an average MRE of 9 4 % on Kemerer’s data. irrelevant aspects of the data, and to encode it accordingly.
However, there are important differences in experimental This process is best done locally, within a software shop,
design that make a comparison between results with OSR, where the idiosyncrasies of that environment can be factored
BACKPROPAGATION, and CARTX unreliable. In particular, when in or out. In such a setting analysts may want to investigate
OSR was used to predict software development effort for a the behavior of systems like BACKPROPAGATION, CART, and
particular Kemerer project, the COCOMOdata and the remain- related approaches [5], [14], [ 151, [22] over a range of permis-
ing 14 Kemerer projects were used as training examples. In sible configurations, thus obtaining performance that is optimal
addition, recognizing that Kemerer’s projects were selected in their environment.
from the same development environment, OSR was configured
to weight evidence stemming from these projects more heavily
than those in the Cocomo data set. The sensitivity of results
to this “weighting factor” is not described.
W e should note that the experimental conditions assumed APPENDIX A
in [5] are quite reasonable from a pragmatic standpoint, DATA DESCRIPTIONS
particularly the decision to weight projects more heavily that The attributes defining the COCOMOand Kemerer databases
are drawn from the same environment as the test project. These were used to develop the COCOMO model. The following
different training assumptions simply confound comparisons is a brief description of the attributes and some of their
between experimental results, and OSR’s robustness across suspected influences on development effort. The interested
differing training and test sets is not reported. In addition, reader is referred to [3] for a detailed exposition of them.
like the work of Porter and Selby [14], [15], [22], OSR as- These attributes can be classified under four major headings.
sumes that the dependent dimension of software development They are Product Attributes; Computer Attributes; Personnel
effort is nominally-valued for purposes of learning. Thus, Attributes; and Project Attributes.
this dimension is partitioned into a number of collectively-
exhaustive and mutually-exclusive ranges prior to learning.
Neither BACKPROPAGATION nor CARTX requires this kind of A. Product Attributes
preprocessing. In any case, OSR appears unique relative to 1) Required SofhYare Reliability (RELY): This attribute
other machine learning systems in that it does not learn a measures how reliable the software should be. For example,
static classifier; rather, it combines predictions from multiple, if serious financial consequencesstem from a software fault,
dynamically-constructed patterns. Whether one is interested in then the required reliability should be high.
software development effort estimation or not, this latter facil- 2) Database Size (DATA): The size of the database to
ity appears to have merits that are worth further exploration. be used by software may effect development effort. Larger
In sum, CARTX suffers from certain theoretical limitations: databases generally suggest that more time will be required
it cannot extrapolate beyond the data on which it was trained, to develop the software product.
it uses a greedy tree expansion strategy, and the resultant 3) Product Complexity (CPLX): The application area has
classifier generatespredictions by matching a project against a a bearing on the software development effort. For example,
single conjunctive pattern of attribute values. However, there communications software will likely have greater complexity
appear to be extensions that might mitigate these problems. than software developed for payroll processing.
4) Adaptation Adjustment Factor (AAF): In many cases
software is not developed entirely from scratch. This factor
40~r choiceof usingCOCOMOdatafor trainingandKemerer’data
s for reflects the extent that previous designs are reused in the new
testwasmadeindependently
of 151. project.
136 IEEE TRANSACTIONS ON SOFIWARE ENGINEERING, VOL. 21, NO. 2, FEBRUARY 1995

B. Computer Attributes [31 B. W. Boehm, Sofrware Engineering Economics. Englewood Cliffs,


NJ: Prentice-Hall,? 1981. - -
1) Execution Time Constraint (TIME): If there are con- [41 L. Breiman, .I. Friedman, R. Olshen, and C. Stone, Clussificafion and
straints on processing time, then the development time may Regression Trees. Belmont, CA: Wadsworth International; 1984.
[51 L. Briand, V. Basili, and W. Thomas, “A pattern recognition approach
be greater. for software engineering data analysis,” IEEE Trans. Software Eng., vol.
2) Main Storage Constraint (STOR): If there are memory 18, pp. 931-942, Nov. 1992.
constraints, then the development effort will tend to be high. 161 C. Brodley and E. Rissland, “Measuring concept change,” in AAAZ
Spring Symp. Training Issues in Incremental Learning, 1993, pp. 98-107.
3) Virtual Machine Volatility (VIRT}: If the underlying [71 K. DeJong, “Learning with genetic algorithms,” Machine Learning, vol.
hardware and/or system software change frequently, then 3, pp. 121-138, 1988.
development effort will be high. @I B. Evans and D. Fisher, “Overcoming process delays with decision tree
induction,” IEEE Expert, vol. 9, pp. 60-66, Feb. 1994.
[91 U. Fayyad, “On the induction of decision trees for multiple concept
C. Personnel Attributes learning,” Doctoral dissertation, EECS Dep., Univ. of Michigan, 1991.
[lOI L. Johnson and R. Riess, Numetical Analysis. Reading, MA: Addison-
1) Analyst Capability (ACAP): If the analysts working on Wesley, 1982.
the software project are highly skilled, then the development [111 C. F. Kemerer, “An empirical validation of software cost estimation
models,” Commun. ACM, vol. 30, pp. 416-429, May 1987.
effort of the software will be less than projects with less-skilled 1121 A. Lapedes and R. Farber, “Nonlinear signal prediction using neural
analysts. networks: Prediction and system modeling,” Los Alamos National
Laboratory, 1987, Tech. Rep. LA-UR-87-2662
2) Applications Experience (AEXP): The experience of u31 S. Mohanty, “Software cost estimation: Present and future,” Soft-
project personnel influences the software development effort. ware-Practice and Experience, vol. 11, pp. 103-121, 1981.
3) Programmer Capability (PCAP): This is similar to [I41 A. Porter and R. Selby, “Empirically-guided software development using
metric-based classification trees,” IEEE Sofrware, vol. 7, pp. 46-54,
ACAP, but it applies to programmers. Mar. 1990.
4) Virtual Machine Experience (VEXP): Programmer ex- [I51 A. Porter and R. Selby, “Evaluating techniques for generating metric-
based classification trees,” J. Syst. Software, vol. 12, pp. 209-218, July
perience with the underlying hardware and the operating
1990.
system has a bearing on development effort. U61 L. H. Putnam, “A general empirical solution to the macro software
5) Language Experience (LEXP): Experience of the pro- sizing and estimating problem,” IEEE Trans. Sofiware Eng., vol. 4, pp.
345-361, 1978.
grammers with the implementation language affects the soft- ]l7] J. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo,
ware development effort. CA: Morgan Kaufmann, 1993.
6) Personnel Continuity Turnover (CONT): If the same [I81 J. R. Quinlan, “Combining instance-based and model-based learning,”
in Proc. the ZOth Znt. Machine Learning Con{, 1993, pp. 236-243.
personnel work on the project from beginning to end, then the (191 D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal
development effort will tend to be less than similar projects representations by error propagation, ” in Parallel Distributed Process-
experiencing greater personnel turnover. ing. Cambridge, MA: MIT Press, 1986.
PO1 W. Scacchi, “Understanding software productivity: Toward a
knowledge-based approach,” Znt. J. Software Eng. and Knowledge Eng.,
D: Project Attributes vol. 1, pp. 293-320, 1991.
PII T. J. Sejnowski and C. R. Rosenberg, “Parallel networks that learn to
1) Modern Programming Practices (MODP): Modem pronounce english text,” Complex Systems, vol. 1, pp. 145-168, 1987.
programming practices like structured software design WI R. Selby and A..Porter, “Learning from examples: Generation and
evaluation of decision trees for software resource analysis,” ZEEE Trans.
reduces the development effort. Software Eng., vol. 14, pp. 1743-1757, 1988.
2) Use of Sofiware Tools (TOOL): Extensive use of soft- P31 S. Vicinanza, M. J. Prietulla, and T. Mukhopadhyay, “Case-based
reasoning in software effort estimation,” in Pro;. ZlihWZnt. Con& Info.
ware tools like source-line debuggers and syntax-directed Syst., 1990, pp. 149-158.
editors reduces the software development effort. ~241 S. Weiss and C. Kulikowski, Computer Systems that Learn. San
3) Required Development Schedule (SCED): If the devel- Mateo, CA: Morgan Kaufmann, 199 1.
opment schedule of the software project is highly constrained, WI J. Zaruda, Introduction to Artificial Neural Networks. St. Paul, MN:
West, 1992.
then the development effort will tend to be high.
Apart from the attributes mentioned above, other attributes
that influence the development are: programming language,
and the estimated lines of code (unadjusted and adjusted for
the use of existing software).

Krishnamoorthy Srinivasan, received the M.B.A,


ACKNOWLEDGMENT in management information systems from the Owen
Graduate School of Management, Vanderbilt Uni-
The authors would like to thank the three reviewers and the versity, and the MS. in computer science from
action editor for their many useful comments. Vanderbilt University. He also received the Post
Graduate Diploma in industrial engineering from
the National Institute for Training in Industrial En-
gineering, Bombay, India, and the B.E. from the
REFERENCES University of Madras, Madras, India.
He is currently working as a Principal Software
[ 11 D. Aha, D. Kibler, and M. Albert, “Instance-based learning algorithms,” Engineer with Personal Computer Consultants, Inc.,
Machine Learning, vol. 6, pp. 37-66, 1991. .e joining PCC. he worked as a Senior Specialist with
[2] A. Albrecht and J. Gaffney Jr., ‘Software function, source lines of code, Inc., Cambridge, MA. His primary research interests
and development effort prediction: A software science validation,” ZEEE ations of machine learning techniques to real-world
Trans. Sofhvare Eng.. vol. 9, pp. 639-648, 1983.
SRINIVASAN AND FISHER: ESTIMATING SOFTWARE DEVELOPMENT EFFORT 137

Douglas Fisher (M’92) received his Ph.D. in information and computer


science from the University of California at Irvine in 1987.
He is currently an Associate Professor in computer science at Van-
derbilt University. He is an Associate Editor of Machine Learning, and
IEEE Expert, and serves on the editorial board of the Journal of An$ciaZ
Intelligence Research. His research interests include machine learning, cog-
nitive modeling, data analysis, and cluster analysis. An electronic addendu
to this article, which reports any subsequent analysis, can be found at
(http://www.vuse.vanderbilt.edurdfisher/dfisher.html).
Dr. Fisher is a member of the ACM and AAAI.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

You might also like