Bayesian Network Model For Task Effort Estimation in Agile Software Development
Bayesian Network Model For Task Effort Estimation in Agile Software Development
Bayesian Network Model For Task Effort Estimation in Agile Software Development
a r t i c l e i n f o a b s t r a c t
Article history: Even though the use of agile methods in software development is increasing, the problem of effort esti-
Received 11 December 2015 mation remains quite a challenge, mostly due to the lack of many standard metrics to be used for effort
Revised 24 January 2017
prediction in plan-driven software development. The Bayesian network model presented in this paper
Accepted 30 January 2017
is suitable for effort prediction in any agile method. Simple and small, with inputs that can be easily
Available online 31 January 2017
gathered, the suggested model has no practical impact on agility. This model can be used as early as
Keywords: possible, during the planning stage. The structure of the proposed model is defined by the authors, while
Bayesian network the parameter estimation is automatically learned from a dataset. The data are elicited from completed
Effort prediction agile projects of a single software company. This paper describes various statistics used to assess the pre-
Agile software development cision of the model: mean magnitude of relative error, prediction at level m, accuracy (the percentage
of successfully predicted instances over the total number of instances), mean absolute error, root mean
squared error, relative absolute error and root relative squared error. The obtained results indicate very
good prediction accuracy.
© 2017 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jss.2017.01.027
0164-1212/© 2017 Elsevier Inc. All rights reserved.
110 S. Dragicevic et al. / The Journal of Systems and Software 127 (2017) 109–119
success. The success of a software project depends on project ative Error (MMRE), and the Prediction at Level m (Pred. (m)), al-
effort, cost and the quality of the final product. When the project though some authors suggest that other statistics represent more
is completed, it is easy to determine its success. What we would appropriate metrics (Foss et al., 2003; Kitchenham et al., 2001; Ko-
like to confirm is that before the final validation of a project rte and Port, 2008).
success, one can use this metric to predict the success. The remainder of this paper is structured as follows: the in-
Traditional software project prediction models are proven either vestigation of current usages of Bayesian network (BN) models for
to be unreliable, or require sophisticated metrics to be rendered re- software effort prediction is described in Section 2, BN is explained
liable (Borade and Khalkar, 2013), both representing a problem in in Section 3, conditions which the proposed BN model should meet
agile development. Many metrics used in traditional software de- are given in Section 4, and the building process is described in de-
velopment project planning simply cannot be used in agile devel- tail in Section 5. The conclusions as well as the outlines of future
opment project planning. work are presented in Section 6.
Agile development teams usually use story points to measure
the effort needed to implement a user story. Story points are 2. Related works
useful to compare technical complexity, effort and uncertainty
of different user stories, as well as to measure project velocity. Effort estimation is inherently inaccurate. There are many rea-
Sometimes, project managers assign x hours for y story points sons that cause imprecision: the lack of relevant information,
and estimate the hours required to complete the task. But this some metrics are of subjective nature, the complex interaction be-
technique is not appropriate to define effort in hours or days ab- tween metrics and the amount of effort required to gather metrics.
solutely, because story points correspond to time distribution and Bayesian network has been proposed to reduce these uncertainties,
equivalence, e.g., 1 story point = 5 h is not valid for all times, all since the BN automatically deals with the uncertainty and risk due
teams and all projects. Instead of that, a project manager can use to its own statistical nature.
velocity to estimate how many story points can be implemented Mendes et al. (2012) depict the successful use of BN mod-
in a sprint. But velocity needs three or four sprints to stabilize, els for web development effort estimation and mention previ-
and it is not suitable for use at the beginning of a project. ous papers which describe how hybrid Bayesian network models
Consequently, agile teams use a technique called ideal days/ (structure expert-driven and probabilities data-driven) have out-
hours (Cohn, 2005). The developers estimate how many days/hours performed the mean and median-based effort, multivariate regres-
are required to finish the task if they are focused exclusively on sion, case-based reasoning, and classification and regression trees.
that task, without interruptions such as by meetings, drop-ins, This is the reason why BN is used in software development
phone calls, etc. The task that requires eight ideal hours can take projects for effort estimation (Bibi and Stamelos, 2004; Mendes
two or three days because of interruptions. It can take even more et al., 2012), reliability evaluation (Si et al., 2014), quality pre-
days if a developer works on several tasks at the same time, not to diction (Jeet et al., 2011; Schulz et al., 2010), risk assessment
mention that developers usually underestimate the needed effort. (Chatzipoulidis et al., 2015; Lee et al., 2009), testing-effort esti-
Moreover, Jorgensen (2013) shows that effort estimation de- mation (Wooff et al., 2002), and so on. But only a few uses of
pends on the direction of comparison. When developers compare BN have been reported in agile software development projects. A
story A to story B, the effort estimation is different from that when Systematic Literature Review (Usman et al., 2014) confirms these
they compare story B to story A. The results are particularly in- results. The most commonly used assessment methods in agile
accurate when comparing a large user story with a much smaller software development are expert judgment, planning poker and
one. use case points, even though these methods do not result in good
Therefore, the main objective of this research is to find a tech- prediction accuracy.
nique that will facilitate the assessment of the required effort. This Hearty et al. (2009) describe the use of a Dynamic Bayesian
technique should be suitable for use even in the planning stage, Network causal model for the Extreme Programming (XP) meth-
and help project managers in further agile software development. ods. A Dynamic Bayesian Network (DBN) is a BN expanded by a
It should not affect the agility and it should be suitable for any temporal dimension, so that the changes over time can be mod-
agile method. elled. The changes of XP’s key Project Velocity metric are modelled
Typical problems of traditional effort, cost and quality predic- to make effort estimation and risk assessments. The model is vali-
tion models can be overcome by using the BN models (Fenton and dated against the real-world XP project.
Neil, 1999; Fenton et al., 2008) due to the: Abouelela and Benedicenti (2010) also use BN for modelling XP
software development process. This model consists of two mod-
• Flexibility of the BN building process (based purely on expert
els: one for the estimation of the project duration, and the other
judgment, empirical data, or the combination of both).
for the estimation of the expected defect rate. The estimations are
• Ability to reflect causal relationships.
based on the use of three XP practices: Pair Programming, Test
• Explicit incorporation of uncertainty as a probability distribu-
Driven Development and Onsite Customer. The model is validated
tion for each variable.
against two XP projects.
• Graphical representation that makes the model clear.
Perkusich et al. (2013) use a BN model for Scrum project mod-
• Ability of both, forward and backward inferences.
elling to provide information to Scrum Master for problem detec-
• Ability to run the model with missing data.
tion. It is validated with data from ten different scenarios.
It has been shown (Celar et al., 2012; Jorgensen, 2010,2014) that Nagy et al. (2010) create another BN model to assist a project
relevant empirical data can significantly increase the accuracy of manager in decision making. This BN model evaluates several key
predictions. Consequently, data from real agile projects are used factors which influence the development of software in order to
for building this BN model. The model is intended for the predic- detect problems as early as possible. The model is not validated.
tion of smaller parts of projects (project tasks) and not for their Due to a small number of papers on the use of the BN model
scheduling. For that reason, the terms “effort” and “duration” are in agile software development projects, we also examine the use of
used interchangeably in this paper. BN in iterative development. Torkar et al. (2010) depict the use of
Various statistics are used to determine the accuracy of predic- a DBN for test effort estimation. The DBN model is based on pro-
tion in software estimation. The most commonly used metrics are: cess and resource measurements. Process measurement is related
the Magnitude of Relative Error (MRE), the Mean Magnitude of Rel- to software activities such as development and support. Resource
S. Dragicevic et al. / The Journal of Systems and Software 127 (2017) 109–119 111
100% (Radlinski, 2010). A prediction accuracy of 80% (significantly where n is the number of the predictions, f(xi ) is a predicted value,
over all predictions) or more is usually satisfactory. and yi is an observed value. All the errors are weighted equally due
The Mean Magnitude of Relative Error (MMRE) and the Pre- to the linear score.
diction at Level m (Pred. (m)) are the two most commonly used RMSE is another measure of deviation between the predicted
metrics to assess the accuracy of prediction in software estimation. f(xi ) and the real value yi :
These measures are also used in this research.
1
n
But, in our research, there is only one prediction per one task. RMSE = ( f (xi ) − yi )2 .
Therefore, other statistical measures are also used to assess the ac- n
i=1
S. Dragicevic et al. / The Journal of Systems and Software 127 (2017) 109–119 113
Large errors are weighted more heavily because the errors are
averaged after they are squared. The error variance can be detected
if RMSE and MAE are used together. The variation is greater if the
difference between them is larger. If all the errors have the same
magnitude, MAE = RMSE, otherwise, RMSE > MAE.
Both MAE and RMSE are useful for the comparison of the pre-
diction errors of different models for a particular variable and not
for the comparison between variables. They show errors in the
same unit and scale as the parameter itself, so they are scale-
dependent.
We also include measures that can be used for the comparison
of models whose errors are measured in different units. Such mea-
sures include:
• Relative Absolute Error (RAE):
n
n
RAE = ( | f ( xi ) − yi | ) (|ȳi − yi | ).
i=1 i=1
This step focuses on gathering the criteria specific for soft- Table 1
Nodes description.
ware effort estimation from the existing literature (Abouelela and
Benedicenti, 2010; Hearty et al., 2009; Mendes et al., 2012; Misirli Node name Description
and Bener, 2014; Nagy et al., 2010; Perkusich et al., 2013; Radlinski, Form Low_No Number of simple user interfaces (forms)
2010; Torkar et al., 2010; Usman et al., 2014). Form Medium_No Number of moderate complexity user
The survey of BN models for software effort prediction interfaces (forms)
(Radlinski, 2010) shows that the criteria can be grouped in four Form High_No Number of complex user interfaces (forms)
Function Low_No Number of simple functions
categories based on the measured characteristics:
Function Medium _No Number of moderate complexity functions
Function High _No Number of complex functions
• Project scope – determined by the use of various measures, e.g.,
Report Low_No Number of simple reports
use cases or user stories, function points, lines of codes, re- Report Medium _No Number of moderate complexity reports
quirements, etc. Report High _No Number of complex reports
• Other project characteristics – considered in relation to type, Form Complexity Total rating of user interface (form) complexity
complexity and stability of the project. Function Complexity Total rating of function complexity
Report Complexity Total rating of report complexity
• Staff factors – which include, among others, developer skills,
Specification Quality Quality of specification
experience and motivation. New Task Type Type of task (new or familiar one)
• Process characteristics – considered in relation to the organiza- Requirements Complexity Overall rating of requirements complexity
tion and the maturity of the development process. Developer Skills Overall rating of developer experience,
motivation and skills
The elements of set V (BN nodes) are selected by applying Working Hours Number of hours spent on the task
a Goal Question Metric (GQM) approach to the collected criteria Working Hours Classification Intervals of spent working hours: 0–2 h – very
simple task (56 instances)
(Basili et al., 1994; Differding et al., 1996). The GQM plan consists
2,1–10 h – simple task (70 instances)
of a goal and a set of questions and measures. The plan describes 10,1–25 h – moderate task (22 instances)
precisely why the measures are defined and how they are going to 25,1–40 h – complex task (6 instances)
be used. The asked questions help to identify information required >40 h – very complex task (6 instances)
to fulfil the goal. The measures define the data to be collected to
answer the questions. The values of nodes are defined in two steps:
The most important goal of the task effort prediction is to de-
termine the time needed for task completion. Hence, the first ele- • The first step defines the types of the selected variables and
ment of set V is defined: Working Hours. Task effort depends on the identifies the values for each variable. Although BN allows the
complexity of requirements and on the developer skills (including use of both discrete and continuous variables, in this paper we
motivation and experience). So, the next two elements of V are de- use discrete values, because the experimental data are discrete,
fined: Requirements Complexity and Developer Skills. The effort of and because the available BN tools require the discretization of
the task depends largely on whether the programmer is familiar the continuous variables.
with this type of task or he has to use new technologies and new • The second step is extremely time-consuming. An experienced
knowledge. Thus, the next element is: a New Task Type. The com- project manager in agile development checks all the projects,
plexity of the requirements depends on the number and the com- evaluates the unstructured data and creates a database suitable
plexity of reports, user interfaces (forms) and functions that should for BN. As task evaluation is time-consuming, tasks are pro-
be created in a task, as well as on the quality of the requirements cessed in batches: first 40, then 50, and finally 70 tasks. All the
specifications. In the first iteration, set V is completed by elements: values in the newly created database are checked for rank and
Form Complexity, Report Complexity, Function Complexity and Speci- accuracy. In some cases, it is necessary to go back to the first
fication Quality. step and refine the values of the nodes.
The GQM approach ensures that all the relevant domain vari-
ables are included. The authors also checked and assured them- 5.3. DAG structure construction (Node connections)
selves that the variables were named conveniently.
The GQM approach used for the definition of the elements of
5.2. Node values set V is also used for the definition of set E. The causal relation-
ships between the nodes are built based on variables and measures
To fully define a set of tuples V (G) = {(s1 , V1 ),…, (sn , Vn )}, it is selected by using GQM. The building process includes d-separation
necessary to define si , the set of all possible values for each Vi . (d-separation dependencies are used to identify variables influ-
To define node values, the authors first checked the past data. enced by evidence coming from other variables in the BN), as well
As mentioned before, the authors have access to database of 160 as a new node definition.
tasks, but agile development software projects are famous for very For example, a variable Report Complexity represents the com-
limited documentation. The existing data are usually unstructured, plexity of the reports that should be created in a task. The variable
meaning that they are not suitable for direct use in the BN. Even can take one of three states (low, medium or high), and its value
the structured data should be ranked. For example, a variable can be estimated by the project manager. Instead of that, three
Working Hours expresses the number of hours spent on an in- new parent nodes (variables) are added: Report Low_No (number
dividual task. The range of values is too large in terms of too of simple reports), Report Medium_No (number of medium com-
many potential outcomes. To simplify possibilities, the outcome plexity reports) and Report High_No (number of complex reports).
values should be intervals instead of point values. Therefore, a new This is important because with parent nodes the state of variable
node Working Hours Classification is added, with possible outcomes Report Complexity is defined automatically, thus avoiding a situa-
ranked in five intervals (Table 1). Instead of adding a new node, the tion where one project manager estimates the same 10 simple re-
values of Working Hours can be split in five intervals. Furthermore, ports once as low, and at other times as moderately complex. Con-
the authors hope that it will be possible to refine values in more sistency significantly affects the accuracy of prediction of the BN
intervals after the model is used, and more data became available. model.
In that case, it will be easier to change the Working Hours Classifi- For the same reason, parent nodes are also added to nodes
cation only. Form Complexity (defines complexity of user interfaces) and
S. Dragicevic et al. / The Journal of Systems and Software 127 (2017) 109–119 115
Table 2
Empirical data prepared for use in the BN model (Part).
Task New Task Specification Form Form Form Function Function Function Report Report Report Developer Working
ID Type Quality Low_No Medium_No High_No Low_No Medium_No High_No Low_No Medium_No High_No Skills Hours
1 Yes 2 2 0 2 0 10 0 0 0 0 2 16,5
2 Yes 1 3 2 0 0 2 0 0 0 2 4 9
3 Yes 4 0 2 2 3 3 3 0 0 0 3 8,5
4 No 4 5 0 0 5 0 0 0 0 0 2 28
5 Yes 3 0 0 5 0 0 0 0 0 5 4 46,5
6 No 3 0 1 0 0 0 0 1 0 0 2 1
7 No 2 0 1 0 0 1 0 1 0 0 3 1,5
8 Yes 3 2 2 5 0 3 3 1 0 0 3 9
9 Yes 3 0 0 0 0 0 0 0 4 0 2 3,75
10 No 4 1 0 0 0 2 0 0 0 1 2 10
11 No 2 0 1 1 0 1 0 0 0 0 3 0,5
12 Yes 5 1 1 1 0 3 0 0 0 0 2 15,5
Function Complexity (defines complexity of function). The values ity Assessment Method (Celar et al., 2014) and then ranked in to
of both nodes can be expressed as high, medium or low. Con- one of 5 grades. The evaluation of a developer is performed once
sequently, the Form Complexity is a child node of Form Low_No or twice a year.
(number of simple forms), Form Medium_No (number of medium A new iteration of the model building process starts each time
complexity forms) and Form High_No (number of complex forms). when a new node is added. A list of all the nodes with explana-
On the other side, Function Low_No (number of simple functions), tions of their meaning is given in Table 1. The final topology is
Function Medium_No (number of medium complexity functions) shown in Fig. 4.
and Function High_No (number of complex functions) are parents
of the Function Complexity node. 5.4. Parameter estimation
The complexity of the reports as well as the complexity of the
forms and functions is defined on the basis of the elements to Conditional and a priori probabilities are learned from the data
be constructed, their number, and their comparison with historical using the WEKA2 machine learning suite.
data on similar elements (analogy). The report evaluation is also As we already mentioned, the data used in this research origi-
influenced by the database query complexity used to obtain the nate from agile projects of a small software company. These data
result. The assessment of the function complexity also depends on are not suitable for direct use in the BN model. They must be pre-
the complexity of the processing algorithm. pared, but, as the process of preparation is time-consuming, the
The specification quality is determined by a level of require- data are separated in three datasets. The first dataset consists of 40
ments decomposition: definition of technical demands and busi- tasks, the second of 50 tasks, and the third of 70 tasks. Tasks are
ness clarity.
The estimation of skills and knowledge as well as experience 2
Waikato Environment for Knowledge Analysis (WEKA) 3.6.11, http://www.cs.
and motivation of each developer is rated by the Personal Capabil- waikato.ac.nz/ml/weka/.
116 S. Dragicevic et al. / The Journal of Systems and Software 127 (2017) 109–119
Classification
veloper who performs the task. All the datasets include tasks of
25,1-40
10,1-25
10,1-25
different duration and complexity, created by different developers.
2,1-10
2,1-10
2,1-10
2,1-10
2,1-10
>40,1
0-2
0-2
0-2
A set of empirical data for the twelve tasks is shown in Table 2.
Empirical data are not available for all the nodes. Nodes Form
Complexity, Function Complexity, Report Complexity, Requirements
Hours
46,5
3,75
15,5
16,5
8,5
0,5
1,5
Complexity and Working Hours Classification are added to simplify
28
10
9
9
the possible outcomes, as well as to provide better model accu-
racy. The values of Working Hours Classification are ranked in five
Skills
non-linear intervals based on the authors’ experience. The manual
2
4
3
2
4
2
3
3
2
2
3
2
definition of NPTs can be a lengthy and error prone process. Con-
sequently, the values of these nodes are evaluated on the basis of
Low_No Medium_No High_No Complexity Low_No Medium_No High_No Complexity Low_No Medium_No High_No Complexity Complexity
empirical values of their parents. The probabilities are automati-
Medium
Medium
Medium
Medium
Medium
cally learned both for empirical and added nodes.
High
High
High
Low
Low
Low
Low
An example of a table with complete data for parameter esti-
mation in BN model is shown in Table 3. It consists of empirical
Medium
data from Table 2, completed with the data estimated by the au-
Report
High
High
High
thors.
Low
Low
Low
Low
Low
Low
Low
Low
5.5. BN model validation
Report
1
0
0
0
0
0
0
0
0
0
The model validation is performed using empirical data. WEKA
provides a k-fold cross-validation and a summary statistics (predic-
Report Report
tion accuracy, MAE, RMSE) which are used to verify the accuracy
of the generated model. The Weka error statistics are normalized.
0
0
0
0
0
0
0
0
4
0
0
0
The predicted distribution for each class is matched against the ex-
pected distribution for that class. All the mentioned Weka errors
are computed by summing over all classes of an instance, not just
1
1
1
0
0
0
0
0
0
0
0
0
a true class (WEKA, 2007a; WEKA, 2007b).
Function Function
Medium
Medium
Medium
Medium
Medium
Medium
In this case, a 10-fold cross-validation is used. The dataset is
randomly divided into 10 equally sized subsets. Of these 10 sub-
High
High
Low
Low
Low
Low
sets, one is taken as the validation dataset, and the other nine sets
are used as training data. Each of the nine training datasets is com-
pared with a validation dataset to calculate the percentage of the
3
3
0
0
0
0
0
0
0
0
0
0
model accuracy. The cross-validation process is repeated ten times,
each of the ten subsets being used exactly once as a validation set.
Function Function
1
3
2
1
3
0
0
0
0
The accuracy is above 90% for all datasets. For a dataset of 160 in-
stances, only the effort of one task is wrongly classified. The MAE
values indicate that the expected effort will be within 2.6% of the
3
5
0
0
0
0
0
0
0
0
0
0
true effort for the last set of data. Small differences between the
MAE and the RMSE values indicate that the error variance is rela-
Medium
Medium
Medium
Medium
tively small.
Form
High
High
High
Low
Low
Low
Low
Low
The Pred. (m) and the MMRE metrics are also applied to this
dataset. Since the BN model estimates effort as a set of proba-
bility distributions for all possible classes, a Conversion method
Complete data used for parameter learning (12 task example).
Form
1
1
0
0
0
0
0
n
2
2
1
1
2
1
1
0
0
0
0
0
i=1
2
3
2
0
0
0
0
1
0
where μclassi is the mean of class i, and ρ classi is its respective class
probability.
The MMRE values suggest that the prediction error is relatively
Quality
The predictions for all the sets are within 25% of the actual val-
ues. Even in the case of the stricter criterion (m = 10), 90% of the
Type
estimates for all the sets remain within a 10% tolerance. Moreover,
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
we should note that this is the result of the set with minimum
Table 3
1
2
3
4
5
6
7
8
9
Table 4 MIB PIVAC, Vrgorac, Croatia, the Croatian Science Foundation un-
Results.
der the project INSENT Innovative Smart Enterprise (1353), Zagreb,
Number of tasks 40 (first) 50 (next) 70 (last) 90 (40+50) 160 Croatia, and the Program of Technological Development, Research
Accuracy (Correctly 90% 96% 97.14% 96.67% 99.375% and Application of Innovations of Split-Dalmatia County (1012-14),
classified instances) Split, Croatia. Special thanks belong to Prof. Sanda Halas for her
MAE 0.1065 0.0533 0.0531 0.0469 0.026 language advice.
RMSE 0.199 0.1301 0.1127 0.117 0.065
RAE 35,28% 24,23% 19,89% 17,69% 9,71%
RRSE 51,27% 40,09% 31,03% 32,31 17,81% References
Pred. (25)% 100% 100% 100% 100% 100%
Pred. (10)% 90% 100% 100% 100% 100%
Abouelela, M., Benedicenti, L., 2010. Bayesian network based XP process modelling.
MMRE 12.80 3.29 4.30 7,69 6.21
Int. J. Softw. Eng. Appl. 1 (3), 1–15.
Basili, V.R., Caldiera, G., Rombach, H.D., 1994. The goal question metric approach.
In: The Encyclopedia of Software Engineering, 1. John Wiley & Sons, New York,
USA, pp. 469–476.
These good results (Table 4) make this simple model applicable Beck, K., Beedle, M., van Bennekum, A., Cockburn, A., Cunningham, W., Fowler, M.,
in practice. According to the cone of uncertainty, the results are et al., 2001. The Agile Manifesto, http://www.agileAlliance.org/.
Bibi, S., Stamelos, I., 2004. Software process modeling with bayesian belief net-
better than expected. The high accuracy of the model is confirmed
works. In: Proceedings of the 10th International Software Metrics Symposium
by comparison with the results listed in the literature (Radlinski, (Metrics 2004). Chicago, USA.
2010). Borade, J.G., Khalkar, V.R., 2013. Software project effort and cost estimation tech-
Such a good prediction accuracy is mainly based on the follow- niques. Proceedings of the International Journal of Advanced Research in Com-
puter Science and Software Engineering 3 (8), 730–739. ISSN: 2277 128X.
ing: Celar, S., Turic, M., Vickovic, L., 2014. Method for personal capability assessment. In:
Proceedings of the 22nd Telecommunications Forum Agile Teams Using Personal
• The BN model outcomes are probability distributions for only Points. Beograd, Serbia. IEEE, pp. 1134–1137.
five intervals. This decreases the prediction precision because Celar, S., Vickovic, L., Mudnic, E., 2012. Evolutionary measurement-estimation
method for micro, small and medium-sized enterprises based on estimation ob-
all the values in an interval are treated equally. For example,
jects. Adv. Prod. Eng. Manag. 7 (2), 81–92.
values 45 and 61 from the interval ‘>40 hours’ have the same Charniak, E., 1991. Bayesian networks without tears: making Bayesian networks
probabilities. more accessible to the probabilistically unsophisticated. AI Mag. 12 (4), 50–63.
• A priori and conditional probabilities are automatically attained Chatzipoulidis, A., Michalopoulos, D., Mavridis, I., 2015. Information infrastruc-
ture risk prediction through platform vulnerability analysis. J. Syst. Softw. 106,
from the experimental data, which is more reliable than elici- 28–41.
tation from scratch. The estimation accuracy of these probabil- Cohn, M., 2005. Agile Estimating and Planning, 3–4. Prentice Hall, Upper Saddle
ities has a notable effect on the outcome quality, and either a River, USA, pp. 43–47.
Differding, C., Joisl, B., Lott, C. M., 1996. Technology Package for the Goal Question
pessimistic or an optimistic approach can spoil the results. Metric Paradigm, Technical Report 281/96, University of Kaiserslautern, Ger-
• The consistency in assessment significantly increases the accu- many.
racy of the prediction. Most experimental data were not suit- Dragicevic, S., Celar, S., 2013. Method for elicitation, documentation and validation
of software user requirements (MEDoV). In: Proceedings of the 18th IEEE Inter-
able for direct application to the BN model, e.g., values that national Symposium on Computers and Communications (ISCC). Split, Croatia,.
should be ranked. Two of the authors have independently eval- Dragicevic, S., Celar, S., Novak, L., 2011. Roadmap for requirements engineering pro-
uated all the data to make sure that the values are consistently cess improvement using BPM and UML. Adv. Prod. Eng. Manag. 6 (3), 221–231.
Dragicevic, S., Celar, S., Novak, L., 2014. Use of method for elicitation, documenta-
assessed.
tion and validation of software user requirements (MEDoV) in agile methods.
In: Proceedings of 6th International Conference on Computational Intelligence,
6. Conclusions and future work Communication Systems and Networks/(CICSyN). Tetovo, Macedonia. IEEE.
Fenton, N., Hearty, P., Neil, M., Radliński, Ł., 2008. Software project and quality mod-
elling using Bayesian networks. In: Meziane, F., Vadera, S. (Eds.), Artificial In-
This paper develops a BN model for effort prediction in agile telligence Applications for Improved Software Engineering Development: New
software development projects. Prospects. Information Science Reference, New York, USA, pp. 1–25.
Fenton, N., Neil, M., 1999. A critique of software defect prediction models. IEEE
The proposed model is relatively small and simple and all the Trans. Softw. Eng. 25 (5), 675–689.
input data are easily elicited, so that the impact on agility is min- Foss, T., Stensrud, E., Kitchenham, B., Myrtveit, I., 2003. A simulation study of the
imal. The model predicts task effort, and it is independent of agile model evaluation criterion MMRE. IEEE Trans. Softw. Eng. 29 (11), 985–995.
doi:10.1109/TSE.20 03.124530 0.
methods used. It is also suitable for use in the early project phase. Hearty, P., Fenton, N., Marquez, D., Neil, M., 2009. Predicting project velocity in XP
The model is validated using a database of 160 tasks from real using a learning dynamic Bayesian network model. IEEE Trans. Softw. Eng. 35
agile projects. The prediction accuracy is measured by the percent- (1), 124–137.
Helmy, W., Kamel, A., Hegazy, O., 2012. Requirements engineering methodology in
age of correct over all predictions. The model results in very good
agile environment. IJCSI 9 (5), 293–300 No. 3, ISSN (Online): 1694–0814.
accuracy: only one misclassified value. Pred. (m = 25) equals 100% Hernandez-Orallo, J., Flach, P., Ferri, C., 2012. A unified view of performance metrics:
– all predictions are classified within 25% tolerance. The MMRE translating threshold choice into expected classification loss. J. Mach. Learn. Res.
13 (1), 2813–2869.
values show that there are no occasional large estimation errors.
Jeet, K., Bhatia, N., Minhas, R.S., 2011. A Bayesian network based approach for soft-
All the other statistical metrics used in this research support these ware defects prediction. ACM SIGSOFT Softw. Eng. Notes 36 (4), 1–5.
results. Jorgensen, M., 2014. What we do and don’t know about software development effort
This BN model is presently used in one software company, and estimation. IEEE Softw. 31 (2), 37–40.
Jørgensen, M., 2013. Relative estimation of software development effort: it matters
the project manager considers it very useful. with what and how you compare. IEEE Softw. 30 (2), 74–79. doi:10.1109/MS.
The proposed BN model is currently being expanded with a 2012.70.
new subnet regarding the existing node Developer Skills and with Jorgensen, M., 2010. Selection of strategies in judgment-based effort estimation. J.
Syst. Softw. 83, 1039–1050. doi:10.1016/j.jss.2009.12.028.
a new outcome variable/node Product Quality. We plan to use this Kan, S.H., 2002. Metrics and Models in Software Quality Engineering, 2nd ed. Ad-
model for quality prediction of software products in the early soft- dison-Wesley Longman Publishing Co., Inc., Boston, USA ISBN: 0201729156.
ware project phase. Kim, S.T., Hong, S.R., Kim, C.O., 2014. Product attribute design using an agent-based
simulation of an artificial market. Int. J. Simul. Model. 13 (3), 288–299.
Kitchenham, B.A., Pickard, L.M., MacDonell, S.G., Shepperd, M.J., 2001. What accu-
Acknowledgments racy statistics really measure. IEEE Proc. Softw. 148 (3), 81–85. doi:10.1049/
ip-sen:20010506.
Korte, M., Port, D., 2008. Confidence in software cost estimation results based on
This work has been supported in part by the PIVIS project MMRE and PRED. In: Proceedings of the 4th international workshop on Predic-
(1904-10), technological project at FESB funded by the enterprise tor models in software engineering, pp. 63–70. doi:10.1145/1370788.1370804.
118 S. Dragicevic et al. / The Journal of Systems and Software 127 (2017) 109–119
Lee, E., Park, Y., Shin, J.G., 2009. Large engineering project risk management using a Scopus, 2016b. https://www.scopus.com/results/results.uri?sort=plf-f&src=
Bayesian belief network. Expert Syst. Appl. Int. J. 36 (3), 5880–5887. s&nlo=&nlr=&nls=&sid=639877A6D7D0DCF9E076107E54EA6E4C.
McConnell, S., 2006. Software Estimation: Demystifying the Black Art. Microsoft wsnAw8kcdt7IPYLO0V48gA%3a330&sot=a&sdt=cl&cluster=scosubjabbr%
Press, WA, USA. 2c%22COMP%22%2cf%2c%22ENGI%22%2cf%2c%22MATH%22%2cf&sl=
Mendes, E., 2008. The use of Bayesian networks for web effort estimation: further 85&s=TITLE- ABS- KEY±%28±bayesian±%29±AND±%28±DOCTYPE±
investigation. In: Proceedings of the Eighth International Conference on Web %28±ar±%29±OR±DOCT YPE±%28±cp±%29±OR±DOCT YPE±%28±ip±
Engineering, Proceedings of ICWE’08, pp. 203–216. %29±%29&origin=resultslist&zone=leftSideBar&editSaveSearch=&txGid=
Mendes, E., Abu Talib, M., Counsell, S., 2012. Applying knowledge elicitation to im- 639877A6D7D0DCF9E076107E54EA6E4C.wsnAw8kcdt7IPYLO0V48gA%3a33,
prove web effort estimation: a case study. In: Proceedings of the 2012 IEEE 36th (viewed 13 December 2016).
Annual Computer Software and Applications Conference, COMPSAC ’12. Izmir, Si, G., Xu, J., Yang, J., Wen, S., 2014. An evaluation model for dependability of inter-
Turkey, pp. 461–469. net-scale software on basis of Bayesian networks and trustworthiness. J. Syst.
Misirli, A.T., Bener, A.B., 2014. A Mapping Study on Bayesian Networks for Software Softw. 89, 63–75.
Quality Prediction. RAISE’14 PROGRAM, Hyderabad, India. Sillitti, A., Succi, G., 2005. Requirements Engineering for Agile Methods Engineer-
Nagy, A., Njima, M., Mkrtchyan, L., 2010. A Bayesian based method for agile method ing and Managing Software Requirements, Part 2. Springer Berlin Heidelberg,
software development release planning and project health monitoring. In: Pro- pp. 309–326. doi:10.1007/3- 540- 28244- 0_14.
ceedings of the 2010 IEEE International Conference on Intelligent Networking Standish Group, 2009. Project Smart, http://www.projectsmart.co.uk/
and Collaborative Systems. Thessaloniki, Greece. the- curious- case- of- the- chaos- report- 2009.html.
Nawrocki, J., Jasiński, M., Walter, B., Wojciechowski, A., 2002. Extreme program- Tierno, I. A. P., 2013. Assessment of Data-driven Bayesian Networks in Software Ef-
ming modified: embrace requirements engineering practices. In: Proceedings fort Prediction, http://hdl.handle.net/10183/71952.
of IEEE Joint International Requirements Engineering Conference. IEEE CS Press, Torkar, R., Awan, N.M., Alvi, A.K., Afzal, W., 2010. Predicting software test effort in
pp. 303–310. iterative development using a dynamic Bayesian network. In: Proceedings of
Pendharkar, P.C., Subramanian, G.H., Rodger, J.A., 2005. A probabilistic model for the 21st IEEE International Symposium on Software Reliability Engineering. San
predicting software development effort. IEEE Trans. Softw. Eng. 31 (7), 615–624. Jose, USA.
Perkusich, M., Oliveira de Almeida, H., Perkusich, A., 2013. A model to detect prob- Usman, M., Mendes, E., Weidt, F., Britto, R., 2014. Effort estimation in agile software
lems on scrum-based software development projects. In: Proceedings of the development: a systematic literature review. In: Proceedings of the 10th Inter-
28th Annual ACM Symposium on Applied Computing, SAC ’13, pp. 1037–1042. national Conference on Predictive Models in Software Engineering, PROMISE ’14.
Radlinski, L., 2010. A survey of Bayesian net models for software development effort Torino, Italy, pp. 82–91.
prediction. Int. J. Softw. Eng. Comput. 2 (2) ISSN: 2229-7413. Version One, 2007. 2nd Annual Survey "The State of Agile Development care", http:
Sarwar, B., Karypis, G., Konstan, J., Riedl, J., 2001. Item-based collaborative filtering //www.versionone.com/pdf/StateOfAgileDevelopmet2_FullDataReport.pdf.
recommendation algorithms. In: Proceedings of the 10th International Confer- WEKA, 2007a. Mean Absolute Error in Classification, http://weka.8497.n7.nabble.
ence on World Wide Web WWW10. Hong Kong, pp. 285–295. com/Mean- absolute- error- in- classification- td9440.html.
Schulz, T., Radliński, Ł., Gorges, T., Rosenstiel, W., 2010. Defect Cost Flow Model WEKA, 2007b. Root Mean Squared Error Calculation, http://weka.8497.n7.nabble.
– A Bayesian Network for Predicting Defect Correction Effort. PROMISE 2010, com/root- mean- squared- error- calculation- td19651.html.
Timisoara, Romania. Williams, L., 2010. Agile software development methodologies and practices. Adv.
Scopus, 2016a. https://www.scopus.com/results/results.uri?sort=plf-f&src= Comput. 80, 1–44.
s&sid=639877A6D7D0DCF9E076107E54EA6E4C.wsnAw8kcdt7IPYLO0V48gA% Wooff, D.A., Goldstein, M., Coolen, F.P.A., 2002. Bayesian graphical models for soft-
3a210&sot=a&sdt=a&sl=85&s=TITLE- ABS- KEY±%28±bayesian±%29±AND± ware testing. IEEE Trans. Softw. Eng. 28 (5), 510–525. doi:10.1109/TSE.2002.
%28±DOCT YPE±%28±ar±%29±OR±DOCT YPE±%28±cp±%29±OR±DOCT YPE± 10 0 0453.
%28±ip±%29±%29&origin=searchadvanced&editSaveSearch=&txGid=
639877A6D7D0DCF9E076107E54EA6E4C.wsnAw8kcdt7IPYLO0V48gA%3a21,
(viewed 13 December 2016).
S. Dragicevic et al. / The Journal of Systems and Software 127 (2017) 109–119 119
Srdjana Dragicevic received her M.Sc. degree in electrical engineering from the Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia. She is currently
an Honorary Assistant at the Department of Electronics and Computing of the Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture (FESB),
University of Split, Croatia. She is enrolled in the PhD program at FESB, University of Split, Croatia. Her research interests include requirements engineering, decision support,
uncertain reasoning, business processes and project management.
Stipe Celar received his B.Sc. degree in electrical engineering from the Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture (FESB), University of
Split, Croatia and his Ph.D. degree in technical sciences from TU Wien, Austria. After years of professional work in IT companies and honorary lecturing he is currently an
Associate Professor at the Department of Electronics and Computing of FESB, University of Split, Croatia. His research interests include software engineering, software metrics
and business information systems.
Mili Turic received his B.Sc. degree in computer science from the Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture (FESB), University of Split,
Croatia. He is currently an Honorary Assistant at the Department of Electronics and Computing of FESB, University of Split, Croatia. He is enrolled in the PhD program at
FESB, University of Split, Croatia. His research interests include application of software engineering to cost estimation of software projects, and to software project planning
in general.