Software Requirements Prioritisation Using Machine Learning
Software Requirements Prioritisation Using Machine Learning
a b c
Arooj Fatima , Anthony Fernandes, David Egan and Cristina Luca
School of Computing and Information Science, Anglia Ruskin University, Cambridge, U.K.
Abstract: Prioritisation of requirements for a software release can be a difficult and time-consuming task, especially
when the number of requested features far outweigh the capacity of the software development team and diffi-
cult decisions have to be made. The task becomes more difficult when there are multiple software product lines
supported by a software release, and yet more challenging when there are multiple business lines orthogonal
to the product lines, creating a complex set of stakeholders for the release including product line managers and
business line managers. This research focuses on software release planning and aims to use Machine Learning
models to understand the dynamics of various parameters which affect the result of software requirements
being included in a software release plan. Five Machine Learning models were implemented and their perfor-
mance evaluated in terms of accuracy, F1 score and K-Fold Cross Validation (Mean).
893
Fatima, A., Fernandes, A., Egan, D. and Luca, C.
Software Requirements Prioritisation Using Machine Learning.
DOI: 10.5220/0011796900003393
In Proceedings of the 15th International Conference on Agents and Artificial Intelligence (ICAART 2023) - Volume 3, pages 893-900
ISBN: 978-989-758-623-1; ISSN: 2184-433X
Copyright c 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
use of engineering time. SPL engineering is primar- empirical research focused on comparisons that take
ily an engineering solution to enable tailored software benefits and disadvantages into consideration. (Perini
variants and to manage software product variability, et al., 2013) differentiate the RP techniques into ba-
customisation and complexity (Grüner et al., 2020), sic ranking techniques, which typically permit priori-
(Abbas et al., 2020). From an engineering point of tisation along a single evaluation criterion, and RP
view, product line requirements are handled in the methods, which incorporate ranking techniques inside
domain engineering process, while business line re- a requirement engineering process. Relevant project
quirements are managed in the application engineer- stakeholders, such as customers, users, and system ar-
ing process. chitects conduct rank elicitation, which can be done
The problem of prioritisation must be looked at in a variety of methods. A fundamental strategy is
from the point of view of business owners and product ranking each need in a group of candidates in accor-
managers. Thus the focus is on making business de- dance with a predetermined standard (e.g., develop-
cisions rather than optimising operational efficiency, ment cost, value for the customer). A requirement’s
to establish business priorities in a complex software rank can be stated as either an absolute measure of
product environment. When planning a software re- the assessment criterion for the requirement, as stated
lease to address SPL/MBL, the challenges come from in Cumulative voting (Avesani et al., 2015), or as
the absolute number of requirements to be prioritised a relative position with regard to the other require-
as well as the complexity of the software release in ments in the collection, as in bubble sort or binary
terms of number of product lines and number of stake- search methods. A prioritising technique’s useful-
holders. When multiple product lines are included in ness depends on the kind of rank elicitation. For
a single software release, one inevitable challenge is example, pair-wise evaluation reduces cognitive ef-
scale: the number of requirements increases as more fort when there are just a few dozen criteria to be
products are included in the release. A second chal- assessed, but with a high number of needs, it be-
lenge is complexity, as product lines become depen- comes expensive (or perhaps impracticable) due to
dent on shared assets and therefore shared require- the quadratic growth in the number of pairings that
ments for those assets. There is also potential for de- must be evoked. The ranking produced by the var-
pendencies between requirements for different prod- ious methods includes requirements listed according
uct lines, which adds further to complexity. Addi- to an ordinal scale (Bubble Sort, Binary Search), re-
tional challenges arise when multiple business lines quirements enlisted as per a rational scale (Analytical
are involved in the process (Pronk, 2002). Building a Hierarchy Process (AHP), 100 Points), and as per or-
robust product line platform while also creating cus- dinal scale (groups or classes), as in the Numerical
tomer or target market specific applications (Metzger Assignment (Perini et al., 2013).
and Pohl, 2014) means satisfying a matrix of stake- The scalability of these strategies is directly linked
holders with inconsistent or even opposing views on with the proportional increase of the human effort.
priority based on their specific product line or mar- The computing complexity depends also on the num-
ket segment interest. These three challenges of scale, ber of criteria (n) to be prioritised, ranging from a lin-
complexity and inconsistency of stakeholders must be ear function in n for Numerical Assignment or Cu-
considered by any prioritisation method that is to be mulative Voting to a quadratic function for AHP. In
used with SPL/MBL. order to handle numerous priority criteria, more orga-
Simple prioritisation methods work best when nized software requirements prioritisation approaches
there are small numbers of requirements to prioritise. employ ranking mechanisms (Perini et al., 2013).
For instance, a simple pair-wise comparison (Sadiq The systematic review in (Svahnberg et al., 2010)
et al., 2021) which requires that each requirement is investigated 28 papers that dealt with strategic RP
assessed against all other requirements takes about models. 24 out of 28 models of strategic release plan-
12 hours to execute with just 40 requirements (Carl- ning were considered whereas the remaining investi-
shamre et al., 2001). More advanced prioritisation gations are concerned with validating some of the of-
and decision-making methods employ simple priori- fered models. The EVOLVE-family of release plan-
tisation methods as a foundation, for example the An- ning models makes up sixteen of these. Most tech-
alytic Hierarchy Process (Saaty, 1977) uses pair-wise niques place a heavy emphasis on strict limitations
comparison. and a small number of requirements selection vari-
The topic of requirements prioritisation contains ables. In around 58% of the models, soft variables
the analysis of the role software release planning have also been included. The studylacks a validation
plays in software development processes, suggestions on large-scale industrial projects.
for various RP strategies, and an expanding area of Machine Learning (ML) based data analysis, esti-
894
Software Requirements Prioritisation Using Machine Learning
mation and prediction techniques have grown in pop- better than others.
ularity in recent years as a result of improvements in The performance of the algorithms has been eval-
algorithms, computer power and availability of data. uated using accuracy (the percentage of correctly
Traditional methods of requirement prioritisation are classified data), speed (the amount of time needed for
a cumbersome process since there can be too many computation), comprehensibility (how difficult an al-
patterns to understand and program. Machine Learn- gorithm is to understand).
ing has been used in many areas to analyse large
datasets and identify patterns. Once it is trained to 3.1 Dataset
identify patterns in the data, it can construct an esti-
mation or a classification model. The trained model This project uses real data produced by a company
can detect, predict or recognise similar patterns or in the semiconductor business producing IoT wireless
probabilities. microchips. The data relates to the software require-
Duan et al (Duan et al., 2009) proposes partial au- ments for bi-annual software release cycles for cal-
tomation of software requirements prioritisation using endar year 2020 (20Q2 and 20Q4). The data has 283
data mining and machine learning techniques. They samples, each representing a software requirement re-
used feature set clustering using unsupervised learn- quested to be included in the software release. Each
ing and prioritised requirements mainly based on the sample has various feature values, some of which
business goals and stakeholder’s concerns. were inputs to the original software release planning
Perini et al (Perini et al., 2013) compared Case- cycle, some were outputs of that cycle and others were
Based Ranking (CBRank) requirements prioritization calculated or derived during the release planning pro-
method (combined with machine learning techniques) cess. During the original release planning cycle, these
with Analytic Hierarchy Process (AHP) and con- values were considered and discussed with stakehold-
cluded that their approach provided better results than ers before the actual software release was finalised.
AHP in terms of accuracy. A key element of the original planning process
Tonella et al (Tonella et al., 2013) proposed an In- was the use of themes to abstract and collate require-
teractive Genetic Algorithm (IGA) for requirements ments into cohesive business initiatives. This served
prioritization and compared it with Incomplete Ana- two purposes: a) reduce the number of items to be
lytic Hierarchy Process (IAHP). They used IHAP to discussed by business stakeholders; and b) provide
avoid scalability issues with AHP and concluded that business stakeholders with something that they could
IGA outperforms IAHP in terms of effectiveness, ef- comprehend.
ficiency, and robustness to the user errors. Out of three available subsets of requirements, the
A number of other researchers also explored clus- most recent and focused data was selected in an at-
tering techniques with existing prioritization meth- tempt to get the best results.
ods i.e. case-based ranking (Avesani et al., 2015)
(Qayyum and Qureshi, 2018)(Ali et al., 2021).
3.2 Exploratory Data Analysis
Most of the machine learning based techniques,
reviewed in this study, are based on some existing In the exploratory analysis, detailed information
prioritisation techniques, and partially automate the about the main characteristics of the dataset is pro-
process using different clustering methods. A require- vided. The dataset has 40+ features that were care-
ments prioritization technique that fully automates the
fully analysed. Table 1 presents a description of the
requirements prioritization process for large scale sys-
key features.
tem with sufficient accuracy is lacking.
Various statistical analyses were carried out to
evaluate feature quality and predictability in relation
to the target value. They provided us with a more
3 PROPOSED APPROACH thorough knowledge of the data.
The raw dataset had some inconsistencies in the
We have followed a simple methodology introduced data i.e. redundant features, zero values and missing
by (Kuhn and Johnson, 2013) for their research on values etc. Most of the features have multiple val-
predictive modelling. The methodology is a standard ues for each sample which require further processing.
process for most of machine learning projects. It in- With respect to both zeros and missing values, the
cludes data analysis, pre-processing of data including data is inevitably incomplete for a number of reasons,
feature selection, model selection including train/test including: the process does not insist on complete
split, fitting various models and tuning parameters, data before starting the planning cycle, secondary ver-
and evaluation to find the model which generalises sions of that field that may not be used for many re-
895
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
Table 1: Exploratory Data Analysis. impurity metric of the node (feature) subtracting the
impurity metric of any child nodes. The mean de-
Feature Description crease in the impurity of a feature across all trees
Issue Key Unique identifier for each requirement gives us the score of how important that feature is
in the Jira database. (Scornet, 2020). Table 2 presents the importance
Release Output of prioritisation process, it has ranking for the features produced by the model.
Commit- three categories i.e. Q2 (requirement
ment was included), Complete (included and Table 2: Feature Importance Score by Decisoon Tree.
completed) and any other value indicat-
ing not included. Feature Value
Estimate The total estimated time in weeks to Theme Category Divisor 0.483655
(wks) complete the task. This feature was AOP LTR$ Theme Rank 0.191668
added to the data after original priori- Cost 0.121553
tisation process. Theme Value 0.054635
(New) Stakeholder assessment of the depen- (New) MoSCoW Multiplier 0.046509
[MoSCoW] dency of the theme on this requirement: Reqs per Theme 0.034841
Must, Should, Could or Wo’nt. Estimate (wks) 0.034282
(New) Multiplier associated with MoSCoW Dependent on 0.021122
MoSCoW value. 3. Category Theme Rank 0.011735
multiplier (New) MoSCoW 2 Multiplier 0.000000
Theme Themes are categorised to indicate the
Category type of strategic or tactical initiative. Based on the feature importance results, the
Divisor The highest ranked categories have a dataset was tuned. We tested our models on full
divisor of 1, whereas the lower ranked dataset as well as on tuned dataset.
categories have higher divisors.
AOP/LTR This is a ranking for themes based on 3.4 Visual Analysis
Theme the lifetime revenue (LTR) linked to
Rank that theme. Various statistical and visual analysis methods were
Cost cost of the requirement. used to learn patterns in data and understand the rela-
tion of features to other features and the target value.
quirements. The target variable Release Commitment has cat-
egorical values which were converted to numeric data
to make two classes i.e. 1 (requirement included in
3.3 Data Pre-Processing the release) and 0 (requirement not included).
An analysis of Class Distribution (see Figure 1
A number of steps were taken to transform sample
showed that the dataset has a moderate degree of
features to make data machine processable.
imbalance. Since the degree of imbalance wasn’t
Data Transformation: The numerical features
too high and our aim was to learn patterns for both
were extracted from the main dataset, special char-
classes, we chose to train our models on the true dis-
acters from numerical data were removed and cat-
tribution.
egorical values (such as Release commitment) were
mapped to numerical values.
Missing Values: After initial transformation of
data, the next step was to handle missing values and
null values. All rows where the data was missing
or null, were reviewed carefully. The rows were re-
moved where it was not ideal to perform feature en-
gineering to fill in the missing values. Other missing
values (where the data was a numerical spread and
were suitable for feature engineering), were filled in
with the mean value of the given feature.
Calculating Feature Importance: We have used
Decision Tree classifier to learn the feature impor-
tance in our dataset. To calculate the feature impor-
tance, Decision Tree model involves computing the Figure 1: Class Distribution.
896
Software Requirements Prioritisation Using Machine Learning
The Correlation Matrix has been built to iden- Table 3: Decision Tree - full and tuned datasets.
tify how features are correlated to each other. It
can be seen from Figure 2 that the Cost and Esti- Performance Metric Score
mate are highly correlated; Theme Category Divi- full tuned
sor is heavily linked with Category Theme Rank. dataset dataset
Applicable to, (New) MoSCoW 2 multiplier and
Accuracy 0.96 0.94
Theme Value2 are heavily correlated to Applica-
F1 score 0.96 0.94
ble to2. Theme Value 2 is also heavily correlated
Precision 0.97 0.95
to the Applicable to and (New) MoSCoW 2 multi-
Recall 0.96 0.94
plier. Theme Value seems to be inversely correlated
K-fold cross validation 0.89 0.92
to AOP/LTR$ Theme rank, LTR$ Theme rank
mean
and Category Theme Rank. Based on these ob-
servations the features Issue Key, Release Commit-
Cross validation is important metric since it can
ment, First Requested Version, (New) MoSCoW 2
flag problems like selection bias and over-fitting. De-
Multiplier, Dependent on2, Applicable to2, Tacti-
spite a drop in acuracy, tuning the dataset has visible
cal Value, Applicable to, Category Theme Rank
impact on cross validation score.
were dropped while attempting the experiments using
tuned dataset for different models.
4.2 K-Nearest Neighbours (KNN)
Table 4 presents the results of KNN for full and tuned
4 EXPERIMENTS AND RESULTS datasets. The accuracy, precision, recall and F1 score
dropped after tuning the dataset however the k-fold
The goal of this research was to experiment the ap- cross validation (mean) has increased.
plication of Machine Learning models to the problem
of software requirements prioritisation, to understand Table 4: k Nearest Neighbours - full and tuned datasets.
the dynamic of various parameters included in a soft-
ware release plan and evaluate the results received. Performance Metric Score
The models considered for the experiment were put full tuned
through rigorous testing using the base line dataset dataset dataset
acquired from pre-processing techniques.
The dataset was split into 80% training and 20 % Accuracy 0.94 0.92
testing data. Experiments were done in a series of F1 score 0.94 0.92
iterations, aiming to tune the dataset and improve the Precision 0.95 0.92
results. Recall 0.94 0.92
Five different ML models have been used for this K-fold cross validation 0.80 0.82
research - Decision Tree Classifier, K-Nearest Neigh- mean
bours (KNN), Random Forest, Logic Regression and
Support Vector Machine. Five metrics have been used
to evaluate the ML models implemented: accuracy, 4.3 Random Forest
F1 score, Precision, Recall and K-Fold Cross Vali-
dation (Mean). For an overall comparison of the re- Table 5 presents the results of Random Forest perfor-
sults, we only considered accuracy, F1 score and k- mance on full and tuned datasets. The accuracy and
fold cross validation mean. All the models have been F1 score were the same after tuning the dataset how-
trained on the full as well as the tuned datasets. ever the k-fold cross validation(mean) has increased.
In this section we present the results for each im- The precision and recall score remained the same
plemented model. hence indicated that change in removal of features has
limited impact on the scores of Random Forest.
4.1 Decision Tree Classifier Random forest model generalised very well to the
data. We did some further experiments with this
Table 3 presents the results on full and tuned datasets model which are detailed in Section 5.
using decision tree model respectively. The accuracy
and F1 score seems to be dropped after tuning the
dataset however the K-Fold Cross Validation score is
improved for tuned dataset.
897
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
Table 5: Random Forest - full and tuned datasets. Table 6: Logistic Regression - full and tuned datasets.
898
Software Requirements Prioritisation Using Machine Learning
899
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
to explore and evaluate the results and derive further and challenges. In Future of Software Engineering
conclusions. Proceedings.
Montalvillo, L. and Diaz, O. (2016). Requirement-driven
evolution in software product lines: A systematic
mapping study. Journal of Systems and Software,
REFERENCES 122:110 – 143.
Perini, A., Susi, A., and Avesani, P. (2013). A machine
Abbas, M., Jongeling, R., Lindskog, C., Enoiu, E. P., Saa- learning approach to software requirements prioriti-
datmand, M., and Sundmark, D. (2020). Product line zation. IEEE Transactions on Software Engineering,
adoption in industry: An experience report from the 39(4):445–461.
railway domain. In Proceedings of the 24th ACM Con- Pohl, K., Böckle, G., and Van Der Linden, F. (2005). Soft-
ference on Systems and Software Product Line. Asso- ware product line engineering: foundations, princi-
ciation for Computing Machinery. ples, and techniques, volume 1. Springer.
Ali, S., Hafeez, Y., Hussain, S., Yang, S., and Jamal, M. Pronk, B. J. (2002). Product line introduction in a multi-
(2021). Requirement prioritization framework using business line context. International Workshop on
case-based reasoning: A mining-based approach. Ex- product Line Engineering: The Early Steps: Plan-
pert Systems, 38(8):e12770. ning, Modelling and Managing.
Ashton, K. (2009). The ’internet of things’ thing. Qayyum, S. and Qureshi, A. (2018). A survey on ma-
Avesani, P., Perini, A., Siena, A., and Susi, A. (2015). Goals chine learning based requirement prioritization tech-
at risk? machine learning at support of early assess- niques. In Proceedings of the 2018 International Con-
ment. In 2015 IEEE 23rd International Requirements ference on Computational Intelligence and Intelligent
Engineering Conference (RE), pages 252–255. Systems, pages 51–55.
Carlshamre, P., Sandahl, K., Lindvall, M., Regnell, B., and Saaty, T. L. (1977). A scaling method for priorities in hier-
Natt och Dag, J. (2001). An industrial survey of re- archical structures. Journal of Mathematical Psychol-
quirements interdependencies in software product re- ogy, 15(3):234 – 281.
lease planning. In Proceedings 5th IEEE International Sadiq, M., Sadim, M., and Parveen, A. (2021). Applying
Symposium on Requirements Engineering, pages 84– statistical approach to check the consistency of pair-
91. wise comparison matrices during software require-
Devroey, X., Perrouin, G., Cordy, M., Samih, H., Legay, ments prioritization process. International Journal
A., Schobbens, P.-Y., and Heymans, P. (2017). Sta- of System Assurance Engineering and Management,
tistical prioritization for software product line testing: pages 1–10.
an experience report. Software & Systems Modeling, Scornet, E. (2020). Trees, forests, and impurity-based vari-
16(1):153–171. able importance. arXiv preprint arXiv:2001.04295.
Duan, C., Laurent, P., Cleland-Huang, J., and Kwiatkowski, Sommerville, I. (2016). Software engineering. Boston :
C. (2009). Towards automated requirements pri- Pearson Education Ltd, 10th edition.
oritization and triage. Requirements engineering,
14(2):73–89. Svahnberg, M., Gorschek, T., Feldt, R., Torkar, R., Saleem,
S. B., and Shafique, M. U. (2010). A systematic re-
Grüner, S., Burger, A., Kantonen, T., and Rückert, J. (2020). view on strategic release planning models. Informa-
Incremental migration to software product line engi- tion and Software Technology, 52(3):237–248.
neering. In Proceedings of the 24th ACM Conference
on Systems and Software Product Line, pages 1–11. Tonella, P., Susi, A., and Palma, F. (2013). Interactive re-
quirements prioritization using a genetic algorithm.
Kuhn, M. and Johnson, K. (2013). Applied predictive mod- Information and software technology, 55(1):173–187.
eling. Springer, London.
Wiegers, K. and Beatty, J. (2013). Software Requirements.
Metzger, A. and Pohl, K. (2014). Software product line en- Microsoft Press, Redmond, Washington, 3rd edition.
gineering and variability management: Achievements
900