A Clustering Based Technique For Large Scale Prioritization During Requirements Elicitation
A Clustering Based Technique For Large Scale Prioritization During Requirements Elicitation
1 Introduction
T. Herawan et al. (eds.), Recent Advances on Soft Computing and Data Mining 623
SCDM 2014, Advances in Intelligent Systems and Computing 287,
DOI: 10.1007/978-3-319-07692-8_59, © Springer International Publishing Switzerland 2014
624 P. Achimugu, A. Selamat, and R. Ibrahim
2 Related Work
Different prioritization techniques have been proposed in the literature. According to
the research documented in [8], existing prioritization techniques are classified under
two main categories, which include: techniques that are applied to small number of
requirements (small-scale) and techniques that applied to larger number of
requirements (medium-scale or large-scale). Examples of small-scale techniques
include round-the-group prioritization, multi-voting system, pair-wise analysis,
weighted criteria analysis, and the quality function deployment approach. However,
techniques for prioritizing larger number of requirements include: MoSCoW, binary
priority list, planning game, case based rank and the wiegers's matrix approaches.
A further classification of existing prioritization techniques was provided by [9].
They similarly divided existing techniques into two main categories: (1) techniques
which enable values or weights to be assigned by project stakeholders against each
requirement to determine their relative importance and (2) methods that include
negotiation approaches in which requirements priorities result from an agreement
among subjective evaluation by different stakeholders. Examples of techniques that
apply to the first category are analytical hierarchy process (AHP), cumulative voting,
numerical assignment, planning game and wieger's method. An example of the second
category would be the win-win approach and the multi criteria preference analysis
requirement negotiation (MPARN).
The most cherished and reliable prioritization technique as reported in the
literature is the AHP technique; although it also suffers scalability problems with
increase in the number of requirements. An in-depth analysis and descriptions of
existing prioritization techniques with their limitations can be found in [10].
Nonetheless, obvious limitations that cut across existing techniques ranges from rank
reversals to scalability, inaccurate rank results, increased computational complexities
and unavailability of efficient support tools among others. However, this research
seeks to address most of these limitations with the aid of clustering algorithms.
sets (R), constructed clusters (k), and attributes (A) are relatively huge. K-means
utilizes a two-phased iterative algorithm to reduce the sum of point-to-centroid
distances, summed over all k clusters described as follows: The first phase implores
the "batch" updates to re-assign points to their nearest cluster centroid, which initiates
the re-calculation of cluster centroids. The second phase uses the "online" updates to
re-assign points so as to reduce the sum of distances which causes the re-computation
of cluster centroids after each reassignment. In this research, the former was adopted
because; the clusters are updated based on minimum distance rule. That is, for each
entity i in the data table, its distances to all centroids are calculated and the entity is
assigned to the nearest centroid. This process continues until all the clusters remain
unchanged. Before loading the datasets for the algorithm to run, there is need to
pre-process or standardized them.
K-Means is an unsupervised clustering method that is applicable to a dataset
represented by set of N to Ith entity with set of M to Vth feature. Therefore, the entity-
to-feature matrix Y will be given by (yiv), where yiv is the feature value v ∈ V at entity
i ∈ I. This process generate a partition S= {S1, S2,…, SK} of I in K non-overlapping
classes Sk, referred to as clusters. Each of these cluster have specific centroids
denoted as ck = (ckv) with an M-dimensional vector in the feature space (k=1, 2,…K).
Centroids form set C = {c1, c2,…, cK}.The criterion, minimized by this method, is the
within-cluster summary distance to the centroids. A partition clustering can be
characterized by (1) the number of clusters, (2) the cluster centroids, and (3) the
cluster contents. Thus, we used criteria based on comparing either of these
characteristics in the generated data with those in the resulting clustering while; the
centroids are calculated by finding the average of the entries within clusters.
During requirements prioritization, the project stakeholders converge to assign
weights to requirements. Before weights assignment takes place, the elicited
requirements are described to the relevant stakeholders in order to understand each
requirement and the implication of weighting one requirement over the other. Therefore,
the main aim of this research is to propose a technique of prioritizing requirements
based on the preference weights provided by the stakeholders. The metric distance
function was utilized in approximating the distances between each requirement weight.
These requirements can thus be considered as points in a K dimensional Euclidean
space. The aim of the clustering in this research work is to minimize the intra-cluster
diversity (distortion) when ranking or prioritizing large requirements.
The case presented in this paper has to do with the calculation of relative
importance of requirement sets across relevant stakeholders based on the preferential
weights of attributes contained in each set. These weights are partitioned into clusters
with the help of centroids to determine the final clusters of requirement sets based on
the Euclidean space of each attribute weight. The cluster centroids are responsible for
attracting requirements to their respective clusters based on a defined criterion.
Prioritization can therefore be achieved by finding the average weights across
attributes in all the clusters. For instance, if we have requirement sets
as R = r1 , r2 ,..., rk i = 1, , N of dimensional attributes A, defined by (a1 , a 2 , , a K )
{ }
over 5 stakeholders. Prioritization will mean computing all the relative weights
of attributes provided by stakeholders based on a weighting scale over each
requirement set. These requirement sets are partitioned into various clusters given
626 P. Achimugu, A. Selamat, and R. Ibrahim
as K = {k1 , k 2 ,, k M }. Each cluster will contain the relative weights of all the
stakeholders for a particular requirement set. The algorithm is described below:
( )
K
d = a k(i ) − a k( j )
(3)
k =1
Equation 4, which is the square root of the variance, is used to prioritize requirements.
(a )
P=
K
(i )
− a k( j )
(4)
k
k =1
10) with respect to the number of stakeholders S (Algorithm 2). The average weights
of each cluster is obtained and normalized. Given a cluster K, the smallest W(S, A) is
subtracted from the largest and the square root of the difference is obtained to reflect
the overall relative weights of each requirement set (Equations 3 and 4).
4 Experimental Setup
The experiments described in this research investigated the possibility of computing
preference weights of requirements across all stakeholders in a real-world software
project using k-means algorithm. As mentioned previously, the metrics evaluated in
this experiment are (1) the number of generated clusters, (2) the cluster centroids, and
628 P. Achimugu, A. Selamat, and R. Ibrahim
(3) the cluster contents. The RALIC datasets was used for validating the proposed
approach. The PointP, RateP and RankP aspect of the requirement datasets were used,
which consist of about 262 weighted attributes spread across 10 requirement sets from
76 stakeholders. RALIC stands for replacement access, library and ID card [12]. It
was a large-scale software project initiated to replace the existing access control
system at University College London. The datasets are available at:
http://www.cs.ucl.ac.uk/staff/S.Lim/phd/dataset.html. Attributes were ranked based
on 5-point scale; ranging from 5 (highest) to 1 (lowest). As a way of pre-processing
the datasets, attributes with missing weights were given a rating of zero [13].
For the experiment, a Gaussian Generator was developed, which computes the
mean and standard deviation of given requirement sets. It uses the Box-Muller
transform to generate relative values of each cluster based on the inputted
stakeholder’s weights. The experiment was initiated by specifying a minimum and
maximum number of clusters, and a minimum and maximum size for attributes. It
then generates a random number of attributes with random mean and variance
between the inputted parameters. Finally, it combines all the attributes into one and
computes the overall score of attributes across the number of clusters k. The
algorithms defined earlier attempt to use these combined weights of attributes in each
cluster to rank each requirement set. For the k-means algorithm to run, we filled in the
variables/observations table which has to do with the three aspect of RALIC dataset
that was utilized (PointP, RateP and RankP), followed by the specification of
clustering criterion (Determinant W) as well as the number of classes. The initial
partition was randomly executed and ran 50 times. The iteration completed 500 cycles
and the convergence rate was at 0.00001.
5 Experimental Results
The results displayed in Table 1 shows the summary statistics of 50 experimental
runs. In 10 requirement sets, the total number of attributes was 262 and the size of
each cluster varied from 1 to 50 while, the mean and standard deviation of each
cluster spanned from 1-30 and 15-30, respectively.
Variables Obs. Obs. with Obs. with Min Max Mean Std.
missing missing deviation
data data
Rate P 262 0 262 0.000 262 5.123 15.864
Point P 262 0 262 2.083 262 28.793 24.676
Rank P 262 0 262 0.000 262 1.289 16.047
*Obs. = Objects
Also, Figure 1 shows the results of running the clustering algorithm on the data set
when trying to find 10 clusters. It displays the generated 10 clusters which represent
10 sets of requirements with various numbers of weighted attributes and the within-
class variance. Figure 2 shows the statistics summary of the experimental iteration.
The error function value was within 3.5.
A Clustering Based Technique for Large Scale Prioritization 629
Analysis of multiple runs of this experiment showed exciting results as well. Using
500 trials, it was discovered that, the algorithm guessed or classified requirement sets
correctly. This is reflected in table 2, where the centroids for each variable were
computed based on the stakeholder’s weights. The sum of weights and variance for each
requirement set was also calculated. The former aided in the prioritization of requirement
sets, while the latter shows the variances existing between each requirement set.
Table 2. Class centroids
Class Rate P Point P Rank P Sum of Within-class
weights variance
1 4.604 17.347 0.276 53.00 7.302
2 4.230 7.6520 0.277 61.00 8.283
3 4.258 52.831 0.346 31.00 37.89
4 3.714 85.639 0.270 14.00 172.8
5 4.370 24.396 0.368 27.00 2.393
6 4.172 39.844 0.302 29.00 12.69
7 1.276 19.435 0.290 12.00 3.607
8 4.167 30.188 0.302 30.00 1.992
9 4.410 27.635 0.437 8.000 1.190
10 262.0 262.00 262.0 1.000 0.000
630 P. Achimugu, A. Selamat, and R. Ibrahim
6 Discussion
The aim of this research was to develop an enhanced prioritization technique based on
the limitations of existing ones. It was eventually discovered that, existing techniques
actually suffer from scalability problems, rank reversals, large disparity or
disagreement rates between ranked weights as well as unreliable results. These were
addressed at one point or the other during the course of undertaking this research. The
method utilized in this research consisted of clustering algorithm with specific focus
on k-means algorithm. Various algorithms and models were formulated in order to
enhance the viability of the proposed technique. The evaluation of the proposed
A Clustering Based Technique for Large Scale Prioritization 631
approach was executed with relevant datasets. The performance of the proposed
technique was evaluated using ANOVA. The results showed high correlation between
the mean weights which finally yielded the prioritized results. On the overall, the
proposed technique performed better with respect to the evaluation criteria described
in Section 4. It was also able to classify ranked requirements with the calculation of
maximum, minimum and mean scores. This will help software engineers determined
the most valued and least valued requirements which will aid in the planning for
software releases in order to avoid breach of contracts, trusts or agreements. Based on
the presented results, it will be appropriate to consider this research as an
improvement in the field of computational intelligence.
References
1. Perini, A., Susi, A., Avesani, P.: A machine learning approach to software requirements
prioritization. IEEE Transactions on Software Engineering 39(4), 445–461 (2013)
2. Tonella, P., Susi, A., Palma, F.: Interactive requirements prioritization using a genetic
algorithm. Information and Software Technology 55(1), 173–187 (2013)
3. Ahl, V.: An experimental comparison of five prioritization methods. Master’s Thesis,
School of Engineering, Blekinge Institute of Technology, Ronneby, Sweden (2005)
632 P. Achimugu, A. Selamat, and R. Ibrahim
4. Berander, P., Andrews, A.: Requirements prioritization. In: Engineering and Managing
Software Requirements, pp. 69–94. Springer, Heidelberg (2005)
5. Kobayashi, A., Maekawa, M.: Need-based requirements change management. In:
Proceedings of the Eighth Annual IEEE International Conference and Workshop on the
Engineering of Computer Based Systems, ECBS 2001, pp. 171–178. IEEE (2001)
6. Kassel, N.W., Malloy, B.A.: An approach to automate requirements elicitation and
specification. In: International Conference on Software Engineering and Applications
(2003)
7. Perini, A., Ricca, F., Susi, A.: Tool-supported requirements prioritization: Comparing the
AHP and CBRank methods. Information and Software Technology 51(6), 1021–1032
(2009)
8. Racheva, Z., Daneva, M., Herrmann, A., Wieringa, R.J.: A conceptual model and process
for client-driven agile requirements prioritization. In: 2010 Fourth International
Conference on Research Challenges in Information Science (RCIS), pp. 287–298. IEEE
(2010)
9. Berander, P., Khan, K.A., Lehtola, L.: Towards a research framework on requirements
prioritization. SERPS 6, 18–19 (2006)
10. Achimugu, P., Selamat, A., Ibrahim, R., Mahrin, M.N.R.: A systematic literature review of
software requirements prioritization research. Information and Software Technology
(2014)
11. Kaur, J., Gupta, S., Kundra, S.: A kmeans clustering based approach for evaluation of
success of software reuse. In: Proceedings of International Conference on Intelligent
Computational Systems, ICICS 2011 (2011)
12. Lim, S.L., Finkelstein, A.: StakeRare: using social networks and collaborative filtering for
large-scale requirements elicitation. IEEE Transactions on Software Engineering 38(3),
707–735 (2012)
13. Lim, S.L., Harman, M., Susi, A.: Using Genetic Algorithms to Search for Key
Stakeholders in Large-Scale Software Projects. In: Aligning Enterprise, System, and
Software Architectures, pp. 118–134 (2013)