Tarimliq A New Internal Metric For Software Clustering Analysis
Tarimliq A New Internal Metric For Software Clustering Analysis
I. I NTRODUCTION
Lehman points out in one of his laws about software evo- Fig. 2. An obtained clustering for Fig. 1
lution that software systems progress to evolve over time [1].
Maintenance and improvement of a software system spend a
major portion of the total life-cycle cost of a software system. resulting decomposition is called software architecture (or soft-
It is estimated that a large part of the software budget in large ware structure). Different clustering algorithms for this purpose
organizations is allocated to maintain existing software systems. are presented in the literature. Figure 1 represents a graph
According to [1], approximately 90% of software costs are constructed for a software system such that the nodes of this
evolution costs. Although this rate may not be exactly right, the graph represent artifacts and the edges between the nodes
fact remains that the large percentage of software costs is spent represent the relationships between the artifacts. Figure 2 shows
on software maintenance process. The proper understanding a sample clustering for Figure 1. According to the principles of
of the software has a major impact on the maintenance and software engineering, clustering should be done in such a way
development of a software system. One of the ways that can that the relationships within the clusters are maximized and the
greatly help to the process of understanding a large application relationships between the clusters are minimized.
from its source code is to form a meaningful partition of Clustering is an unsupervised process, which means that the
its structure into smaller, more controllable subsystems [2]. user does not interfere in the process of clustering. Usually,
To fulfill this aim, clustering methods are utilized. Clustering there exist no predefined classes or examples showing whether
techniques make easy to understand a program by partitioning the results are credible. Hence, various criteria for assessing the
of it. obtained clusters are presented. These criteria are categorized
The purpose of software clustering is to overcome the into two class of external and internal criteria. In the external
complexity of a large program by replacing a set of artifacts metrics, the cluster obtained by an algorithm is compared with
(e.g. file, function, class, etc.) with a cluster, a representative a predetermined clustering. This predetermined clustering is
abstraction of all artifacts grouped within it. Consequently, created by an expert. This predetermined cluster is also called
the obtained partition is straightforward to understand. This the ground-truth architecture [3]. For example, the ground-truth
978-1-7281-1508-5/19/$31.00 2019
c IEEE
1879
Authorized licensed use limited to: Istinye Universitesi. Downloaded on February 26,2023 at 01:13:17 UTC from IEEE Xplore. Restrictions apply.
27th Iranian Conference on Electrical Engineering (ICEE2019)
architecture of the mtunis academic operating system is outlined External validity. External validity indexes are preferable when
in its documentation. However, most software does not have a ground-truth labels are available. External measures are used to
ground-truth structure, and on the other hand, it is necessary to compare the similarity of the two clustering results. The most
validate the obtained clustering. In these cases, internal criteria important external criteria in evaluation of software architecture
are used. The aim of the internal criteria is to determine how are Precision [1], Recall [1], FM [1], MoJo [5] and MoJoFM
well the clusters are separated. It is important to note that [6]. Here’s a brief overview of these criteria.
the outcome of the external metrics is more reliable than the Precision: This criteria is intersect of extracted and ground-
internal criteria in evaluating a clustering algorithm. Therefore, truth architecture divided by the extracted architecture. The
it is essential to provide an internal measure that can simulate precision belongs to the positive class (i.e. true positives and
the behavior of external metrics. false positives).
In this paper, we present an internal metric for software Recall: This criteria is intersect of extracted and ground-
cluster validity. The results of our experiments on Mozilla truth architecture divided by the ground-truth architecture (also
Firefox demonstrate that it can be a good alternative for external known as sensitivity).
criteria. Hence, it can be used to evaluate the clustering achieved F-measure: This criteria, denoted by FM, is the weighted
by the algorithms. harmonic mean of precision and recall.
The structure of this paper is organized as follows: Section MoJo: This metric gauges how “close” two different clusterings
2 addresses the related works. Section 3 presents the proposed are. It counts the minimum number of operations (move and
internal metric. Section 4 evaluates the proposed metric, and join operations) one needs to perform in order to transform one
Sect. 5 concludes the paper. partition to the other. Because this criterion does not produce
an answer at a given interval and only calculates the number
II. R ELATED W ORKS of movements, the MoJoFM was introduced.
This section is organized into two sub-sections. In the first MoJoFM: Let C1 and C2 indicate the clustering achieved
part, we examine the criteria for clustering assessment, and in by a clustering algorithm and an authoritative decomposition,
the second part, we examine the clustering algorithms. respectively. mno(C1 , C2 ) indicates the least number of move
and join operations required to attain from C1 to C2 . The
A. Clustering Validity Criteria MoJoFM measure is taken from Eq. 1. The number produced
In most cases, the number of clusters is an unknown pa- by this criterion located in the range of 0 and 100, so that the
rameter because clustering is unsupervised and the user has larger number describes the higher similarity between the two
very little knowledge about the data. Thus, the evaluation of clustering algorithms.
different clustering algorithms is an important research problem
in cluster analysis. The evaluation of clustering results can be mno(A, B)
M oJoF M (A, B) = 1 − × 100% (1)
examined from two perspectives: internal index and external max(mno(∀A, B))
index [4].
Internal Index. The purpose of an internal evaluation is to B. Clustering Algorithms
examine the achievement of clustering goals. Validation of Due to the NP-hardness of the clustering problem, most
clustering results is as difficult as clustering. The overall goal algorithms presented in this field use evolutionary methods. The
of the internal criteria is to evaluate the clusters obtained from algorithms of E-CDGM [7], Bunch [8], DAGC [9], SAHC [10],
the two aspects of Compactness and Separation [4]. Some NAHC [10], Multiple-HC [11], MCA [12], ECA [12], GA-
of the internal criteria are Homogeneity, Separation, Dunn SMCP [13], PSOMC [14], HSBRA [15], MAE [16] and BCA
Index, Davies-Bouldin, Silhouette coefficient. We have selected [17] are a number of software clustering algorithms that use
three internal criteria i.e. Homogeneity, Separation, Dunn Index evolutionary techniques to perform clustering. The following is
for comparisons. The reason for this choice is to use the a brief explanation of some of these algorithms, which are of
criteria for comparison that evaluates the clusters from different interest in the community.
perspectives. Following is a brief overview of these methods. Bunch algorithm: One of the most popular algorithms for
Cluster Homogeneity index: This index displays the pairwise software clustering is provided by Mitchell in his Ph.D thesis.
similarity of the cluster artifacts. Intracluster homogeneity is a This algorithm uses a genetic algorithm with a vector-based
concept related to the degree of similarity between artifacts in encoding for software clustering. This is a single objective
the same cluster. algorithm, and by providing a quality function called TurboMQ,
Dunn Index: If a data set contains well-separated clusters, the it creates clusters with a maximum cohesion and minimum
distances among the clusters are usually large and the diameters coupling.
of the clusters are expected to be small [3]. Therefore a higher DAGC algorithm: This algorithm is similar to the Bunch
Dunn index indicates better clustering. algorithm, with the difference that the encoding used is a
Separation: This index measures how well separated a cluster permutation-based encoding.
is from other clusters. Separation is measured by the between- Hill climbing algorithm: This algorithm uses a local search for
cluster sum of squares. clustering, and there are two different versions, called NAHC
1880
Authorized licensed use limited to: Istinye Universitesi. Downloaded on February 26,2023 at 01:13:17 UTC from IEEE Xplore. Restrictions apply.
27th Iranian Conference on Electrical Engineering (ICEE2019)
and SAHC. The difference between these two versions is in source. Of course, Z can be empty. To calculate common power,
how neighbors are searched. neighboring artifacts are identified between the two artifacts x
MCA: This algorithm uses a multi-objective function to cluster and y. Then the relationships between the artifacts x and y are
a software system. The objective used include: maximizing calculated considering common neighboring artifacts. Equation
the sum of intra-edges of all clusters, minimizing the sum of 5 shows the dissimilarity between the two artifacts x and y. If x
inter-edges of all clusters, maximizing the number of clusters, and y do not communicate together directly or indirectly, they
maximizing TurboMQ, minimizing the number of isolated have the maximum dissimilarity and the value of 1 is assigned
clusters. to them. Equation 6 calculates the total dissimilarity for cluster
ECA: The goals used in this algorithm are the same as those i. In Eq. 7, the total external dissimilarity for artifacts located
used in the MCA, with the difference that instead of the last one, in the cluster i is calculated. In Eqs. 6 and 7, the direction
the difference between the maximum and a minimum number of relationships is important for dissimilarity. In Eq. 8 and 9,
of modules in a cluster (minimizing), has been used. the Tarimliq for each clusters and an achieved clustering are
calculated, respectively.
III. P ROPOSED I NTERNAL M ETRIC
This section proposes a metric that can measure a clustering Adj(x) = y ∈ V | (x, y) ∈ E (2)
obtained by an algorithm. According to the software mainte-
nance principles, a good software system should include several power(x) = E(x, Adj(x)) (3)
understandable and independent functional clusters as possible. x∈V
To this end, the following requirements should be hold:
1) The relationship between different clusters should be power(x, y) = E(x, Z) + E(Z, y) (4)
minimized. x,y∈V
1881
Authorized licensed use limited to: Istinye Universitesi. Downloaded on February 26,2023 at 01:13:17 UTC from IEEE Xplore. Restrictions apply.
27th Iranian Conference on Electrical Engineering (ICEE2019)
!
SUHFLVLRQ
5HFDOO
)P
0RMR)0
" # #
0HDQ
%XQFK '$*& (&$ 0&$ 1$+&' 6$+&'
$OJRULWKP1DPH
algorithms. The remarkable point is that these four algorithms
" "
"
are single-objective. We also selected two multi-objective al-
gorithms namely ECA and MCA. Figures 3 and 4 represent,
respectively, the average of ten runs of these six algorithms on
Mozilla folders in terms of external and internal metrics.
One of the one-way ANOVA techniques called Duncan Fig. 8. Ranking and grouping of six algorithms in terms of MoJoFM
has been used to compare the average of different criteria
to evaluate the efficiency of algorithms [18]. This technique
$% &% ! % &'
categorizes algorithms according to different indexes according
to priorities. The algorithms are grouped and prioritized by this
+RPRJHQHLW\
6HSDUDWLRQ
'XQQ,QGH[
7DULPOLT
0HDQ
#
%XQFK '$*& (&$ 0&$ 1$+&' 6$+&'
$OJRULWKP1DPH
" # # #
Fig. 5. Ranking and grouping of six algorithms in terms of Precision Fig. 11. Ranking and grouping of six algorithms in terms of Separation
1882
Authorized licensed use limited to: Istinye Universitesi. Downloaded on February 26,2023 at 01:13:17 UTC from IEEE Xplore. Restrictions apply.
27th Iranian Conference on Electrical Engineering (ICEE2019)
V. C ONCLUSION
Due to the NP-Hardness of the clustering problem, the ex-
isting algorithms produces different clustering. Data clustering
uses internal criteria to evaluate the obtained clustering. Given
that the purpose of software clustering is different from that
of data clustering, so the internal metrics provided for data
clustering may not be able to properly gauge the clustering.
The purpose of the software clustering is to partition the
software into clusters that, in addition to maximum cohesion
and minimum coupling, should be well-understood. To this end,
1883
Authorized licensed use limited to: Istinye Universitesi. Downloaded on February 26,2023 at 01:13:17 UTC from IEEE Xplore. Restrictions apply.