Employing Clustering For Assisting Source Code Maintainability Evaluation According To ISO/IEC-9126
This document proposes a methodology that combines clustering and multi-criteria decision techniques to assist in evaluating the maintainability of software systems according to ISO/IEC-9126. It extracts source code metrics and elements, assigns them weights using analytical hierarchy process, and applies k-Attractors clustering to group systems and provide overviews. The methodology is evaluated on Apache Geronimo, an open-source application server, and results are discussed.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
54 views5 pages
Employing Clustering For Assisting Source Code Maintainability Evaluation According To ISO/IEC-9126
This document proposes a methodology that combines clustering and multi-criteria decision techniques to assist in evaluating the maintainability of software systems according to ISO/IEC-9126. It extracts source code metrics and elements, assigns them weights using analytical hierarchy process, and applies k-Attractors clustering to group systems and provide overviews. The methodology is evaluated on Apache Geronimo, an open-source application server, and results are discussed.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5
Employing Clustering for Assisting Source Code
Maintainability Evaluation according to ISO/IEC-
9126 Panagiotis Antonellis1, Dimitris Antoniou1, Yiannis Kanellopoulos1, 2, Christos Makris1, Evangelos Theodoridis1, Christos Tjortjis2, Nikos Tsirakis1 1 Universiry of Patras, Computer Engineering and Informatics Department, Greece 2 The University Of Manchester, School Of Computer Science, U.K.
Abstract • The k-Attractors clustering algorithm [4] is then applied
on the derived ISO/IEC-9126’s maintainability values, in This paper elaborates on how to use clustering for the order to provide the evaluator with a quick and rough evaluation of a software system’s maintainability according to grasp of the system. the ISO/IEC-9126 quality standard. More specifically it We attempt to evaluate the usefulness of this methodology proposes a methodology that combines clustering and by employing as test-bed, Geronimo 1.0, an open source multicriteria decision aid techniques for knowledge application server used in real life industrial applications. The acquisition by integrating groups of data from source code remaining of this paper is organized as follows: Section 2 with the expertise of a software system’s evaluators. A process reviews existing work in the area of data mining and software for the extraction of elements from source code and Analytical evaluation. Section 3 outlines the logic behind the main parts Hierarchical Processing for assigning weights to these data of the proposed methodology. Section 4 assesses the accuracy are provided; k-Attractors clustering algorithm is then of the output of the proposed framework, analyses its results applied on these data, in order to produce system overviews and outlines deductions from its application. Finally, and deductions. The methodology is evaluated on Apache conclusions and directions for future work are presented in Geronimo, a large Open Source Application Server; results Section 5. are discussed and conclusions are presented together with directions for future work 2. Background 1. Introduction Data mining [3], is the process which extracts implicit, previously unknown, and potentially useful information from Software maintenance is considered as the most difficult data, by searching large volumes of them for patterns and by stage in software lifecycle. According to the National Institute employing techniques such as classification, association rules of Standards and Technology (NIST), it costs the U.S. mining, and clustering. It is a quite complex topic and has economy $60 billion per year [12]. Given this high cost, links with multiple core fields such as computer science and maintenance processes can be considered as an area of adds value to rich seminal computational techniques from competitive advantage. There are several studies for statistics, information retrieval, machine learning and pattern evaluating a system’s maintainability and controlling the recognition. Its ability to deal with vast amounts of data has effort required to carry out maintenance activities [2], [14], been considered a suitable solution in assisting software [18]. According to ISO/IEC-9126, maintainability is the maintenance, often resulting in remarkable results [1], [7], capability of a software product to be modified. Evaluating [8]], [10], [20]. As previous studies have shown, data mining such a characteristic is a difficult process as many is capable to obtain useful knowledge about the structure of contradictory criteria must be considered in order to reach a large systems. decision. Sartipi et al. used data mining for architectural design This paper presents a methodology that facilitates the recovery [16]. They proposed a model for the evaluation of evaluation of a software product’s maintainability according the architectural design of a system based on associations to the ISO/IEC-9126 software engineering quality standard. among system components and used system modularity The intuition of this methodology is to integrate groups of measurement as an indication of design quality and its measurement data extracted from source code’s elements with decomposition into subsystems. Besides association rules, the the expertise of a system’s evaluators by providing them the clustering data mining technique has been used to support ability to define a number of attributes suitable for such software maintenance and software systems knowledge evaluation. For this reason: discovery [21], [15]. The work in [15] proposes a • Metrics are extracted from elements of system’s source methodology for grouping Java code elements together, code. according to their similarity and focuses on achieving a high • Relative weights are assigned to these metrics by level system understanding. employing the Analytical Hierarchy Process, reflecting Understanding low/medium level concepts and their importance on evaluating maintainability. relationships among components at the function, paragraph or even line of code level by mining C and COBOL legacy • At first to collect appropriate elements that systems source code was addressed in [19]. For C programs, describe the software architecture and its functions were used as entities, and attributes defined characteristics. These elements include native according to the use and types of parameters and variables, source code attributes and metrics. and the types of returned values. Then clustering was applied • Then to analyze the collected elements, choose to identify sub-sets of source code that were grouped together a refinement subset of them and store them in a according to custom-made similarity metrics [19]. An relational database system for further analysis. approach for the evaluation of dynamic clustering is presented Native attributes include Definition files, classes, Structure in [22]. The scope of this solution is to evaluate the usefulness blocks etc. Metrics, on the other hand, provide additional of providing dynamic dependencies as input to software system information and describe more effectively the clustering algorithms. Finally, Clustering over a Module system’s characteristics and behaviour. Dependency Graph (MDG) [9] uses a collection of algorithms All the metrics are associated with a native source code which facilitate the automatic recovery of the modular attribute, e.g. the lack of cohesion is associated with a class structure of a software system from its source code. The member method. All of the above collected attributes and method creates a hierarchical view of system architecture into metrics are stored into appropriate structured XML files. We subsystems, based on the components and the relationships have chosen XML because of its interoperability and its wide between components that can be detected in source code. acceptance as a de facto standard for data representation and Recently, [8] presented an approach that examines the exchange. Storing the metrics in XML files enables further evolution of code stored in source control repositories. This processing and analysis with a variety of tools. technique identifies Change Clusters, which can help For simplicity, we chose to analyse a refinement subset of managers to classify different code change activities as either the most important collected elements. This subset should be software maintenance or a new development. On the other small enough in order to be easily analyzed and large enough hand [20] analyzes whether some change coupling between to contain all the necessary system information. Based on this source code entities is significant or only minor textual requirement, we stored and further analyzed only the metrics adjustments have been checked in, as reflect the changes to and their associated native attributes. the source code entities. An approach for analyzing and The elements chosen need to be extracted from the XML classifying change types based on code revisions has been files and stored permanently in a relational database. For this developed. Finally, in [4] language processing techniques are reason we used tools that map XML elements and nodes into applied to extend human judgment into situations where any relational database, keeping the extraction method obtaining direct human judgment is impractical due to the transparent from the underlying database. volume of information that must be considered. Figure 1 depicts the general architecture of data extraction The value of this work that differentiates it from what and preparation module. presented above, is that we don’t cluster raw software measurement data. Instead, we provide the evaluator the 3.2 Weights Assignment ability to employ a Multicriteria Analysis (MA) method, the As mentioned above, we have adopted the analytic Analytical Hierarchy Process (AHP), for assigning relative hierarchy process (AHP) for the weights assignment. AHP is a weights to the extracted metrics in order to reflect their decision making technique that allows consideration of both importance on evaluating maintainability. This helps qualitative and quantitative aspects of decisions [25]. It incorporating the evaluator’s domain expertise with the reduces complex decisions to a series of one-on-one measurement data extracted from source code, which may comparisons and then synthesizes the results. Compared to lead to more accurate and interesting clustering results. other techniques, like ranking or rating techniques, AHP emulates the human ability to compare single properties of 3. Description of the Methodology alternatives. It not only helps decision makers choose the best The proposed methodology is supported by the alternative, but also provides a clear rationale for the choice. Code4Thought tool [24]. Our main purpose when In a systematic way AHP compares a list of objectives or implementing this tool was to use open source and portable alternatives. When used in the systems engineering process, technologies. Thus, we decided to use the Java programming AHP can be a powerful tool for comparing alternative design language for implementing the main functionality of our tool, concepts. Assuming that a set of objectives has been the MySQL database for storing our data the PHP scripting established; and that we are trying to establish a normalized language for designing the user interface of our tool. set of weights to be used when comparing alternatives using This section presents the logic behind the following these objectives. AHP forms a pairwise comparison matrix A, modules that constitute the Code4Thought tool: where the number in the i-th row and j-th column gives the • Data extraction and preparation relative importance of objective O(i) as compared with O(j). • Weights assignment Values that usually are used are in a 1–9 scale, with a(i,j) = 1 • Data analysis if the two objectives are equal in importance, a(i,j) = 3 if O(i) is weakly more important than O(j), a(i,j) = 5 if O(i) is 3.1. Data Extraction and Preparation strongly more important than O(j), a(i,j) = 7 if O(i) is very The objective of data extraction and preparation is two- strongly more important than O(j), and a(i,j) = 9 if O(i) is fold: absolutely more important than O(j). After this procedure the comparison matrix is normalized and its eighenvalues are Figure 1. Architecture of data extraction and preparation module
Figure 2: Weights Assignment Hierarchy
Figure 3. Data analysis module
computed. These eighenvalues play the role of 3.3. Data Analysis
coefficients/weights when someone wants to evaluate the As depicted in the Figure 3, the k-Attractors algorithm, alternatives for the examined objectives. accepts data from the source code analyzer, by performing In our case when we aim at evaluate maintainability (see queries on the database, wherethe data reside. The outcome Figure 2) from a set of employed metrics, we apply AHP of the analysis is stored in XML files, in order to be procedure in each level of the maintainability metrics visualized by the corresponding module. hierarchy. At the first level we evaluate the characteristics In the case of software maintainability evaluation, (analyzability, changeability, etc) from the extracted metrics clustering produces overviews of systems by creating and at the second level we evaluate maintainability from the mutually exclusive groups of classes, member data or characteristics applying AHP procedure again. So at first methods, according to their similarities in terms of technical level we construct a pairwise comparison table for each one (source code) measurements [16]. This helps reducing the of the characteristics reflecting the expert’s knowledge of time required to understand and evaluate the overall system. how much each metric influences each characteristic. Then Another contribution of clustering is that it helps discovering by applying the normalization and extraction of eighenvalues programming patterns and “unusual” or outlier cases which upon each matrix we find the weight of each metric for may require attention. calculating a score for each characteristic. At the higher level For this purpose the k-Attractors algorithm was a pairwise comparison table is constructed too reflecting the employed which is tailored for numerical data such as expert’s knowledge of how much each characteristic measurements from source code Error! Reference source influences maintainability; and the weights are calculated by not found.. The main characteristics of k- Attractors are: normalization and eighenvalues extraction. o It defines the desired number of clusters (i.e. the 7. DERobject.java, a class of only 38 LOC. number of k), without user intervention. Table 2 presents the metric values for the classes in o It locates the initial attractors of cluster centers cluster 2. A further study on these values indicates that the with great precision. classes in cluster 2 are grouped in two categories: o It measures similarity based on a composite metric • The first category includes the first five classes that that combines the Hamming distance and the inner product of have the following characteristics: transactions and clusters’ attractors. • They don’t follow the principle of low The k-Attractors algorithm employs the maximal coupling/high cohesion. On the contrary they exhibit frequent itemset discovery and partitioning in order to define low cohesion and high coupling. the number of desired clusters and the initial attractors of the • They are highly complex. centers of these clusters. The intuition is that a frequent • All of them have polymorphic methods; which itemset in the case of software metrics is a set of indicates that encapsulation is not applied in these measurements that occur together in a minimum part of a classes. software system’s classes. Classes with similar • The second category includes the classes measurements are expected to be on the same cluster. The ASN1Encodable and DERObject that are difficult to term attractor is used instead of centroid, as it is not maintain for different reasons. More specifically determined randomly, but by its frequency in the whole these two classes have the following characteristics: population of a software system’s classes. o Interestingly they are not complex, and their size is very small unlike the classes 4. Application - Results Evaluation on the first category. They also follow the The evaluation of Apache Geronimo’s maintainability principle of low coupling/high cohesion. according to ISO/IEC-9126, involved the study of 1440 o They have an excessive number of classes. Figure 3 depicts the clusters derived from clustering children. This indicates probably that these the maintainability values of Geronimo’s classes. The higher classes are fundamental elements of the values on axis X the less maintainable the classes are. Apache Geronimo’s structure. Table 1 presents statistics for the derived clusters. o The number of classes depending on them (Ca) is big. Table 3 presents statistics for the metrics of Apache Geronimo’s classes in clusters 0, 1, 3 and 4. This table indicates that: • The lower the metric values the higher the probability of low maintainability. • There is limited use of inheritance as shown by the low DIT and NOC values. • The majority of the classes follow the low coupling/high cohesion principle. • Most of the classes exhibit low complexity. • The design property of encapsulation is applied to most of the classes. Figure 3: Apache Geronimo ISO/IEC-9126 Maintainability Clusters 5. Conclusions and Future Work Table 1: Clusters Statistics The application of the proposed methodology has been S/N Standard proved to be time and performance efficient. The extraction Population Percentage Mean Deviation process, which is the most time-consuming part of this 0 methodology, analyzed the 1440 classes of Apache 419 29% 1.10 0.29 Geronimo 1.0 and stored the corresponding metrics and 1 130 9% 2.45 0.60 elements in a limited amount of time. A domain expert 2 7 0.004% 13.75 2.27 previewed the stored metrics and assigned easily and 3 856 59% 0.39 0.16 efficiently the corresponding weights, according to his priorities and concernings. After clustering application, the 4 28 1.996% 5.02 1.55 resulted clusters proved to be representative of the code artifacts, helping the domain expert to identify relations Cluster 3, which has the biggest population, contains between specific metrics and global maintainability as well classes that their maintainability values range between 0 and as spot individual outlier classes that may need 0.9. This shows that the vast majority of Geronimo’s classes reconsideration. are highly maintainable. Furthermore, clusters 0, 1 and 4 As future work, we intend to enhance our extraction contain classes that their maintainability values range from method by calculating metrics from other languages like 0.9 – 2, 2 - 4 and 4 – 9.2 respectively, which can be C++, C and COBOL which were used for the development of considered good in terms of maintainability. the majority of legacy systems, a category of software However, outliers are detected in cluster 2, which consists systems which is very interesting in terms of program of only seven (7) classes that have the lowest maintainability comprehension and maintainability evaluation. values. These classes are: 1. KernelManagementHelper.java, a class of 1024 Lines Of Code (LOC). Acknowledgements 2. TradeDirect.java, a class of 2312 LOC. This research work has been partially supported by the Greek 3. ClientApp.java, a class of 1633 LOC. General Secretariat for Research and Technology (GSRT) 4. CdrInputStream.java, a class of 1569 LOC. and Dynacomp S.A. within the program “P.E.P. of Western 5. CdrOutputStream.java, a class of 1241 LOC. Greece Act 3.4” 6. ASN1Encodable.java, a class of only 62 LOC. Table 2: Cluster 2 Metrics S/N WMC NPM DAM CBO POM DIT NOC LCOM Ca 1 9.15 11.13 1.62 17.40 40.00 0.72 0.00 42.69 0.00 2 11.58 4.52 1.62 35.65 30.00 0.72 0.00 45.21 0.51 3 10.68 0.32 1.62 2.99 2.50 0.72 0.00 81.97 2.53 4 18.38 11.45 1.62 14.37 20.00 0.72 0.00 64.96 9.61 5 14.77 11.29 1.62 13.14 12.50 0.72 0.00 47.82 9.61 6 0.42 0.81 0.00 0.33 0.00 0.72 149.49 0.27 26.30 7 0.28 0.48 0.00 0.00 0.00 1.44 76.27 0.18 52.10
Table 3: Cluster 0, 1, 3 and 4 Metrics Statistics
Min. Max. Mean Median Stand. Dev. WMC 0.07 12.55 0.96 0.55 1.20 NPM 0.00 8.71 0.98 0.65 1.17 DAM 0.00 1.62 1.00 1.62 0.76 CBO 0.00 16.54 0.95 0.41 1.54 POM 0.00 37.50 0.93 0.00 2.88 DIT 0.72 3.60 1.00 0.72 0.49 NOC 0.00 70.17 0.85 0.00 3.87 LCOM 0.00 26.84 0.81 0.11 2.43 Ca 0.00 81.94 0.93 0.00 3.28 [14] Rajendra K. Bandi, Vijay K. Vaishnavi, Daniel E. Turk, References “Predicting Maintenance Performance Using Object Oriented [1] N. Anquetil and T. C. Lethbridge, “Experiments with Clustering Design Complexity Metrics”, IEEE Transactions on Software as a Software Remodularization method”, Proc. 6th Working Engineering, vol. 29, No. 1, January 2003, pp. 77-87. Conf. Reverse Engineering (WCRE 99), IEEE Comp. Soc. [15] D. Rousidis and C. Tjortjis, “Clustering Data Retrieved from Press, (1999) 235-255. Java Source Code to Support Software Maintenance: A Case [2] Erik Arisholm, Lionel C. Briand, Audun Foyen, “Dynamic Study”, Proc IEEE 9th European Conf. Software Maintenance Coupling Measurement for Object-Oriented Software”, IEEE and Reengineering (CSMR 05), IEEE Comp. Soc. Press, Transactions on Software Engineering, vol. 30, No. 8, August (2005) 276-279. 2004, pp. 491-506. [16] K. Sartipi, K. Kontogiannis and F. Mavaddat, “Architectural [3] Dunham, M. H. Data Mining: Introductory and Advanced Design Recovery Using Data Mining Techniques”, Proc. 2nd Topics. Prentice Hall PTR, 2002. European Working Conf. Software Maintenance [4] Kanellopoulos Y., Antonellis P. Tjortjis C., Makris C., “k- Reengineering (CSMR 00), IEEE Comp. Soc. Press, (2000) Attractors, A Clustering Algorithm for Software Measurement 129-140. Data Analysis”, In Proceedings of IEEE 19th International [17] Spinellis D: “Code Quality: The Open Source Perspective“, Conference on Tools for Artificial Intelligence (ICTAI 2007), Addison-Wesley, 2006. IEEE Computer Society Press 2007 [18] Yong Tan, Vijay S. Mookerjee, “Comparing Uniform and [5] Kan, S. H. Metrics and Models in Software Quality Engineering. Flexible Policies for Software Maintenance and Addison-Wesley. Second Edition. 2002. Replacement”, IEEE Transactions on Software Engineering, [6] Jay Kothari, Ali Shokoufandeh, Spiros Mancoridis, Ahmed E. vol. 31, No. 3, March 2005, pp. 238-255. Hassan, "Studying the 1Evolution of Software Systems Using [19] C. Tjortjis, N. Gold, P.J. Layzell and K. Bennett, ”From Change Clusters," ICPC, pp. 46-55, 14th IEEE International System Comprehension to Program Comprehension”, Proc. Conference on Program Comprehension (ICPC'06), 2006. IEEE 26th Int’l Computer Software Applications Conf. [7] T. Kunz and J. P. Black, “Using Automatic Process Clustering (COMPSAC 02), IEEE Comp. Soc. Press, (2002) 427-432. for Design Recovery and Distributed Debugging”, IEEE [20] C. Tjortjis C., L. Sinos and Layzell P.J., “Facilitating Program Transactions on Software Engineering, 21(6), (1995) 515-527. Comprehension by Mining Association Rules from Source [8] Dawn J. Lawrie, Henry Feild, David Binkley, "Leveraged Code”, Proc. IEEE 11th Int’l Workshop Program Quality Assessment using Information Retrieval Techniques," Comprehension (IWPC 03), IEEE Comp. Soc. Press, (2003) ICPC, pp. 149-158, 14th IEEE International Conference on 125-132. Program Comprehension (ICPC'06), 2006. [21] V. Tzerpos and R. Holt, “Software Botryology: Automatic [9] S. Mancoridis, B.S. Mitchell, Y. Chen and E.R. Gansner, Clustering of Software Systems”, Proc. 9th Int'l Workshop “Bunch: A Clustering Tool for the Recovery and Maintenance of Database Expert Systems Applications (DEXA 98), IEEE Software System Structures”, Proc. Int'l Conf. Software Comp. Soc. Press, (1998) 811-818. Maintenance (ICSM 99), IEEE Comp. Soc. Press, (1998) 50-59. [22] C. Xiao and V. Tzerpos, “Software Clustering on Dynamic [10] O. Maqbool, H.A. Babri, A. Karim, and M. Sarwar, “Metarule- Dependencies”, Proc. IEEE 9th European Conf. Software guided association rule mining for program understanding, Maintenance and Reengineering (CSMR 05), IEEE Comp. Software”, IEEE Proceedings, 152(6) (2005) 281- 296. Soc. Press, (2005) 124-133. [11] Storey Margaret-Anne: “Theories, Methods and Tools in [23] http://geronimo.apache.org/downloads.htm Program Comprehension: Past, Present and Future”, Proc. IEEE [24] http://www.code4thought.org 13th Int’l Workshop Program Comprehension (IWPC 2005), [25] Saaty T.. Multicriteria Decision Making: The Analytic 2005. Hierarchy Process, Vol. 1, AHP Series, RWS Publications, [12] National Institute of Standards and Technology (NIST), “The 502 pp., 1990 Economic Impacts of Inadequate Infrastructure for Software Testing.”, Washington D.C. 2002. [13] C. M. de Oca and D. L. Carver, “Identification of Data Cohesive Subsystems Using Data Mining Techniques”, Proc. Int'l Conf. Software Maintenance (ICSM 98), IEEE Comp. Soc. Press, (1998) 16-23.