0% found this document useful (0 votes)
21 views17 pages

Search-Based Refactoring For Software Maintenance

Uploaded by

skinfosyspvt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views17 pages

Search-Based Refactoring For Software Maintenance

Uploaded by

skinfosyspvt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/222413173

Search-Based Refactoring for Software Maintenance

Article in Journal of Systems and Software · April 2008


DOI: 10.1016/j.jss.2007.06.003 · Source: DBLP

CITATIONS READS

169 1,520

2 authors, including:

Mel Ó Cinnéide
University College Dublin
76 PUBLICATIONS 2,028 CITATIONS

SEE PROFILE

All content following this page was uploaded by Mel Ó Cinnéide on 08 April 2019.

The user has requested enhancement of the downloaded file.


This article was published in an Elsevier journal. The attached copy
is furnished to the author for non-commercial research and
education use, including for instruction at the author’s institution,
sharing with colleagues and providing to institution administration.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:

http://www.elsevier.com/copyright
Author's personal copy

Available online at www.sciencedirect.com

The Journal of Systems and Software 81 (2008) 502–516


www.elsevier.com/locate/jss

Search-based refactoring for software maintenance


Mark O’Keeffe *, Mel Ó Cinnéide
School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland

Available online 13 June 2007

Abstract

The high cost of software maintenance could be reduced by automatically improving the design of object-oriented programs without
altering their behaviour. We have constructed a software tool capable of refactoring object-oriented programs to conform more closely
to a given design quality model, by formulating the task as a search problem in the space of alternative designs. This novel approach is
validated by two case studies, where programs are automatically refactored to increase flexibility, reusability and understandability as
defined by a contemporary quality model. Both local and simulated annealing searches were found to be effective in this task.
 2007 Elsevier Inc. All rights reserved.

Keywords: Search-based software engineering; Automated design improvement; Refactoring

1. Introduction then begin from a more advantageous point, thus reducing


the costs involved.
One measure of the quality of an object-oriented design Our novel approach to automated design improvement
is the level of difficulty encountered in carrying out mainte- is the formulation of the refactoring task as a search prob-
nance programming. This is because the goal of the object- lem; given a design quality function we apply automated
oriented approach is to produce understandable, modular refactorings to a program in order to move through the
designs in order to minimise the cognitive complexity of space of alternative designs and search for those of highest
programming tasks. However, it is not uncommon to quality. The effectiveness of the search can be measured in
encounter designs that have become weakened as a side- terms of the change in quality function, but the effective-
effect of the repeated addition of functionality during ness of the approach itself can only be judged in terms of
development (a problem referred to as design erosion), or the actual changes made to the program, and to what
have not been properly maintained in the past. Such extent it is more maintainable than the original. For this
designs can require significant refactoring in order to reason, choice of design quality function is a key facet of
increase their maintainability to an acceptable level, thus this work.
increasing the cost of carrying out maintenance tasks. While there exists a large body of work dealing with the
The ideal solution to this problem would be the automa- measurement of design quality in terms of a set of metrics
tion of some portion of the refactoring step by the applica- (see Section 2.3), there are few examples of attempts to cap-
tion of an automated design improvement tool. Such a tool ture complex properties such as maintainability as a single
would take the current set of program entities as input and value, as required for an evaluation function. This is per-
output a set with the same external behaviour, but having a haps not surprising, given that comparison of the design
design that conforms more closely to a given quality model. of unrelated programs with different purposes has little
Maintenance programming or manual refactoring could meaning. However, for the purpose of search-based soft-
ware maintenance the evaluation function need only be
*
Corresponding author. Tel.: +353 1 7162911.
capable of ranking alternative designs of the same
E-mail address: mark.okeeff[email protected] (M. O’Keeffe). program.

0164-1212/$ - see front matter  2007 Elsevier Inc. All rights reserved.
doi:10.1016/j.jss.2007.06.003
Author's personal copy

M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516 503

One model of software quality that incorporates suitable ing effects of three evaluation functions. We describe
evaluation functions is Bansiya’s ‘Hierarchical Model for directions for future work in Section 7 and conclude in
Object-Oriented Design Quality Assessment’ (Bansiya Section 8.
and Davis, 2002), or QMOOD, which defines evaluation
functions for such quality attributes as flexibility, reusabil-
ity and understandability, based on eleven object-oriented 2. Related work
design metrics. We have examined these evaluation func-
tions through experimentation described in this article Work related to this project can be divided broadly into
and determined that their level of suitability for this three areas: Search-Based Software Engineering, Auto-
approach varies considerably. In the process we have dem- mated Design Improvement and Design Quality Measure-
onstrated a secondary function of the search-based ment. These topics are discussed below.
software maintenance approach; that by refactoring pro-
grams to comply with a given quality model we gain a valu-
able mechanism for validation of that model. 2.1. Search-based software engineering
While our primary goal in this work was to demonstrate
that object-oriented programs can be automatically refac- Search-Based Software Engineering (SBSE) can be
tored to conform more closely to a given quality model defined as the application of search-based approaches in
using a search-based approach, we also report here on sev- solving optimisation problems in software engineering
eral other significant contributions: (Harman and Clark, 2004). Such problems include module
clustering, where a software system is reorganised into
• An assessment of the suitability and effectiveness of sev- loosely coupled clusters of highly cohesive modules to aid
eral contemporary evaluation functions for the purpose reengineering (Doval et al., 1999; Harman et al., 2002;
of search-based software maintenance. Mancoridis et al., 1999; Mitchell and Raverso, 2001), test
• A comparison of the performance of a set of search tech- data generation (Michael et al., 2001), automated testing
niques in the context of automated refactoring of Java (Wegener et al., 2001) and project management problems
programs guided by contemporary evaluation functions. such as requirements scheduling (Bagnall et al., 2001) and
• A subjective assessment of the performance of a proto- project cost estimation (Burgess and Lefley, 2001; Dolado,
type automated refactoring tool from a software engi- 2000, 2001). An overview of such work and comprehensive
neer’s perspective. recent references can be found in Clark et al. (2003) and
Harman and Clark (2004), respectively. Of particular rele-
This article expands on the CSMR 2006 paper ‘Search- vance to this work is ‘Metrics Are Fitness Functions Too’
Based Software Maintenance’ (O’Keeffe and Ó Cinnéide, (Harman and Clark, 2004), in which Harman states that
2006) in which we report similar success in automatically any product or process metric can be used as the evaluation
refactoring Java packages to increase maintainability. function driving a search-based optimisation.
While the same metric suite is employed here, we have sub- Once a software engineering task is framed as a search
stantially expanded the refactoring capability of the proto- problem there are numerous approaches that can be
type design improvement tool, with the addition of six applied to solving that problem, from local searches such
refactorings described in Section 4.1. We also report here as exhaustive search and hill-climbing to meta-heuristic
on larger case studies involving complete open-source searches such as genetic algorithms (GAs) and ant col-
programs, rather than individual packages, and have ony optimisation. Module clustering, for example, has
employed a wider variety of search techniques. Our been addressed using exhaustive search (Mancoridis
research on this novel concept was first published in 2003 et al., 1998), hill-climbing (Harman et al., 2002; Mahdavi
(O’Keeffe and Ó Cinnéide, 2003). et al., 2003; Mancoridis et al., 1998; Mitchell and Man-
The remainder of this article is structured as follows: coridis, 2002), genetic algorithms (Doval et al., 1999;
in Section 2 we survey related work in the fields of Harman et al., 2002; Mancoridis et al., 1998; Mitchell
search-based software engineering, automated and semi- and Mancoridis, 2002) and simulated annealing (SA)
automated design improvement, and design quality mea- (Mitchell and Mancoridis, 2002). In those studies that
surement. In Section 3 we discuss the limitations of our compared search techniques, hill-climbing was, perhaps
approach. In Section 4 we describe our experimental meth- surprisingly, found to produce better results than meta-
odology, with particular emphasis on the prototype design- heuristic GA searches (Harman et al., 2002; Mitchell,
improvement tool CODe-Imp. In Section 5 we present the 2002). These results were echoed in search-based auto-
results of search-based refactoring of two Java programs, parallelisation (Williams, 1998), where local searches sim-
with regard to overall quality function gain and relative ilarly out-performed GA. In software clustering the
performance of search techniques. In Section 6 we present meta-heuristic simulated annealing search was found by
our observations on search-based maintenance of the same Mitchell and Mancoridis (2002) to perform similarly to
two Java programs, discussed in terms of individual met- hill-climbing in terms of solution quality, but better in
rics and direct code examination, and compare the differ- terms of search efficiency.
Author's personal copy

504 M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516

2.2. Automated design improvement transformations’ that can be applied to ameliorate the
defect. The proviso of such tools is, of course, that they
Previous approaches to the fully automated restructur- reduce the need for programmer intervention rather than
ing of software have focussed on improving one particular eliminate it.
aspect of design, such as method reuse or code factorisa-
tion. Examples of such work include that of Casais 2.3. Design quality measurement
(1992), who proposed algorithms to restructure class hier-
archies in order to maximise abstraction, and Moore In order to treat object-oriented design as a search prob-
(1996), who proposed a system where existing classes are lem, it is necessary to define a quality evaluation function
discarded and replaced with a new set with optimal method that will serve to rank alternative designs. Furthermore,
factorisation – meaning code duplication is minimised. in order for an effective search to be carried out this quality
However, since object-oriented design involves numerous function must be automatically computable from the
trade-offs, this narrow focus can result in overall quality design model at a minimal cost. We have conducted a sur-
loss. vey of metric-based object-oriented quality models and
Our approach has two main advantages over previous selected three of the most prominent, which are described
fully automated restructuring work. Firstly, and most sig- below and assessed as to their suitability for the task in
nificantly, the use of evaluation functions consisting of hand. The principle criteria for assessment were: firstly,
combinations of multiple metric values allows us to employ that the model comes as close as possible to providing com-
much richer quality models than the single-goal approaches plete evaluation functions, and secondly, that the constitu-
mentioned above, which do not take into account the ent metrics are well-defined and well-established.
numerous trade-offs involved in object-oriented design.
Secondly, by careful choice and definition of the refactor-
ings employed we can make design quality affecting 2.3.1. CK
changes to an object-oriented program without loss of The Metrics Suite for Object-Oriented Design (known as
domain-specific information such as class and member CK) of Chidamber and Kemerer (1994) is a seminal work
names; a particular disadvantage of Moore’s work (Moore, in object-oriented quality measurement and is still fre-
1996). quently cited today. Metrics are defined for properties such
While the term refactoring was popularised by Opdyke as complexity, inheritance, coupling, cohesion and messag-
(1992) as a verb meaning ‘to improve the design of a pro- ing. The CK metrics and subsequent modifications by Li
gram without altering its behaviour’, the word has subse- (1998) have been independently validated as indicators of
quently come to be used as a noun meaning ‘a code such characteristics as fault-proneness (Basili et al., 1996),
change that can be made in order to improve design while but no attempt has been made to combine them in the form
preserving behaviour’. An example of a refactoring is Pull of an evaluation function. Several interpretations exist of
Up Method, meaning the repositioning of a method at a some CK metrics, such as Lack of Cohesion of Methods
higher level in an inheritance hierarchy. Catalogues of (LCOM).
refactorings such as Fowler’s ‘Refactoring: Improving the
Design of Existing Code’ (Fowler, 1999) are available that 2.3.2. MOOD2
have provided a useful standard for reference and The MOOD (Metrics for Object-Oriented Design)
communication. metrics suite (Brito e Abreu, 1994) was introduced by
The application of many of the refactorings prescribed Fernando Brito e Abreu et al., and was subsequently
by Fowler and others can be automated to some extent, evaluated by the author (Brito e Abreu, 1996) and others
given user interaction. Robert’s Refactoring Browser (Rob- (Harrison et al., 1998). Because some deficiencies were
erts, 1999) was one of the first software tools to provide identified, namely the lack of measures of reuse, polymor-
automated assistance for the application of refactorings; phism and external coupling, the MOOD suite was super-
today most IDEs provide some form of automated refac- seded by the MOOD2 metrics suite in 1998 (Brito e
toring support. While such tools reduce the effort involved Abreu, 1998). The MOOD2 metrics are also defined in an
in refactoring, they do not assist the programmer in the English-language paper (Brito e Abreu, 2001) through
vital task of determining where it is advantageous to apply extended OCL and the GOODLY design language (Brito
refactorings. Some semi-automated approaches to design e Abreu et al., 1999).
improvement, however, attempt do just that. The MOOD2 suite is a comprehensive, modern metrics
Semi-automated approaches to design improvement suite including several measures each of coupling, reuse,
mainly involve the use of metric-based rules to identify polymorphism, data-hiding and inheritance. MOOD2 met-
areas in need of improvement, the onus then being on the rics are formally defined, and hence can be directly imple-
programmer to determine precisely what changes should mented without resolution of ambiguity. However,
be made. Such ‘bad smell’ detection has been proposed nowhere in the literature are evaluation functions defined
by van Emden (2002), and by Tahvildari and Kontogiannis that combine MOOD2 metric values to give an overall
(2004), whose system also recommends ‘meta-pattern quality index. As a result MOOD2 does not provide a
Author's personal copy

M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516 505

complete quality model suitable for use in search-based motivation. Early in the software life-cycle, for example,
refactoring. the main motivation for refactoring may be to preserve
flexibility so that further functionality can be easily added,
2.3.3. QMOOD while later in the cycle reusability may be paramount. For
The QMOOD (Quality Model for Object-Oriented this reason, we ultimately see the automated refactoring
Design) model of Bansiya and Davis (2002) consists of a approach described here being employed by software com-
hierarchy of four levels. The levels in descending order panies that have developed their own domain-specific qual-
are: Design Quality Attributes such as ‘understandability’, ity models.
Object-Oriented Design Properties such as ‘encapsulation’,
Object-Oriented Design Metrics, and Object-Oriented 4. Experimental methodology
Design Components such as ‘class’.
For the purpose of search-based refactoring, the In order to test the thesis that object-oriented programs
QMOOD model has the advantage that it defines functions can be automatically refactored so that their design con-
from metric values to Quality Attribute Indices (QAIs) for forms more closely to a given quality model we have con-
such design attributes as flexibility, reusability and under- structed a prototype search-based design improvement
standability. This provides an excellent foundation for tool called CODe-Imp.1 CODe-Imp can be configured to
experimentation in automatically refactoring a design to operate using various subsets of its available automated
conform to this quality model. However, while QMOOD refactorings, various search techniques, and various evalu-
provides a detailed model of object-oriented design quality, ation functions based on combinations of established met-
it is lacking in the area of effective metric definition. Met- rics. In the remainder of this section we describe the
rics in QMOOD literature (Bansiya and Davis, 2002) are configuration of CODe-Imp that yielded the results
defined in natural language, and are in some cases ambig- reported in this article.
uous. In order to implement the QMOOD metrics for rep-
licable studies it is necessary to define them more precisely. 4.1. Refactorings
The QMOOD quality model was selected for this work
as it includes pre-defined evaluation functions. Were The refactoring configuration of CODe-Imp was con-
another metric suite selected it would have been necessary stant throughout the case studies reported here, and con-
to define evaluation functions on those metrics, thus reduc- sisted of the fourteen refactorings described below.
ing the independence of our work from the field of software Complementary pairs of refactorings were selected so that
product measurement. In order to ameliorate the problem changes made to the input design during the course of the
of ambiguous metric definitions in QMOOD, we have pre- search could be reversed. This is necessary for some search
cisely defined the QMOOD metrics as implemented in techniques (e.g. simulated annealing) to move freely
CODe-Imp later in this article. It should be noted that through the space of alternative designs.
other metrics for individual design properties such as cohe-
sion could be substituted for the corresponding QMOOD Push Down Field moves a field from some class to those
metrics without alteration to the approach as a whole. subclasses that require it. This refactoring is intended to
simplify the design by reducing the number of classes
3. Limitations of this approach that have access to the field.
Pull Up Field moves a field from some class(es) to the
In common with all fully automated refactoring tools immediate superclass. This refactoring is intended to
CODe-Imp has the drawback that changes in the design eliminate duplicate field declarations in sibling classes.
must be communicated to the programmer. Several issues Push Down Method moves a method from some class to
such as the relocation of program comments and need to those subclasses that require it. This refactoring is
introduce new identifiers such as class names can compli- intended to simplify the design by reducing the size of
cate this task (Calliss, 1988). These issues could be class interfaces.
addressed by a programmer review of the refactored code, Pull Up Method moves a method from some class(es) to
but our assumption that this would be less expensive than the immediate superclass. This refactoring is intended to
completely manual refactoring is unproven. help eliminate duplicate methods among sibling classes,
As automatic refactoring with CODe-Imp improves and hence reduce code duplication in general.
program design with respect to a well-defined quality Extract Hierarchy adds a new subclass to a non-leaf
model rather than in an absolute sense, the effectiveness class C in an inheritance hierarchy. A subset of the sub-
of the approach hinges on how accurately that quality classes of C will inherit from the new class. This refac-
model reflects the refactoring goals of the user. Quality toring is intended to help improve class cohesion and
models of sufficient detail to be employed in our case stud-
ies are extremely rare in the literature, largely because the
definition of quality varies not only between software
1
domains but also between refactoring tasks of differing Combinatorial Optimisation Design-Improvement.
Author's personal copy

506 M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516

modularity by increasing abstraction in the class iour. In order to achieve this we have employed a system
hierarchy. of conservative precondition checking similar to that devel-
Collapse Hierarchy removes a non-leaf class from an oped by Roberts for the Smalltalk ‘Refactoring Browser’
inheritance hierarchy. This refactoring is intended to (Roberts, 1999) and subsequently extended by Ó Cinnéide
reduce design complexity by removing superfluous clas- (2000), but using static rather than dynamic analysis. Fur-
ses from the design. ther details are omitted here due to space constraints.
Increase Field Security increases the security of a field
from public to protected or from protected to private.
4.2. Search techniques
This refactoring increases data encapsulation.
Decrease Field Security decreases the security of a field
In order to provide an insight into which search tech-
from private to protected or from protected to public.
niques are most effective in a search-based software main-
This refactoring reduces data encapsulation.
tenance context we have replicated the case studies
Replace Inheritance with Delegation replaces an inheri-
reported here across four; three local and one meta-heuris-
tance relationship between two classes with a delegation
tic. The search techniques selected were the following:
relationship; the former subclass will have a field of the
type of the former superclass. This refactoring is used to
First-ascent hill-climbing (HC1): A local search algo-
rectify a situation where a subclass does not use enough
rithm where the search examines neighbouring solutions
of a superclass’s features to justify the specialisation
until a higher quality solution is discovered. This neigh-
relationship (Fowler, 1999).
bour then becomes the current solution. Local search
Replace Delegation with Inheritance replaces a delega-
algorithms were selected as they have been shown to
tion relationship between two classes with an inheritance
produce good results in other SBSE applications, as dis-
relationship; the delegating class becomes a subclass of
cussed in Section 2.1.
the former delegate class. This refactoring can be used
Steepest-ascent hill-climbing (HC2): A second local
in a situation where a delegating class is using enough
search algorithm, where the search examines all neigh-
features of a delegate class that a specialisation relation-
bouring solutions and moves to the solution of highest
ship would be more appropriate (Fowler, 1999).
quality.
Increase Method Security increases the security of a
Multiple-restart hill-climbing (HCM): A variation of
method from protected to private or from public to pro-
first-ascent hill climbing where a number of ‘restarts’
tected. This refactoring can reduce the size of the public
are made when the search reaches an apparent opti-
interface of a class.
mum. In the experiments described here three restarts
Decrease Method Security decreases the security of a
of a depth of five random refactorings were made in
method from protected to public or from private to pro-
each case.
tected. This refactoring can increase the size of the pub-
Low-temperature simulated annealing (SA): A meta-
lic interface of a class.
heuristic search technique described below. Simulated
Make Superclass Abstract declares a constructorless
annealing was selected as it has previously been found
class explicitly abstract. This increases some measures
to perform well in the context of software clustering
of abstraction, and facilitates other refactorings.
(Mitchell and Mancoridis, 2002).
Make Superclass Concrete removes the explicit ‘abstract’
declaration of an abstract class without abstract meth-
A simulated annealing (Kirkpatrick et al., 1983) search
ods. This decreases some measures of abstraction.
essentially involves making series of tentative changes to
some solution of a combinatorial optimisation problem.
We have deliberately chosen refactorings that operate at
Changes which increase the quality of the solution are
the method/field level of granularity and higher because
accepted, and the changed solution becomes the starting
our focus is on the automatic improvement of the design
point for the next series of tentative changes. In addition,
encapsulated in a program rather than implementation
some changes which reduce the quality of the solution
issues such as correct factorisation of methods.
are accepted in order to allow the search to escape from
During the search process alternative designs are repeat-
local minima. Such (negative) changes are accepted with
edly generated by the application of a refactoring to the
a probability that decreases steadily during the annealing
existing design, evaluated for quality, and either accepted
process (Eq. (1); where p is the probability of accepting a
as the new current design or rejected. As the current design
given solution, dq is the magnitude of quality reduction rel-
changes, the number of points at which each refactoring
ative to the current solution, and T is the temperature
can be applied will also change. One the functions of
value)
CODe-Imp’s Java Program Model (JPM), an abstraction
dq
of the AST automatically enriched with program facts at p ¼ e T ð1Þ
runtime, is to determine where refactorings can legally be
applied – in other words, where the corresponding code In common with other search techniques, simulated
alterations can be made without altering program behav- annealing requires an evaluation function and a problem
Author's personal copy

M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516 507

representation with a means of altering solutions. In addi- corresponding value for the original design D to give the
tion, a cooling schedule is required that determines how metric change quotient. Metric weights for each evaluation
quickly the annealing runs, and hence how likely the solu- function are shown in Table 2; a positive weight corre-
tion is to be of high quality. CODe-Imp employs the stan- sponds to a metric that should be increased in order to
dard geometric cooling schedule, meaning the temperature enhance the design property in question, while a negative
is reduced by a constant factor after each step in the weight corresponds to a metric that should be decreased.
annealing process. Of course, the concept of design quality is quite ephem-
The parameters of a geometric cooling schedule are: eral and even subconcepts such as understandability
Tstart, the starting value for the temperature variable; Mar- cannot easily be defined. We consider the QMOOD evalu-
kov chain length (M), the number of tentative changes that ation functions examples of how some desirable design
will be made at each temperature; and f, the geometric property can be precisely expressed, rather than definitive
cooling factor. Theoretically, a simulated annealing search metrics for the design properties they are named for. We
yields optimum results when M tends towards infinity and f therefore attempt to optimise these values not in order to
towards one; in practice the cooling schedule should be as guarantee an improvement in terms of the subjective con-
slow as possible within the time available. cepts of flexibility, reusability and understandability, but
A number of different cooling schedules were tested in rather to demonstrate that in the general case a well-
order to establish a useful candidate for the experiments defined design property can be optimised using our
described later in this article. Mean quality increase over approach. However, in Section 6 we do subjectively assess
three runs for Markov chain lengths 1 and 2 and various whether the QMOOD evaluation functions lead to
cooling factors in the range 0.990–0.999 was recorded for improvements in the corresponding design properties, in
input B under the QMOOD Understandability evaluation the context of our approach.
function, which we describe in Section 4.3. A cooling factor
of f = 0.9975 was found to be most effective while Markov
chain lengths of 1 and 2 were equally effective for that 4.4. Input and hardware
value. A cooling schedule of f = 0.9975 and M = 1 was
therefore used in the experiments described in the remain- Input consisted of one program from the Spec Bench-
der of this article. marks2 standard performance evaluation framework and
It should also be noted that a low temperature simulated one program taken from SourceForge3 via java-source.net.
annealing was employed, meaning that the value of Tstart These programs were selected because a large number of
was adjusted to give large quality drops a lower than nor- refactorings could be applied to them. Input A (Spec-
mal chance of being accepted. This is because standard Check) consisted of 41 classes to which 351 distinct refact-
simulated annealing is usually employed where the starting orings could initially be applied, while input B (Beaver)
solution is of extremely low quality, such as a timetable consisted of 30 classes to which 190 distinct refactorings
with a large number of clashes in the context of a timet- could initially be applied.
abling problem. In contrast, object-oriented designs which Experiments were carried out on a 2.2 GHz AMD Ath-
are of low quality from a programmer’s perspective are lon powered PC with 1GB RAM. Mean processing time
nonetheless of relatively high quality, when we consider per solution examined was approximately one second,
the potentially infinite size of the search space of all func- including model building, metric extraction, quality assess-
tionally equivalent designs. In the experiments reported ment, discovery of legal refactorings, and actual (Abstract
in this article that used simulated annealing initial accep- Syntax Tree) refactoring. Total run times varied between
tance probabilities of approximately 0.2 were observed approximately one hour and almost two hours depending
for large quality drops, whereas a standard annealing sche- on input size and search algorithm for local searches, with
dule would result in initial probabilities of approximately simulated annealing notably requiring a disproportionately
0.8. long run time of up to ten hours due to processing over-
heads required by the algorithm. However, CODe-Imp
4.3. Evaluation functions was designed with robustness rather than speed as a prior-
ity and makes no use of concurrent processes, so there is
The evaluation functions employed in the CODe-Imp potential to greatly decrease these run-times.
prototype described here are the Flexibility, Reusability In the following two sections we present case studies of
and Understandability functions defined as part of the two facets of the search-based refactoring of Java pro-
QMOOD hierarchical design quality model (Bansiya and grams. In Section 5 we present the results of refactoring
Davis, 2002). Each evaluation function in the model is of two programs with regard to quality gain and relative
based on a weighted sum of quotients on the 11 metrics performance of search techniques, while in Section 6 we
described in Table 1. QMOOD evaluation functions deter- present our observations on search-based maintenance of
mine the relative quality attributes of two designs, pre-
sumed to be similar in purpose. For this reason, each 2
http://www.spec.org/.
metric value for the refactored design D 0 is divided by the 3
http://sourceforge.net/.
Author's personal copy

508 M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516

Table 1
QMOOD metrics (Bansiya and Davis, 2002)
Metric Acronym Description Design
Property
Design size in classes DSC A count of the total number of classes in the design. Design Size
Interpreted as excluding imported library classes
Number of hierarchies NOH A count of the number of class hierarchies in the design. Hierarchies
Interpreted as excluding hierarchies that consist of a
specialised class within the design and a generalised class
outside
Average number of ancestors ANA The average number of classes from which each class Abstraction
inherits information
Number of polymorphic methods NOP A count of the number of the methods that can exhibit Polymorphism
polymorphic behaviour. Interpreted as the average
across all classes, where a method can exhibit
polymorphic behaviour if it is overridden by one or more
descendent classes
Class interface size CIS A count of the number of public methods in a class. Messaging
Interpreted as the average across all classes in a design
Number of methods NOM A count of all the methods defined in a class. Complexity
Interpreted as the average across all classes in a design
Data access metric DAM The ratio of the number of private (protected) attributes Encapsulation
to the total number of attributes declared in the class.
Interpreted as the average across all design classes with at
least one attribute, of the ratio of non-public to total
attributes in a class
Direct class coupling DCC A count of the different number of classes that a class is Coupling
directly related to. The metric includes classes that are
directly related by attribute declarations and message
passing (parameters) in methods. Interpreted as an
average over all classes when applied to
a design as a whole; a count of the number of distinct
user-defined classes a class is coupled to by method
parameter or attribute type. We exclude standard
Java library classes from the computation
Cohesion among methods of class CAM The relatedness among methods of a class, computed Cohesion
using the summation of the intersection of parameters of
a method with the maximum independent set of all
parameter types in the class. We have excluded
constructors and implicit ‘this’ parameters from the
computation
Measure of aggregation MOA A count of the number of data declarations whose types Composition
are user-defined classes. Interpreted as the average value
across all design classes. We define ‘user defined classes’
as non-primitive types that are not included in the Java
standard libraries
Measure of functional abstraction MFA The ratio of the number of methods inherited by a class Inheritance
to the number of methods accessible by member methods
of the class. Interpreted as the average across all classes in
a design of the ratio of the number of methods inherited by
a class to the total number of methods available to that class,
i.e. inherited and defined methods

the same two Java programs, discussed in terms of individ- tion 4.2. Two input programs and the QMOOD Flexibility,
ual metrics and direct code examination, and compare the Reusability and Understandability evaluation functions
differing effects of three evaluation functions. described in Section 4.3 were examined, giving a total of
six distinct cases for comparison. These results indicate
5. Case studies I: Comparison of search techniques the level of success achieved in refactoring the input pro-
grams to improve design as measured by the evaluation
In this section we present the quality changes observed functions, and allow us to compare the performance of
using each of the four search techniques described in Sec- the three search techniques employed.
Author's personal copy

M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516 509

Table 2
Metric weights of QMOOD evaluation functions (Bansiya and Davis, 2002)
Function DSC NOH ANA DAM DCC CAM MOA MFA NOP CIS NOM
Flexibility 0 0 0 0.25 0.25 0 0.5 0 0.5 0 0
Reusability 0.5 0 0 0 0.25 0.25 0 0 0 0.5 0
Understandability 0.33 0 0.33 0.33 0.33 0.33 0 0 0.33 0 0.33

The results described in this section are mean values of Fig. 2 shows the mean number of solutions examined in
at least three replications of each run, the only variation each of the cases graphed in Fig. 1. For all three evaluation
being in random decisions required by the search algo- functions, HC1 and SA examined the fewest solutions, with
rithms. Figures show standard deviation ‘error’ bars; where a statistically significant difference occurring only for the
these are absent no deviation from the mean was observed. Reusability function, where SA examined fewest solutions.
Statistical significance was established in all cases by per- For all three functions HC2 examined the greatest number
forming student’s t-test for unpaired data assuming of solutions, with HCM examining significantly fewer than
unequal variance, in order to establish that perceived differ- HC2 but significantly more than HC1 or SA. So, while the
ences were not due to chance alone. A confidence interval four search techniques performed similarly in terms of
of 95% was used. mean quality gain for this input, the number of solutions
examined in the search process varied considerably, with
5.1. Input A – Spec-Check HC2 examining approximately twice as many as SA in each
case.
Fig. 1 shows the mean overall quality changes observed For this input all four search techniques produced qual-
for each search technique and evaluation function for input ity improvements, but HC1 and SA were the most efficient
A. An increase in evaluation function value was observed in terms of number of solutions examined. Since SA
for all four search techniques for each of the three evalua- incurred high processing overheads as mentioned in Sec-
tion functions Flexibility, Reusability and Understandabil- tion 4.4, HC1 must be considered the most efficient search
ity. The magnitude of quality function change was greatest technique for this input. As neither HCM or SA discovered
in the case of the Understandability function and smallest solutions of higher quality it is likely that the search space
in the case of the Flexibility function, but as QMOOD eval- was smooth for these three combinations of input and eval-
uation functions are based on a weighted sum of metric uation function.
quotients this does not necessarily mean that more exten-
sive changes were made to the design in any particular case. 5.2. Input B – Beaver
The four search techniques yielded identical results in
terms of mean solution quality increase across the Flexibil- Fig. 3 shows the mean overall quality changes observed
ity and Understandability evaluation functions. For the for each search technique and evaluation function for
Reusability function HC2 produced significantly greater input B. An increase in evaluation function value was
quality increases than HC1, HCM or SA. No statistically observed for all three search techniques for each of the
significant difference in quality increase was observed three evaluation functions Flexibility, Reusability and
between the other three search techniques. Understandability.

Fig. 1. Mean quality change – input A.


Author's personal copy

510 M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516

Fig. 2. Mean solutions examined – input A.

The three search techniques yielded varying results in ibility function. For the Flexibility and Understandability
terms of mean solution quality increase across the three functions HC2 examined the greatest number of solutions,
evaluation functions. For the Flexibility function, SA pro- while for the Reusability function SA examined more by a
duced the greatest quality increase by a clear margin, while small but statistically significant amount. In all three cases
no statistically significant difference was observed between HCM examined the second-highest number of solutions,
the other three search techniques. For the Reusability func- although in the case of the Reusability function it did not
tion, HC1 and HCM produced the greatest quality examine significantly more than HC2.
increases, with no statistically significant difference between For this input all four search techniques produced qual-
them, while HC2 and SA produced lesser quality increases, ity improvements, but HC1 was most efficient in terms of
with no statistically significant difference between them. number of solutions examined. In the case of the Flexibility
For the Understandability function, HC2 produced the function SA produced the greatest quality increase while
greatest mean quality increase. Although HCM produced examining the second fewest mean number of solutions,
the highest quality individual solution observed, the large but still required the greatest run-time due to the process-
variation in solution quality in this case caused the mean ing overheads mentioned in Section 4.4. In the case of
quality increase to be significantly lower than HC2. HC1 the Understandability function HC2 produced the greatest
and SA also performed significantly worse than HC2 for quality increase, but at the cost of examining approxi-
this evaluation function. mately twice as many solutions as each of the other search
Fig. 4 shows the mean number of solutions examined in techniques.
each of the cases graphed in Fig. 3. For all three evaluation In summary, the search techniques employed all demon-
functions HC1 examined the fewest solutions, although SA strated strengths in this experiment: first-ascent hill climb-
did not examine significantly more in the case of the Flex- ing consistently produced quality improvements at a

Fig. 3. Mean quality change – input B.


Author's personal copy

M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516 511

Fig. 4. Mean solutions examined – input B.

relatively low cost, steepest-ascent hill climbing produced little variation was observed in mean quality increase in
the greatest mean quality improvements in two of the six most cases.
cases, multiple-ascent hill climbing produced individual For each evaluation function we will present the
solutions of highest quality in two cases, and simulated observed changes in metric values and describe how they
annealing produced the greatest mean quality improvement contributed to the increase in quality function value. We
in one case. also report the refactorings that were actually applied to
the design in order to achieve these changes, and discuss
the impact on the design from a programmer’s perspective,
6. Case studies II: Comparison of evaluation functions having examined the refactored source code. The relevant
subsections will be titled ‘metric quotient changes’, ‘con-
In this section we present the observed changes in the crete design changes’ and ‘analysis of refactored design’,
metric values that comprise each of the three QMOOD respectively.
evaluation functions Flexibility, Reusability and Under-
standability, for two Java programs after search-based
refactoring. These results demonstrate the differing effects 6.1. Flexibility
of the various evaluation functions on the output design
and, along with an examination of the output code, allow The Flexibility quality attribute in QMOOD is defined
us to discuss the effectiveness of the evaluation functions as the readiness of a design for adaptation to provide func-
in actually increasing design quality. For clarity, we present tionally related capabilities (Bansiya and Davis, 2002).
the results of the best single run for each input in each of Metric changes resulting from use of the Flexibility func-
the following sections. As can be seen from Figs. 1 and 3, tion in CODe-Imp are shown in Fig. 5. Input A values

Fig. 5. Metric quotient changes, Flexibility function.


Author's personal copy

512 M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516

are from one solution obtained using HC1; input B values aggregation. Coupling had also increased slightly. Exami-
are from one solution obtained using SA. It should be nation of the refactored code revealed that where the inher-
noted that these are metric quotient changes; differences itance relationship had been replaced by delegation the
from the identity value of 1 are graphed. A graphed value former subclass made no use of any of the former super-
of 1 would equate to a doubling of the metric value from class’s features, so the change was justified. The refactored
input to output. The actual metric weights comprising the design in this case was superior in terms of general object-
Flexibility function are shown in Table 2 and are repeated oriented design principles such as the maximisation of
in Fig. 5. Details of the individual metrics can be found in encapsulation and the use of inheritance only where it is
Table 1. suitable, so there was some evidence that general maintain-
ability had increased. There was no conclusive evidence
6.1.1. Metric quotient changes that the refactored design would be more flexible in
In the case of input A, use of the QMOOD Flexibility particular.
function resulted in increases in the positively weighted In the case of input B, the refactored design exhibited
DAM (0.25) and MOA (0.5) metrics, as well as the nega- improved data-hiding, with seven classes having a greater
tively weighted DCC (0.25) metric. A decrease was also proportion of non-public methods compared to the input
observed in the unweighted ANA metric. design. In addition, coupling had been reduced for three
In the case of input B, use of the Flexibility function classes net, at the cost of slightly reduced cohesion for
resulted in increases in the positively weighted DAM one class. While high cohesion is valued in object-oriented
(0.25) metric and the unweighted MFA metric, and design, low coupling is perhaps a greater priority when flex-
decreases in the negatively weighted DCC (0.25) metric ibility of design is paramount. Therefore, the refactored
and unweighted CAM metric. design in this case was not only better in terms of general
object-oriented principles, but also could be regarded as
6.1.2. Concrete design changes more flexible than the input design.
In more concrete terms, for input A the DAM metric Although the NOP metric is positively weighted in the
value increased from 0.82 to 0.90, so the output solution QMOOD Flexibility function, no increase in NOP was
consisted of classes with a higher average ratio of non-pub- observed for either input, nor was any increase observed
lic to public attributes. MOA for the program increased for the positively weighted MOA metric in the case of input
from 0.61 to 0.63, while DCC increased from 0.83 to A. There are two possible explanations for this; firstly,
0.85. ANA decreased from 0.17 to 0.12. Examination of there may not have been any legal refactorings that would
the refactored code revealed that nine applications of the increase these values or, secondly, any refactoring that
Increase Field Security refactoring had increased encapsu- increased these values may have also caused an undesired
lation in eight different classes, while one Pull Up Field and change of greater magnitude in other weighted metrics
one Replace Inheritance With Delegation refactoring had and hence been rejected.
increased aggregation for one class at the cost of one addi-
tional coupling link.
6.2. Reusability
In the case of input B the DAM metric value increased
from 0.58 to 0.81, so again the output solution consisted of
Metric changes resulting from use of the Reusability
classes with a considerably higher average ratio of non-
function in CODe-Imp are shown in Fig. 6. Input A values
public to public attributes. In addition, DCC fell from
are from one solution obtained using HC2; input B val-
2.70 to 2.63 and CAM fell from 0.67 to 0.62, meaning that
ues are from one solution obtained using HC1. Again,
average coupling and average cohesion decreased slightly,
these are metric quotient changes; differences from the
while MFA increased from 0.03 to 0.10. Examination of
identity value of 1 are graphed. The actual metric weights
the source code revealed that eleven applications (net) of
comprising the Reusability function are shown in Table 2
the Pull Up Method and seven applications (net) of the Pull
and repeated in Fig. 6. It should be noted that it was
Up Field refactorings had reduced coupling in the case of
necessary to impose a limit on the number of classes in
six different classes but increased coupling in the case of
refactored designs considered in the search space; this is
three classes, for a net decrease of three classes coupled
discussed further below.
to one other. One class displayed slightly reduced cohesion.
In addition, application of the Increase Field Security
refactoring had improved encapsulation in seven classes. 6.2.1. Metric quotient changes
In the case of input A, use of the QMOOD Reusability
6.1.3. Analysis of refactored designs function resulted in increases in the positively weighted
The refactored design in the case of input A exhibited metrics DSC (0.5) and CAM (0.25) and the unweighted
improved data-hiding, with seven classes having a greater metrics ANA, DAM and MFA. Decreases were observed
proportion of non-public methods compared to the input for the positively weighted metric CIS (0.5), the negatively
design. In addition, use of the inheritance mechanism had weighted metric DCC (0.25) and the unweighted metrics
decreased by one instance, which had been replaced by MOA and NOP.
Author's personal copy

M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516 513

Fig. 6. Metric quotient changes, Reusability function.

In the case of input B, use of the QMOOD Reusability observed for input A: due to repositioning of methods in
function resulted in increases in the positively weighted inheritance hierarchies, four input classes were found to
metrics DSC (0.5), CAM (0.25) and CIS (0.5), the nega- be dependent on fewer other classes after refactoring,
tively weighted metric DCC (0.25), and the unweighted though one input class gained a dependency. Improve-
metrics ANA, DAM, MFA and NOM. Decreases were ments were also observed for the CAM metric in five of
observed in the unweighted metrics NOH, MOA and NOP. the input classes, while decreases were observed in two.
In the cases of both input A and input B the most High CAM values were observed for six of the ten new
prominent changes are the increases in ANA and MFA classes, but in all instances this was a result of the class
(truncated on graph) which corresponded to greater than defining only one method.
two-fold increases in these metric values in the case of each
input. However, as these metrics are unweighted in the 6.2.3. Analysis of refactored designs
QMOOD Reusability function, their values had no impact The striking changes made to the two input designs
on the search process. under the QMOOD Reusability function provide a good
illustration of the capacity of CODe-Imp to discover
6.2.2. Concrete design changes designs that conform more closely to a given quality model.
Examination of the output code for input A revealed However, in subjectively assessing the level of design
that major changes had been made to the design, including improvement in this case we find fault with the evaluation
the addition of eight new classes within one inheritance function. Firstly, as mentioned above, it was necessary to
hierarchy by means of the Extract Hierarchy refactoring. impose a limit on the number of classes in the solution
The addition of subclasses affected not only the DSC and design. The reason for this is the large positive weight on
ANA metrics, but also all metrics that are taken as an aver- the Design Size in Classes metric, which makes it likely that
age over the number of design classes. The observed any addition of a class to the design will be interpreted as
changes in the DCC, NOP and CIS metrics were due solely an improvement in the evaluation function. A runaway
to this effect. The observed increase in the CAM metric was search process that adds an infinite number of empty clas-
due to three applications of the Pull Up Method refactor- ses to the design therefore becomes a possibility. In the
ing, with one input class exhibiting greater cohesion after cases of both input A and B, the ‘best’ solutions observed
methods had been repositioned. Three of the eight new reached the imposed design size limit of 1.2 times the num-
classes also exhibited high CAM values, but in each case ber of input classes, even though this meant including fea-
this was due to the class defining only one method. tureless classes in the output design. Secondly, putting
Examination of the output code for input B revealed aside the featureless classes problem, it is hard to see how
that major changes had also been made to this design by a real increase in the reusability of a design is effected by
three applications of the Replace Inheritance With Delega- reorganising an inheritance hierarchy to favour classes with
tion refactoring and the addition of six new classes by only one method. While such classes can be said to be
means of the Extract Hierarchy refactoring. In contrast highly cohesive and loosely coupled it is unlikely that they
to the results for input A, none of the observed changes would represent meaningful domain abstractions. We con-
occurred solely as a side-effect to the increase in the size clude that although some genuine improvements to the
of the design. Significant reductions were observed for this input design were observed, the QMOOD Reusability func-
input in the case of the DCC metric, a result which was not tion in its published form is not well-suited to the task of
Author's personal copy

514 M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516

search-based software refactoring with the set of refactor- 0.12 as a result of one application of the Replace Inheri-
ings we have employed. tance with Delegation refactoring, and a slight increase in
the average CAM value resulted from an application of
6.3. Understandability the Pull Up Method refactoring.
For input B the average DAM value rose from 0.58 to
Metric quotient changes resulting from use of the 0.78, so a considerable improvement in terms of encapsula-
Understandability function in CODe-Imp are shown in tion was achieved as a result of eighteen applications of the
Fig. 7. Input A values are from one solution obtained using Increase Field Security refactoring. Examination of the
HC1; input B values are from one solution obtained using output code revealed that the decreases in ANA (0.40–
HCM. Again, these are metric quotient changes; differences 0.33) and NOP (0.007–0.004), and the increases in DCC
from the identity value of 1 are graphed. The actual metric (2.70–2.83) and NOM (175–189) resulted from two appli-
weights comprising the Understandability function are cations of the Replace Inheritance with Delegation refac-
shown in Table 2 and repeated in Fig. 7. toring. Two applications of Pull Up Field and one of
Push Down Method resulted in the increase in CAM
6.3.1. Metric quotient changes (0.66–0.67), and helped prevent a greater increase in DCC.
In the case of input A the Understandability function In each case there was no decrease in DSC, despite the
produced increases in the positively weighted metric negative weight on this metric. This is because we do not
DAM (weight 0.33) and the negatively weighted metric allow the complete destruction of input classes by CODe-
DCC (weight 0.33), and a decrease in the negatively Imp, even where it can be done legally. The reason for this
weighted metric ANA (weight 0.33). A very small is simple; destroying a class that a programmer has identi-
increase in the positively weighted metric CAM (0.33) fied as an enduring domain abstraction is undesirable
was also observed which is not visible from the graph. In because it represents a loss of domain-specific information,
the case of input B the Understandability function pro- and could result in refactored designs containing a large
duced increases in the positively weighted metrics DAM proportion of automatically named classes if classes were
(0.33) and CAM (0.33), the negatively weighted metrics repeatedly removed and replaced.
DCC (0.33) and NOM (0.33), and the unweighted met-
rics MOA and CIS. Decreases were observed for the nega- 6.3.3. Analysis of refactored designs
tively weighted metrics ANA (0.33) and NOP (0.33), From a programmer’s perspective, the changes to the
and the unweighted metrics NOH and MFA. two designs observed after running CODe-Imp under the
Understandability evaluation function were positive.
6.3.2. Concrete design changes The increases in encapsulation and method cohesion seen
The average DAM value for classes in the input A in both cases, as well as the decrease in polymorphism seen
design rose from 0.82 to 0.90, so the proportion of non- in one case, mean an improvement in the understandability
public to public methods was considerably higher in the of the designs from our subjective viewpoint. In addition,
output design, indicating a greater level of encapsulation for both inputs unjustified inheritance relationships were
achieved by nine applications of the Increase Field Security replaced with delegation; for input A a subclass using none
refactoring. The average ANA value dropped from 0.17 to of its superclass’s features and for input B a subclass using

Fig. 7. Metric quotient changes, Understandability function.


Author's personal copy

M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516 515

only one of its superclass’s ten methods were refactored. 8. Conclusion


This shows that a design can be improved in ways that
are not directly measured; in this case the redefinition of The results reported here support the thesis that object-
the relationships between several classes was a product of oriented programs can be automatically refactored to
reducing inheritance and polymorphism while maintaining improve quality as measured by well-defined quality mod-
cohesion and encapsulation. Furthermore, these improve- els, and partially validate the search-based software main-
ments were made without incurring any loss of domain- tenance approach. We have shown that evaluation function
specific information encapsulated in the design such as increases can be obtained in all cases examined using sim-
class or member names. For these reasons, the results ple search techniques, and that variation in weights on
reported here suggest that the QMOOD Understandability evaluation function components has a significant effect
function is a quality model suitable for search-based soft- on the overall refactoring process.
ware refactoring. To elaborate on the contributions stated in Section 1;
inspection of output code and analysis of solution metrics
7. Future work provided some evidence in favour of use of the QMOOD
Flexibility function, and strong evidence in favour of use
In common with all fully automated refactoring tools of the Understandability function. The QMOOD Reusabil-
CODe-Imp has the drawback that changes in the design ity function in its present form was not found to be suitable
must be communicated to the programmer. Several issues to the requirements of search-based software maintenance
such as the relocation of program comments and need to because it resulted in solutions including a large number
introduce new identifiers such as class names can compli- of featureless classes.
cate this task (Calliss, 1988). These issues could be The search techniques employed all demonstrated
addressed by a programmer review of the refactored code, strengths in this experiment: first-ascent hill climbing con-
but our assumption that this would be less expensive than sistently produced quality improvements at a relatively
completely manual refactoring has not yet been proved. low cost, steepest-ascent hill climbing produced the greatest
To date, the refactoring capacity of CODe-Imp has mean quality improvements in certain cases, multiple-
mainly focussed on transforming the structure of inheri- restart hill climbing produced individual solutions of high-
tance hierarchies. In order for this technique to become est quality in certain cases, and simulated annealing
part of software engineering practice it will be necessary produced the greatest mean quality increase by a clear mar-
to increase the power of the tool. However, only certain gin for one input and evaluation function pair. We con-
refactorings can be fully automated. Further research is clude that both local search and simulated annealing are
required to determine whether the refactoring capacity of effective in the context of search-based software refactor-
CODe-Imp can become diverse enough to satisfy the main- ing, as did Mitchell and Mancoridis (2002) for the related
tenance programmer. problem of module clustering. Perhaps the most significant
While we have demonstrated that object-oriented pro- observation here was that quality improvements were
grams can be automatically refactored to improve their obtained using simple search techniques with manageable
design with respect to a given quality model, much work run-times such as first-ascent hill climbing, which bodes
remains to be done in order to provide a quality model that well for the scalability of the approach.
is of use in the general case. Although the QMOOD Flex- In the case of the Understandability function, genuine
ibility and Understandability evaluation functions appear improvements were made to the design of both of the pro-
sufficient to produce genuine improvements in the case grams studied here. In addition, the fact that these
studies described here, larger studies with independent improvements were made using local search techniques
assessment of the refactored designs are required in order indicates that the search landscape is not as difficult as
to fully establish this approach. Further studies are also might be imagined. We conclude that a search-based soft-
required to establish the level of resources needed to suc- ware maintenance tool based on the QMOOD Under-
cessfully apply this approach in an industrial setting. standability evaluation function has the potential to be of
In the future we envisage the search-based software real use to the software engineer faced with a difficult reen-
maintenance approach forming a synergistic relationship gineering task.
with operational research in domain-specific quality mod-
els. Where a specific model is proposed it would be possible References
to validate and refine it using the search-based approach; a
quality model of sufficient accuracy could then be used to Bagnall, Anthony J., Rayward-Smith, Victor J., Whittley, I.M., 2001. The
drive the search-based design improvement process. It next release problem. Inform. Software Technol. 43 (14), 883–890.
would also be possible to give the programmer more con- Bansiya, Jagdish, Davis, Carl G., 2002. A hierarchical model for object-
oriented design quality assessment. IEEE Trans. Software Eng. 28 (1),
trol over the automated refactoring process, for example
4–17.
by protecting certain methods, fields or classes from alter- Basili, Victor R., Briand, Lionel C., Melo, Walcélio L., 1996. A validation
ation. In this way, portions of the design known to be of of object-oriented design metrics as quality indicators. IEEE Trans.
high quality could be preserved. Software Eng. 22 (10), 751–761.
Author's personal copy

516 M. O’Keeffe, M. Ó Cinnéide / The Journal of Systems and Software 81 (2008) 502–516

Brito e Abreu, Fernando, Melo, Walcélio L. 1996. Evaluating the impact Harrison, R., Counsell, S., Nithi, R., 1998. An evaluation of the MOOD
of object-oriented design on software quality. In: IEEE METRICS, set of object-oriented software metrics. IEEE Trans. Software Eng. 24
pp. 90–99. (6), 491–496.
Brito e Abreu, Fernando, 1998a. The MOOD2 metrics set (in portuguese); Kirkpatrick, Scott, Gelatt Jr., D., Vecchi, M.P., 1983. Optimization by
relatorio r7/98, abril, Technical report, Grupo de Engenharia de simulated annealing. Science 220 (4598), 671–680.
Software, INESC. Wei, Li, 1998. Another metric suite for object-oriented programming. J.
Brito e Abreu, Fernando, 2001. Using OCL to formalize object oriented Syst. Software 44 (2), 155–162.
metrics definitions. Technical report, Grupo de Engenharia de Soft- Mahdavi, Kiarash, Harman, Mark , Hierons, Robert M., 2003. A multiple
ware, INESC. hill climbing approach to software module clustering. In: ICSM, pp.
Brito e Abreu, Fernando, Ochoa, Luis, Goulão, Miguel, 1994. Candidate 315–324.
metrics for object oriented software within a taxonomy frame- Mancoridis, Spiros, Mitchell, Brian S., Rorres, C., Chen Yih-Farn, Emden
workJournal of Systems and Software, Vol. 26. North-Holland, R., 1998. Gansner. Using automatic clustering to produce high-level
Elsevier Science, July. system organizations of source code. In: IWPC, pp. 45.
Brito e Abreu Fernando, Ochoa, Luis, Goulão, Miguel, 1999. The Spiros Mancoridis, Mitchell Brian S., Chen Yih-Farn, Gansner Emden R.,
GOODLY design language for MOOD2 metrics collection. In: 1999. Bunch: A clustering tool for the recovery and maintenance of
ECOOP Workshops, pp. 328–329. software system structures. In: ICSM, pp. 50.
Burgess, Colin J., Lefley, Martin, 2001. Can genetic programming improve Michael, Christoph C., McGraw, Gary, Schatz, Michael, 2001. Generat-
software effort estimation? A comparative evaluation. Inform. Soft- ing software test data by evolution. IEEE Trans. Software Eng. 27
ware Technol. 43 (14), 863–873. (12), 1085–1110.
Calliss, Frank W., 1988. Problems with automatic restructurers. SIG- Mitchell, Brian S., 2002. A Heuristic Search Approach to Solving the
PLAN Notices 23 (3), 13–21. Software Clustering Problem. Ph.D. thesis, Drexel University Phila-
Casais, Eduardo, June 1992. An incremental class reorganization delphia, USA.
approach. In: O. Lehrmann Madsen, (Ed.), Proceedings of the Mitchell, Brian S., and Mancoridis, Spiros. 2002. Using heuristic search
European Conference on Object-Oriented Programming, LNCS, techniques to extract design abstractions from source code. In
Utrecht, pp. 114–131. GECCO, pp. 1375–1382.
Chidamber, S., Kemerer, C.F., 1994. A metrics suite for object oriented Mitchell, Brian S., Raverso, Martin, and Mancoridis, Spiros. 2001. An
design. IEEE Trans. Software Eng. 20, 476–493, June. architecture for distributing the computation of software clustering
Ó Cinnéide, Mel, 2000. Automated Application of Design Patterns: a algorithms. In WICSA, pp. 181–190.
Refactoring Approach. Ph.D. dissertation, University of Dublin, Moore, Ivan. 1996. Automatic inheritance hierarchy restructuring and
Trinity College, Department of Computer Science, Available from: method refactoring. In: OOPSLA, pp. 235–250.
<http://www.cs.ucd.ie/staff/meloc/home/papers/thesis>. O’Keeffe, M., Ó Cinnéide, M., 2003. A stochastic approach to automated
Clark, John A., Dolado, José J., Harman, Mark, Hierons, Robert M., design improvement. In: Power, James F., Waldron, John T. (Eds.),
Jones, B., Lumkin, M., Mitchell, Brian S., Mancoridis, Spiros, Rees, Proceedings of the 2nd International Conference on the Principles and
K., Roper, Marc, Shepperd, Martin J., 2003. Formulating software Practice of Programming in Java, ACM SIGAPP. Computer Science
engineering as a search problem. IEE Proc. – Software 150 (3), 161– Press, Trinity College Dublin, Ireland, pp. 59–62.
175. O’Keeffe, M., Ó Cinnéide, M., 2006. Search-based software maintenance.
Dolado, José Javier, 2000. A validation of the component-based method In: Proceedings of the 10th European Conference on Software
for software size estimation. IEEE Trans. Software Eng. 26 (10), 1006– Maintenance and Reengineering (CSMR 2006), pp. 249– 260.
1021. Opdyke, William, 1992. Refactoring: a program restructuring aid in
Dolado, José Javier, 2001. On the problem of the software cost function. designing object-oriented application frameworks. Ph.D. thesis,
Inform. Software Technol. 43 (1), 61–72. Department of Computer Science, University of Illinois at Urbana-
Doval, D., Mancoridis, S., Mitchell, B.S., 1999. Automatic clustering of Champaign.
software systems using a genetic algorithm. In: International Confer- Roberts, Donald Bradley, 1999. Practical analysis for refactoring. Ph.D.
ence on Software Tools and Engineering Practice (STEP’99). thesis, Department of Computer Science, University of Illinois at
van Emden, Eva, Moonen, Leon, 2002. Java quality assurance by Urbana-Champaign, Adviser-Ralph Johnson.
detecting code smells. In: WCRE, p. 97. Tahvildari, Ladan, Kontogiannis, Kostas, 2004. Improving design quality
Fowler, Martin, 1999. Refactoring: Improving the Design of Existing using meta-pattern transformations: a metric-based approach. J.
Code. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, Software Mainten. 16 (4-5), 331–361.
USA. Wegener, Joachim, Baresel, André, Sthamer, Harmen, 2001. Evolutionary
Harman, Mark, Clark, John A., 2004. Metrics are fitness functions too. test environment for automatic structural testing. Inform. Software
In: IEEE METRICS, pp. 58–69. Technol. 43 (14), 841–854.
Harman, Mark, Hierons, Robert M., Proctor, Mark, 2002. A new Williams, K.P., September 1998. Evolutionary algorithms for automatic
representation and crossover operator for search-based optimization parallelization. Ph.D. thesis, University of Reading, UK, Department
of software modularization. In: GECCO, pp. 1351–1358. of Computer Science.

View publication stats

You might also like