0% found this document useful (0 votes)
25 views15 pages

2023 - Dependent or Not - Detecting and Understanding Collections of Refactorings

Uploaded by

liujiayu508
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views15 pages

2023 - Dependent or Not - Detecting and Understanding Collections of Refactorings

Uploaded by

liujiayu508
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

3344 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 49, NO.

6, JUNE 2023

Dependent or Not: Detecting and Understanding


Collections of Refactorings
Thiago Ferreira , James Ivers , Jeffrey J. Yackley , Marouane Kessentini , Ipek Ozkaya, Senior Member, IEEE,
and Khouloud Gaaloul

Abstract—Refactoring is a program transformation to improve


the internal structure of a program while preserving its external
behavior. Developers frequently apply multiple refactorings that
depend on each other to achieve goals such as improving code
reusability. Although manually applying a sequence of dependent
refactorings is a common practice, existing refactoring recommen-
dation tools treat refactorings in isolation without revealing the
dependencies among them to developers. One reason is that these
relationships among refactorings are poorly understood. Current
approaches treat refactoring recommendations as a strictly ordered
sequence limiting developers’ ability to understand, validate, and
Fig. 1. Output Example of a refactoring recommendations tool: JDeodorant.
apply recommended refactorings. To address this gap, this paper
describes a theory for reasoning about collections of refactorings
through defining an ordering dependency relation among refactor-
ings and organizing collection of refactorings as a set of refactoring
graphs. We propose an algorithm for identifying refactoring de-
pendencies and illustrate these concepts with a tool for visualizing in favor of meeting deadlines [2]. Lack of robust automated tools
such refactoring dependencies and refactoring graphs. Our valida- results in buggy and poor quality software that causes financial
tion results demonstrate that 43% of the 1,457,873 recommended
refactorings from 9,595 projects that we studied are part of depen- losses, high maintenance costs, increased fault-proneness, and
dent refactoring graphs. Furthermore, refactorings are not only delayed or canceled projects [3]. Software refactoring is widely
commonly involved in dependent relations, but also when applied, recognized as an effective approach for maintaining high quality
dependent refactoring graphs improve all of the quality attribute software by restructuring existing code without changing its
metrics in our experiments more than individual refactorings. external behavior [4].
Index Terms—Dependency, refactoring, search based software Resolving code smells and broader code quality improve-
engineering. ments across a project often requires applying multiple refac-
torings, which has lead to the creation of tools that recommend
I. INTRODUCTION collections of refactorings to developers [5]. Manually applying
a sequence of refactorings is common practice [5], [6], [7],
VEN for the most competent organizations, building and
E maintaining high performing software applications with
high quality-code is a challenging and expensive endeavor [1].
however these tools treat each refactoring in the sequence in
isolation. For instance, Cinnéide et al. [8] investigate the impact
of only individual refactorings on quality attribute metrics, such
Working in a fast-paced environment that demands frequent re- as using Move Method to reduce the coupling of a class, without
leases across several products and deployment environments, de- studying the impact of a sequence of refactorings. Bibiano et
velopers are often forced to compromise high quality standards al. [6] evaluates the relationships between refactoring types
(e.g., Move Class, Extract Interface) and code smells. However,
Manuscript received 23 December 2021; revised 7 December 2022; accepted the study is based on the assumption that refactorings are only
23 January 2023. Date of publication 17 February 2023; date of current version related if applied to the same code location, e.g., class. However,
14 June 2023. This work was supported by the Department of Defense under
Grant FA8702-15-D-0002, with Carnegie Mellon University for the operation most refactoring types in fact modify multiple code fragments
of the Software Engineering Institute, a federally funded research and develop- e.g., Move Method modifies two classes, the source and target
ment center. Recommended for acceptance by L. Tan. (Corresponding author: of the move.
Marouane Kessentini.)
Thiago Ferreira and Jeffrey J. Yackley are with the College of Innovation Current tools generate extensive lists of refactorings as recom-
& Technology, University of Michigan-Flint, Flint, MI 48502 USA (e-mail: mendations to developers. However, these lists present refactor-
[email protected]; [email protected]). ings as a sequence, with an implied strict ordering among refac-
James Ivers and Ipek Ozkaya are with the Software Engineering In-
stitute, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail: torings. In practice, however, many of these orderings are not
[email protected]; [email protected]). significant while others retain significant meaning. Developers
Marouane Kessentini and Khouloud Gaaloul are with the Department of lack a theory to tell the difference, which (among other things)
Computer Science and Engineering, Oakland University, Rochester, MI 48309
USA (e-mail: [email protected]; [email protected]). makes refactoring recommendations harder to understand than
Digital Object Identifier 10.1109/TSE.2023.3244123 is necessary. Fig. 1 shows an output example of the refactoring
0098-5589 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
FERREIRA et al.: DETECTING AND UNDERSTANDING COLLECTIONS OF REFACTORINGS 3345

recommendations of JDeodorant [9] where, similar to exist- attribute metrics. Finally, we conducted human validation with
ing automated refactoring tools, the dependencies between the 27 developers to manually evaluate the correctness of the de-
refactoring are not revealed, thus leaving the challenging task tected dependencies and their relevance.
of interpreting the sequence of refactoring recommendations to Our implemented algorithm achieved 100% in correctly de-
the programmers. tecting all dependencies between refactorings and identifying
The dependencies between refactorings is critical when de- invalid refactorings. Furthermore, our findings demonstrate that
velopers select which refactorings to be applied as this helps 43% of the 1,457,873 recommended refactorings are part of de-
them understand how a sequence of refactorings is dependent pendent refactoring graphs. This finding confirms that refactor-
and what is the impact of making changes in some of the ings are commonly involved in dependent relations and cannot
recommended refactorings. The integration of refactoring de- be applied truly independently. Furthermore, dependent refac-
pendencies in recommender tools is still lacking in existing torings improve all six QMOOD quality attribute metrics [13]
research. in our experiments better than independent refactorings. The
To close this gap, in this paper we describe a theory for rea- manual validation of the refactorings by 27 developers shows
soning about collections of refactorings through a definition of that all the identified dependencies are correct for a sample of
ordering dependencies among refactorings and an algorithm for 233 refactorings after applying them directly on the code of 61
identifying these dependencies. We aim to improve the accuracy open source projects based on the order proposed by DPRef. The
of refactoring recommendation tools by detecting refactoring post-study survey with the developers confirmed the relevance of
dependencies, which allows the developers to efficiently inter- detecting the dependencies to help them understand the sequence
act with such refactoring recommendation tools. We propose of recommended refactorings.
defining refactoring recommendations as sets of refactoring The authors also provide a replication package1 that includes
graphs rather than as refactoring sequences. We illustrate these the refactoring dependency detection tool and necessary data for
concepts with a tool for visualizing refactoring dependencies our large scale validation. The replication package will enable
and refactoring graphs. researchers and tool builders to integrate the refactoring de-
Refactorings, when formalized, have clear pre-conditions pendency feature into existing refactoring recommendation and
defining the circumstances in which they can be applied and detection tools and further investigate the relationships among
post-conditions defining the effects of applying them. For in- refactorings.
stance, one of the pre-conditions of a Move Method refactoring The rest of the paper is organized as follows: Section II
is that the method exists in the class from which it will be moved discusses the related work and presents a motivating exam-
and one of its post-conditions is that the method must exist in ple; Section III provides our definitions of refactoring depen-
the target class afterward. Therefore, a refactoring dependency dencies and an algorithm to detect them; Section IV presents
exists when any post-condition of one refactoring matches any and discusses the obtained results; Section V highlights the
pre-condition of another refactoring e.g., the method exists in threats to validity; Section VI summarizes our research agenda;
the relevant class. These linked refactorings then can be orga- and Section VII concludes.
nized into groups based upon their dependencies. We represent
these groups as directed acyclic graphs, where the nodes are II. RELATED WORK AND MOTIVATING EXAMPLE
the refactorings and the edges are the dependencies. This ap-
proach offers three main benefits: 1) developers can quickly Catalogs like Fowler’s [4] have identified many types of
and intuitively understand the dependencies among refactorings refactorings e.g., Move Method, Extract Class, Pull Up Field,
that constrain recommendations e.g., which refactorings must each of which is a semantics preserving code transformation
be done together; 2) developers can more easily compare rec- that improves code structure. Developers routinely apply such
ommendations by focusing on essential, rather than accidental, refactorings in their day-to-day work, with modern development
differences; and 3) tool builders can easily integrate new features environments providing limited support for applying selected
to detect invalid refactorings and improve their recommendation refactorings as directed by a developer. In this section, we
algorithms. survey two categories of refactoring research: recommendation
We validated our proposed theory based on 1,457,873 refac- tools and refactoring dependencies. We then summarize the
torings recommended for 9,595 Java projects publicly available challenges addressed in this paper through a motivating example.
on GitHub. We considered 14 types of refactorings that are most
commonly used in practice [10], [11]. We also developed a web A. Related Work
tool, DPRef, that implements the proposed ordering dependency
1) Refactoring Recommendations: There has been both in-
detection algorithm. It transforms a refactoring sequence recom-
dustry and research interest in developing automated and semi-
mended by existing refactoring tools [9], [12] into refactoring
automated refactoring tools to support developers [14]. One
graphs based on the detected dependencies. We conducted ex-
representative example is JDeodorant, the tool proposed by
periments to evaluate the correctness of detected dependencies,
discover what portion of refactorings in recommendations are
actually dependent rather than independent, and estimate the
potential impact of dependent refactorings on several quality 1 https://sites.google.com/umich.edu/refactoring-dependency/home

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
3346 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 49, NO. 6, JUNE 2023

TABLE I
REFACTORING TYPES AND THEIR PRE- AND POST-CONDITION RULES

Tsantalis and Chatzigeorgiou [9]. JDeodorant and similar rec- a multi-criteria code refactoring approach aimed at optimiz-
ommendation tools [11], [15], [16], [17], [18], [19] generate ing contrasting objectives: (i) minimizing the number of code
recommendations as sequences of refactoring instances. The smells; (ii) minimizing the refactoring cost (i.e., the number
experiments described in this paper take this form of refactoring of recommended refactorings); (iii) preserving the design se-
recommendation as input. Thus, our discussion in this sec- mantics (meaning considering textual information embedded in
tion focuses on this category of studies. We point the interested code identifiers and comments in the refactoring recommenda-
reader to the survey by Bavota et al. [20] for an overview of tion); and (iv) maximizing the consistency with code changes
approaches supporting code refactoring recommendations. In performed over the system’s change history. In this study, we
another refactoring recommendation tool, O’Keeffe and Cin- use the refactoring recommendations generated by this tool
néide [21] formulate refactoring tasks as a multi-objective search based on 1) its superior performance compared to the state
problem to generate alternative designs by applying a sequence of the art [12]; 2) the large number of supported refactoring
of refactoring operations. Such a search is guided by a quality types, and 3) its being publicly available. The contribution of
evaluation function based on eleven object-oriented design met- this paper is not generating refactoring recommendations. Any
rics that reflect refactoring goals. Harman and Tratt [17] were the refactoring recommendation approach can be used to generate
first to introduce the concept of Pareto optimality to search-based the refactorings (input of our proposed approach) if they support
refactoring. They used it to combine two metrics, namely CBO some or all of the refactoring types summarized in Table I.
(Coupling Between Objects) and SDMPC (Deviation of Meth- 2) Refactoring Dependencies: Chavez et al. [26] investi-
ods Per Class), into a fitness function and showed its superior gated how refactoring types affect five quality attributes based
performance as compared to a mono-objective technique [17]. on the version history of twenty-three open source projects.
The two aforementioned studies [17], [21] paved the way for They found that 94% of refactorings are applied to code with at
several search-based approaches aimed at recommending refac- least one low quality attribute value, with 65% of refactorings
torings [12], [15], [22], [23], [24], [25]. improving attributes and 35% of all refactorings being neutral
A representative example of these search-based refactoring on the system. Similarly, Cinnéide et al. [8] studied the impact
techniques is the work by Ouni et al. [12], who propose of individual refactorings on quality attributes, such as using

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
FERREIRA et al.: DETECTING AND UNDERSTANDING COLLECTIONS OF REFACTORINGS 3347

Move Method to reduce the coupling of a class. None of these B. Motivating Example
studies considered the impact of a sequence of refactorings on
The key to applying refactorings successfully is the decision
the quality attributes.
of which refactorings to apply and where to apply them. In
Murphy-Hill et al. [7] investigated refactoring tool usage
essence, this requires a developer to instantiate a refactoring
through both sampling developers’ code and manually checking
by supplying the parameters that allow a type of refactoring to
if their refactorings were performed with tool support and looked
be unambiguously applied to code. For example, to instantiate a
at 240,000 tool-assisted refactorings to find assumptions on how
Move Method, a developer must supply parameters that indicate
developers, in general, refactor code. They ultimately concluded
which method to move and where to move it. Throughout this
that refactoring tools are rarely used by developers in practice
paper, when we talk about refactorings and dependencies among
with 90% of refactorings being performed manually, and that
refactorings, we are talking about refactoring instances.
40% of refactorings occur in batches.
While refactoring recommendations generated by tools to
Bibiano et al. [6] analyzed batch refactoring characteristics
mimic this activity are typically represented as sequences, not
and their effects on code smells in open and closed source
all orderings in these sequences are significant. That is, the same
projects and concluded that 57% of batches/patterns are simple
code could be generated by two solutions that contain the same
compositions of only two types of refactorings. They high-
refactorings, but simply apply them in a different order. This is
light lack of tool support to automatically detect refactoring
because while many refactorings are independent of one another,
dependencies as a barrier. However, this study is based on the
other refactorings are dependent on each other such that remov-
assumption that refactorings are only related if applied to the
ing or reordering a refactoring from a solution could make other
same code location, which often is not the case for types of
refactorings invalid. These refactoring recommendations tools
refactorings that modify multiple code fragments.
typically do not intend to imply a strict ordering of dependencies
Mens et al. [27], [28] define and detect mutual exclusions, se-
among elements of the sequence. The sequence is simply a
quential dependencies and asymmetric conflicts between refac-
concise means to communicate multiple steps. In that sense,
torings. These studies analyze dependencies at the model-level
many tools that report a sequence of refactorings today may
working with UML and they use graph transformation tech-
communicate an unintended meaning (strict sequential order)
niques to detect invalid refactorings. The detection of conflicts
that this work can help clarify for users of such tools. With the
between refactorings at the model level (UML) is based on a
current growth of interactive tools to support refactoring [11],
set of rules (matrix where the lines and columns are model
[18], developers are offered solutions that contain dozens to
refactoring types) that are manually defined. The type of refac-
hundreds of refactorings and the option to selectively apply
torings is different and simplified compared to code-level ones.
elements of a solution. Without a theory for reasoning about
Furthermore, the authors were looking for mutually exclusive
refactoring dependencies, developers can inadvertently make
UML refactorings rather than detecting dependencies.
decisions (e.g., ignoring part of a solution) that result in failure
Liu et al. [29], [30] propose a conflict-aware scheduling
(code cannot be successfully refactored) and enter a tedious
approach, which schedules refactorings according to the conflict
trial and error loop. Making refactoring dependencies visible
matrix of refactorings and effects of each individual refactoring
improves developers’ understanding of how refactorings work
using a multi-objective optimisation model. In this work, the au-
together and allows them to make sound inferences regarding
thors focused on identifying the best schedule to apply refactor-
their application.
ings where the conflicts are defined based on which code smells
Fig. 2 shows a simplified example of a solution composed of 5
to be fixed. Thus, the same refactorings fixing different code
refactorings to be applied to the JFreeChart project. Three of the
smells or applied in the same locations are grouped together.
refactorings (#3, #4, #5) depend on another refactoring (#2)
The notion of dependencies is defined in our work in a different because the Extract Super Class refactoring (#2) creates a new
way than Liu et al. where they are more about the conflicts
class (Class_7), on which refactorings #3, #4, and #5 operate.
between the refactoring themselves and not their goals.
If the new class is not created first, then refactorings #3, #4,
Sousa et al. [31] identify and analyze composite refactorings and #5 will fail. Thus, there exists an ordering dependency from
within and across commits from the commit history of 48
each of #3, #4, #5 to #2. Refactoring #1, however, has dis-
GitHub software projects. The concept of defining dependencies
tinct parameters, indicating that it operates on different code ele-
in this work is different than our paper where the dependency ments, thus it has no ordering dependencies on any others in this
is about grouping the refactorings applied within the same
solution. Presenting these dependencies to a developer clarifies
commits/locations. the options that the developer has. For example, the developer
Overall, existing studies do not provide a rigorous definition of could choose not to apply any refactoring except for refactoring
ordering dependencies among refactorings. They mainly define
#2 without consequences; if the developer chooses not to ap-
what might be better considering similarity relations, such as a ply refactoring #2, then refactorings #3, #4, and #5 cannot
collection of refactorings that have similar effects (fixing a code be applied either. Detecting ordering dependency relationships
smell) or similar context (applied by the same developer or to
among refactorings is essential to more effectively applying
the same code location) [32], [33]. refactorings.

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
3348 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 49, NO. 6, JUNE 2023

Fig. 2. A simplified solution of 6 refactorings for the JFreeChart project.

III. REFACTORING DEPENDENCY THEORY graphs (Algorithm 1) results in a set of graphs with the following
traits:
The refactoring dependency theory for reasoning about col- r Each refactoring instance is an element of exactly one
lections of refactorings is built upon two concepts. The first is the
refactoring graph.
definition of an ordering dependency relation among refactor- r Some graphs contain a single refactoring instance because
ings in a collection of refactorings. Pre- and post-conditions for
that refactoring is truly independent of all others. We
refactoring types are used to detect refactoring dependencies,
i.e., a set of predicates associated with each refactoring that call these trivial graphs comprised of a single node of a
refactoring instance.
reflect changes. The second is the organization of a collection r The remaining graphs contain multiple refactoring in-
of refactorings as a set of refactoring graphs. Together, these
concepts improve our ability to understand the meaning of stances, each of which is part of one or more dependencies.
We call these non-trivial graphs.
collections of refactorings, allowable operations on them, and r Each refactoring graph is independent of every other graph
their composition in practice.
In this section we describe the elements of our proposed in the solution.
Refactoring recommendations typically comprises a collec-
theory, the algorithm for detecting refactoring dependencies
tion of compatible refactorings, and as such positive dependen-
and an associated web tool that implements this detection
algorithm. cies are more relevant to common use cases. The idea of negative
refactorings would be more applicable if a recommendation
contained mutually exclusive advice (e.g., three Move Method
A. Definitions refactorings that move the same method to three different loca-
Our proposed dependency relation captures an ordering de- tions). This is not the common use case, but this work would
pendency between pairs of refactoring instances. Specifically, be easily adapted. The essence of identifying a refactoring that
an ordering dependency (rf2 → rf1 ) between two refactoring precluded (or invalidated) another could be performed using
instances (rf1 and rf2 ) exists when rf2 can only be successfully the same pre- and post-conditions, but with a modification to
applied after rf1 has been applied. That is, rf1 makes a change to check for differences rather than commonalities (e.g., refac-
code that is necessary in order to apply rf2 . This condition can be toring #1’s post-condition moves the location of a method to
evaluated based on the combination of pre- and post-conditions class A and refactoring #2’s pre-condition requires that same
of the types of refactorings involved and the parameters of method to reside in class B). It may require additional work to
each refactoring instance. For example, to apply Move Method consider the initial state of a program, but the same principles
(a type of refactoring) to move method m1 from class c1 to would likely apply. existing tools that do not use refactoring
class c2 (m1 , c1 , and c2 being the parameters of the refactoring dependencies.
instance), several preconditions must hold (e.g., m1 , c1 , and c2 In this paper, we are planning to use refactoring dependencies
must all exist and m1 must be defined on c1 ). The pre- and to present recommendations in terms of refactoring graphs that
post-conditions of each type of refactoring will be described in clearly convey when an order is required and when it is not.
the next sub-section. Most recommendations based on this work will not imply a
Building on this ordering dependency definition, we organize strict sequential order for all refactorings. For instance, let us
collections of refactorings as sets of refactoring graphs rather consider the example of the non-trivial graph shown in Fig. 2 as
than as sequences of refactorings. A refactoring graph is a an example. In an approach based on refactoring dependencies,
weakly connected directed acyclic graph composed of refac- #4 and #5 can only be applied after #2, but no ordering is
toring instance vertices and ordering dependency edges. Using implied between #4 and #5. The sequential order is critical
the ordering dependencies as the basis for forming refactoring when the refactorings are dependent to each other.

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
FERREIRA et al.: DETECTING AND UNDERSTANDING COLLECTIONS OF REFACTORINGS 3349

1) Refactoring Pre- and Post-Conditions: As our approach


Algorithm 1: Dependency Detection Algorithm.
for detecting ordering dependencies relies on the pre- and post-
conditions of types of refactorings, we began with validated
conditions in the current literature [4], [12], [21], [32] related
to 14 types of refactorings. We selected the refactoring types
summarized in Table I since they were those most frequently
used in practice based on existing studies [7], [10], [26] and
since our work focuses more on complex/composite refactoring
operations, which have more complex/sophisticated pre/post
conditions. The pre- and post-conditions sets published in cur-
rent literature were extensively validated for correctness and
completeness [4], [12], [21], [32]. The complete list of updated
pre- and post-conditions for the 14 supported types of refactor-
ings are organized in Table I. In this table, the post-conditions
that are presented are only those that represent a change. This
is important for the dependency detection algorithm, allowing
it to efficiently identify changes that enable pre-conditions of
dependent refactorings. Also, the functions used for describing
the pre- and post-conditions and their meanings are defined
in [12].

B. Algorithm for Detecting Refactoring Dependencies


Algorithm 1 describes the process for detecting refactoring
ordering dependencies. These dependencies are detected based
on comparisons between pre- and post-conditions of refactoring
Fig. 3. Execution of Algorithm 1 on the example of Fig. 2.
instances. The proposed algorithm takes a list of refactoring
instances as input and generates a set of refactoring graphs as
output.
Fig. 3 illustrates the application of the algorithm to this
Lines 1 and 2 initialize the lists of refactoring instances (nodes
example. In Step 1, the post-conditions of refactoring #1 are
of the graph, V ) and refactoring dependencies (edges of the
compared to the pre-conditions of all remaining refactorings.
graph, E). Then, the post-conditions of each refactoring instance
Since there is no match among pre- and post-conditions for any
of the solution C (collection of refactorings) are evaluated for
of them, no dependency is added. Next, the post-conditions of
matching with the remaining refactoring instances in C (Lines
refactoring #2 are compared in Step 2 to the pre-conditions
3–13). Specifically, the algorithm looks for any match between
of the next refactorings in the sequence (#3 - #6). Again, no
predicates of pre- and post-conditions from Table I. That is, if any
match is found. In Step 3, the post-conditions of refactoring
predicate of the post-condition of one refactoring (any element of
#3 are compared to pre-conditions of refactorings #4, #5,
P ) matches any predicate of the pre-condition of another refac-
and #6. For each, a match exist(Class_7) is found. Thus, three
toring (any element of Q), then a dependency has been detected
dependencies are added, from each of #4, #5, and #6 to
and an edge is added to the graph between those refactorings
#3. The algorithm continues, but no additional matches are
(Lines 5–10). We repeat this process until all the refactorings
found. Thus, Algorithm 1 transforms this sequence into three
have been visited. Then, Lines 14 identifies the different trivial
refactoring graphs (two trivial graphs and one non-trivial graph
and non-trivial graphs that are formed based on the detected
that includes four refactorings).
dependencies.
When the developers interact with the tool by modifying
To illustrate Algorithm 1, consider the motivating example
or rejecting some of the refactorings then the dependencies
described in Fig. 2. This example contains six refactoring in-
detection algorithm is re-executed to check the impact of those
stances:
changes on the graphs. Thus, all the graphs will be updated
instantly during those interactions.
#1 ExtractClass(OverwriteDataSet;Class_6;[x]; [add-
ChangeListener]) C. DPRef
#2 MoveField(PowerFunction2D;BooleanList;[b];[])
#3 ExtractSuperClass(CSV;Class_7;[fieldDelimiter]; To validate our definitions and algorithm and to make our
[extractRowKeyAndData]) refactoring dependency detection approach available to the com-
#4 MoveField(Class_7;EventObject;[fieldDelimiter];[]) munity, we implemented DPRef [34], a free and open-source
#5 PullUpField(CSV;Class_7;[textDelimiter];[]) web platform that allows users to provide a sequence of refac-
#6 PullUpMethod(CSV;Class_7;[];[readCategory torings to be applied as input and generates a set of refactoring
Dataset]) graphs as output.

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
3350 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 49, NO. 6, JUNE 2023

TABLE II
QMOOD QUALITY METRICS

Fig. 4. DPRef, a tool for detecting refactoring dependencies.

Fig. 4 shows DPRef’s output, which includes the generated set


of refactoring graphs and associated information like the impact
on quality attributes for each graph or individual refactoring. Our
tool automatically applies the refactoring graphs on the source
all generated refactoring solutions to understand if and how
code then the quality metrics are calculated on the code after
refactorings are applied together rather than in isolation. Finally,
applying those refactorings. Then, developers are able to change
we studied and compared the impact of refactoring graphs on
the generated refactoring graphs and apply filters to show, for
different well-defined quality attributes, based on the QMOOD
instance, just the graphs that significantly improve specific qual-
model [13] detailed in Table II.
ity attributes. DPRef also includes an Eclipse plug-in to execute
To answer RQ1, we use two methods: automated and manual
any refactorings selected by the user on the actual source code.
correctness. We used these complementary methods because the
The tool is available as part of our replication package.
manual evaluation can be error-prone, time-consuming and not
scalable while the automated evaluation may lack insights and
IV. EMPIRICAL STUDY feedback from developers.
In this section, we present our research questions, validation For the automated correctness, we removed refactorings from
methodology, experimental setup, and discuss our findings. Our valid non-trivial graphs and determined whether the removal
implementation, data and results are publicly available [34]. invalidated the graph. We define a valid graph as a refactoring
graph for which all pre-conditions of all refactorings hold (Table
A. Research Questions I). As defined earlier, if an ordering dependency exists between
two refactorings, for instance, rf2 is the head refactoring (the
The following research questions guide the evaluation of our
refactoring depending on another) and rf1 is the tail refactoring
refactoring dependency theory:
(the refactoring on which another depends) for a refactoring
RQ1. (Precision) To what extent are the detected refactoring
dependency (rf2 → rf1 ), then the head refactoring can only be
ordering dependencies correct?
successfully applied after the tail; removing any tail refactoring
RQ2. (Relation) To what extent are refactorings dependent?
should then invalidate at least one pre-condition of a head
RQ3. (Improvement) To what extent do non-trivial refac-
refactoring. As such, our specific test was to remove one tail
toring graphs improve quality attributes compared to
refactoring from each valid non-trivial graph with more than
trivial refactoring graphs (independent refactorings)?
two refactorings and then count the number of valid and invalid
We collected data from 9,595 open source repositories to
non-trivial graphs. To answer RQ1, we calculated the Rate of
evaluate the correctness of the detected ordering dependency
Correctness (RC) after removing one tail refactoring from all
relationships among refactorings. For each project, we executed
non-trivial graphs as follows:
existing refactoring recommendation tools [9], [12] to find rele-
vant refactorings to be applied on the code of those projects. In
this study, we use the refactoring recommendations generated #of Invalid Non − trivial Graphs
by those tools based on their superior performance compared RC = . (1)
#of Non − trivial Graphs
to the state of the art with over 90% of precision, recall and
manual correctness based on large open source and industry
projects; the large number of supported refactoring types; and Our assertion is that the refactoring dependency detection
its being publicly available. We describe the parameter settings algorithm is correct if all non-trivial graphs become invalid if at
used by such refactoring recommendation tools in the next least one tail refactoring is removed from each, as above.
section. Finally, we detected dependencies among refactorings For the manual evaluation, we asked 27 full-time developers
and generated refactoring graphs based on Algorithm 1 for to manually check the correctness of 50 valid non-trivial graphs

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
FERREIRA et al.: DETECTING AND UNDERSTANDING COLLECTIONS OF REFACTORINGS 3351

totaling 233 refactorings for 5 open source projects2,3,4,5,6 that TABLE III
STATISTICS OF THE SUBJECT PROJECTS
contained at least 5 K LOC and involved significant refactorings
in the last 2 years. The graphs are a sequence of refactorings and
some of them are large in size. We made sure during the sampling
process to use the following criteria to avoid any bias in the se-
lected refactorings for the manual validation: refactoring types,
projects size, projects domain, and locations of the refactorings
(files).
The participants were asked to use our tool to identify refac-
toring dependencies, assess the correctness of those dependen- and the design metrics from QMOOD. We compared the number
cies, and apply and compile the refactorings. First, developers of graphs that improve the quality attributes and design metrics
checked the applicability and ability to properly compile the from QMOOD [13] for both trivial and non-trivial graphs. We
refactorings without producing any conflicts. Each refactoring also considered the rates of improvement, in percentage, for
that caused the code to fail to compile was deemed invalid and all graphs taking into consideration the reusability, flexibility,
was discarded. Then, they identified the refactorings for which understandability, functionality, extendibility, and effectiveness
the set of dependencies was both correct and complete among quality attributes captured by QMOOD metrics and available
all the applicable and compilable refactorings. When checking in Table II, as well as, basic metrics such as coupling, cohesion,
correctness, developers were asked to evaluate each refactoring etc. We also calculated a Total Quality Index (TQI), aggregating
independently to determine whether each identified dependency all the metrics, after normalization, with equal weights into one
was necessary and whether there were any missing dependencies metric.
to evaluate the accuracy of both tail-related dependencies and These evaluation metrics are useful to understand the im-
those not connected to the tail. The participants looked to the pact of collections versus individual refactorings on improving
generated dependencies graph of the refactorings using our the quality and which quality attributes are more likely to be
visual representation in the DPRef tool. Then, they reviewed significantly improved using non-trivial graphs or independent
the code before and after applying any selected refactorings. In refactorings. We want to also highlight that the comparison
case of any doubt, the participants can select the refactorings of non-trivial graphs and trivial graphs could be the result of
from the graph and they will be automatically executed on the the number of refactoring instances, instead of the result of
code. Conflicting refactorings may generate errors in the code dependency.
which may confirm the missing dependencies. In this case, the
refactoring is considered as invalid in this exercise. Otherwise,
B. Experimental Settings
a refactoring for which the set of dependencies was both correct
and complete is considered as a valid refactoring. We define a We considered a total of 9,595 open-source Java projects
manual correctness score (MC) as provided by [35] to address the above research questions. The
selection process limited consideration to projects with ≥ 5 k
# of Valid Refactorings LOC and at least 2 collaborators. We also eliminated any du-
MC = . (2)
#of Evaluated Refactorings plicate (cloned) projects from consideration. We applied these
To answer RQ2, we calculated the number of dependencies criteria on the list of one million GitHub projects. We performed
(edges) and graphs (trivial and non-trivial) for all projects. We this selection process in an attempt to eliminate small projects,
also counted the number of refactorings in non-trivial graphs such as student projects and small hobby/learner programs, that
and the most frequently occurring refactoring types in them, as were not likely to be good candidates for refactorings.
well as the Non-Trivial Rate (NTR) defined as follows: Table III shows the min, average, and max for the number of
the collaborators, code size (in LOC), # of classes, and # of
#of Refactorings inNon-trivial Graphs recommended refactorings generated. The list of subject projects
NTR = . (3)
#of Refactorings is also available in the replication package, along with all results
(e.g., refactorings, quality metrics, refactoring graphs, etc.).
These evaluation metrics allow us understand the extent of The total number of refactorings collected from recommen-
refactoring dependencies. Furthermore, we can evaluate the dations for these 9,595 projects is almost 1.5 million (1,457,873
refactoring types that are less commonly applied in isolation and refactorings). We used the parameter settings recommended by
also understand the complexity of the non-trivial graphs based the authors of the refactoring recommendations tool [12]: Single
on their sizes. To answer RQ3,we consider all the trivial and non- Point crossover with probability = 0.7, Bit Flip mutation with
trivial graphs to evaluate their impact on the quality attributes probability = 0.4, and stopping criterion was set to 100,000 eval-
uations. We also set the initial population size to 100 and utilized
a tournament selection operator with n=2. The minimum and
2 https://github.com/phunware/maas-ads-android-sdk
3 https://github.com/solita/query-utils
maximum number of refactorings per solution are limited to 150
4 https://github.com/forge/roaster and 200, respectively.
5 https://github.com/goobi/goobi-ugh For the manual validation of the refactoring dependencies,
6 https://github.com/kongchen/swagger-maven-example we recruited 27 full-time developers from our networks, each of

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
3352 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 49, NO. 6, JUNE 2023

TABLE IV
SELECTED PARTICIPANTS

Fig. 6. Boxplots of refactoring dependency correctness for the 9,595 projects.

detecting refactoring dependencies along with the usability of


DPRef. The samples used in the manual validation study as well
as the full details of our pre-study survey results can be found
in the replication package.

C. Results and Discussions


1) Results for RQ1: Fig. 6 summarizes the distribution of
Fig. 5. Number of refactorings per non-trivial graph in each dataset. the results of removing one tail refactoring from all non-trivial
graphs with more than two refactorings. One important outcome
is that all non-trivial graphs become invalid after removing a tail
whom was unaware of our algorithm. These participants were refactoring (RC), which confirms that the proposed algorithm ac-
first asked to fill out a pre-study questionnaire containing six curately identifies refactoring dependencies. The RC evaluation
questions. The questionnaire helped to collect background infor- metric has a value of 1.0 (or 100%) across all the 9,595 projects,
mation such as their role within the company, their programming i.e., for all non-trivial graphs. The total number of non-trivial
experience, and their familiarity with software refactoring. The graphs is 257,725 with an average of 26.8 per project.
list of the pre- and post-study questions of all the questionnaires, Another interesting result was that our algorithm detected
the validation data and the obtained results can be found in the many invalid refactorings among the solutions generated by the
online appendix, which can be found on the Computer Society tool of Ouni et al. [12]. An average of 6.2 invalid non-trivial
Digital Library at http://doi.ieeecomputersociety.org/10.1109/ graphs were identified per project. The main reason is that
TSE.2023.3244123. Although the vast majority of participants the crossover operator used to exchange refactorings between
were already familiar with refactoring as part of their jobs and solutions did not ensure that refactoring pre-conditions would
graduate studies, all the participants attended a two-hour lecture remain satisfied after the exchange. As discussed in the future
on refactoring by two the organizers of the experiments. The work section, the theory proposed in this paper can be integrated
details of the selected participants can be found in Table IV, into existing refactoring recommendation tools to improve their
including their years of programming experience, familiarity correctness and contribute to the definition of intelligent change
with refactoring, etc. operators (including crossover) for search-based refactoring.
All participants had a minimum of 6 years experience and Non-trivial graphs that were initially invalid were excluded from
work as active programmers with strong backgrounds in refac- the calculation of RC and the removal of refactorings shown
toring, Java, and software quality metrics. We divided the partic- in Fig. 6.
ipants into 5 datasets where each dataset contains 10 samples of For the manual evaluation, all the non-trivial graphs were
valid non-trivial graphs. We selected these samples based on the correctly executed by the participants on the open source projects
distribution of the refactoring types and number of refactorings and they agreed that the dependencies were correctly and com-
in each graph as described in Fig. 5. pletely identified for each refactoring, thus the MC scores were
Each participant was asked to assess the correctness of the 100% on all the selected datasets as shown in Fig. 6. Thus, the
refactoring dependencies and to identify missing dependencies manual correctness results confirm the automated method.
between the refactorings using our DPRef tool. They were The fact that a refactoring was recommended by Ouni et
asked to execute the sequence of refactoring using our Eclipse al. [12] and was not applied before by developers does not mean
plug-in and compile the code after applying the refactorings that the recommendation is not correct but simply that probably
of each valid non-trivial graph. In addition to evaluating the the developers may not think about that refactoring and may
refactorings, the participants were asked to configure, run, and not have time to do it. The manual validation scores of [12]
interact with the tool on the different systems. We assigned tasks are more than 90% on large scale systems which means that the
to the participants according to the datasets and developers’ manual check by developers confirmed that the vast majority
experience. Each participant was given a post-study survey. This of the recommendations are correct and useful. We clarified
second survey was more general as it collected the practitioners’ these observations in the validation section when explaining our
opinions and their perception of the importance and relevance of choices.

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
FERREIRA et al.: DETECTING AND UNDERSTANDING COLLECTIONS OF REFACTORINGS 3353

Fig. 8. Distribution of refactorings in trivial versus non-trivial graphs based


on the 9,595 projects.

Fig. 7. Participant survey.

Maturation of automated tools to assist developers with com-


plex tasks such as refactoring is an important gap to fill. The
developers that we surveyed also agree with this. All the partici-
pants rated the importance of detecting refactoring dependencies
as important or rather important, which confirms the need for
tools to detect dependencies in a refactoring sequence. The vast Fig. 9. Size of non-trivial refactoring graphs in the 9,595 projects.
majority of participants (24 out of 27) also rated the task of man-
ually identifying these dependencies as difficult or very difficult
based on their experiences in using existing refactoring tools.
2) Results for RQ2: Fig. 8 shows that while truly independent
These developers described that the most important motivations
refactorings are more common, the mean NTR shows that more
of automatically detecting dependencies is to understand how
than 40% of all recommended refactorings are part of non-trivial
they can fix code smells (typically fixed using a sequence of
graphs and for some projects, all refactorings are part of a single
refactorings) and also to reduce the refactoring effort by applying
non-trivial graph (NTR=1.0). The portion of refactorings that
refactorings incrementally.
are part of refactoring dependencies is significant.
We asked also the participants to rate their agreement on a
Across all projects, non-trivial graphs have a mean number
Likert scale from 1 (complete disagreement) to 5 (complete
of dependencies of almost 40 (recall that this is based on so-
agreement) with the following question on how important it is
lutions with a mean of roughly 150 refactorings), with nearly
to detect refactoring dependencies automatically.
100 dependencies observed for some projects. This indicates
As shown in Fig. 7, all the participants rated this feature as
a range of connectedness in non-trivial graphs, with the more
important or rather important which confirms the need for tools
highly dependent/coupled likely being difficult for developers to
to detect dependencies in a refactoring sequence. The majority
understand and having a high risk of generating large numbers of
of participants (14 out of 27) also rated the task of manually
invalid refactorings if they are not applied together. We noticed
identifying these dependencies as difficult/very difficult based
that the most connected refactoring graphs are more likely to be
on their experiences in using existing refactoring tools. Fig. 7
associated with small projects in which many refactorings are
also describes that developers think that the most important
applied to common code locations.
motivations of automatically detecting dependencies is to un-
Comparing the range and distribution of the number of graphs,
derstand how they can fix code smells (typically fixed using a
most projects have a mean of more than 100 graphs per solution,
sequence of refactorings) and also to reduce the refactoring effort
in which the number of trivial graphs is greater than non-trivial.
by applying refactorings in an incremental way. The figure also
However, as shown by NTR, the number of refactorings in
shows that DPRef was easy to use by at least 18 out of the 27
non-trivial graphs is very similar or exceeds the number of
participants and they were able to understand the dependencies
refactorings in trivial graphs.
without complications.
Fig. 9 shows the distribution of the size of the non-trivial
graphs; notice that there is a small number of non-trivial graphs
Key findings: To answer RQ1, the proposed algorithm for including 31+ refactorings. However, the vast majority of non-
detecting refactoring dependencies achieved 100% correct- trivial graphs include 2-5 refactorings. This may confirm that
ness on all projects. collections of dependent refactorings tend to be small, which

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
3354 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 49, NO. 6, JUNE 2023

Fig. 12 shows the rate of improvement in % for Effective-


ness, Extendibility, Flexibility, Functionality, Reusability and
Understandability. This data shows that, for most projects, the
improvement caused by the non-trivial graphs is greater than for
trivial graphs. The implication is that non-trivial graphs may be
more useful in practice for developers than marginal improve-
ments obtained by individual refactorings. Making refactoring
dependencies explicit allows users to see, for example, that disre-
garding a particular refactoring may invalidate several others in
the same refactoring graphs. As non-trivial graphs often result in
greater improvements, that decision may be more significant to
the user. Furthermore, some metrics, like Extendibility (note the
different scale) and Reusability, are more significantly improved
using dependent refactorings. Even small changes are signifi-
cant when considered as aggregate measures across an entire
Fig. 10. Distribution of the refactoring types among non-trivial graphs for the code base. This result is consistent with the fact that Extract
9,595 projects.
[Super/Sub] Class refactorings are more likely to occur within a
non-trivial graph and that these refactorings are natural choices
for improving the Extendibility and Reusability of software.
can offer flexibility to developers if they want to modify these The results are also consistent with the test which returned
collections. Projects vary, however, with several including non- p-values lower than 0.05 for Functionality and Reusability, while
trivial graphs with 10-50 refactorings and with one including a the p-value is greater than 0.05 for the remaining metrics. How-
graph of 138 refactorings. ever, quality metrics may be conflicting, hence the execution
Regarding refactoring types, Fig. 10 shows that the most of a refactoring could increase some metrics and deteriorate
common refactoring types found in non-trivial graphs are Ex- others. The cumulative impact will depend on the types of the
tract Class, Extract Super Class and Extract Sub Class. This refactorings in the non-trivial graph. Indeed, several studies
finding confirms that modifying the hierarchy of code requires a examined relationships and correlations between quality metrics
combination of several refactorings and cannot be done with and confirmed the possibility of a conflict between two quality
one isolated refactoring. Furthermore, the figure also shows metrics, which means that improving one quality may seriously
that Decrease Method Security, Encapsulate Field, and Increase degrade another such as coupling and cohesion. For instance, it
Field Security are the least common refactoring types in non- has been found that maintainability and efficiency are negatively
trivial graphs. It most likely indicates that these refactorings can correlated, as are functionality and understandability, perfor-
be applied independently without requiring major restructuring mance and evolvability, or performance and reliability. Negative
effort. relationships between extensibility and understandability, for
example, result from the fact that systems with extensional
features are more challenging to learn and maintain. Most of the
Key findings: To answer RQ2, while more refactorings
time, a solution entails prioritization and a compromise when
appear in trivial graphs than non-trivial graphs, the difference
applying refactoring, which could increase some metrics and
is not large. The mean value of NTR is 43%, indicating nearly
deteriorate others.
half of all refactorings participate in refactoring dependencies
and as such cannot be applied without consideration of other
refactorings. Some refactoring types are more likely to be Key findings: To answer RQ3, non-trivial refactoring
applied with other dependent refactorings than others. graphs improve all six quality attribute metrics in our ex-
periments better than independent refactorings. In particular,
the improvement from the application of non-trivial graphs
3) Results for RQ3: Fig. 11 shows how trivial and non-trivial
over trivial graphs is particularly significant for Functionality
refactoring graphs improve quality attribute and design metrics.
and Reusability.
The number of trivial graphs that improves each metric is greater
than the number of non-trivial graphs that do so. However,
the impact of each kind of graph on the improvement of each
V. THREATS TO VALIDITY
quality metric can vary considerably. We also analysed the
improvements caused by the two kinds of refactoring graphs Conclusion Validity. We used Design of Experiments
for a subset of the quality attribute metrics. To statistically (DoE) [37] to mitigate the internal threat related to parameter
compare the distributions of the number of graphs that improved tuning used in our experiments. DoE is a methodology for
the quality metrics across the quality metrics for trivial and systematically applying statistics to experimentation and is one
non-trivial graphs, we used the Wilcoxon rank sum test [36] of the most efficient techniques for parameter settings of evo-
with the level of significance (α) set to 0.05. In all cases, the test lutionary algorithms for the used refactoring recommendation
rejected the null hypothesis (p-values< 0.005). tool. Each parameter has been uniformly discretized in some

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
FERREIRA et al.: DETECTING AND UNDERSTANDING COLLECTIONS OF REFACTORINGS 3355

Fig. 11. The number of graphs that improved the quality metrics.

them to confirm that by validating the code after refactoring


and executing it. The invalid graphs can be easily validated as
the code will not even compile after refactoring. As described
in the pre-study questionnaire, we asked the participants about
their experience in interacting with the tool and measured the
time that they spent as well. Thus, our questions were not just
to ask them about the importance of detecting dependencies but
more on how they could be useful for them in understanding the
refactoring recommendations and so on.
Construct Validity. The refactorings used in our experiments
are generated using an existing refactoring tool [12]. Thus, it is
Fig. 12. Rate of quality improvement (%) for the refactoring graphs per metric. possible that some of them are not relevant (e.g., small impact
on quality). However, our goal is to evaluate the dependencies
among the refactorings independently from their relevance. In
addition, our approach can take as input any sequence of refac-
intervals. Values from each interval have been tested for our torings.
application and we chose the best values. External Validity. The types of refactorings considered in our
Internal Validity. We collected data from a very large number experiments may threaten the generalizability of our results.
of repositories resulting in about 1.5 million refactorings for Besides, our study was limited to the use of specific quality
9,595 projects. A possible internal threat is the diversity of these attributes to measure the impact of the application of refactoring
projects in terms of domains, size, etc. To mitigate this threat, we graphs. Future replications of this study are necessary to further
used several criteria (e.g., more than one contributor per project, confirm our findings. Also, the number of participants can be
over 5 K lines of code, etc.) to select the projects and eliminate extended in our future work to validate more refactoring depen-
redundant ones or those with small size to avoid considering dencies. Moreover, There is no consensus in refactoring studies
student projects on GitHub and so on. Also, different tools, about the most representative types. Several existing works [38],
with different recommendation strategies might be more or less [39], [40], [41] show that the used refactoring types in this study
dominated by non-trivial graphs. Historical refactorings from are the most frequently used by developers.
commit histories could likewise have a different composition.
When we conclude that dependencies are common, the source of
the source data matters. We make a replication package available VI. IMPLICATIONS AND FUTURE WORK
including all the collected data that can be used and improved Our proposed theory of organizing collections of refactoring
by the community. instances as a set of refactoring graphs offers several advantages
Another possible internal threat is the technique used for the that address the challenges confirmed by developers:
manual validation. We did not inform the participants that the r explainability: each refactoring graph is smaller and more
non-trivial graphs are all valid based on our tool and we wanted coherent than a long sequence of refactorings. As each

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
3356 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 49, NO. 6, JUNE 2023

can be explained independently, the cognitive burden on a recommendation/detection tools treat refactorings in isolation.
developer is much lower (e.g., contrast with determining In this paper, we proposed a definition for ordering dependencies
which refactorings scattered across a sequence of dozens among refactorings and an algorithm for detecting these de-
or hundreds of refactorings are related). pendencies. We also proposed defining refactoring recommen-
r comparability: search-based refactoring recommendation dations as sets of refactoring graphs rather than as refactoring
tools typically generate multiple recommendations on a sequences and illustrated these concepts with a tool for visualiz-
Pareto front, leaving developers to choose one. Identifying ing refactoring dependencies and sets of refactoring graphs. We
common elements of different recommendations is simpli- elaborated our research agenda for future work in Section VI.
fied by comparing sets of graphs that do not contain the We validated the proposed approach on 1,457,873 refactor-
spurious orderings found in sequence representations. ings recommended for 9,595 projects. Our results show that
r search efficiency: search-based refactoring recommenda- the proposed approach achieved 100% in correctly detecting
tion tools that use genetic algorithms gain new options. all dependencies among refactorings. Furthermore, we found
Specifically, crossover operations can be more reliable (re- that 43% of the 1,457,873 recommended refactorings are part of
ducing failures) when using dependency analysis; graphs dependent refactoring graphs, which confirms that refactorings
may also be better genomes for crossover than individual are commonly involved in dependent relations and cannot be
refactorings. applied truly independently. These concepts advance a theory
Consequently, there are several directions for future work: for reasoning about refactorings collectively, rather than individ-
Refactoring Pattern Extraction. One important implication ually, and offer clear benefits to developers applying refactoring
of the proposed refactoring dependency theory is the ability to recommendations (explainability and comparability) and au-
extract common refactoring patterns by mining software repos- thors of tools for recommending refactorings (search efficiency
itories using tools such as RefMiner [16]. These patterns are the and improving correctness of recommendations).
common non-trivial graphs that can be extracted on different
commits/pull-requests of the same project or multiple projects.
Such patterns of non-trivial graphs can be linked to refactoring ACKNOWLEDGMENTS
opportunities such as resolving different types of code smells Copyright 2021 IEEE. References herein to any specific
repeatably. In the future, we plan to use the refactoring depen- commercial product, process, or service by trade name, trade
dencies to understand the common refactoring patterns from the mark, manufacturer, or otherwise, does not necessarily consti-
history of commits and pull requests of software repositories tute or imply its endorsement, recommendation, or favoring
using existing refactoring detection tools such as RefMiner. by Carnegie Mellon University or its Software Engineering
Refactoring Collaborations Between Developers. Studying Institute. DM21-0546
the collaborations among multiple developers when refactoring
code is a promising next step. Refactoring graphs extracted
from commit histories can be linked to the authors of those REFERENCES
commits. Then, a graph of collaborations among developers
[1] E. Tom, A. Aurum, and R. Vidgen, “An exploration of technical debt,”
can be generated based on the dependencies among the applied J. Syst. Softw., vol. 86, no. 6, pp. 1498–1516, 2013.
refactorings. This can lead to new insights into why and when [2] M. Kuutila, M. Mäntylä, U. Farooq, and M. Claes, “Time pressure in
developers collaborate for refactoring. software engineering: A systematic review,” Inf. Softw. Technol., vol. 121,
2020, Art. no. 106257.
Change Operator in Search-Based Refactoring. Random se- [3] S. A. Slaughter, D. E. Harter, and M. S. Krishnan, “Evaluating the cost of
lection and application of crossover and mutation when evolv- software quality,” Commun. ACM, vol. 41, no. 8, pp. 67–73, 1998.
ing a population of solutions is a challenge in search-based [4] M. Fowler, Refactoring: Improving the Design of Existing Code. Boston,
MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999.
refactoring. Refactoring dependency analysis can be used to [5] G. Bavota, A. D. Lucia, M. D. Penta, R. Oliveto, and F. Palomba, “An
avoid destroying good patterns in refactoring solutions and make experimental investigation on the innate relationship between quality and
change operators more intelligent, which can lead to better refactoring,” J. Syst. Softw., vol. 107, pp. 1–14, 2015.
[6] A. C. Bibiano et al., “A quantitative study on characteristics and effect
solutions and faster convergence. of batch refactoring on code smells,” in Proc. IEEE/ACM 13th Int. Symp.
Interactive Refactoring Tool Support. Developers can more Empir. Softw. Eng. Meas., Brazil, 2019, pp. 1–11.
easily understand the implications of selecting which refactor- [7] E. Murphy-Hill, C. Parnin, and A. P. Black, “How we refactor, and how we
know it,” IEEE Trans. Softw. Eng., vol. 38, no. 1, pp. 5–18, Jan./Feb. 2012.
ings from a recommendation to apply, improving the interactive [8] M. Ó. Cinnéide, L. Tratt, M. Harman, S. Counsell, and I. H. Moghadam,
process and increasing their confidence in the recommendation “Experimental assessment of software metrics using automated refactor-
tool. The only restriction in applying non-trivial refactoring ing,” in Proc. IEEE-ACM Int. Symp. Empir. Softw. Eng. Meas., Lund,
Sweden, 2012, pp. 49–58.
graphs is that a refactoring can only be applied if every other [9] N. Tsantalis and A. Chatzigeorgiou, “Identification of move method refac-
refactoring that it depends on (transitively) is also applied. Thus, toring opportunities,” IEEE Trans. Softw. Eng., vol. 35, no. 3, pp. 347–367,
invalid refactorings can be detected and highlighted on the fly. May/Jun. 2009.
[10] M. Kim, T. Zimmermann, and N. Nagappan, “An empirical study of
refactoring challenges and benefits at Microsoft,” IEEE Trans. Softw. Eng.,
vol. 40, no. 7, pp. 633–649, Jul. 2014.
VII. CONCLUSION [11] V. Alizadeh, M. Kessentini, W. Mkaouer, M. O. Cinnéide, A. Ouni, and
Y. Cai, “An interactive and dynamic search-based approach to software
Although manually applying a collection of refactorings is refactoring recommendations,” IEEE Trans. Softw. Eng., vol. 46, no. 9,
common practice, existing empirical studies and refactoring pp. 932–961, Sep. 2020.

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
FERREIRA et al.: DETECTING AND UNDERSTANDING COLLECTIONS OF REFACTORINGS 3357

[12] A. Ouni, M. Kessentini, H. Sahraoui, K. Inoue, and K. Deb, “Multi-criteria [36] J. H. McDonald, Handbook of Biological Statistics, vol. 2. Baltimore, MD,
code refactoring using search-based software engineering: An industrial USA: Sparky House Publishing, 2009.
case study,” ACM Trans. Softw. Eng. Methodol., vol. 25, no. 3, 2016, [37] J. R. Koehler and A. B. Owen, “Computer experiments,” in Handbook of
Art. no. 23. Statistics, Amsterdam, Netherlands: Elsevier Science, 1996, pp. 261–308.
[13] J. Bansiya and C. G. Davis, “A hierarchical model for object-oriented [38] M. Alshayeb and Mohammad, “Empirical investigation of refactoring
design quality assessment,” IEEE Trans. Softw. Eng., vol. 28, no. 1, effect on software quality,” Inf. Softw. Technol., vol. 51, pp. 1319–1326,
pp. 4–17, Jan. 2002. Sep. 2009.
[14] C. Abid, V. Alizadeh, M. Kessentini, T. D. N. F. Ferreira, and D. Dig, “30 [39] F. Palomba, A. Zaidman, R. Oliveto, and A. De Lucia, “An exploratory
years of software refactoring research: A systematic literature review,” study on the relationship between changes and refactoring,” in Proc.
2020, arXiv:2007.02194. IEEE/ACM 25th Int. Conf. Prog. Comprehension, 2017, pp. 176–185.
[15] M. W. Mkaouer, M. Kessentini, S. Bechikh, K. Deb, and M. O. Cinnéide, [40] C. Vassallo, G. Grano, F. Palomba, H. C. Gall, and A. Bacchelli, “A
“Recommendation system for software refactoring using innovization and large-scale empirical exploration on refactoring activities in open source
interactive dynamic optimization,” in Proc. IEEE-ACM 29th Int. Conf. software projects,” Sci. Comput. Program., vol. 180, pp. 1–15, 2019.
Autom. Softw. Eng., Vasteras, Sweden, 2014, pp. 331–336. [41] G. Szke, G. Antal, C. Nagy, R. Ferenc, and T. Gyimthy, “Empirical
[16] N. Tsantalis, M. Mansouri, L. M. Eshkevari, D. Mazinanian, and D. Dig, study on refactoring large-scale industrial systems and its effects on
“Accurate and efficient refactoring detection in commit history,” in Proc. maintainability,” J. Syst. Softw., vol. 129, no. C., pp. 107–126, Jul. 2017,
ACM 40th Int. Conf. Softw. Eng., Gothenburg, Sweden, 2018, pp. 483–494. doi: 10.1016/j.jss.2016.08.071.
[17] M. Harman and L. Tratt, “Pareto optimal search based refactoring at
the design level,” in Proc. 9th ACM Annu. Conf. Genet. Evol. Comput.,
London, England, 2007, pp. 1106–1113.
[18] Y. Lin, X. Peng, Y. Cai, D. Dig, D. Zheng, and W. Zhao, “Interactive and
guided architectural refactoring with search-based recommendation,” in
Proc. ACM SIGSOFT Int. Symp. Found. Softw. Eng., Seattle, USA, 2016,
pp. 535–546.
[19] T. Sharma and D. Spinellis, “A survey on software smells,” J. Syst. Softw.,
vol. 138, pp. 158–173, 2018. Thiago Ferreira received the PhD degree in com-
[20] G. Bavota, A. D. Lucia, A. Marcus, and R. Oliveto, “Recommending refac- puter science from the Federal University of Parana,
toring operations in large software systems,” in Recommendation Systems in 2019. He is an assistant professor with the College
in Software Engineering, Berlin, Germany: Springer, 2014, pp. 387–419. of Innovation & Technology (CIT), University of
[21] M. O’Keeffe and M. O. Cinnéide, “A stochastic approach to automated Michigan-Flint. His research interests focus on the
design improvement,” in Proc. ACM 2nd Int. Conf. Princ. Pract. Program. use of user preferences, optimization algorithms, and
Java, Kilkenny City, Ireland, 2003, pp. 59–62. artificial intelligence techniques to address several
[22] O. Seng, J. Stammel, and D. Burkhart, “Search-based determination of software engineering problems such as software re-
refactorings for improving the class structure of object-oriented systems,” quirements, software testing, and software refactor-
in Proc. ACM 8th Annu. Conf. Genet. Evol. Comput., Seattle, USA, 2006, ing. For more information, see [email protected].
pp. 1909–1916.
[23] M. Kessentini, W. Kessentini, H. Sahraoui, M. Boukadoum, and A. Ouni,
“Design defects detection and correction by example,” in Proc. IEEE 19th
Int. Conf. Prog. Comprehension, kingston, Canada, 2011, pp. 81–90.
[24] A. Ouni, M. Kessentini, and H. Sahraoui, “Search-based refactoring using
recorded code changes,” in Proc. IEEE 17th Eur. Conf. Softw. Maintenance
Reengineering, Genova, Italy, 2013, pp. 221–230.
[25] W. Mkaouer et al., “Many-objective software remodularization us-
ing NSGA-III,” ACM Trans. Softw. Eng. Methodol., vol. 24, no. 3,
pp. 17:1–17:45, 2015.
[26] A. Chávez, I. Ferreira, E. Fernandes, D. Cedrim, and A. Garcia, “How James Ivers is the lead of the Carnegie Mellon
does refactoring affect internal quality attributes? A multi-project study,” University Software Engineering Institute’s software
in Proc. ACM 31st Braz. Symp. Softw. Eng., Fortaleza, Brazil, 2017, architecture group, which develops and matures
pp. 74–83. tools and practices to support software architects.
[27] T. Mens, G. Taentzer, and O. Runge, “Analysing refactoring dependencies He is also the co-author of the Documenting Soft-
using graph transformation,” Softw. Syst. Model., vol. 6, no. 3, pp. 269–285, ware Architectures book. For more information, see
2007. [email protected].
[28] T. Mens, G. Taentzer, and O. Runge, “Detecting structural refactoring
conflicts using critical pair analysis,” Electron. Notes Theor. Comput. Sci.,
vol. 127, no. 3, pp. 113–128, 2005.
[29] H. Liu, G. Li, Z. Ma, and W. Shao, “Conflict-aware schedule of software
refactorings,” IET Softw., vol. 2, no. 5, pp. 446–460, 2008.
[30] H. Liu, Z. Ma, W. Shao, and Z. Niu, “Schedule of bad smell detection and
resolution: A new way to save effort,” IEEE Trans. Softw. Eng., vol. 38,
no. 1, pp. 220–235, Jan./Feb. 2012.
[31] L. Sousa et al., “Characterizing and identifying composite refactorings:
Concepts, heuristics and patterns,” in Proc. 17th Int. Conf. Mining Softw.
Repositories, 2020, pp. 186–197.
[32] H. Melton and E. Tempero, “Identifying refactoring opportunities by Jeffrey J. Yackley is currently working toward
identifying dependency cycles,” in Proc. ACM 29th Australas. Comput. the PhD degree with the University of Michigan -
Sci. Conf., Australia, 2006, pp. 35–41. Dearborn. He is co-advised by Dr. Marouane Kessen-
[33] N. Yoshida, Y. Higo, T. Kamiya, S. Kusumoto, and K. Inoue, “On refac- tini in the ISE Lab and Dr. Bruce R. Maxim in the
toring support based on code clone dependency relation,” in Proc. IEEE GAME Lab. His research focuses on search based
11th Int. Softw. Metrics Symp., Como, Italy, 2005, pp. 10–pp. software engineering and machine learning in or-
[34] DPRef, 2022. [Online]. Available: https://github.com/iselab-dearborn/ der to address problems with software architecture,
dpref-refactoring-dependencies refactoring, and testing in addition to his research
[35] N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, “Curating github on computer science education where he focuses on
for engineered software projects,” Empir. Softw. Eng., vol. 22, no. 6, applying active learning techniques in the classroom.
pp. 3219–3253, 2017. For more information, see [email protected].

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.
3358 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 49, NO. 6, JUNE 2023

Marouane Kessentini received the PhD degree from Khouloud Gaaloul received the PhD degree from
the University of Montreal, Canada, in 2012. He is a The University of Luxembourg. She is a postdoctoral
full professor, chair of the CSE Department, Oakland researcher with the University of Oakland University
University, and director of the NSF IUCRC Center on under the supervision of Dr. Marouane Kessentini in
Pervasive AI. He is a recipient of the prestigious 2018 the ISELab. She held the position of a post-doctoral
President of Tunisia distinguished research award, researcher with The SnT Centre for Security, Relia-
the University of Michigan-Dearborn distinguished bility, and Trust, University of Luxembourg and the
teaching award, the University of Michigan-Dearborn position of a post-doctoral researcher with the Uni-
distinguished digital education award, the University versity of Michigan-Dearborn. Her research interests
of Michigan-Dearborn/College of Engineering and include model-based software development and anal-
Computer Science distinguished research award, 4 ysis of Cyber-Physical Systems, search-based testing
best paper awards including and the prestigious IEEE 10 Year Most Influential and machine learning. She has been conducting her research in close collabo-
Paper award (2011–2021), and his AI-based software refactoring invention, ration with industry partners in the aerospace sector. For more information, see
licensed and deployed by Fortune 500 companies, and selected as one of the [email protected]
Top 8 inventions with the University of Michigan for 2018 (including the three
campuses), among more than 500 inventions, by the UM Technology Transfer
Office. He received various multi-million grants from both industry and federal
agencies and published more than 180 papers in top journals and conferences.
He has extensive collaborations with the industry in different areas related to
Edge AI, AI/MLOps, AI and cyber-physical systems, intelligent software bots,
etc. He is the co-founder of many workshops, general chair of SSBSE16 and
ASE22, and PC chair of MODELS19, SANER 2021, GECCO, etc. He served
as a keynote speaker with various venues including ICSR, SSBSE, GECCO,
WCCI, etc. He graduated more than 18 PhD students and served as associate
editor in 7 journals and PC member of more than 200 conferences.

Ipek Ozkaya (Senior Member, IEEE) received the


PhD degree in computational design from CMU.
She is a technical director with the Carnegie Mellon
University Software Engineering Institute, where she
develops methods and practices for software archi-
tectures, agile development, and managing technical
debt in complex systems. She coauthored a book
on Managing Technical Debt: Reducing Friction in
Software Development (2019). She is 2019–2021
editor-in-chief of IEEE Software Magazine. For more
information, see [email protected].

Authorized licensed use limited to: Institute of Software. Downloaded on June 07,2024 at 06:51:20 UTC from IEEE Xplore. Restrictions apply.

You might also like