0% found this document useful (0 votes)
15 views18 pages

DeMIMA A Multilayered Approach For Design Pattern Identification

This paper presents an approach called DeMIMA to identify design patterns in source code. DeMIMA uses a multilayered approach with three layers: the first two layers recover an abstract model of source code including class relationships, and the third layer identifies design patterns in the abstract model. On average, DeMIMA achieves 34% precision on 12 design patterns in 5 open-source systems, and ensures 100% recall on all systems. DeMIMA was also applied to 33 industrial components.

Uploaded by

mahsa saeidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views18 pages

DeMIMA A Multilayered Approach For Design Pattern Identification

This paper presents an approach called DeMIMA to identify design patterns in source code. DeMIMA uses a multilayered approach with three layers: the first two layers recover an abstract model of source code including class relationships, and the third layer identifies design patterns in the abstract model. On average, DeMIMA achieves 34% precision on 12 design patterns in 5 open-source systems, and ensures 100% recall on all systems. DeMIMA was also applied to 33 industrial components.

Uploaded by

mahsa saeidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO.

5, SEPTEMBER/OCTOBER 2008 667

DeMIMA: A Multilayered Approach


for Design Pattern Identification
Yann-Gaël Guéhéneuc, Member, IEEE, and Giuliano Antoniol, Member, IEEE

Abstract—Design patterns are important in object-oriented programming because they offer design motifs, elegant solutions to
recurrent design problems, which improve the quality of software systems. Design motifs facilitate system maintenance by helping
maintainers to understand design and implementation. However, after implementation, design motifs are spread throughout the source
code and are thus not directly available to maintainers. We present DeMIMA, an approach to semiautomatically identify
microarchitectures that are similar to design motifs in source code and to ensure the traceability of these microarchitectures between
implementation and design. DeMIMA consists of three layers: two layers to recover an abstract model of the source code, including
binary class relationships, and a third layer to identify design patterns in the abstract model. We apply DeMIMA to five open-source
systems and, on average, we observe 34 percent precision for the 12 design motifs considered. Through the use of explanation-based
constraint programming, DeMIMA ensures 100 percent recall on the five systems. We also apply DeMIMA on 33 industrial
components.

Index Terms—Maintenance traceability, design patterns, interclass relationships.

1 INTRODUCTION
relationships [3]. They influence the design of modules and
M AINTAINERS must be aware of design choices in order to
modify an object-oriented software system appropri-
ately. Design choices include all decisions made by
classes but not the overall architecture. They are defined in
terms of classes and relationships; thus their implementa-
developers when designing and implementing the system: tion uses idioms.
the structures of classes and the relationships among them. We use the term motif to express the solution of a pattern
However, design choices are often scattered in the source as “a reliable sample of traits, acts, tendencies, or other
code of systems after implementation because, with avail- observable characteristics” [1]. We distinguish between
patterns and motifs because patterns often encompass
able object-oriented programming languages, they do not
information that is not readily available for their identifica-
transcribe directly into source code; developers must write
tion. For example, the Composite design pattern [2, p.163]
several lines of code using constructs of the languages to also includes information about its intent, motivation,
implement their choices. Moreover, documentation is often applicability, and consequences, which are not observable
obsolete, if it even exists, and these choices are thus lost. characteristics. Only its structure, its participants, and their
However, design choices are often implemented with collaborations are observable in the source code. Thus,
recurring patterns, “a form or model proposed for imita- strictly speaking, we cannot use the terms design pattern
tion” [1], to facilitate writing and understanding the source “identification,” “detection,” or “instantiation” but rather
code. Idioms and design patterns are two types of patterns; the instantiation and identification of microarchitectures
architectural patterns and micropatterns are others. Idioms similar to some motifs; thus, we use the term “design motif
are low-level patterns specific to some programming identification” for the process traditionally called design
languages and to the implementation of particular char- pattern identification.
acteristics of classes or their relationships. They are We define the term microarchitectures as concrete
intraclass patterns describing typical implementation of, manifestations of some motifs in the implementation of a
for example, relationships, object containment, and collec- system. A microarchitecture is composed of classes, methods,
tion traversal. Design patterns [2] are recurring interclass fields, and relationships having structure and organization
patterns that define solutions to common design problems similar to one or more motifs. A microarchitecture can be
in the organization of classes. They are “tactics” that similar to more than one motif because only developers may
generate the structure and behavior of classes and their decide intent, motivation, and consequences.
Developers usually search for some kinds of patterns in
order to understand a system [4]; by recognizing concrete
. Y.G. Guéhéneuc is with the Département d’Informatique et Recherche
Opérationnelle, Université de Montréal, C.P. 6128, succ. Centre Ville, manifestations of these patterns, they deduce, from their
Montréal, Québec, H3C 3J7, Canada. E-mail: [email protected]. experience, the design choices underlying the presence of

. G. Antoniol is with the Département d’Informatique, Ecole Polytechnique motifs in the source code. During maintenance and evolution,
de Montréal, C.P. 6079, succ. Centre Ville, Montréal, Québec, H3C 3A7, maintainers would greatly benefit from knowing the design
Canada. E-mail: [email protected]. choices made during implementation, see, for example, [5].
Manuscript received 18 Apr. 2007; revised 1 Apr. 2008; accepted 29 May To support design pattern identification and program
2008; published online 27 June 2008. comprehension, we combine and extend our previous work
Recommended for acceptance by R. Taylor.
For information on obtaining reprints of this article, please send e-mail to: [6], [7], [8] in a new multilayered approach named the Design
[email protected], and reference IEEECS Log Number TSE-2007-04-0133. Motif Identification Multilayered Approach (DeMIMA).
Digital Object Identifier no. 10.1109/TSE.2008.48. DeMIMA makes it possible to recover two kinds of design
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
0098-5589/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society
668 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. 5, SEPTEMBER/OCTOBER 2008

choices from source code: idioms pertaining to the relation- describe design motifs [12]. Also, class diagrams are often
ships among classes and design motifs characterizing the produced early in the development cycle and are the sole
organization of the classes. DeMIMA is extensible and reliable documentation because they can be reverse
scalable; it ensures traceability between motifs and source engineered with reasonable accuracy. We will use other
code by first identifying idioms related to binary class information in future work.
relationships to obtain an idiomatic model of the source code DeMIMA assists maintainers in task 1 by providing a
and then using this model to identify design motifs and three-step identification process of a design motif DM in
generate a design model of the system. On average, we the source code S of a system based on UML-like class
observe 34 percent precision for the 12 design motifs diagram models:
considered and the five open-source systems on which we
apply our approach. DeMIMA ensures 100 percent recall on 1. Model the source code S as a model MS using a subset
the five systems. We also apply DeMIMA on industrial of the language used to describe models of motifs and
system source code and designs. including all of the constituents corresponding to
The remainder of the paper is organized as follows: In constructs of S, as explained in Section 4.1.
Section 2, we give an overview of the approach and justify 2. Enrich model MS with idioms that reveal binary
its rationale. In Section 3, we summarize related work and class relationships to obtain a model MI , which uses
present essential characteristics of the identification steps. the same language used to describe models of
In Sections 4.2 and 4.3, we describe our approach and motifs, as detailed in Section 4.2.
discuss its characteristics. In Section 6, we apply the 3. Enrich the model MI through the following three
approach on a testbed of open source and industrial substeps, as shown in Section 4.3:
systems. In Section 7, we summarize our work and discuss
future challenges. Build a model MDM of a motif DM as a class
a.
diagram with the formalism used to describe MI .
b. Identify microarchitectures similar to MDM in
2 DESIGN MOTIF IDENTIFICATION MI . A microarchitecture A might be either a
2.1 Context complete form if its entities and their relation-
We have broken down the comprehension process that ships match one to one the entities and relation-
maintainers use to identify recurring motifs in the source ships in MDM or an approximate form if they do
code into three tasks. not, e.g., if a suggested relationship between two
entities does not exist.
Identifying a microarchitecture A similar to some
1. c. Instantiate a model MD based on MI and
motifs from a set of known patterns S DP . Main- enriched with models MA of the identified
tainers analyze a system source code S, either microarchitectures.
manually or using tools, and identify subsets of the Any approach to design motif identification should
source code that are similar to known motifs. maintain a traceability link between the different layers
2. Contextualizing A to keep a unique motif from S DP from source code up to the identified microarchitectures:
using semantic data extrinsic to S. Maintainers
choose in S DP the pattern DP whose corresponding 1 2 3  
S Ð MS Ð MI Ð MD  fMA g ; ð1Þ
motif DM is embodied by A. Contextualization
x
depends on the system domain and on the main- where Ð describes the xth layer to produce the next model.
tainers’ experience and understanding of the system.
Example. In the rest of this paper, we use the simple
3. Comprehending S. Maintainers deduce from DP ,
example taken from [6] and shown in Fig. 1 to illustrate
whose motif DM was manifested by A during the
implementation of S, the design choice behind A, the different steps performed by DeMIMA. The example
including the intent and motivation of the devel- uses two classes, C1 and C2, linked by an aggregation
opers and the consequences on the overall system relationship. The aggregation relationship exists through
design. the field C2 c2 and the void operation1() method
body.
Because subtasks 2 and 3 depend on the maintainers’
We want to identify in this source code any micro-
experience and the system domain, they are difficult to
architecture similar to the design motif represented by
automate. In contrast, task 1, which is tedious and error
the UML-like class diagram in Fig. 2a. Thus, we need to
prone [9], [10] is a good candidate for automation.
first recover a model MS of the system, then refine this
2.2 Problem model into MI , which includes the aggregation relation-
Design motifs are described with UML-like class and ship, and, finally, model and match the motif MDM
against MI to create a model MD , which includes the
sequence diagrams,1 which represent different aspects of
result of the matching, MA , as shown in Fig. 2.
software systems [11]. Class diagrams are global models of
systems, representing their entities and the relationships 2.3 Our Solution
among entities, while sequence diagrams specify local
In DeMIMA, we characterize the constituents of class
interactions in entities and sequences of method calls
diagrams and propose algorithms to identify these consti-
among entities.
In the rest of this paper, we only consider class tuents in source code. Basically, class diagrams consist of
diagrams because they are most frequently used to classes, fields, methods, interfaces, inheritance, and im-
plementation relationships. We concur with Dave Thomas
1. Design motifs notation borrows from OMT class diagrams, OBJECTORY that “Every model needs a metamodel” [13]. Thus, we
interaction diagrams, and the BOOCH method [2]. define a metamodel, Pattern and Abstract-level Description
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
 ENEUC
GUEH  AND ANTONIOL: DEMIMA: A MULTILAYERED APPROACH FOR DESIGN PATTERN IDENTIFICATION 669

Language (PADL), to express these constituents. New


constituents may be added to PADL using inheritance to
enrich the descriptions of systems. The methods of the
constituents define the semantics of models obtained from
the metamodel. Our objective in defining PADL is to have a
simple and extensible language to describe and reason
about abstractions pertinent to our problem, namely, MS ,
MI , MD , MDM , and MA .
DeMIMA reuses the definitions in [6] of the use, creation,
association, aggregation, and composition relationships to
formalize these relationships with four language-indepen-
dent properties: exclusivity, message receiver type, lifetime,
and multiplicity. DeMIMA distinguishes use, association,
aggregation, and composition relationships because such
relationships exist in most notations to model systems, for
example, in UML, and because design motifs are defined
using these relationships. Thus, it is able to identify these
relationships in MS and represent these in MI .
The language to describe design motifs is the same as
that used to describe models of systems. DeMIMA uses
explanation-based constraint programming and constraint
relaxation to identify microarchitectures, complete or
approximate, similar to the modeled design motifs, without
explicitly enumerating all possible variants to produce a
model MD .
DeMIMA ensures the traceability of design motifs
between implementation and design because it uses the
same language to describe MS , MI , and MD (by construc-
tion, each layer is a refinement of the previous layer) and
Fig. 1. Source code of the running example. because it explicitly records, in the more abstract constitu-
ents, the set of lower-level constituents that led to their
existence. Thus, a constituent of MS (respectively, of MI

Fig. 2. Models of the motif and of the source code for the running example. (a) The bottom part shows the UML-like diagram of a simple motif; this
part, together with the upper part, represents MDM . (b) UML-like diagram of MS . (c) UML-like diagram of MI (some instantiation links are omitted).
(d) UML-like diagram of MD (some instantiation links are omitted).
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
670 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. 5, SEPTEMBER/OCTOBER 2008

and MD ð MA Þ) can be traced back to the source code 3.3 Related Work on MD (Including MDM and MA )
constructs in S (respectively, to the constituents in MS and Several authors proposed approaches to identify micro-
MI ) from which it originates. architectures similar to design motifs. In general, these
approaches rely on a design motif library; thus they are
similar to the program understanding and architectural
3 RELATED WORK recovery approaches based on clichés matching and plan
We classify the related work according to the recovered recognition. The main problems of these approaches as
models because obtaining and abstracting the data needed identified by Wills’ precursor work [23] and put forward
to identify design motifs is problematic. We conclude with a recently by Niere et al. [24] is that a design motif may
summary of essential characteristics of any identification appear in several different forms due to variants. Wills
approach for design motifs. classifies the main sources of variants as syntactic variation,
implementation variation, delocalization, organization var-
3.1 Related Work on MS iation, redundancy, unrecognizable code, and function
Building a model of the source code is the first step of any sharing. Syntactic variation is mostly with regard to the
static analysis. The objective of this step is to obtain a model syntactic level clichés. Cliché recognizers traditionally
of the source code that can be manipulated programmati- embody the knowledge of all of the different forms that a
cally. This step can be performed using a readily available certain cliché can assume. This is not the case in our
approach, where the use of explanation-based constraint
parser technology such as JAVACC or COLUMBUS [14].
programming accounts for syntactic variants. Implementa-
3.2 Related Work on MI tion variation is related to the fact that a given concept may
be implemented in different ways: An aggregation may be
Several authors proposed approaches to extract binary class
implemented with a list or a set or any other user-defined
relationships, which is an important concern when building
type. We define such relationships using language-inde-
models of source code. Indeed, these relationships are not pendent properties to avoid this problem. Another example
explicit constructs of mainstream object-oriented program- concerns the depth of the inheritance tree between a
ming languages, such as C++, Java, or Smalltalk, and they superclass and a derived class participating in a motif
lack precise definitions. (see, for example, the Composite design motif). Again, the
Jahnke et al. [15] and Niere et al. [16] introduced generic use of explanation-based constraint programming deals
fuzzy reasoning nets (GFRN) to recover association rela- with such variants. The other problems highlighted by
tionships among entities in the context of the Fujaba project. Wills—delocalization, redundancy, unrecognizable code,
They proposed a set of clichés from source code. Source and function sharing—do not concern our approach.
code clichés used together with GFRN allow identifying Rich and Waters [4] proposed the use of constraint
associations relationships while managing variations of programming to recognize plans in Cobol source code.
implementation. Although their work is promising, the use Cobol systems are modeled by their abstract syntax trees. A
of GFRN is complex and they consider association relation- plan is modeled as nodes of the abstract syntax tree and
ships only, not aggregation and composition relationships. constraints among nodes (control and data-flow, function
More recently, Niere et al. [17] introduced an approach calls. . . ). The identification of a plan in source code is
converted to a constraint satisfaction problem in which
based on fuzzy beliefs able to recover association and
nodes of the plan are variables, constraints among nodes are
aggregation relationships in large software systems while
constraints among variables, and the source code abstract
handling impreciseness. syntax tree is the domain of the variables. This work is the
Jackson and Waingold [18] developed WOMBLE, a tool first account of the use of constraint programming for plan
for the lightweight extraction of object models from Java identification. However, it does not apply to design motif
bytecodes. They described an object model as a graph identification because plans are low level and it does not
wherein nodes are entities and links are binary class identify approximate forms of the plans. Nevertheless, we
relationships. Relationships considered in WOMBLE are draw from this work two important characteristics of
inheritance, association, and aggregation. WOMBLE in- design motif identification: the need for explanations and
cludes heuristics to infer the target entities of association for approximations [4, pp. 83 and 181].
and aggregation relationships. This work is a source of Other approaches to design motif identification used
inspiration even though it did not consider composition clichés recognition algorithms such as unification, see the
relationships. precursor work by Krämer and Prechelt [25]. An example is
In general, previous work was limited by the lack of the SOUL environment [5], a logic programming environ-
commonly agreed upon definitions for binary class relation- ment based on Smalltalk that directly manipulates Smalltalk
ships. Moreover, to the best of our knowledge, no constructs through predicates. The SOUL environment
definitions of the association, aggregation, and composition allows direct representation of the abstract syntax tree of
relationships existed, describing how these relationships the Smalltalk source code managed by the underlying
environment as logic facts. Using these facts, it is possible to
must be implemented in source code. For example, [19],
build a library of predicates and to identify entities whose
[20], [21], [22] proposed definitions of these relationships,
structures and organizations correspond to design motifs.
but there were no hints on their concrete implementation. However, the use of logic programming requires the
Thus, the first step toward design motif identification is to definitions of predicates for all possible variants, i.e., all
define the association, aggregation, and composition rela- expected variations of implementation. The definition of all
tionships and to obtain models of systems that integrate variants of implementation is cumbersome. Also, the use of
these relationships. A complete survey of the subject is logic programming does not explain the presence or
available in [6]. absence of microarchitectures similar to design motifs.
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
 ENEUC
GUEH  AND ANTONIOL: DEMIMA: A MULTILAYERED APPROACH FOR DESIGN PATTERN IDENTIFICATION 671

Other authors introduced the use of queries to identify Contributions of DeMIMA are the following: For the first
entities whose structure and organization are similar to time, as suggested in previous work, an approach brings a
design motifs [26], [27]. In particular, Keller et al. [27] solution to the identification of microarchitectures similar to
introduced the SPOOL environment for reverse engineer- design motifs using commonly agreed-upon definitions of
ing, which allows manual, semiautomated, or automated the unidirectional binary class relationships, unique repre-
identification of abstract design components using queries sentations of design motifs, and semiautomated and/or
on source code models. A query is manually associated automated algorithms explaining identified microarchitec-
with an abstract design component and applied to a source tures. Thus, it complies with the characteristics of the
code model. The main limitation of this work is the need to identification of microarchitectures similar to design motifs.
develop and associate queries with abstract design compo- In particular, explanation-based constraint programming
nents manually and with each possible variant of their explains identified microarchitectures for maintainers to
implementation. direct their search and discriminate among possible false
Generic fuzzy reasoning nets have also been applied to positives easily. Explanation and constraint relaxation lead
the identification of design motifs [24], [28]. A design motif to interactive or automatic algorithms while naturally
is described as a generic fuzzy reasoning net representing tackling the problem of variants identified by Wills [23].
rules to identify microarchitectures similar to its implanta-
tion in source code. However, this approach has not been
pursued or implemented despite its promises. Moreover, it 4 MULTILAYERED APPROACH
is difficult to express design motifs as generic fuzzy DeMIMA relies on a multilayered approach, detailed in the
reasoning nets and to modify them. following sections.
Graphs and graph-transformation techniques also have
been used to describe and identify design motifs in system 4.1 First Layer: Source Code Model MS
source code [29], [30]. A design motif is described as a The first layer consists of an infrastructure, e.g., parsers, to
graph whose nodes represent entities and whose edges obtain models MS of the source code of systems. MS is
represent relationships among entities. The identification of expressed using the language defined by the metamodel
microarchitectures corresponds to a graph isomorphism: shown in Fig. 3 (Part 1 exclusively) and inspired by UML. It
the identification of a subgraph similar to a given graph in a includes all of the constituents found directly in any Java
graph, which is a difficult problem [31]. Pettersson and object-oriented system: class, interface, member class and
Löwe [32] proposed transforming graphs of systems into interface, method, field, inheritance and implementation
planar graphs to improve performance with interesting relationships, and rules controlling their interactions. The
results. An approach based on similarity scoring has also constituents describe the structure of systems and a subset
been proposed [33] which provides an efficient means to of their behavior. The main constituents in the metamodel
compute the similarity between the graph of a design motif and their relationships are the following:
and the graph of a system to identify classes potentially
. Class Entity to describe entities of a system. An entity
playing a role in the design motif. Although efficient, these
might be a Class or an Interface.
approaches are not interactive, do not explain their results,
. Class Element, to describe elements of entities. An
and only allow a limited set of approximations.
element might be a Method or a Field.
Finally, several authors proposed dedicated syntactic
analyses to identify design motifs in source code, for A model of a system is an instance of class Program-
example, [34], [35], [36], [37]. These analyses are efficient in Model. It contains a set of entities, each of which contains a
time, recall, and precision but are specialized to particular set of elements.
design motifs. We propose a more general solution that uses We have implemented the first layer to cope with any
standard algorithms, as offered by constraint programming. number of parsers for various programming languages
Some authors, such as Heuzeroth et al. [38], combined static (e.g., C++ and Java) and produce an instance of Program-
and dynamic analyses to improve the precision of the Model representative of the parsed source code:
identification but faced the problem of the choice of the 1
methods to instrument and of the scenarios to execute. S Ð MS : ð2Þ

3.4 Summary of the Characteristics of DeMIMA Example. Fig. 2b shows a UML-like diagram of the model
From our study of the related work, DeMIMA must possess MS of the source code illustrated in Fig. 1, as well as the
the following characteristics: instantiation links between the objects in MS and their
classes reported in Part 1 of Fig. 3.
. Models of source code must differentiate among use,
association, aggregation, and composition relation- 4.2 Second Layer: Idiom-Level Model MI
ships so that design motif models are as close as The second layer describes systems at a higher level of
possible to their usual descriptions in [2]. abstraction than their source code by making explicit certain
. A given model of a design motif must serve to programming idioms. Idioms reveal particular characteris-
identify both complete and approximate forms of tics of classes or their relationships. For example, a class
microarchitectures similar to the design motif with- could be stereotyped as a UML Data Type according to
out explicitly enumerating all variants. certain idioms used in its implementation [39]. Thus, in
. The algorithms must be semiautomatic or automatic general, idioms can implement other characteristics of
and must explain the identified microarchitectures classes than binary class relationships. Nevertheless, in
so that maintainers can direct their search to easily the rest of this paper, we only study binary class relation-
distinguish possible false positives. ships as they are relevant to design motif identification; the
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
672 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. 5, SEPTEMBER/OCTOBER 2008

Fig. 3. Metamodel to describe the source code of systems.

terms idioms and binary class relationships are therefore definitions; more details and examples of each property
interchangeable. are available elsewhere [6].
This layer provides models MI of systems in which An instance of class B involved at a given time in a
binary class relationships are reified as first-class entities. relationship with an instance of class A may also participate
We focus on the use, association, aggregation, and in another relationship at the same time. We name B B the set
composition unidirectional binary class relationships as ftrue; falseg. We define the exclusivity property EX as
commonly advocated in UML-like notations because these
relationships are used to describe design patterns [2]. EX : Class  Class ! B
B:
Parts 1 and 2 (exclusively) in Fig. 3 present the language
to describe idiom-level models. Instances of class A involved in a relationship send
messages to instances of class B. We name any the set of all
4.2.1 Informal Definitions possible message receivers:
An extensive survey of the literature related to the
any ¼ ffield; array field; collection field;
relationships in different domains such as database, soft-
ware engineering, or reverse engineering can be found in parameter; array parameter; collection parameter;
[6]. Table 1 summarizes the definitions of the relationships local variable; local array; local collectiong:
used in DeMIMA from the existing links among instances.
Association, aggregation, and composition are relationships We distinguish three types of message receivers: fields,
among instances of classes. Relationships involving classes parameters, and local variables. Also, we distinguish
(not instances) are modeled as use relationships. “simple” message receivers from arrays and collections
Let A and B be two classes. Association and aggregation because they imply different sets of programming idioms
relationships allow multiple instances of A and B to take for their declarations and uses and thus different identifica-
part in the relationship. The composition relationship tion strategies. The set any of receivers is language
allows multiple instances of B to be in a relationship with independent and its elements correspond to concepts
one instance of A at a time. In an aggregation relationship, available in object-oriented programming languages, such
instances of A access instances of B through a field as a as C++, Java, and Smalltalk. We define the receiver type
particular type of message receiver. In a composition property RT 2 as
relationship, instances of B are exclusive to their corre-
sponding instances of A and instances of A and B have RT : Class  Class ! any:
related lifetimes. The lifetime property LT constrains the lifetime of all
instances of class B with respect to the lifetime of all
4.2.2 Definitions of the Properties
instances of class A. It relates to the difference between the
The definitions of the binary class relationships use four
language-independent properties. We present here only 2. The RT property was formerly named “invocation site” IS in [6] but is
information needed to explain the subsequent formal renamed to avoid confusion with the location of a method invocation.
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
 ENEUC
GUEH  AND ANTONIOL: DEMIMA: A MULTILAYERED APPROACH FOR DESIGN PATTERN IDENTIFICATION 673

TABLE 1
Definitions and Applicability of the Unidirectional Relationships in Our Model

times of destruction LTd of two instances of classes A and B 4.2.3 Formalizations of the Relationships
[21]. The time is in any convenient unit such as seconds or Using EX, LT , MU, and RT , formalizations of the
CPU ticks:3 relationships are expressed as three conjunctions, respec-
LTd : Instance ! IN: tively, AS, association, AG, aggregation, and CO, composi-
In programming languages with garbage collection, LTd tion. The formalizations of the relationships are important
matches the moment where an instance is ready to be because they are the basis of the identification algorithms
collected for garbage. We infer from LTd a relation between needed to abstract MS into MI .
the lifetimes of all instances of two classes A and B. We An association between classes A and B characterizes the
name k the set f; þg: ability of an instance of A to send a message to an instance
LT : Class  Class ! k: of B. Nothing prevents other relationships from linking
classes A and B. We define ASðA; BÞ as
The multiplicity property MU specifies the number of
instances of class B allowed in a relationship. We express ASðA; BÞ ¼
this property as4 ðRT ðA; BÞ ¼ anyÞ ^ ðRT ðB; AÞ ¼ ;Þ:
MU : Class  Class ! IN [ fþ1g: An aggregation exists between classes A and B when the
The four properties are orthogonal, but the exclusivity definition of A, the whole, contains instances of B, its part.
and multiplicity properties are closely related. For example, The whole must define a field (“simple,” array, or
in the Country-Language relationship, we have the collection) of the type of its part. Instances of the whole
following:
send messages to instances of its part. We formalize
. The multiplicity property states the number of AGðA; BÞ as
instances of class Language that each instance of
class Country possesses: AGðA; BÞ ¼

MUðCountry; LanguageÞ ¼ ½1; þ1: RT ðA; BÞ  ffield; array field;

collection fieldg ^
(For example, Canada possesses two official lan-
guages, English and French, and several spoken ðRT ðB; AÞ ¼ ;Þ ^
languages, Inuktitut, Punjabi, Portuguese, and so on.) ðMUðA; BÞ ¼ ½1; þ1Þ ^ ðMUðB; AÞ ¼ ½0; þ1Þ:
. The exclusivity property states that an instance of
A composition is an aggregation with a constraint
class Language is shared among instances of class
Country and of other classes: between the lifetimes of the whole and its part and a
constraint on the ownership of the part by the whole.
EXðCountry; LanguageÞ ¼ false:
Instances of the whole own the instances of its part.
(French is spoken in Canada, in France, . . . .) Instances of the part might be instantiated before the whole
is instantiated, but they must not belong to any other whole.
Example. The values of the four properties are reported and
They are exclusive to the instance of the whole. The
commented on in Table 2 for the source code of the
definition of the composition relationship allows only an
running example in Fig. 1.
association between part and whole to ensure the lifetime
3. IN represents the set of all natural numbers. and ownership properties between whole and part. We
4. We need þ1 to denote multiplicities with no limit in the numbers of
instances in the relationships. define COðA; BÞ as
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
674 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. 5, SEPTEMBER/OCTOBER 2008

TABLE 2
Values of the Four Properties Instantiated for the Running Example

COðA; BÞ ¼ Any algorithm recovering aggregation relationships


ðEXðA; BÞ ¼ trueÞ ^ ðEXðB; AÞ ¼ falseÞ ^ needs to deal with a the difficulty that arises when message
 receivers are untyped collections [18], collection field,
RT ðA; BÞ  ffield; array field;
 collection parameter, local collection, because
collection fieldg ^
they are typed with the class hierarchy root Object.
ðRT ðB; AÞ ¼ ;Þ ^ Algorithms have been proposed to deal with this difficulty,
ðLT ðA; BÞ ¼ þÞ ^ ðLT ðB; AÞ ¼ Þ ^ for example, [18]. Drawing inspiration from these algorithms,
ðMUðA; BÞ ¼ ½1; þ1Þ ^ ðMUðB; AÞ ¼ ½1; 1Þ: DeMIMA implements the detection of aggregation relation-
ships with static analyses and heuristics expressing common
Example. According to the values of the properties detailed programming idioms, i.e., a collection is generally accessed
in Table 2 for the running example and to the through specific accessors to infer the type of stored instances.
formalizations of the relationships ASðC1; C2Þ ¼ false, It assumes that these kinds of collections are homogeneous,
AGðC1; C2Þ ¼ true, COðC1; C2Þ ¼ false. No relationships i.e., containing instances with a common superclass different
from Object. It is possible to determine their types by using
are identified between C2 and C1.
well-known Java programming idioms such as pairs of
4.2.4 Discussions add()-remove() accessors. DeMIMA also recognizes user-
defined collections in addition to collections from the
The formalizations of the relationships consist of two
standard Java class libraries such as Map, List, and Set
fundamental parts: a static part corresponding to the MU
and their implementations. Recently, systems to convert
and RT properties and a dynamic part corresponding to the
programs to use generics have been proposed that could
EX and LT properties. Association and aggregation are
potentially solve the difficulty with untyped collections [40].
inherently static, so their static parts are important for their
Detection of the values of the MU property also uses
detection. A composition is an aggregation with additional message receivers. For example, we assign value [0, 1] to the
constraints on the behavior of composed instances; thus, its MU property if the message receiver is field, parameter,
dynamic parts are important for its distinction from an or local variable and value ½0; þ1 if the receiver is
aggregation and its detection. array field, array parameter, local array, col-
Minimality of the properties and common usage of the lection field, collection parameter, or local
relationships supported by our formalizations are ex- collection.
plained in [6]. The dynamic part—the EX and LT properties—of the
composition relationship is difficult to detect due to the
4.2.5 Creation of the Model well-known limitations of dynamic analyses. We use a
With the formalizations of the relationships, we define trace-analysis technique presented in [7] to compute, for
algorithms to identify in models MS association, aggrega- each aggregation relationship, values of the exclusivity and
tion, and composition relationships to produce models MI . lifetime properties and, if the values match, to convert it
These algorithms depend only on the properties which isolate into a composition relationship. The results depend on the
the formalizations of the relationships from any coding scenario executed; we assume the existence of unit tests and
conventions, similar to the concept of subpatterns in [24]. execute all available tests to infer values for the EX and LT
Identification of association relationships requires col- properties. A low coverage by the unit tests would lead to a
lecting the value of the RT property. Identification of number of false negatives, i.e., candidate composition
aggregation relationships requires inferring the values of relationships missed by our algorithms. Missed composi-
the RT and MU properties. Identification of composition tions relationships impact DeMIMA by decreasing the
relationships requires collecting the value of the RT and number of complete occurrences of any design motifs
MU properties and of the EX and LT properties. DeMIMA including such relationships in their representations. How-
computes the RT and MU properties using static analyses ever, thanks to the use of explanation-based constraint
and can infer values of the EX and LT properties using programming, DeMIMA would identify and report approx-
dynamic analyses. imate occurrences corresponding to these motifs in which
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
 ENEUC
GUEH  AND ANTONIOL: DEMIMA: A MULTILAYERED APPROACH FOR DESIGN PATTERN IDENTIFICATION 675

composition relationships would be replaced by aggrega- . Strict transitive inheritance constraint. The domains of
tion relationships and, thus, its recall would not be two variables contain entities that belong to the same
impacted. branch of the inheritance tree.
. Transitive inheritance constraint. The domains of two
4.2.6 MI Construction in Summary variables contain entities that belong to the same
We formalized the definitions of the use, association, branch of the inheritance tree or that are identical.
aggregation, and composition relationships and developed . Use constraint. The entities in the domain of variable v1
algorithms based on dynamic and static analyses to build a use the entities in the domain of variable v2 .
model MI of a system from its source code model MS , thus . Ignorance constraint. This constraint explicitly states
creating the traceability link: that two entities must not have any relationship.
. Association constraint. Association relationships link
2
MS Ð MI : the entities in the domain of v1 with the entities in
the domain of v2 .
Example. Fig. 2c shows how the UML-like model MS is . Aggregation constraint. Aggregation relationships link
enriched into a model MI by adding an aggregation the entities in the domain of v1 with the entities in
relationship between C1 and C2, instance of the the domain of v2 .
Aggregation relationship class. . Composition constraint. Composition relationships
link the entities in the domain of v1 with the entities
4.3 Third Layer: Design-Level Model, MD ð MA Þ in the domain of v2 .
In the third layer, we first describe a model MDM of a design . Creation constraint. Entities in the domain of v1
motif with the same language used for MI . Then, DeMIMA instantiate (at least once) entities in the domain of v2 .
looks for microarchitectures MA similar to the design motif We add standard (in)equality constraints to these
DM in a model MI of a system. To identify microarchitec- constraints which ensure that different entities play
tures similar to MDM , it transforms MDM into a constraints different roles. We associate a weight with each constraint,
system. It then solves the constraint satisfaction problem an integer value p 2 f1; 2; 3 . . . ; 100g, which indicates the
using explanation-based constraint programming [8]. The relative importance of the constraints with one another or
solutions of the constraint satisfaction problem represent an order among constraints.
microarchitectures similar to MDM in MI . Example. The model MDM of the motif of the running
example transforms into a constraint system with
4.3.1 Modeling of Design Motifs two variables vZ1 and vZ2 corresponding to the
Parts 1, 2, and 3 in Fig. 3 show the language used to describe classes Z1 and Z2 and the composition constraint
design motifs as first-class entities that can be manipulated compositionðvZ1 ; vZ2 ; 100Þ.
programmatically. A design motif is represented by an
instance of the class DesignMotifModel and is composed 4.3.3 Resolution of the Constraint System
of Participants, each having different Elements. DeMIMA uses explanation-based constraint programming
Example. Fig. 2a shows the UML-like diagram of the model [8], [41] as a technique to solve constraint satisfaction
MDM of the motif that we want to identify as well as problems translated from the identification of microarchi-
instantiation links with some of the classes in Fig. 3. tectures similar to design motifs. Explanation-based con-
straint programming justifies solutions, and lack thereof, of
4.3.2 Transformation of Design Motifs a constraint satisfaction problem by remembering con-
straints that can or cannot be satisfied. Explanation-based
With DeMIMA, the identification of microarchitectures
constraint programming is an extension of constraint
similar to a design motif translates into a constraint
programming in which the solver justifies its behavior at
satisfaction problem, which we list as follows:
each step of the resolution process.
. Variables correspond to the participants of the We implemented an explanation-based constraint reso-
design motif model, MDM . lution system dedicated to design motif identification
. Domains of the variables correspond to the entities reusing the JPALM [42] explanation-based constraint
of MI in which to identify microarchitectures. library. This extension includes a generic algorithm for the
. Constraints among variables correspond to the resolution of constraint satisfaction problems with explana-
relationships among the participants of MDM . tions and a backtrack algorithm to manage contradiction.
The transformation of a design motif into a constraint Example. In the running example, no solution of the
system requires dedicated constraints that represent relation- constraint system is found and, thus, no microarchitec-
ships among participants. For example, constraint Strict ture is identified and reported.
Inheritance, in the case of Java-like single inheritance, creates
a partial order on the set of entities and is satisfied for any 4.3.4 Relaxation of the Constraint System
couple ðv1 ; v2 Þ if the domain D1 of v1 represents a set of entities Constraint relaxation consists of replacing the constraints
inheriting from the entities in the domain D2 of v2 . that led to a contradiction with semantically weaker
We proceed in a similar fashion for all relationships and constraints.
define the following constraints: As shown in Table 1 and from the formalizations of the
binary class relationships, an order exists among the use,
. Inheritance constraint. The domains of two variables association, aggregation, and composition relationships.
may contain the same entities, in contrast to strict The properties of the use relationship are less constraining
inheritance. than those of the association relationship, which in turn are
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
676 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. 5, SEPTEMBER/OCTOBER 2008

less constraining than those of the aggregation relationship. describe models MD and the set of models fMA g of
Finally, the properties of the aggregation relationship are microarchitectures similar to design motifs:
less constraining than those of the composition relationship.
Inheritance-related constraints are also ordered from the . The MicroArchitecture class describes micro-
most constraining to the least: strict inheritance, inheritance, architectures similar to design motifs models. A
strict transitive inheritance, and transitive inheritance. microarchitecture model aggregates a set of entities
We take advantage of these orders; for example, if a which play a role in the microarchitecture. It also
composition relationship between two entities prevents records the score of the solution and the set of
microarchitectures from being found, then this constraint relaxed constraints.
can be replaced by an aggregation relationship between the . An instance of class ProgramModel may contain
same two entities. The microarchitectures found are instances of class MicroArchitecture.
semantically similar to the design motif model to the extent Thus, DeMIMA can build models MA of microarchi-
of the semantic similarity between the relationships. tectures identified as similar to MDM models in MI and
Problem relaxation is a special case of constraint relaxation ensure the traceability between their constituents:
in which no semantically weaker constraint is added to the
3
constraint system. MI Ð MD ð MA Þ:
DeMIMA enables experts to relax constraints and
problems interactively as a guide in the identification of Example. The model MI is enriched by the microarchitec-
microarchitectures similar to a design motif. Relaxation is ture corresponding to the found approximate solution
important because entities or relationships among entities into a model MD shown in Fig. 2d.
in a model may differ from the expected entities and their
relationships as defined in a design motif model. First, the
solver searches for microarchitectures identical to a design 5 TOOLING
motif model and provides maintainers with explanations of We implement DeMIMA on top of the PTIDEJ framework.
contradiction. A maintainer chooses one or more constraints The main programming language for the tools is Java. We
which she believes are not essential to the design motif use Prolog for the computation of the EX and LT
model and removes them from the constraint system
properties and JPALM to implement the constraint solver
dynamically, replacing them with semantically weaker
to benefit from existing libraries. We present here only the
constraints; the solver then searches for approximate
components of the PTIDEJ framework relevant to DeMIMA:
microarchitectures. This process goes on until the main-
tainer decides that too many constraints have been relaxed 1. PADL provides the language needed to describes
and the microarchitectures are becoming too distant from models MS , MI , and MD of systems. Its imple-
the design motif model. Weights associated with each mentation is general enough to cope with different
constraint are used to score a microarchitecture to help programming languages, such as C++ and Java.
maintainers in choosing which constraints to relax. The 2. The PADL CLASSFILE CREATOR parser analyzes the
score of a microarchitecture is Java class files associated with a system to produce a
0 1 0 1 model MS of the system.
X X 3. RELATIONSHIP STATIC ANALYSER computes values
score ¼ @ pA  @ p=100A;
of the RT and MU properties and infers use,
p2fp1 ;...;pn g p2fpj ;...;pk g
association, and aggregation relationships among
where fp1 ; . . . ; pn g is the set of weights of all constraints and entities of MS to refine MS into MI .
fpi ; . . . ; pj g is the set of weights of the relaxed constraints. If 4. CAFFEINE performs dynamic analyses of a system to
all constraints from the design motif model are satisfied, compute values for the EX and LT properties.
then score ¼ 100 else score < 100. Results are integrated within MI to refine aggrega-
The solver may be automated to compute all combina- tion relationships into composition relationships if
tions of constraint relaxations. The set of all possible required.
microarchitectures (complete and approximate) is identical 5. PTIDEJ UI allows the visualization and refinement of
manually or automatically. This set only depends on the MS , MI , and MD . It displays the models as UML-
design motif and system models. The difference between like class diagrams with a Sugiyama-based layout
automated and manual constraint relaxation is that main- algorithm. It is also responsible to convert a chosen
tainers may choose to relax constraints in a different order design motif MDM into a constraint system and MI
than that suggested by the design motif model and thus into a domain for its variables.
may direct the search more quickly toward useful micro- 6. Finally, the constraint solver PTIDEJ SOLVER is
architectures. applied on the generated constraint satisfaction
problem to solve the problem either interactively
Example. The composition constraint would be relaxed into or automatically. The constraint solver produces
an aggregation constraint aggregationðvZ1 ; vZ2 ; 100Þ ac- microarchitectures MA similar to the design motif
cording to Table 1. A solution to this constraint system to create MD .
exists with vZ1 ¼ C1 and vZ1 ¼ C2.

4.3.5 MD Construction in Summary 6 EXPERIMENTATION


DeMIMA solves constraint satisfaction problems represent- We apply DeMIMA to identify microarchitectures similar to
ing the identification of microarchitectures similar to MDM several design motifs in both public domain and industrial
in MI . Parts 1, 2, and 4 in Fig. 3 show the language used to systems. We analyze public domain systems because their
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
 ENEUC
GUEH  AND ANTONIOL: DEMIMA: A MULTILAYERED APPROACH FOR DESIGN PATTERN IDENTIFICATION 677

The design class diagram and the final code were


TABLE 3
Public Domain Systems Features available for each component. The class diagrams were
produced during the stage of detailed design. Classes
almost always have a constructor (with void argument) and
a destructor. Nested inner classes were not represented.
Many, but not all, methods have their parameters specified
in full detail. A few classes are completely unspecified (no
attributes or methods). Thus, design class diagrams
represent a mixture of high level and detailed design,
perhaps closer to high-level design.
Component sizes range from a few hundred to about
50,000 lines of code, for a total of about 350 KLOC. The
mean system size measured in LOC is 9,983 (standard
deviation 11,578 LOC); design documents contain a fairly
From left to right: the name of the system, the total LOC, numbers of spread number of classes with a maximum of 113 and a
classes, methods, uses, associations, aggregations, and inheritances. minimum of 1 (mean value 17, standard deviation 20).
Design documents have quite different levels of details; for
source code can be easily obtained and results can be example, the mean number of specified methods is 98;
compared with those of other researchers. We also analyze however, the method standard deviation is 95, i.e., there are
industrial systems for which we have both code and design. designs specifying no methods. Table 4 presents detailed
We used industrial systems because, to the best of our data on the 33 components.
knowledge, no public domain system has both code and
6.2 Objects of the Experiments
design available.
In the following experiments, we use a set of well-known
6.1 Subjects of the Experiment design patterns [2], identical to those used by Tsantalis
We choose five well-known open source systems for our et al., which includes Adapter, Command, Composite,
experiments, summarized in Table 3. JHOTDRAW [43] is a Decorator, Factory Method, Observer, Prototype, Singleton,
two-dimensional graphics framework for structured draw- State, Strategy, Template Method, and Visitor. Contrary to
ing editors. It includes several examples of editors, in Tsantalis et al., we distinguish between Composite and
Decorator; however, we merge State and Strategy because
particular a simple one to draw and color rectangles, circles,
their structures are identical. This choice is consistent with
and texts. JREFACTORY is a tool that can perform several
what previous authors did in the absence of semantic or
different refactorings on Java source files. It has been
behavioral data.
integrated in various IDE, including Sun’s NetBeans. JUNIT
We only use the “canonical” representations of the
is a unit-test framework developed to ease the implementa-
design motifs because DeMIMA takes care of relaxations to
tion and running of unit tests for Java systems. MAPPERXML, find similar microarchitectures. In an interactive environ-
a presentation framework for Web applications, is based on ment, a maintainer would direct the search by choosing the
the Model-View Controller architectural pattern. Finally, order of the relaxations. In the following experiments, we
QUICKUML is an object-oriented design tool that supports mimic the decision of the maintainer by limiting the
the design of a core set of UML models. The sizes of the number of relaxations to one per type of constraint (binary
open source systems range from about 2, 000 lines of code class relationships or inheritance relationships) and by
(LOC5) to about 36,000 LOC with a number of classes from imposing the next constraint depending on the design
about 200 to 500. motif. For example, we permit an aggregation to be relaxed
We also analyze 33 components from four complete into an association but not into a use relationship. In the
industrial systems in the area of telecommunications case of inheritance relationships, there are two possibilities,
developed by Sodalia, a medium-size company (about relax a strict inheritance constraint into an inheritance
250 programmers) Trento, Italy. Initially, it was a joint constraint or into a strict transitive inheritance constraint.
venture between Bell Atlantic and Telecom Italia; nowa- Out of the 13 design motifs studied, it is our opinion that
days, Sodalia is an IT Telecom company belonging to relaxing strict inheritance into inheritance makes sense only
Telecom Italia Group, the sector leader in Italy. Although for the Abstract Factory and Observer motifs, while relaxing
the software engineering environment, tools, middleware, into a strict transitive inheritance is more suitable for the
and general corporate culture can be considered uniform others. In the following, we report the number of micro-
across projects, it is difficult to control all factors—espe- architectures identified as similar to each design motif
cially the human factors. The 33 components were thus according to the relaxations mentioned above. Microarch-
selected as representatives of the corporate system domain, itectures are validated manually using the approach
the corporate skill, and the teams. Our analyses are described in Section 6.3.
performed at the component level, rather than the system
level, because that is the level at which the design is 6.3 Performance, Accuracy, and Threat to Validity
generally documented and developers work. All compo- For the purpose of design motif identification, with the
nents were documented with OMT class diagrams and present level of efficiency of Java environments (exploiting
developed by teams using C++ and the CORBA platform JIT compile technology), DeMIMA time execution and
for distributed computing. memory requirements are not an issue and, even for the
largest systems in our subjects, the identification process
5. LOC is measured as the number of nonblank, noncomment lines requires resources that are compatible with the program
including preprocessor directives. comprehension process introduced in Section 2.
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
678 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. 5, SEPTEMBER/OCTOBER 2008

of relevant documents for that query in all of the given


TABLE 4
Industrial Component Features documents. Precision is the ratio of the number of
relevant documents retrieved over the number of re-
trieved documents.
Although the number of relevant documents in all of the
given documents (i.e., design motifs present in a given
system) is not known a priori, we only need to assess the
number of relevant documents retrieved for a given query
(i.e., number of identified microarchitecture really imple-
menting a given design motif) because we identify both
complete and approximate microarchitectures. However,
precision and recall depend on the accuracy of the static
and dynamic analyses producing MS and MI . Through
relaxations, we ensure that we do not miss design motifs
due to misclassifications of binary class relationships.
However, more microarchitectures are identified: The
approach ensures 100 percent recall for the five systems at
the cost of a lower precision. The desired trade-off between
precision and recall mostly depends on the maintainers’
objectives: For the program comprehension task, we believe
that perfect recall might be preferable because the main-
tainers do not want to miss any actual microarchitectures.
Three programs in our subjects were also studied in
previous work [33]. Although, in theory, given a common
problem and a benchmark, comparison should be feasible, it
was not possible with this previous work due to the
imprecision of the published data. To our surprise, we
discovered errors in the results. 6 For example, in
JHOTDRAW v5.1, the authors did not identify the Observer
design motif, where classes Figure and FigureChange-
Listener in package CH.ifa.draw.framework play the
roles of Subject and Observer, respectively. Still, in JHOT-
DRAW v5.1, they report CommandButton and Command in
package CH.ifa.draw.util as Context and State/Strat-
egy, respectively, in a State or Strategy design motif, while
the CommandButton class merely implements a Command-
enabled button, encapsulating a given and unique Command,
as confirmed by its documentation. Thus, we rely on other
available results manually validated by independent soft-
ware engineers of two different teams at two different
research institutions [45], [46] and assembled for convenience
in the publicly available P-MART database [47].
For the industrial components, the validation of the
microarchitectures identified in the source code was
performed manually, starting from the results of the
identification process and assessing which of the micro-
architectures implemented a design motif. A team of five
independent software engineers performed the validation.
Each time a doubt on a microarchitecture arose, they
considered the design-pattern book [2] as the reference in
From left to right: the NAME of the component (component identifier),
deciding by consensus whether or not that microarchitec-
the total number of LOC, numbers of classes, methods, associations,
aggregations, and inheritances. ture implemented the design motif.
The average precisions by system vary mostly because of
the design choices made in each system. JHOTDRAW is
All computations are performed on a AMD Athlon 64-bit famous for the use of design patterns made by its authors.
processor running Microsoft Windows XP. We allocate a Thus, it is not surprising to have a higher precision than for
maximum of 800 megabytes of memory to the Java virtual the other systems because its design contains either
machine. Computations take an average of 50 minutes to microarchitectures that are very similar to design motifs
identify all of the microarchitectures similar to one given or microarchitectures that are very dissimilar to design
design motif in one given system. motifs. In the other systems, authors may have accidentally
We assess DeMIMA as an information retrieval system implemented some design motifs, thus producing designs
for which the most commonly used measures of accuracy
are recall and precision [44]. Recall is the ratio of relevant 6. Tsantalis et al. [33] kindly provide their results at java.uom.gr/ nikos/
documents retrieved for a given query over the number pattern-detection.html, last accessed on the 21 Feb. 2007.
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
 ENEUC
GUEH  AND ANTONIOL: DEMIMA: A MULTILAYERED APPROACH FOR DESIGN PATTERN IDENTIFICATION 679

Fig. 4. Comparison of JHOTDRAW documented and recovered MD model. The list and box show one selected MA similar to Composite.

where several microarchitectures are similar to design 6.4 A Step-by-Step Identification of Composite in
motifs yet do not implement their intents and motivations. JHOTDRAW
As described above and shown in Tables 3 and 4, our We perform a step-by-step identification of the Composite
subjects are comprised of systems from different domains, design motif in JHOTDRAW to illustrate the use of
complexities, and sizes. Thus, results reported below DeMIMA.
support the feasibility of DeMIMA and its ability to identify The top-left part of Fig. 4 shows a subset of the system
design motifs based on structural properties captured by design as presented in its documentation. We apply
AS, AG, and CO relationships and the set of defined DeMIMA to build a model MI of JHOTDRAW from its
constraints. Results are encouraging; future work will source code. Fig. 4 compares the recovered design-level
include studying generalization to other object-oriented model of the system and its documented design. The
programming languages, domains, and design motifs. recovered model presents essentially the same data as the
Internal validity is defined as the ability to detect a documented architecture. Some relationships among classes
cause-effect relationship between independent and depen- and interfaces differ because the authors of the documenta-
dent variables. DeMIMA obviously detects design motifs tion summarized the main classes and interfaces of the
and thus highlights microarchitectures to help program framework and reported against these entities some
comprehension and documentation of reverse-engineered relationships existing only among their subclasses. For
design choices; however, the extent to which these micro- example, the instantiation relationship between interfaces
architectures correspond to the intention or motivation of Figure and Handle only exists between class Standard-
the developers has not been assessed and will be studied in DrawingView (which implements Figure) and class
future work. NullHandle (which implements Handle). Thus, with
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
680 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. 5, SEPTEMBER/OCTOBER 2008

TABLE 5
Results of Design Motif Identification in Public-Domain Systems

DeMIMA, we obtain a model MI of a system source code S Thus, with DeMIMA, we obtain models MA similar to a
and ensure the traceability between MI and S: design motif model MDM in a model MD . DeMIMA also
ensures the traceability between MA , MD , and S:
S Ð MS Ð MI :  
S Ð MS Ð MI Ð MD  fMA g :
The Composite design motif [2, p. 163] defines three
participants, Component, Composite, and Leaf, and three Models MA of microarchitectures similar to the
relationships among them, an inheritance between Compo- Composite design motif help maintainers in understanding
nent and Composite and between Component and Leaf and the design of the JHOTDRAW system by explaining the roles
a composition between Composite and Component. It of the highlighted classes, which solve the problem of
composing “objects into tree structures to represent part-
translates into the following constraint system: three
whole hierarchies” and “let clients treat individual objects
variables, component, composite, and leaf, and three and compositions of objects uniformly,” as defined by the
constraints: Composite design pattern. Maintainers are guided by the
identification in their comprehension of the system. Thus,
. Two inheritance constraints between variables leaf
DeMIMA may ease Task 3 of comprehending the system, as
and component, composite and component:
inheritance(component, composite, 100) presented in Section 2.
and inheritance(component, leaf, 100). 6.5 Open Source Systems Case Studies
. A composition constraint between variables compo-
Table 5 gives the number of microarchitectures identified for
site and component: composition(compo-
each system in the public domain for each design motif.
site, component, 100). Columns labeled with I report detected motifs, with
DeMIMA solves the constraint satisfaction problem T microarchitectures manually classified as true motifs and
defined by the constraint system from the Composite design with P the corresponding precision. It can be observed that
motif using as domain the JHOTDRAW idiom-level model. the most frequently found design motifs are the Abstract
During the process, the composition constraint is relaxed Factory and Factory Method because they use characteristics
because only aggregation relationships are present in the at the core of object-oriented programming. The last row of
model MI of JHOTDRAW; the inheritance constraints are also Table 5 gives the precision of the design motif identification.
relaxed because an intermediate class, AbstractFigure, Precision is computed over all motifs by summing the
numbers of each column and then computing T =I, assuming
exists in the framework. Then, the identified microarchitec-
a precision of 100 percent when I ¼ T ¼ 0.
tures are integrated in a design-level model. The bottom part In some cases, DeMIMA does not identify any micro-
of Fig. 4 shows the design-level model MD of JHOTDRAW architecture similar to some design motifs. The reason is
and, together with the top-right list, highlights a microarch- twofold: First, we only allow one approximation for each
itecture similar to the Composite design motif. type of relationship; thus, it is possible that we do not
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
 ENEUC
GUEH  AND ANTONIOL: DEMIMA: A MULTILAYERED APPROACH FOR DESIGN PATTERN IDENTIFICATION 681

TABLE 6 TABLE 7
Results of Design Motif Identification Results of Design Motif Identification
in the Source Code of Industrial Components in the Design of Industrial Components

[46] that we developed for StP/OMT. Tables 6 and 7 report


the identified motifs in the source code and in the design,
respectively. The tables do not include components where
no microarchitecture was identified. A zero value means
that no microarchitecture was identified for the correspond-
ing design motif, while “-” means that the motif could not
be searched for lack of available data.
Several observations can be made based on Tables 6 and
7. First, design motifs were not retrieved in several
components, either in the design or in code: Design patterns
seem seldom used.
Second, obtained results confirm results reported in
previous work [46]. Due to a company takeover, lack of
identify a highly approximated form of the motif. However, detailed documentation, and programmer turnover, only
this approximated form would very unlikely implement the design patterns verified in previous work are verified, with
motif and, thus, we do not affect the recall. Second, the a resulting precision of 100 percent. A full evaluation
systems are known for not containing such a design motif. pertaining to the entire set of identified microarchitectures
For example, there is no Visitor in JHOTDRAW v5.1; two is unfortunately not feasible for the above reasons.
were implemented in later versions. Furthermore, for confidentiality reasons, we cannot dis-
The average precisions by design motif vary because of tribute design or code and thus cannot report true positives
the number of constraints and approximations. The more and precisions computed by independent experts. There-
constraints and the fewer allowed relaxations, the higher fore, we only report the number of identified microarchi-
the precision because DeMIMA does not report micro- tectures, not the precision and recall.
architecture too far from the design motif of interest. For Third, a comparison of the microarchitectures identified
example, the Proxy design motif requires a method to in the design and those identified in source code shows that
return an instance of the declaring class, which is a strong there is no intersection between these two sets: It would
constraint with respect to the five systems under study. In seem that different design motifs have been used in the
contrast, the Factory Method design pattern has low design and in the implementation of the components. This
precision because many microarchitectures have structure fact can be partially explained by three reasons: First, when
similar to the structure of this design motif. The identifica- working with design, we do not have dynamic data so we
tion algorithms would require dynamic and semantic data cannot find composition relationships. Second, source code
to automatically distinguish true from false positives. often includes a collection of classes reused from libraries or
COTS that are not modeled in the design. Finally, our
6.6 Industrial Systems Case Studies design documents are inconsistent with the source code:
With respect to the industrial systems, both design and code After code modifications, they were not properly updated
are analyzed. Design information has been recovered from to reflect the changes; hence the gap between design and
the corporate database using the CASE2AOL TRANSLATOR code is relevant.
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
682 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. 5, SEPTEMBER/OCTOBER 2008

6.7 Discussion ACKNOWLEDGMENTS


Explanation-based constraint programming can assign The authors thank Hervé Albin-Amiot for his work on the
entities to all the roles in a design motif, including, for
example, the Client role or the Leaf role. Also, the identified precursor of the PADL metamodel and Narendra Jussien
microarchitectures contain more information than previous for his help with explanation-based constraint program-
approaches such as [33] and [46]. In contrast with previous ming. Giuliano Antoniol was partially supported by
work, DeMIMA distinguishes microarchitectures similar to NSERC, Canada, research chair in software change and
the Adapter and the Command design motifs because the evolution. Yann-Gaël Guéhéneuc was partially supported
constraint system locates entities playing the roles of Client
and Invoker, which differentiate the two structural motifs. by Object Technology International, Inc., an IBM Eclipse
The Singleton design motif must hold a single piece of Fellowship Grant, and an NSERC Discovery Grant.
information, its own unique instance. Nevertheless, in
JREFACTORY, DeMIMA identified a microarchitecture
similar to the Singleton but mapping unique instances of
REFERENCES
the Singleton with given objects. This variant of the [1] Merriam-Webster, Merriam-Webster Online Dictionary, www.mer-
riam-webster.com/, Mar. 2003.
Singleton is akin to the Identity Map described by Fowler [2] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design
in [48]. This accounts for the difference in the reported Patterns—Elements of Reusable Object-Oriented Software, first ed.
numbers between our work and the work by Tsantalis et al. Addison-Wesley, 1994.
[3] K. Beck and R.E. Johnson, “Patterns Generate Architectures,” Proc.
Eighth European Conf. for Object-Oriented Programming, M. Tokoro
7 CONCLUSIONS and R. Pareschi, eds., pp. 139-149, http://citeseer.nj.nec.com/
27318.html, July 1994.
Microarchitectures similar to design motifs may help [4] C. Rich and R.C. Waters, The Programmer’s Apprentice, first ed.
maintainers understand systems and ease their tasks. We ACM Press Frontier Series and Addison-Wesley, Jan. 1990.
introduced DeMIMA, a multilayered approach for design [5] R. Wuyts, “Declarative Reasoning About the Structure of Object-
Oriented Systems,” Proc. 26th Conf. Technology of Object-Oriented
motif identification that defines Languages and Systems, J. Gil, ed., pp. 112-124, http://www.iam.
unibe.ch/~wuyts/publications.html, Aug. 1998.
. simple class diagram constituents to build a model [6] Y.-G. Guéhéneuc and H. Albin-Amiot, “Recovering Binary Class
MS of a system source code S, Relationships: Putting Icing on the UML Cake,” Proc. 19th Conf.
. idiom-level constituents, in particular use, associa- Object-Oriented Programming, Systems, Languages, and Applications,
D.C. Schmidt, ed., pp. 301-314, http://www.iro.umontreal.ca/
tion, aggregation, and composition relationships to ptidej/Publications/Documents/OOPSLA04.doc.pdf, Oct. 2004.
build a model MI from MS , and [7] Y.-G. Guéhéneuc, R. Douence, and N. Jussien, “No Java without
. microarchitectures similar to design motifs to en- Caffeine—A Tool for Dynamic Analysis of Java Programs,” Proc.
hance MS into MD with models MA of the 17th Conf. Automated Software Eng., W. Emmerich and D. Wile,
eds., pp. 117-126, http://www.iro.umontreal.ca/~ptidej/
microarchitectures. Publications/Documents/ASE02.doc.pdf, Sept. 2002.
In the second layer, DeMIMA depends on a set of [8] Y.-G. Guéhéneuc and N. Jussien, “Using Explanations for Design-
definitions for unidirectional binary class relationships that Patterns Identification,” Proc. First IJCAI Workshop Modeling and
Solving Problems with Constraints, C. Bessière, ed., pp. 57-64,
we proposed and formalized. The formalizations define the
http://www.iro.umontreal.ca/ptidej/Publications/Documents/
relationships in terms of four language-independent prop- IJCAI01MSPC.doc.pdf, Aug. 2001.
erties that are derivable from static and dynamic analyses of [9] J. Bansiya, “Automating Design-Pattern Identification,” Dr.
systems: exclusivity, type of message receiver, lifetime, and Dobb’s J., http://www.ddj.com/articles/1998/9806/9806a/
multiplicity. DeMIMA keeps track of data and links to 9806a.htm?topic=patterns, June 1998.
[10] T. Richner and S. Ducasse, “Recovering High-Level Views of
identify and ensure the traceability of these relationships. Object-Oriented Applications from Static and Dynamic Informa-
In the third layer, DeMIMA uses explanation-based tion,” Proc. Seventh Int’l Conf. Software Maintenance, H. Yang and
constraint programming to identify microarchitectures L. White, eds., pp. 13-22, http://www.computer.org/
similar to design motifs. This technique makes it possible proceedings/icsm/0016/00160013abs.htm, Aug. 1999.
to identify microarchitectures similar to a model of a design [11] D. Jackson and M.C. Rinard, “Software Analysis: A Roadmap,”
Proc. 22nd Int’l Conf. Software Eng., Future of Software Eng. Track,
motif without having to describe all possible variants M. Jazayeri and A. Wolf, eds., pp. 133-145, http://sdg.lcs.mit.
explicitly. edu/%20dnj/talks/roadmap/, June 2000.
We illustrated DeMIMA with the identification of [12] P. Tonella and A. Potrich, “Reverse Engineering of the UML Class
microarchitectures similar to the Composite design pattern Diagram from C++ Code in Presence of Weakly Typed Contain-
in the JHOTDRAW framework. We showed that the ers,” Proc. Int’l Conf. Software Maintenance, G. Canfora and
A.A.A.-V. Maryhauser, eds., pp. 376-385, http://www.computer.
identified microarchitectures indeed highlight entities im- org/proceedings/icsm/1189/11890376abs.htm, Nov. 2001.
plementing the motif as documented by the authors of the [13] D. Thomas, “Reflective Software Engineering—From MOPS to
system. We also applied DeMIMA on both open source and AOSD,” J. Object Technology, vol. 1, no. 4, pp. 17-26, http://
industrial systems and discussed its precision and recall. www.jot.fm/jot/issues/issue_2002_09/column1/index.html,
In future work, we plan to improve our analyses of Sept. 2002.
[14] T. Gyimóthy, R. Ferenc, and I. Siket, “Empirical Validation of
source code and integrate other sources of data such as Object Oriented Metrics on Open Source Software for Fault
sequence diagrams to enhance precision and identify Prediction,” IEEE Trans. Software Eng., vol. 31, no. 10, pp. 897-910,
behavioral and creational design motifs. We will also study http://csdl2.computer.org/dl/trans/ts/2005/10/e0897.pdf, Oct.
object lifetime dependencies. We also plan to study the 2005.
relation between identified microarchitectures and the [15] J.H. Jahnke, W. Schäfer, and A. Zündorf, “Generic Fuzzy
Reasoning Nets as a Basis for Reverse Engineering Relational
concrete intent and motivation of software engineers. Database Applications,” Proc. Sixth European Software Eng. Conf.,
Finally, we would like to further assess the use of M. Jazayeri, ed., pp. 193-210, http://www.uni-paderborn.de/cs/
approximations in an automatic environment. varlet/docs.html, Sept. 1997.
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
 ENEUC
GUEH  AND ANTONIOL: DEMIMA: A MULTILAYERED APPROACH FOR DESIGN PATTERN IDENTIFICATION 683

[16] J. Niere, J.P. Wadsack, and A. Zündorf, “Recovering UML [34] K. Brown, “Design Reverse-Engineering and Automated Design
Diagrams from Java Code Using Patterns,” Proc. Second Workshop Pattern Detection in Smalltalk,” Technical Report TR-96-07, Dept.
Soft Computing Applied to Software Eng., J.H. Jahnke and C. Ryan, of Computer Science, Univ. of Illinois at Urbana-Champaign,
eds., pp. 89-97, http://trese.cs.utwente.nl/scase/scase-2/ http://citeseer.nj.nec.com/context/734211/0, July 1996.
Proceedings.pdf, Feb. 2001. [35] G. Hedin, “Language Support for Design Patterns Using Attribute
[17] J. Niere, J.P. Wadsack, and L. Wendehals, “Handling Large Search Extension,” Proc. First ECOOP Workshop Language Support for
Space in Pattern-Based Reverse Engineering,” Proc. 11th Int’l Design Patterns and Frameworks), J. Bosch and S. Mitchell, eds.,
Workshop Program Comprehension, K. Wong and R. Koschke, eds., Springer, pp. 137-140, http://www.cs.lth.se/Research/ProgEnv/
pp. 274-280, http://portal.acm.org/citation.cfm?id=857020, May LSDF.html, June 1997.
2003. [36] H. Albin-Amiot and Y.-G. Guéhéneuc, “Meta-Modeling Design
[18] D. Jackson and A. Waingold, “Lightweight Extraction of Object Patterns: Application to Pattern Detection and Code Synthesis,”
Models from Bytecode,” Proc. 21st Int’l Conf. Software Eng., Proc. First ECOOP Workshop Automating Object-Oriented Software
D. Garlan and J. Kramer, eds., pp. 194-202, http://sdg.lcs.mit. Development Methods, P. van den Broek, P. Hruby, M. Saeki,
edu/ dnj/, May 1999. G. Sunyé, and B. Tekinerdogan, eds., http://www.iro.umontreal.
[19] Object Management Group, UML v1.5 Specification, http:// ca/~ptidej/Publications/Documents/ECOOP01AOOSDM.
www.omg.org/cgi-bin/doc?formal/03-03-01, Mar. 2003. doc.pdf, Centre for Telematics and Information Technology, Univ.
[20] J. Noble and J. Grundy, “Explicit Relationships in Object-Oriented of Twente, tR-CTIT-01-35, Oct. 2001.
Development,” Proc. 18th Conf. Technology of Object-Oriented [37] I. Philippow, D. Streitferdt, M. Riebisch, and S. Naumann, “An
Languages and Systems, B. Meyer, ed., pp. 211-226, http:// Approach for Reverse Engineering of Design Patterns,” Software
citeseer.nj.nec.com/noble95explicit.html, Nov. 1995. and System Modeling, vol. 4, no. 1, pp. 55-70, http://www.springer
[21] F. Civello, “Roles for Composite Objects in Object-Oriented link.com/content/0dn4pmqh5uhnbk69/, Feb. 2005.
Analysis and Design,” Proc. Eighth Conf. Object-Oriented Program- [38] D. Heuzeroth, T. Holl, and W. Löwe, “Combining Static and
ming, Systems, Languages, and Applications, A. Paepcke, ed., Dynamic Analyses to Detect Interaction Patterns,” Proc. Sixth
pp. 376-393, http://www.it.bton.ac.uk/staff/frc/papers/ World Conf. Integrated Design and Process Technology, H. Ehrig,
aboops93.html, Sept. 1993. B.J. Krämer, and A. Ertas, eds., http://www.info.uni-karlsruhe.
[22] S. Ducasse, M. Blay-Fornarino, and A.-M. Pinna-Dery, “A de/publications.php/bib=281, June 2002.
Reflective Model for First Class Dependencies,” Proc. 10th Conf. [39] Y.-G. Guéhéneuc“A Systematic Study of UML Class Diagram
Object-Oriented Programming, Systems, Languages, and Applications, Constituents for Their Abstract and Precise Recovery,” Proc.
F. Manola, ed., pp. 265-280, http://www.iam.unibe.ch/ 11th Asia-Pacific Software Eng. Conf., D.-H. Bae and W.C. Chu,
~ducasse/WebPages/Publications.html, Oct. 1995. eds., pp. 265-274, http://www.iro.umontreal.ca/~ptidej/
[23] L. Wills, “Automated Program Recognition by Graph Parsing,” Publications/Documents/APSEC04.doc.pdf, Nov.-Dec. 2004.
PhD dissertation, Massachusetts Inst. of Technology, 1992. [40] A. Donovan, A. Kiezun, M.S. Tschantz, and M.D. Ernst, “Con-
[24] J. Niere, W. Schäfer, J.P. Wadsack, L. Wendehals, and J. Welsh, verting Java Programs to Use Generic Libraries,” Proc. 19th Conf.
“Towards Pattern-Based Design Recovery,” Proc. 24th Int’l Conf. Object-Oriented Programming Systems, Languages, and Applications,
Software Eng., M. Young and J. Magee, eds., pp. 338-348, http:// D. Schmidt, ed., pp. 15-34, http://portal.acm.org/citation.cfm?id
portal.acm.org/citation.cfm?id=581382, May 2002. =1035292.1028979, Oct. 2004.
[41] N. Jussien and V. Barichard, “The PaLM System: Explanation-
[25] C. Krämer and L. Prechelt, “Design Recovery by Automated
Based Constraint Programming,” Proc. Techniques for Implementing
Search for Structural Design Patterns in Object-Oriented Soft-
Constraint Programming Systems), N. Beldiceanu, W. Harvey,
ware,” Proc. Third Working Conf. Reverse Eng., L.M. Wills and
M. Henz, F. Laburthe, E. Monfroy, T. Müller, L. Perron, and
I. Baxter, eds., pp. 208-215, http://www.computer.org/
C. Schulte, eds., pp. 118-133, Sept. 2000, School of Computing,
proceedings/wcre/7674/76740208abs.htm, Nov. 1996.
Nat’l Univ. of Singapore, tRA9/00.
[26] B. Kullbach and A. Winter, “Querying as an Enabling Technology
[42] N. Jussien, “e-Constraints: Explanation-Based Constraint Pro-
in Software Reengineering,” Proc. Third Conf. Software Maintenance
gramming,” Proc. First CP Workshop User-Interaction in Constraint
and Reengineering, P. Nesi and C. Verhoef, eds., pp. 42-50, http://
Satisfaction, B. O’Sullivan and E. Freuder, eds., http://
www.computer.org/proceedings/csmr/0090/00900042abs.htm,
www.emn.fr/jussien/publications/jussien-WCP01.pdf, Dec.
Mar. 1999.
2001.
[27] R.K. Keller, R. Schauer, S. Robitaille, and P. Pagé“Pattern-Based [43] E. Gamma and T. Eggenschwiler, “JHotDraw,” http://members.
Reverse-Engineering of Design Components,” Proc. 21st Int’l pingnet.ch/gamma/JHD-5.1.zip, 1998.
Conf. Software Eng., D. Garlan and J. Kramer, eds., pp. 226-235, [44] W.B. Frakes and R. Baeza-Yates, Information Retrieval: Data
http://www.iro.umontreal.ca/~schauer/Private/Publications/ Structures and Algorithms. Prentice Hall, 1992.
icse1999/icse1999.html, May 1999. [45] J. Bieman, G. Straw, H. Wang, P.W. Munger, and R.T. Alex-
[28] J.H. Jahnke and A. Zündorf, “Rewriting Poor Design Patterns by ander“Design Patterns and Change Proneness: An Examination
Good Design Patterns,” Proc. First ESEC/FSE Workshop Object- of Five Evolving Systems,” Proc. Ninth Int’l Software Metrics
Oriented Reengineering, S. Demeyer and H.C. Gall, eds., http:// Symp., M. Berry and W. Harrison, eds., pp. 40-49, http://
www.iam.unibe.ch/~famoos/ESEC97/, Distributed Systems csdl.computer.org/comp/proceedings/metrics/2003/1987/00/
Group, Technical Univ. of Vienna, UV-1841-97-10, Sept. 1997. 19870040abs.htm, Sept. 2003.
[29] G. Antoniol, R. Fiutem, and L. Cristoforetti, “Design Pattern [46] G. Antoniol, G. Casazza, M. di Penta, and R. Fiutem, “Object-
Recovery in Object-Oriented Software,” Proc. Sixth Int’l Workshop Oriented Design Patterns Recovery,” J. Systems and Software,
Program Comprehension, S. Tilley and G. Visaggio, eds., pp. 153- vol. 59, pp. 181-196, http://web.soccerlab.polymtl.ca/~antoniol/
160, http://citeseer.nj.nec.com/antoniol98design.html, June 1998. publications/index.html, Nov. 2001.
[30] J. Seemann and J.W. von Gudenberg, “Pattern-Based Design [47] Y.-G. Guéhéneuc, H. Sahraoui, and F. Zaidi, “Fingerprinting
Recovery of Java Software,” Proc. Fifth Int’l Symp. Foundations of Design Patterns,” Proc. 11th Working Conf. Reverse Eng., E. Stroulia
Software Eng., B. Scherlis, ed., pp. 10-16, http://www.informatik. and A. de Lucia, eds., pp. 172-181, http://www.iro.umontreal.
uni-trier.de/~ley/db/indices/a-tree/s/Seemann:Jochen.html, ca/~ptidej/Publications/Documents/WCRE04.doc.pdf, Nov.
Nov. 1998. 2004.
[31] D. Eppstein, “Subgraph Isomorphism in Planar Graphs and [48] M. Fowler, Patterns of Enterprise Application Architecture, first ed.
Related Problems,” Proc. Sixth Ann. Symp. Discrete Algorithms, Addison-Wesley Professional, http://www.amazon.com/
K. Clarkson, ed., pp. 632-640, www.ics.uci.edu/~eppstein/pubs/ Patterns-Enterprise-Application-Architecture-Martin/dp/
Epp-TR-94-25.pdf, Jan. 1995. 0321127420, Nov. 2002.
[32] N. Pettersson and W. Löwe, “Efficient and Accurate Software
Pattern Detection,” Proc. 13th Asia Pacific Software Eng. Conf.,
P. Jalote, ed., pp. 317-326, http://ieeexplore.ieee.org/xpls/
abs_all.jsp?isnumber=4137387&arnumber=4137433&count=
65&index=43, Dec. 2006.
[33] N. Tsantalis, A. Chatzigeorgiou, G. Stephanides, and S. Halkidis,
“Design Pattern Detection Using Similarity Scoring,” IEEE Trans.
Software Eng., vol. 32, no. 11, Nov. 2006.
Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.
684 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. 5, SEPTEMBER/OCTOBER 2008

Yann-Gaël Guéhéneuc received the engineer- Giuliano Antoniol received the degree in

ing diploma from the Ecole des Mines of Nantes, electronic engineering from the Università di
France, in 1998 and the PhD degree in software Padova in 1982 and the PhD degree in electrical
engineering from the University of Nantes, engineering from the Ecole Polytechnique de
France (under Professor Pierre Cointe’s super- Montréal, Canada, in 2004. He has worked in
vision) in 2003. His PhD thesis was funded by companies, research institutions, and universi-
Object Technology International, Inc. (now IBM ties. He is currently an associate professor at the
OTI Labs.) in 1999 and 2000. He is an assistant 
the Ecole Polytechnique de Montréal, where he
professor in the Department of Computing works on software evolution, software traceabil-
Science and Operations Research at the Uni- ity, software quality, and maintenance. He has
versity of Montreal, where he leads the Ptidej team on evaluating and published more than 100 papers in journals and international conference
enhancing the quality of object-oriented programs by promoting the use proceedings. He has served as a member of the program committees of
of patterns at the language, design, or architectural levels. His research international conferences and workshops such as the International
interests are program understanding and program quality during Conference on Software Maintenance, the International Conference on
development and maintenance, in particular through the use and the Program Comprehension, and the International Symposium on Software
identification of recurring patterns. He is also interested in empirical Metrics. He is currently a member of the editorial board of the Journal
software engineering; he uses eye trackers to understand and to Software Testing Verification and Reliability, the Journal Information and
develop theories about program comprehension. He has published Software Technology, the Journal of Empirical Software Engineering,
many papers in international conference proceedings and journals. He is and the Journal of Software Quality. In 2005, he was awarded the
a member of the IEEE. Canada Research Chair Tier I in software change and evolution. He is a
member of the IEEE.

. For more information on this or any other computing topic,


please visit our Digital Library at www.computer.org/publications/dlib.

Authorized licensed use limited to: ULAKBIM UASL - Izmir Ekonomi Univ. Downloaded on March 20,2024 at 21:03:58 UTC from IEEE Xplore. Restrictions apply.

You might also like