Reverse Engineering Of: Object Oriented Code

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Reverse Engineering of Object Oriented Code

Paolo Tonella
ITC-irst
Centro per la Ricerca Scientifica e Tecnologica
38050 Povo (Trento), Italy
[email protected]

ABSTRACT Reverse engineering aims at recovering design views from


During software evolution, programmers devote most of their the source code, to offer programmers a faithful, high level
effort to the understanding of the structure and behavior of representation of the program that is ensured to be consis-
the system. For Object-Oriented code, this might be par- tent with the actual implementation. Reverse engineering
ticularly hard, when multiple, scattered objects contribute can support the activities 1-3, by providing summary infor-
to the same function. Design views offer an invaluable help, mation on the main program's entities and relationships.
but they are often not aligned with the code, when they are This tutorial deals with the main problems that are en-
not missing at all. countered when Object-Oriented code is reverse engineered.
This tutorial describes some of the most advanced tech- The most important UML (Unified Modeling Language) di-
niques that can be employed to reverse engineer several de- agrams are taken into account to offer a wide overview of the
sign views from the source code. The recovered diagrams, techniques that can be employed to recover them from the
represented in UML (Unified Modeling Language), include code. The extracted views include the class diagram, the
class, object, interaction (collaboration and sequence), state object diagram, the collaboration and sequence diagrams,
and package diagrams. A unifying static code analysis frame- the state diagrams, and the package diagram. In fact, no
work used by most of the involved algorithms is presented single diagram is able to summarize all information about
at the beginning of the tutorial. A single running example an Object-Oriented system and multiple perspectives have
is referred all over the presentation. Irade-offs (e.g., static to be adopted to cope with the different involved aspects.
vs. dynamic analysis), limitations and expected benefits are When using reverse engineering techniques, it is impor-
also discussed. tant to be aware of some trade-offs that possibly affect the
accuracy and safety of the recovered information. Exam-
Categories and Subject Descriptors: ples considered in the tutorial are the choice between static
D.2.7 [Software Engineering]: Distribution, Maintenance, and dynamic analysis, and the level of object sensitivity. As
and Enhancement-Restructuring, reverse engineering, and regards the presentation of the recovered information, us-
reengineering. ability issues must be also considered. Some of the available
visualization options are described in the tutorial.
General Terms: Design.
Keywords: Diagram recovery, object oriented program- 2. CODE ANALYSIS FRAMEWORK
ming, static code analysis. Most of the static code analyses employed by the reverse
engineering techniques presented in the tutorial are based
1. INTRODUCTION on the same, unifying framework. This framework consists
Software evolution accounts for the vast majority of a pro- of a graph representation of a program, called the Object
gram's life cycle. This phase of the software process is aimed Flow Graph (OFG), which is focused on the tracing of the
at adding functions, correcting defects, adapting the code flow of information about objects from the object creation
to a new environment or improving the internal structure. by allocation statements, through object assignment to vari-
The main activities involved in such tasks, and conducted in ables, up until the storage of objects in class fields or their
response to a change request, are: (1) Program understand- usage in method invocations.
ing, (2) Change location, (3) Impact analysis, (4) Change The concrete Java syntax of the program under analysis
implementation, (5) Regression testing. is turned into an abstract language, for the representation
of the data flows, independently of the specific control flow
structure. Statements affecting the control flow are dropped,
while statements modifying the data flows are maintained
and translated. Conversion from the abstract language to
the OFG representation is straightforward. Nodes in the
OFG represent program locations, while edges represent data
flows. In the construction of the OFG, program locations
Copyright is held by the author/owner.
can be associated either with the classes (object insensitiv-
ICSE'05, May 15-21, 2005, St. Louis, Missouri, USA. ity) or with the objects (object sensitivity) they belong to.
ACM 1-58113-963-2/05/0005. The tutorial discusses such a trade-off.

724
3. CLASS DIAGRAM 6. STATE DIAGRAMS
The most important and most widely used structural view State diagrams show the possible states an object can be
of a software system is the class diagram. It shows the most in and the transitions from state to state, as triggered by
relevant features (attributes and methods) of the core classes the method invocations received by the object.
and their mutual relationships. Reverse engineering of the state diagrams from the code
The main problem with the recovery of the class diagram is a difficult task, that cannot be fully automated. However,
from the code is in the way relationships are inferred. A it is possible to partially automate it by means of abstract
basic algorithm can be defined, taking into account the de- interpretation. Class attributes are associated with sym-
clared types. Thus, an association/aggregation is inferred bolic, abstract values. Similarly, the effect that statements
when the declared type of an attribute is another class, while may have on such abstract values is represented in the form
a dependency (a weaker relation) is inferred when the de- of their abstract semantics, by providing an abstract inter-
clared type is that of a local variable or a method parameter. pretation table for them. Recovery of the state diagrams
Inheritance is inferred directly from the syntax. is then achieved by running an abstract interpretation of
However, the basic algorithm for the recovery of the inter- the constructors, to determine the initial states, and of the
class relationships suffer two main limitations: (1) the actual methods, to determine the possible state transitions.
type may be different from the declared type; (2) in the
presence of weakly typed containers no type is declared at 7. PACKAGE DIAGRAM
all for the contained objects. Flow propagation in the OFG
can be used to tackle both problems.
Packages are a general grouping mechanism that can be
used to decompose a given system into components (and
sub-components) that are relatively independent of each oth-
4. OBJECT DIAGRAM er. Key to a good decomposition of a system into packages is
The object diagram shows the set of objects created by a the definition of highly cohesive and loosely coupled modu-
given progrmn and the relationships holding among them. larization units. Automated recovery of a package structure
A flow propagation in the OFG can be exploited to re- for a given program might be interesting in at least three
verse engineer information about the objects allocated in cases: (1) when a flat sequence of classes is to be organized
a program and the inter-object relationships mediated by into packages (no pre-existing package structure); (2) when
the object attributes. The allocation points in the code are the existing package structure is known to be inadequate;
used to approximate the set of objects created by a program, (3) when the existing package structure is being assessed
while the results accumulated in the OFG nodes after prop- against alternative ones.
agation are used to determine the inter-object relationships. Reverse engineering of the package diagram is based on
An alternative technique that can be used to produce the the discovery of similarities among entities (classes), to be
object diagram is based on the execution of the program grouped together, and on the minimization of the relation-
on a set of test cases. Each test case is associated with an ships that cross the boundaries of the packages (low cou-
object diagram depicting the objects and the relationships pling). This tutorial describes how two widely used meth-
that are instantiated when the test case is run. The diagram ods, clustering and concept analysis, can be adopted to de-
can be obtained as a post-processing of the program traces termine heuristic solutions to this problem.
generated during each execution.
The tutorial discusses the pros and cons of the two ap- 8. CONCLUSIONS
proaches. The static technique is safe with respect to the In the last part of the tutorial, some software evolution
objects and relationships it represents, but it cannot pro- scenarios that could possibly benefit from the reverse engi-
vide precise information on the actual multiplicity of the neered diagrams are presented, with reference to the running
allocated objects nor on the actual layout of the relation- example. The architecture of a tool that implements the
ships associated with the allocated objects. described reverse engineering techniques is considered and
the presenter's experience in the reverse engineering of some
5. INTERACTION DIAGRAMS large C++ systems developed at CERN (Conseil Europeen
Interaction diagrams augment the object diagrams with pour la Recherche Nucleaire) is briefly reported.
information about the messages that are exchanged among Finally, some perspectives on the future role of reverse
the objects over time. Construction of the interaction dia- engineering in software engineering are given. Modern pro-
grams require the ability to resolve the method calls into the gramming languages make the source code increasingly ex-
target objects. This can be achieved statically, by means of pressive, for example by supporting annotations and reflec-
the flow information determined at the OFG nodes, or dy- tion. This opens to the possibility of recovering extremely
namically, by tracing the actual method calls. In the pres- meaningful and informative views from the source code. Ag-
ence of incomplete systems, the output of the static method ile development processes (such as XP, Extreme Program-
remains safe (i.e., valid for any possible execution) only if ming), that are centered around the source code ( "the source
external data flows are properly modeled. code is the design" is one of XP's guiding principles), can be
Given an Object-Oriented system under analysis, it makes integrated with reverse engineering in a very natural way.
no sense to produce one overall interaction diagram that de-
scribes all possible computations, since this is likely to ex- 9. REFERENCES
ceed the cognitive abilities of human beings for any non triv- [1] P. Tonella and A. Potrich. Reverse Engineering of
ial program. A solution is to apply focusing, by restricting Object Oriented Code. Springer-Verlag, Berlin,
the message exchanges being considered to those triggered Heidelberg, New York, 2005.
by a computation of interest.

725

You might also like