Reverse Engineering Of: Object Oriented Code
Reverse Engineering Of: Object Oriented Code
Reverse Engineering Of: Object Oriented Code
Paolo Tonella
ITC-irst
Centro per la Ricerca Scientifica e Tecnologica
38050 Povo (Trento), Italy
[email protected]
724
3. CLASS DIAGRAM 6. STATE DIAGRAMS
The most important and most widely used structural view State diagrams show the possible states an object can be
of a software system is the class diagram. It shows the most in and the transitions from state to state, as triggered by
relevant features (attributes and methods) of the core classes the method invocations received by the object.
and their mutual relationships. Reverse engineering of the state diagrams from the code
The main problem with the recovery of the class diagram is a difficult task, that cannot be fully automated. However,
from the code is in the way relationships are inferred. A it is possible to partially automate it by means of abstract
basic algorithm can be defined, taking into account the de- interpretation. Class attributes are associated with sym-
clared types. Thus, an association/aggregation is inferred bolic, abstract values. Similarly, the effect that statements
when the declared type of an attribute is another class, while may have on such abstract values is represented in the form
a dependency (a weaker relation) is inferred when the de- of their abstract semantics, by providing an abstract inter-
clared type is that of a local variable or a method parameter. pretation table for them. Recovery of the state diagrams
Inheritance is inferred directly from the syntax. is then achieved by running an abstract interpretation of
However, the basic algorithm for the recovery of the inter- the constructors, to determine the initial states, and of the
class relationships suffer two main limitations: (1) the actual methods, to determine the possible state transitions.
type may be different from the declared type; (2) in the
presence of weakly typed containers no type is declared at 7. PACKAGE DIAGRAM
all for the contained objects. Flow propagation in the OFG
can be used to tackle both problems.
Packages are a general grouping mechanism that can be
used to decompose a given system into components (and
sub-components) that are relatively independent of each oth-
4. OBJECT DIAGRAM er. Key to a good decomposition of a system into packages is
The object diagram shows the set of objects created by a the definition of highly cohesive and loosely coupled modu-
given progrmn and the relationships holding among them. larization units. Automated recovery of a package structure
A flow propagation in the OFG can be exploited to re- for a given program might be interesting in at least three
verse engineer information about the objects allocated in cases: (1) when a flat sequence of classes is to be organized
a program and the inter-object relationships mediated by into packages (no pre-existing package structure); (2) when
the object attributes. The allocation points in the code are the existing package structure is known to be inadequate;
used to approximate the set of objects created by a program, (3) when the existing package structure is being assessed
while the results accumulated in the OFG nodes after prop- against alternative ones.
agation are used to determine the inter-object relationships. Reverse engineering of the package diagram is based on
An alternative technique that can be used to produce the the discovery of similarities among entities (classes), to be
object diagram is based on the execution of the program grouped together, and on the minimization of the relation-
on a set of test cases. Each test case is associated with an ships that cross the boundaries of the packages (low cou-
object diagram depicting the objects and the relationships pling). This tutorial describes how two widely used meth-
that are instantiated when the test case is run. The diagram ods, clustering and concept analysis, can be adopted to de-
can be obtained as a post-processing of the program traces termine heuristic solutions to this problem.
generated during each execution.
The tutorial discusses the pros and cons of the two ap- 8. CONCLUSIONS
proaches. The static technique is safe with respect to the In the last part of the tutorial, some software evolution
objects and relationships it represents, but it cannot pro- scenarios that could possibly benefit from the reverse engi-
vide precise information on the actual multiplicity of the neered diagrams are presented, with reference to the running
allocated objects nor on the actual layout of the relation- example. The architecture of a tool that implements the
ships associated with the allocated objects. described reverse engineering techniques is considered and
the presenter's experience in the reverse engineering of some
5. INTERACTION DIAGRAMS large C++ systems developed at CERN (Conseil Europeen
Interaction diagrams augment the object diagrams with pour la Recherche Nucleaire) is briefly reported.
information about the messages that are exchanged among Finally, some perspectives on the future role of reverse
the objects over time. Construction of the interaction dia- engineering in software engineering are given. Modern pro-
grams require the ability to resolve the method calls into the gramming languages make the source code increasingly ex-
target objects. This can be achieved statically, by means of pressive, for example by supporting annotations and reflec-
the flow information determined at the OFG nodes, or dy- tion. This opens to the possibility of recovering extremely
namically, by tracing the actual method calls. In the pres- meaningful and informative views from the source code. Ag-
ence of incomplete systems, the output of the static method ile development processes (such as XP, Extreme Program-
remains safe (i.e., valid for any possible execution) only if ming), that are centered around the source code ( "the source
external data flows are properly modeled. code is the design" is one of XP's guiding principles), can be
Given an Object-Oriented system under analysis, it makes integrated with reverse engineering in a very natural way.
no sense to produce one overall interaction diagram that de-
scribes all possible computations, since this is likely to ex- 9. REFERENCES
ceed the cognitive abilities of human beings for any non triv- [1] P. Tonella and A. Potrich. Reverse Engineering of
ial program. A solution is to apply focusing, by restricting Object Oriented Code. Springer-Verlag, Berlin,
the message exchanges being considered to those triggered Heidelberg, New York, 2005.
by a computation of interest.
725