0% found this document useful (0 votes)
11 views

Chava-reverse engineering and tracking of Java applets

The paper presents Chava, a reverse engineering and tracking system for Java applets that analyzes and tracks changes in applet code, extracting information into a relational database. Chava supports advanced software analysis techniques such as reachability analysis and program differencing, allowing developers to efficiently examine the structure and interactions within Java applications. The system can process both Java source files and compiled class files, making it versatile for analyzing remote applets without available source code.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Chava-reverse engineering and tracking of Java applets

The paper presents Chava, a reverse engineering and tracking system for Java applets that analyzes and tracks changes in applet code, extracting information into a relational database. Chava supports advanced software analysis techniques such as reachability analysis and program differencing, allowing developers to efficiently examine the structure and interactions within Java applications. The system can process both Java source files and compiled class files, making it versatile for analyzing remote applets without available source code.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/3826122

Chava: Reverse Engineering and Tracking of Java Applets

Conference Paper · November 1999


DOI: 10.1109/WCRE.1999.806970 · Source: IEEE Xplore

CITATIONS READS
55 571

3 authors, including:

Yih-Farn Robin Chen Eleftherios Koutsofios


AT&T AT&T
104 PUBLICATIONS 3,976 CITATIONS 48 PUBLICATIONS 3,010 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Yih-Farn Robin Chen on 19 May 2014.

The user has requested enhancement of the downloaded file.


Chava: Reverse Engineering and Tracking of Java Applets

Jeffrey Korn Yih-Farn Chen Eleftherios Koutsofios


Princeton University AT&T Labs - Research AT&T Labs - Research
Dept. of Computer Science 180 Park Avenue 180 Park Avenue
Princeton, NJ 08544 Florham Park, NJ 07932 Florham Park, NJ 07932
[email protected] [email protected] [email protected]

Abstract perform client-side processing to generate dynamic content.


While many web site analysis tools [14, 8] are available to
Java applets have been used increasingly on web sites analyze the structure of static HTML content, most of them
to perform client-side processing and provide dynamic con- completely ignore the applet code, which by its nature re-
tent. While many web site analysis tools are available, their quires software analysis techniques.
focus has been on static HTML content and most ignore ap- Traditional software repositories [29, 30, 7, 13, 3] apply
plet code completely. This paper presents Chava, a system reverse engineering [12] techniques on the source code to
that analyzes and tracks changes in Java applets. The tool build a central information source for maintaining code in
extracts information from applet code about classes, meth- a software system. Repositories are useful to developers as
ods, fields and their relationships into a relational database. they make it possible to efficiently examine the structure
Supplementary checksum information in the database is and interaction between components of a system without
used to detect changes in two versions of a Java applet. having to delve through potentially hundreds of thousands
Given our Java data model, a suite of programs that query, of lines of source code. Advanced tools have also been
visualize, and analyze the structural information were gen- built to perform reachability analysis [7], clustering analy-
erated automatically from CIAO, a retargetable reverse en- sis [20], selective regression testing [10] and even extraction
gineering system. Chava is able to process either Java of light-weight object models [28, 19].
source files or compiled class files, making it possible to an- This paper presents Chava, a reverse engineering and
alyze remote applets whose source code is unavailable. The tracking system for Java [1]. The system presented has sev-
information can be combined with HTML analysis tools to eral noteworthy features:
track both the static and dynamic content of many web sites.
This paper presents our data model for Java and describes Data Model for both Byte Code and Source Code:
the implementation of Chava. Advanced reverse engineer- Like Womble [19] and some recent Java tools, Chava
ing tasks such as reachability analysis, clustering, and pro- can work on binary class files directly. However, un-
gram differencing can be built on top of Chava to support like other tools with a single-task focus, Chava aims to
design recovery and selective regression testing. In partic- have a complete data model (as defined in Acacia [7])
ular, we show how Chava is used to compare several Java at the selected abstraction level – class member decla-
Development Kit (JDK) versions to help spot changes that ration – to support a wide range of analysis and track-
might impact Java developers. Performance numbers indi- ing tasks. It gets additional information (such as line
cate that the tool scales well. numbers) from source code when it is available.
Analysis using only class files is possible primarily due
to properties of the Java language. Java does not have
1. Introduction a preprocessor, which means that we do not have to
deal with constructs such as macros, include files, and
The World Wide Web first started with web servers only templates, whose information would not be available
presenting static HTML content. Later, Common Gate- in an object file. Also, Java is an architecture neutral
way Interface (CGI) scripts were introduced to run on web language, so its byte code is the same on all machines.
servers to dynamically compose content before presenting This makes it possible to scan through object code in a
them to the clients. Recently, Java applets have been used machine-independent manner to discover relationships
increasingly on web sites to provide rich user interfaces and in a program.
Program Difference Database: Chava supports dif- of an entity A depends on entity B, a relationship between A
ferencing of Java program databases. Similar to the and B is in the model. We satisfy this condition with one no-
work on change detection in Java from University of table exception. In Java, classes can be loaded and methods
Waterloo [25], and in the earlier work of ciadiff [5] for can be invoked dynamically at runtime using the reflection
C and Cdiff [16] for C++, Chava allows tools to exam- API [26]. Programs that do this may not satisfy complete-
ine what changes have been made in two different ver- ness. Completeness allows us to perform analyses such as
sions of a system. However, the approach is quite dif- dead code detection and reachability.
ferent: Chava can take two previously-built databases In selecting an appropriate model, a level of granular-
and create a difference database with minimal efforts. ity must be chosen. Not enough granularity will prevent a
user from being able to make non-trivial queries. However,
Integration with HTML Analysis: Chava can ana- too much granularity leads to a database that is too large to
lyze web pages along with the embedded Java applets handle queries efficiently. Our model handles class mem-
by combining its database with HTML analysis results ber declarations. We create entities for all constructs up to
created by WebCiao [8], which also uses an entity- this level of granularity in a program, but do not include in-
relationship model. formation down at the level of statements and expressions.
That means detailed control flow analysis or pattern match-
To give a quick idea of the capabilities of Chava, Figure 1
ing on program constructs [23] is not available with this
shows a sample diagram generated by our tool from the
level of abstraction.
difference database created for JDK1.0 and JDK1.1. The
We will illustrate the model with an example of a sim-
query was
ple Java program. Figure 2 contains the source code for a
Show all the methods that referred to any deleted, set of classes that implements circles and rectangles. The
protected field member in any Java class. base class Shape is extended to implement Circle and
Rectangle.
The diagram shows immediately that only one pro-
tected field, PushbackInputStream.pushBack, was 2.1. Entity types
deleted (shown as a white oval) in JDK1.1, and five
methods were affected by this change, all in class
Our model handles the following Java entity types:
PushbackInputStream. It also showed that all these
references have now been removed (represented by dot- class: Contains declarations and definitions of a col-
ted edges) in JDK1.1. By doing a reverse reachabil- lection of methods and fields.
ity analysis for three layers, we see that the method
DataInputStream.readLine, which refers to two of interface: Interfaces are similar to classes, but do not
those methods that used to access the deleted field, is af- contain definitions. Classes implement the declara-
fected by this change as well and should be retested. Note tions of zero or more interfaces.
that solid edges indicate relationships that remain in the new
version (JDK1.1). Finding correlations between a new soft- package: A set of classes.
ware feature and changed program entities and relationships
file: Source code that contains one or more classes
is frequently useful in helping locate problems should they
arise after the introduction of the new feature. method: A function that is part of a class

2. A Data Model for Java field: A variable or constant that is part of a class

string: Strings that are referenced by methods or fields.


Our Java Data model is based on Chen’s entity-
relationship model [4]. Each Java program is viewed as a For example, in Figure 2, we have the following entities:
set of entities, which may refer to each other. Entities exist Classes: Shape, Circle, Rectangle
for each language construct, such as classes, methods, and Interfaces: Cloneable
fields. Relationships between entities encompass notions Packages: graph
such as inheritance and method invocation. This section de- Files: Shape.java
scribes in more detail the composition of the entities and Methods: Shape.printArea, Circle.Circle (constructor),
relationships. Circle.area, Circle.circumference, Rectangle.Rectangle,
A property that our model must satisfy is that of com- Rectangle.area, Rectangle.circumference
pleteness as described in Acacia [7]. In order for our model Fields: Circle.r, Circle.PI, Rectangle.w, Rectangle.h
to be complete, it must be the case that if the compilation Strings: "Area:"

2
int java.io.PushbackInputStream.available()

void java.io.PushbackInputStream.PushbackInputStream(java.io.InputStream)
java.lang.String java.io.DataInputStream.readLine()
void java.io.PushbackInputStream.unread(int) java.io.PushbackInputStream.pushBack

int java.io.PushbackInputStream.read()

int java.io.PushbackInputStream.read(byte,int,int)

Figure 1. A Sample Reverse Reachability Diagram from the Difference Database of JDK1.0 and JDK1.1

2.2. Attributes

All entities contain the following attributes: id, name,


kind, file, begin line, end line, and chksum. The chksum at-
// file name is Shape.java tribute is a 64 bit integer that can be used to compare two
package graph;
versions of an entity. This attribute is used when comparing
abstract class Shape implements Cloneable
two databases that represent different versions of the pro-
{ gram.
public abstract double area(); The model also includes other attributes that only apply
public abstract double circumference(); to certain kinds of entities. Table 1 summarizes these at-
public void printArea() { tributes. They are as follows:
System.out.println("Area:"+area());
} parent: The parent of a method, a field, or a string is
} its defining class. The parent of a class is the package
that contains the class. Packages and files do not have
class Circle extends Shape parents.
{
protected double r; scope: The scope attribute contains one of private, pro-
protected static double PI = 3.14159265; tected or public. It is used for class entities, which
public Circle(double r) { this.r = r; } can be public or private, and class members (fields and
public double area()
methods) which can be public, private or protected.
{ return PI * r * r; }
public double circumference()
flags: A set of zero or more modifiers for a given
{ return 2 * PI * r; }
object. For methods, modifiers include abstract, fi-
}
nal, native, static, and synchronized. For fields, mod-
class Rectangle extends Shape ifiers include final, static, transient, and volatile. For
{ classes and interfaces, modifiers include abstract and
protected double w, h; final. Flags are represented in the database as a bit
public Rectangle(double w, double h) vector.
{ this.w = w; this.h = h; }
public double area() { return w * h; } params: For method types, this attribute contains a list
public double circumference() of parameters for the method. This attribute is not used
{ return 2 * (w + h); } by other entity types. It is primarily used to distinguish
} among overloaded methods under the same name.

dtype: For method types, this attribute contains the re-


Figure 2. A Simple Java Program turn type of the method. For field types, the attribute
contains the type name of the field. For classes, this
attribute contains the name of the package it is con-
tained in. The full name of a class is constructed by
appending the name and dtype attributes of the entity.

3
graph.Circle void Circle.Circle(double)
graph.Shape java.lang.Object
graph.Rectangle
double Circle.area()

Figure 4. Class inheritance diagram double Circle.circumference()


Circle
void Circle.<clinit>()
The dtype attribute is used in these two differing con-
texts to preserve space in the database. However, dtype
is a package name in one context and a type name in Circle.r

the other, so there is no overlap in the possible set of


values for these contexts. Circle.PI

For example, let us consider two entities from our ex-


Figure 5. Members of class Circle
ample program. The field Circle.PI has as its parent
the class Circle, its scope is protected, its flag is
static, params is empty and dtype is double. The con-
field write: A field write relationship exists when
structor for class Rectangle has as its parent the class
method A field writes to B.
Rectangle, its scope is public, flags is empty, params
is (double w, double h), and dtype is void.
reference: Reference relationships exist between two
In cases where we are working without source files, some
method entities A and B when A invokes B.
entities will be missing values for the begin and end line
numbers. Java class files do not include line numbers for
definitions of classes, fields and strings. However, bytecode 3. Java Program Analysis
for methods includes line numbers annotations, which we
use to find the begin and end line numbers for this entity
type. Entities with missing line numbers do not affect the Once we generate a database for a particular Java ap-
ability to do most analysis tasks on the code. plication, we can then use a number of supporting tools to
extract information about the program. This section starts
with examples of basic database queries and program vi-
2.3. Relationship Types
sualization, followed by reachability analysis and program
differencing.
Our Java data model contains the following relation-
ships:
3.1. Visualization Queries with CIAO
subclass: Figure 4 shows the subclass relationships
that exist in our example program. The class Shape Using a retargetable reverse engineering system called
has Object as a subclass, and both Circle and CIAO [6], a user interface can be generated to graph re-
Rectangle have Shape as a subclass. Arrows go lationships in a Java program. CIAO takes a specification
from class to superclass in order to show the direction for a language and generates a set of supporting tools for
of dependency and maintain the completeness prop- querying and visualizing databases for that language. In-
erty. stantiations of CIAO exist for a variety of languages includ-
ing C [9], C++ [7], HTML [8], and ksh.
containment: If a field or method A is a member of
class B, then a containment relationship exists between Going back to our example in the previous section, we
B and A. can use the Java instance of CIAO to show its class inheri-
tance graph as seen in Figure 4. Another query can be made
implements: The implements relationship exists be- to the database returning all relationships that exist in the
tween a class entity A and an interface entity B if class application, as shown in Figure 3. In the graphs, each entity
A implements B. In the example, Shape implements type is drawn with a different shape, color, etc. as defined
the interface Cloneable. in the CIAO specification.
If we want to see all of the members of the class
field read: A field read relationship exists between a Circle, we can perform a similar query, returning all re-
method entity A and a field entity B if method A reads lationships that are of the type containment. Figure 5 shows
field B. the resulting graph.

4
Type Parent Scope Flags Params Dtype
Class package public,private abstract,final N/A package name
Interface package public,private abstract,final N/A package name
Package N/A N/A N/A N/A N/A
File N/A N/A N/A N/A N/A
Method class public,private,protected abstract,final,native,static, params return type
synchronized
Field class public,private,protected final,static,transient,volatile N/A type
String class N/A N/A N/A N/A

Table 1. Other Attributes

double graph.Circle.area() double graph.Circle.r java.lang.String java.lang.StringBuffer.toString()

double graph.Circle.circumference() double graph.Circle.PI void java.io.PrintStream.println(java.lang.String)

graph.Circle void graph.Circle.Circle(double) void graph.Shape.printArea() java.lang.System.out

double graph.Shape.circumference() void java.lang.StringBuffer.StringBuffer(java.lang.String)

void graph.Circle.<clinit>()
java.lang.Cloneable java.lang.StringBuffer java.lang.StringBuffer.append(double)

graph.Shape
java.lang.Object
double graph.Shape.area()

void graph.Rectangle.Rectangle(double,double) void graph.Shape.Shape() void java.lang.Object.Object()


graph.Rectangle
double graph.Rectangle.circumference() double graph.Rectangle.h

double graph.Rectangle.area() double graph.Rectangle.w

Figure 3. Relationships in Example Program

3.2. Reachability Analysis the class SimpleDateFormat. Thus, we have learned


that any change to intValue will require the methods
There are two kinds of reachability analysis, forward subParse, byteValue, and shortValue to be either
reachability and reverse reachability. With forward reach- modified or retested appropriately.
ability, we start with an entity and compute all entities that Reachability analysis along with a difference database
are in the transitive closure of relationships from that en- and dynamic traces were also used in TestTube [10] to de-
tity. This kind of analysis is useful if we have a large set of termine what test cases need to be rerun during selective
classes and methods available to us but only need to use a regression testing.
subset for a given program.
Reverse reachability detects entities which depend on 3.3. Differencing
a given entity, either directly or indirectly. This type
of analysis is useful if we want to make a change to Our supporting tools allow two different databases to be
an entity such as a method or field and we need to compared against each other. Using a set of attributes, all
see what other parts of the program will be affected by entities in one version of the database are compared against
the change. Figure 6 shows reverse reachability anal- another version. From this, we get a list of added and
ysis on the method java.lang.Number.intValue. deleted entities. For entities that occur in both versions, we
The graph shows that this method is called in the class use the chksum field to compare the entities. If the check-
Number as well as from a method called subParse in sum matches, the entities are considered the same, other-

5
SimpleDateFormat parse
0.1
subParse
Long iserver ihtwalk

Short
0.2

byteValue icron iagent ihttpd


BigDecimal intValue
0.3
shortValue
BigInteger Number ipmain icmd archive

Float

idns iforward iarchive javabin


Double

Byte
mysock

Integer

Figure 6. Reverse Reachability Example Figure 8. A Cluster Diagram for a Proxy Server
Written in Java

wise they are considered changed.


To demonstrate one of the uses of differencing, we have to produce the cluster diagram, including the process of cre-
constructed databases for the Java 1.0 standard library and ating the database and performing cluster analysis, in less
the Java 1.1 standard library. Using differencing, we can than a few seconds. The resulting clusters are consistent
easily see what changes were made between these two ver- with the developer’s view on the high-level structure of the
sions of the libraries. For example, Figure 1 showed differ- code: the right cluster corresponds to a Java CGI compo-
encing combined with reverse reachability. nent, the middle cluster represents a web server component,
Figure 7 shows differencing combined with forward and the left cluster represents the agent component of the
reachability. The original program is our example from Fig- proxy server. Such a road map is potentially useful for fu-
ure 2. We compare this to a new version that has removed ture software maintainers who did not develop the original
the method printArea from the class Shape. We view source code to quickly identify the logical components in
all relationships in the new program showing difference in- the proxy server.
formation. The dotted arrows show relationships that have
been deleted. The white box shows the member function 4.2. Security
that has been deleted.
Chava analysis can be useful in detecting potential se-
4. Applications curity flaws in an applet or application. If we download a
set of binary Java classes off the web, we can use the re-
sults of queries to see which potentially dangerous methods
This section presents some examples of how Chava can and classes are being used in the program. For example,
be applied to either software systems or analysis of web- if we run a query asking Chava to find all methods that in-
sites. voke a method whose parent package is java.net, then
we can see if the application is making use of the network
4.1. Clustering API. Also, by analyzing changes in the internal structure of
a Java applet, we can find out if the dynamic content of a
We have been working with several other researchers to website has been changed in a malicious and not obvious
study the use of clustering techniques [21, 18, 27] on soft- way.
ware repositories to discover high level software structures
from existing code. We applied Bunch [20], the clustering 4.3. Object Model Extraction
tool developed in this purpose, to the Java software reposi-
tory we created for a proxy server called iPROXY [24], and Similar to Womble [19], we can use the extracted infor-
obtained the cluster diagram shown in Figure 8. The proxy mation stored in a Chava database to produce object mod-
server consists of 3,600 lines of Java code and we were able els of software systems. Object models present a graphical

6
void graph.Circle.<clinit>()
graph.Circle.PI
double graph.Circle.circumference()

graph.Circle double graph.Circle.area() graph.Circle.r

void java.lang.Object.Object()

void graph.Shape.Shape()
double graph.Shape.area()
void graph.Circle.Circle(double)

void java.io.PrintStream.println(java.lang.String)
void graph.Shape.printArea()
graph.Shape java.lang.System.out
java.lang.Object
void java.lang.StringBuffer.StringBuffer(java.lang.String)
java.lang.Cloneable
java.lang.StringBuffer java.lang.StringBuffer.append(double)
void graph.Rectangle.Rectangle(double,double) double graph.Shape.circumference()
graph.Rectangle java.lang.String java.lang.StringBuffer.toString()
double graph.Rectangle.area() graph.Rectangle.h

double graph.Rectangle.circumference() graph.Rectangle.w

Figure 7. Difference Example

abstraction of the interactions between components in an The ability to reverse engineer a remote applet has im-
object-oriented program. By examining the definitions of portant uses. A user may wish to reverse engineer an applet
methods and fields in a class, as well as references to objects on a remote site to get an overview of what it does or ana-
in methods of a class, we can find one-to-one, one-to-many, lyze potential security problems (see Section 4.2).
and many-to-one relationships between classes. Graphical Since the author of an applet typically writes the appli-
tools such as CIAO can be extended to draw the model. The cation in one environment and uploads the applet to a server
architectural view allows observers to more easily see rela- where it is run, there is a possibility that the applet will be-
tionships between program components, which is useful for have differently in each environment. For example, a dif-
the design, implementation and debugging of software sys- ferent set of classes could be loaded on the server due to
tems. its configuration. Our tool eliminates the process of mov-
ing the applet and its supporting components to a different
4.4. Software Metrics machine before doing the analysis.
The CIAO environment works with a variety of lan-
Using information such as the number of methods in
guages, one of which is HTML. WebCiao [8] is a system
each class and the number of method calls or field refer-
for visualizing and tracking the structures of web sites us-
ences in each method, we can devise software metrics that
ing an HTML repository. WebCiao generates a database
can be used to suggest how complex a system is. For exam-
from a set of web pages, and uses a suite of tools that are
ple, Chava can be used to compute the following Object
generated with a CIAO specification.
Oriented Design (OOD) metrics proposed by Chidamber
and Kemerer[11]: WMC (weighted methods per class), DIT We have implemented a tool which combines reposito-
(depth of inheritance tree), NOC (number of children), and ries of HTML and Java applets. The combined repository
CBO (coupling between object classes). gives web developers an environment for viewing all as-
pects of web content together, both static and dynamic, in-
4.5 Website Analysis cluding document structure and applet interaction.
Figure 9 shows an example query for the web page
Our tool supports generation of databases for remote ap- http://www.att.com. The query shows the reachable
plets. When the URL for a web page is specified to the set of entities from this url. The web page contains an applet
database generation program, it will find any applet tags and called AIG Flipper, so a relationship exists between the
process the class files for each applet in the HTML file. web page and the applet. Thus, this query draws the set of

7
URLs referred to by the web page and the methods referred
to by the applet together.

5. Implementation

Our implementation consists of two parts. First, there is


a program called chava that generates a repository for an
init
application. Second, there is a specification file that pro-
setInfo
vides information about the semantics of the database used
by external tools.
run
Chava is implemented as a C program and is around
paint
3,000 lines of source code. A Java-based implementation
was considered, but was discontinued after initial perfor-
titleUpdate
mance results were poor. The C-based implementation runs
msgUpdate
an order of magnitude faster than our Java version.
Chava works analogously to a compiler. It first gener-
update
ates incremental databases on either a per-file or per-class
http://imag...onalink.gif
http://www....personalink start
basis (depending whether we are working with source or
AIG_Flipper.class
class files), similar to how a compiler generates object files.
AIG_Flipper stop
Then all of the per-class databases are linked together to
AT&T WorldN...er a friend
mouseEnter
generate the complete database, similar to the linking phase
http://www....01.csb.html
of a compiler. The linking phase reads in the per-class
Larry Rabin..., AT&T Labs
http://www....01.ala.html mouseExit databases and resolves entities that reference external en-
If you had ...ould it be? mouseMove
tities with their definitions.
http://www....01.csa.html

AT&T offers...ness owners http://www.att.com/wib mouseDown


5.1. .class Files
http://www..../cmd/eoffer parseColor
Long-distan...sup> Online
Chava is able to generate a database for a class given
http://www....rningpoints AIG_Flipper
Your calls ...up> credits
only access to its class file (generated by the compiler).
http://www.att.com
Stock Price:
http://www.att.com/ir Runnable
The class file contains enough information about the enti-
Newsroom
ties in a class that we can fill in most of the attributes for
http://imag...orldnet.gif http://www.att.com/news
Applet our model. Class files contain a table of methods, fields,
Write to Us http://www....ourworldnet
classes and strings. Chava goes through each of these ta-
bles and converts the table entries into database entities. To
Comment on the Site
http://www.att.com/write
determine the relationships of field access and method invo-
http://www....om/feedback
cation, Chava goes through the byte code for each method
Help/Search
and picks out instructions that call methods or access fields.
http://search.att.com
Site Map

http://www....com/sitemap 5.2. Source Files


Privacy Policy

http://www....com/privacy

Terms and Conditions


Class files do not contain every piece of information that
http://www..../terms.html our model provides, but they contain enough to generate a
useful database for our tools. Class files are missing some
Figure 9. HTML and Java Integration information about line numbers that Chava is able to get by
parsing source files. If a source file exists for a processed
class, Chava parses it to fill in the missing fields from the
database.
If we only have a source file for a class without the com-
piled class file, Chava invokes the Java compiler to produce
the class file and then works with both the source and class
files. Thus, Chava is capable of working with only source,
only class files, or both.

8
5.3. Archives The specification file is less than 300 lines long. The
complete suite of query, visualization, and generic reach-
Java allows a set of classes to be combined into a single ability analysis tools are generated from this specification
file as an archive. To generate a database for an archive, the file.
archive is first expanded into its source and/or class files.
Then, Chava is run on each file and the set of databases is 6. Performance and Experience
linked to form a single database from the set of classes. The
resulting database can subsequently be linked with other ap- This section looks at the performance of Chava as a func-
plications. tion of the number of entities and relationships emitted. It
For example, we have produced a database for the Java is our goal that Chava scale well with large applications so
default class library. The library is quite large and con- that it can be useful for real-world software projects.
sists of around 2 megabytes of class files. The generated
database contains 15,248 entities and 35,012 relationships. 6.1. Speed and Size
This database can be linked in with Java applications if the
user wants to do analysis involving the Java class library. To measure the speed of Chava, we have taken a set
For example, reachability analysis on a Java program would of applications ranging in size and generated databases for
reveal how much of the Java library it actually uses. them. To put the numbers in context, we compare the time
that Chava takes to the speed of compilation. We also com-
5.4. Java Instantiation of CIAO pare the running time of Chava to javap, which is a Java
program in the Java SDK that dumps the contents of a class
The query and visualization subsystem for Java is built file. javap can be viewed as a lower bound on performance
by constructing a new instance of the CIAO system. CIAO had we used Java to write Chava.
takes a specification file for a language and generates the The three programs we performed experiments on were
supporting tools for querying and visualization automati- as follows:
cally. The specification consists of the following parts:
1. java.*: This is a collection of classes that make up the
Schema. It enumerates the attributes of entities and re- standard library for Java. All classes are in packages
lationships. Types for attribute fields include integers, that begin with java. The set of classes is stored as
strings, and pointers to other entities. an archive in zip format, and source is not available.
Thus, our tool does not extract some properties from
Database View. This section defines how different en- the archive, such as line numbers.
tity and relationship entries are viewed in text format.
For example, a class entity contains the package name 2. swing: Swing is another Java archive distributed from
as one attribute and the class name as another. When Sun. It contains a set of classes that implement a set
displaying a class entity, we can specify that the name of user interface components for use with Java. This is
be displayed by combining these fields. a good example of a large application in which source
code is available.
HTML View. This section defines how to format each
3. WebDelta: WebDelta is an AT&T project that imple-
of the attributes in the database as HTML, making it
ments a Java-based version of the WebCiao interface.
possible to display query results in a browser window.
This is an example of a medium sized software project.
Source View. This section defines the fields that are Source is available.
needed to find the source file and line numbers for a The results are shown in Tables 2, 3 and 4. Running
given entity in the database. times are from a SPARC station running Solaris 2.5. We
Graph View. This section defines how to graphically see in Table 3 that Chava is significantly faster than compi-
display entities and relationships. Entities are dis- lation with javac. In fact, Chava is also faster than javap, the
played as nodes, and the specification can define what Java program that dumps the contents of a class file, despite
colors, shapes, fonts, etc. should be used for differ- Chava’s increased functionality. This is most likely the case
ent kinds of entities. Relationships are represented as because Chava is implemented as a C program instead of a
edges, and can also be represented with various styles Java program. Nonetheless, these numbers indicate that the
and colors. performance of Chava is an order of magnitude better than
compilation.
GUI Front End. The final section defines the appear- Table 4 shows the number of entities and relations gener-
ance of the graphical user interface. ated by each of our test applications. The table also includes

9
Program Source size Class size Number of classes
java.* 300,000 lines 1,648,508 696
swing 211,640 lines 1,340,364 503
WebDelta 23,951 lines 369,469 92

Table 2. Project information

Program javac javap Compile Database Link Database


java.* N/A 1m5.4s 48.6s 22.07s
swing 5m16s 38.29s 30.72s 15.61s
WebDelta 1m8s 8.8s 4.9s 2.52s

Table 3. Performance of Chava


Program Database size Number of entities Number relationships
java.* 1,720,326 15,248 35,012
swing 1,245,338 9,939 30,371
WebDelta 482,594 3,807 14,422

Table 4. Space Requirements

the size of the database file, the content of which is database put information in the database about exceptions and
entries in ASCII format. We see that the generated database errors. Future versions may create separate entities for
size is in the order of the size of the class files, which is them and create relationships between methods and the
quite manageable. Size could be significantly reduced if exceptions/errors they throw.
compression were used on the database. The number of en-
tities and relationships is small enough that queries can be Handle compiler optimizations. Chava works with
performed efficiently. class files emitted from Java compilers. However,
some compilers optimize code. When source files are
not available, Chava has difficulty working with op-
7. Summary and Future Work timized code because entities from the original code
may not exist in the compiled class file. If source ex-
With the emerging popularity of Java, a growing num- ists, Chava can always recompile the code without op-
ber of applications are being written which can benefit from timizations, but it should be possible to deal with some
tools that assist with software engineering tasks. Our tool optimizations at the bytecode level.
makes it possible to work with large applications and per-
form complex analysis tasks such as reachability analysis Integration with debuggers. Our Java database genera-
and clustering analysis. It also allows visualization of the tion tool is the first instantiation of CIAO to work with
components of an application to facilitate the understand- object files. Much of the information that we extract
ing of their interaction. Working with object code instead from applications would be useful to a debugger, as our
of source has the disadvantage that we cannot extract in- suite of tools help users to better understand programs.
formation that the compiler has removed (such as optimiza- We plan to integrate the CIAO tools into a prototype
tions or source comments), but is advantageous in that it debugger called deet [17]. Some examples of applica-
allows analysis to be applied to a wider variety of applica- tions that might be useful are: (1) being able to look at
tions, such as applets and legacy systems in which source is the current location of a program inside its call graph
unavailable. and (2) using queries to find a set of line numbers to
We have several tasks planned for future development of use as breakpoints (e.g., set a breakpoint at every line
our Java repository: that reads field X).

Handle errors and exceptions. In Java, methods may 3D Visualization. Feijs and De Jong [15] have applied
throw exceptions and errors. Each method has a set 3D visualization techniques to software systems with
of exceptions that it is allowed to throw, and others encouraging results. 3D provides more ways to repre-
must be caught. Our current implementation does not sent program relationships, making use of position and

10
color to describe attributes. We plan to apply some of Dead Code Detection. In Sixth European Software En-
these techniques to Chava databases. gineering Conference and Fifth ACM SIGSOFT Sym-
posium on the Foundations of Software Engineering,
Web-Based Reverse Engineering Service. Instead of Sept. 1997.
installing CIAO and Chava on each user’s machine,
we are currently creating a web service for develop- [8] Y.-F. Chen and E. Koutsofios. WebCiao: A Web-
ers who would like to share the understanding of a site Visualization and Tracking System. In WebNet97,
particular program amongst each other. Users will be 1997.
able to generate graph views using an applet [2], run
database queries through JDBC, and view source code [9] Y.-F. Chen, M. Nishimoto, and C. V. Ramamoor-
as HTML or XML [22]. Such a service will allow re- thy. The C Information Abstraction System. IEEE
searchers to freely analyze and experiment with public Transactions on Software Engineering, 16(3):325–
source code. 334, Mar. 1990.

[10] Y.-F. Chen, D. Rosenblum, and K.-P. Vo. TestTube:


8. Availability A System for Selective Regression Testing. In The
16th International Conference on Software Engineer-
Chava will soon be available for experimental use. ing, pages 211–220, 1994.
Please visit http://www.research.att.com/
˜ciao. [11] S. R. Chidamber and C. F. Kemerer. A Metrics Suite
for Object Oriented Design. IEEE Transactions on
Software Engineering, 20(6):476–493, 1994.
References
[12] E. H. Chikofsky and J. H. C. II. Reverse Engineering
[1] K. Arnold and J. Gosling. The Java Programming and Design Recovery: A Taxonomy. IEEE Software,
Language. Addison Wesley, 1996. 7(1), Jan. 1990.

[2] N. S. Barghouti, J. Mocenigo, and W. Lee. Grappa: [13] P. Devanbu. GENOA—A language and front-End in-
A Graph Package in Java. In Fifth International Sym- dependent source code analyzer generator. In Pro-
posium on Graph Drawing, pages 336–343. Springer- ceedings of the Fourteenth International Conference
Verlag, Sept. 1997. on Software Engineering, pages 307–317, 1992.

[3] E. Buss, R. D. Mori, W. Gentleman, J. Henshaw, [14] F. Douglis, T. Ball, Y.-F. Chen, and E. Koutsofios.
J. Johnson, K. Kontogianis, E. Merlo, H. Müller, The AT&T Internet Difference Engine: Tracking and
J. Mylopoulos, S. Paul, A. Prakash, M. Stanley, Viewing Changes on the Web. World Wide Web,
S. Tilley, J. Troster, and K. Wong. Investigating 1(1):27–44, 1998.
Reverse Engineering Technologies for the CAS Pro-
gram Understanding Project. IBM Systems Journal, [15] L. Feijs and R. D. Jong. 3D visualization of soft-
33(3):477–500, 1994. ware architectures. Communications of the ACM,
41(12):73–78, Dec. 1998.
[4] P. P. Chen. The Entity-Relationship Model – Toward a
Unified View of Data. ACM Transactions on Database [16] J. Grass. Cdiff: A Syntex Directed Differencer for
Systems, 1(1):9–36, Mar. 1976. C++ Programs. In Proceedings of the Usenix C++
Conference, Aug. 1992.
[5] Y.-F. Chen. Reverse engineering. In B. Krishna-
murthy, editor, Practical Reusable UNIX Software, [17] D. R. Hanson and J. L. Korn. A Simple and Exten-
chapter 6, pages 177–208. John Wiley & Sons, New sible Graphical Debugger. In Winter 1997 USENIX
York, 1995. Conference, pages 173–184, Jan. 1997.

[6] Y.-F. Chen, G. S. Fowler, E. Koutsofios, and R. S. Wal- [18] D. Hutchens and R. Basili. System Structure Analysis:
lach. Ciao: A Graphical Navigator for Software and Clustering with Data Bindings. IEEE Transactions on
Document Repositories. In International Conference Software Engineering, 11:749–757, Aug. 1995.
on Software Maintenance, pages 66–75, 1995.
[19] D. Jackson and A. Waingold. Lightweight Extraction
[7] Y.-F. Chen, E. Gansner, and E. Koutsofios. A C++ of Object Models from Bytecode. In Proc. 21st Intl.
Data Model Supporting Reachability Analysis and Conf. Software Engineering, May 1999.

11
[20] S. Mancoridis, B. S. Mitchell, C. Rorres, Y. Chen, and
E. Gansner. Using Automatic Clustering to Produce
High-Level System Organizations of Source Code. In
Sixth International Workshop on Program Compre-
hension, June 1998.
[21] H. Müller, M. Orgun, S. Tilley, and J. Uhl. A Reverse
Engineering Approach to Subsystem Structure Identi-
fication. Journal of Software Maintenance: Research
and Practice, 5:181–204, 1993.
[22] The SGML/XML Web Page. http://www.
oasis-open.org/cover/xml.html, 1999.
[23] S. Paul and A. Prakash. A Framework for Source Code
Search Using Program Patterns. IEEE Transactions on
Software Engineering, 20(3):463–475, June 1994.
[24] H. Rao, Y.-F. Chen, M.-F. Chen, and J. Cheng. iproxy:
An agent-based middleware. In Poster Proceedings of
the Eighth World Wide Web Conference, May 1999.
[25] D. Rayside, S. Kerr, and K. Kontogiannis. Change
and Adaptive Maintenance Detection in Java Software
Systems. In Proc. Fifth Working Conference on Re-
verse Engineering, Oct. 1998.
[26] The Java Reflection API. http://www.
javasoft.com/products, 1998.
[27] R. Schwanke. An Intelligent Tool For Re-Engineering
Software Modularity. In Proc. 13th Intl. Conf. Soft-
ware Engineering, May 1991.
[28] J. Seemann and J. W. von Gudenberg. Pattern-based
design recovery of java software. In Proc. Foundation
of Software Engineering, Nov. 1998.
[29] D. Sharon and R. Bell. Tools that Bind: Creating In-
tegrated Environments. IEEE Software, 12(2):76–85,
Mar. 1995.
[30] I. Thomas. PCTE Interfaces: Supporting Tools in
Software-Engineering Environments. IEEE Software,
6(6):15–23, Nov. 1989.

12

View publication stats

You might also like