Chava-reverse engineering and tracking of Java applets
Chava-reverse engineering and tracking of Java applets
net/publication/3826122
CITATIONS READS
55 571
3 authors, including:
All content following this page was uploaded by Yih-Farn Robin Chen on 19 May 2014.
2. A Data Model for Java field: A variable or constant that is part of a class
2
int java.io.PushbackInputStream.available()
void java.io.PushbackInputStream.PushbackInputStream(java.io.InputStream)
java.lang.String java.io.DataInputStream.readLine()
void java.io.PushbackInputStream.unread(int) java.io.PushbackInputStream.pushBack
int java.io.PushbackInputStream.read()
int java.io.PushbackInputStream.read(byte,int,int)
Figure 1. A Sample Reverse Reachability Diagram from the Difference Database of JDK1.0 and JDK1.1
2.2. Attributes
3
graph.Circle void Circle.Circle(double)
graph.Shape java.lang.Object
graph.Rectangle
double Circle.area()
4
Type Parent Scope Flags Params Dtype
Class package public,private abstract,final N/A package name
Interface package public,private abstract,final N/A package name
Package N/A N/A N/A N/A N/A
File N/A N/A N/A N/A N/A
Method class public,private,protected abstract,final,native,static, params return type
synchronized
Field class public,private,protected final,static,transient,volatile N/A type
String class N/A N/A N/A N/A
void graph.Circle.<clinit>()
java.lang.Cloneable java.lang.StringBuffer java.lang.StringBuffer.append(double)
graph.Shape
java.lang.Object
double graph.Shape.area()
5
SimpleDateFormat parse
0.1
subParse
Long iserver ihtwalk
Short
0.2
Float
Byte
mysock
Integer
Figure 6. Reverse Reachability Example Figure 8. A Cluster Diagram for a Proxy Server
Written in Java
6
void graph.Circle.<clinit>()
graph.Circle.PI
double graph.Circle.circumference()
void java.lang.Object.Object()
void graph.Shape.Shape()
double graph.Shape.area()
void graph.Circle.Circle(double)
void java.io.PrintStream.println(java.lang.String)
void graph.Shape.printArea()
graph.Shape java.lang.System.out
java.lang.Object
void java.lang.StringBuffer.StringBuffer(java.lang.String)
java.lang.Cloneable
java.lang.StringBuffer java.lang.StringBuffer.append(double)
void graph.Rectangle.Rectangle(double,double) double graph.Shape.circumference()
graph.Rectangle java.lang.String java.lang.StringBuffer.toString()
double graph.Rectangle.area() graph.Rectangle.h
abstraction of the interactions between components in an The ability to reverse engineer a remote applet has im-
object-oriented program. By examining the definitions of portant uses. A user may wish to reverse engineer an applet
methods and fields in a class, as well as references to objects on a remote site to get an overview of what it does or ana-
in methods of a class, we can find one-to-one, one-to-many, lyze potential security problems (see Section 4.2).
and many-to-one relationships between classes. Graphical Since the author of an applet typically writes the appli-
tools such as CIAO can be extended to draw the model. The cation in one environment and uploads the applet to a server
architectural view allows observers to more easily see rela- where it is run, there is a possibility that the applet will be-
tionships between program components, which is useful for have differently in each environment. For example, a dif-
the design, implementation and debugging of software sys- ferent set of classes could be loaded on the server due to
tems. its configuration. Our tool eliminates the process of mov-
ing the applet and its supporting components to a different
4.4. Software Metrics machine before doing the analysis.
The CIAO environment works with a variety of lan-
Using information such as the number of methods in
guages, one of which is HTML. WebCiao [8] is a system
each class and the number of method calls or field refer-
for visualizing and tracking the structures of web sites us-
ences in each method, we can devise software metrics that
ing an HTML repository. WebCiao generates a database
can be used to suggest how complex a system is. For exam-
from a set of web pages, and uses a suite of tools that are
ple, Chava can be used to compute the following Object
generated with a CIAO specification.
Oriented Design (OOD) metrics proposed by Chidamber
and Kemerer[11]: WMC (weighted methods per class), DIT We have implemented a tool which combines reposito-
(depth of inheritance tree), NOC (number of children), and ries of HTML and Java applets. The combined repository
CBO (coupling between object classes). gives web developers an environment for viewing all as-
pects of web content together, both static and dynamic, in-
4.5 Website Analysis cluding document structure and applet interaction.
Figure 9 shows an example query for the web page
Our tool supports generation of databases for remote ap- http://www.att.com. The query shows the reachable
plets. When the URL for a web page is specified to the set of entities from this url. The web page contains an applet
database generation program, it will find any applet tags and called AIG Flipper, so a relationship exists between the
process the class files for each applet in the HTML file. web page and the applet. Thus, this query draws the set of
7
URLs referred to by the web page and the methods referred
to by the applet together.
5. Implementation
http://www....com/privacy
8
5.3. Archives The specification file is less than 300 lines long. The
complete suite of query, visualization, and generic reach-
Java allows a set of classes to be combined into a single ability analysis tools are generated from this specification
file as an archive. To generate a database for an archive, the file.
archive is first expanded into its source and/or class files.
Then, Chava is run on each file and the set of databases is 6. Performance and Experience
linked to form a single database from the set of classes. The
resulting database can subsequently be linked with other ap- This section looks at the performance of Chava as a func-
plications. tion of the number of entities and relationships emitted. It
For example, we have produced a database for the Java is our goal that Chava scale well with large applications so
default class library. The library is quite large and con- that it can be useful for real-world software projects.
sists of around 2 megabytes of class files. The generated
database contains 15,248 entities and 35,012 relationships. 6.1. Speed and Size
This database can be linked in with Java applications if the
user wants to do analysis involving the Java class library. To measure the speed of Chava, we have taken a set
For example, reachability analysis on a Java program would of applications ranging in size and generated databases for
reveal how much of the Java library it actually uses. them. To put the numbers in context, we compare the time
that Chava takes to the speed of compilation. We also com-
5.4. Java Instantiation of CIAO pare the running time of Chava to javap, which is a Java
program in the Java SDK that dumps the contents of a class
The query and visualization subsystem for Java is built file. javap can be viewed as a lower bound on performance
by constructing a new instance of the CIAO system. CIAO had we used Java to write Chava.
takes a specification file for a language and generates the The three programs we performed experiments on were
supporting tools for querying and visualization automati- as follows:
cally. The specification consists of the following parts:
1. java.*: This is a collection of classes that make up the
Schema. It enumerates the attributes of entities and re- standard library for Java. All classes are in packages
lationships. Types for attribute fields include integers, that begin with java. The set of classes is stored as
strings, and pointers to other entities. an archive in zip format, and source is not available.
Thus, our tool does not extract some properties from
Database View. This section defines how different en- the archive, such as line numbers.
tity and relationship entries are viewed in text format.
For example, a class entity contains the package name 2. swing: Swing is another Java archive distributed from
as one attribute and the class name as another. When Sun. It contains a set of classes that implement a set
displaying a class entity, we can specify that the name of user interface components for use with Java. This is
be displayed by combining these fields. a good example of a large application in which source
code is available.
HTML View. This section defines how to format each
3. WebDelta: WebDelta is an AT&T project that imple-
of the attributes in the database as HTML, making it
ments a Java-based version of the WebCiao interface.
possible to display query results in a browser window.
This is an example of a medium sized software project.
Source View. This section defines the fields that are Source is available.
needed to find the source file and line numbers for a The results are shown in Tables 2, 3 and 4. Running
given entity in the database. times are from a SPARC station running Solaris 2.5. We
Graph View. This section defines how to graphically see in Table 3 that Chava is significantly faster than compi-
display entities and relationships. Entities are dis- lation with javac. In fact, Chava is also faster than javap, the
played as nodes, and the specification can define what Java program that dumps the contents of a class file, despite
colors, shapes, fonts, etc. should be used for differ- Chava’s increased functionality. This is most likely the case
ent kinds of entities. Relationships are represented as because Chava is implemented as a C program instead of a
edges, and can also be represented with various styles Java program. Nonetheless, these numbers indicate that the
and colors. performance of Chava is an order of magnitude better than
compilation.
GUI Front End. The final section defines the appear- Table 4 shows the number of entities and relations gener-
ance of the graphical user interface. ated by each of our test applications. The table also includes
9
Program Source size Class size Number of classes
java.* 300,000 lines 1,648,508 696
swing 211,640 lines 1,340,364 503
WebDelta 23,951 lines 369,469 92
the size of the database file, the content of which is database put information in the database about exceptions and
entries in ASCII format. We see that the generated database errors. Future versions may create separate entities for
size is in the order of the size of the class files, which is them and create relationships between methods and the
quite manageable. Size could be significantly reduced if exceptions/errors they throw.
compression were used on the database. The number of en-
tities and relationships is small enough that queries can be Handle compiler optimizations. Chava works with
performed efficiently. class files emitted from Java compilers. However,
some compilers optimize code. When source files are
not available, Chava has difficulty working with op-
7. Summary and Future Work timized code because entities from the original code
may not exist in the compiled class file. If source ex-
With the emerging popularity of Java, a growing num- ists, Chava can always recompile the code without op-
ber of applications are being written which can benefit from timizations, but it should be possible to deal with some
tools that assist with software engineering tasks. Our tool optimizations at the bytecode level.
makes it possible to work with large applications and per-
form complex analysis tasks such as reachability analysis Integration with debuggers. Our Java database genera-
and clustering analysis. It also allows visualization of the tion tool is the first instantiation of CIAO to work with
components of an application to facilitate the understand- object files. Much of the information that we extract
ing of their interaction. Working with object code instead from applications would be useful to a debugger, as our
of source has the disadvantage that we cannot extract in- suite of tools help users to better understand programs.
formation that the compiler has removed (such as optimiza- We plan to integrate the CIAO tools into a prototype
tions or source comments), but is advantageous in that it debugger called deet [17]. Some examples of applica-
allows analysis to be applied to a wider variety of applica- tions that might be useful are: (1) being able to look at
tions, such as applets and legacy systems in which source is the current location of a program inside its call graph
unavailable. and (2) using queries to find a set of line numbers to
We have several tasks planned for future development of use as breakpoints (e.g., set a breakpoint at every line
our Java repository: that reads field X).
Handle errors and exceptions. In Java, methods may 3D Visualization. Feijs and De Jong [15] have applied
throw exceptions and errors. Each method has a set 3D visualization techniques to software systems with
of exceptions that it is allowed to throw, and others encouraging results. 3D provides more ways to repre-
must be caught. Our current implementation does not sent program relationships, making use of position and
10
color to describe attributes. We plan to apply some of Dead Code Detection. In Sixth European Software En-
these techniques to Chava databases. gineering Conference and Fifth ACM SIGSOFT Sym-
posium on the Foundations of Software Engineering,
Web-Based Reverse Engineering Service. Instead of Sept. 1997.
installing CIAO and Chava on each user’s machine,
we are currently creating a web service for develop- [8] Y.-F. Chen and E. Koutsofios. WebCiao: A Web-
ers who would like to share the understanding of a site Visualization and Tracking System. In WebNet97,
particular program amongst each other. Users will be 1997.
able to generate graph views using an applet [2], run
database queries through JDBC, and view source code [9] Y.-F. Chen, M. Nishimoto, and C. V. Ramamoor-
as HTML or XML [22]. Such a service will allow re- thy. The C Information Abstraction System. IEEE
searchers to freely analyze and experiment with public Transactions on Software Engineering, 16(3):325–
source code. 334, Mar. 1990.
[2] N. S. Barghouti, J. Mocenigo, and W. Lee. Grappa: [13] P. Devanbu. GENOA—A language and front-End in-
A Graph Package in Java. In Fifth International Sym- dependent source code analyzer generator. In Pro-
posium on Graph Drawing, pages 336–343. Springer- ceedings of the Fourteenth International Conference
Verlag, Sept. 1997. on Software Engineering, pages 307–317, 1992.
[3] E. Buss, R. D. Mori, W. Gentleman, J. Henshaw, [14] F. Douglis, T. Ball, Y.-F. Chen, and E. Koutsofios.
J. Johnson, K. Kontogianis, E. Merlo, H. Müller, The AT&T Internet Difference Engine: Tracking and
J. Mylopoulos, S. Paul, A. Prakash, M. Stanley, Viewing Changes on the Web. World Wide Web,
S. Tilley, J. Troster, and K. Wong. Investigating 1(1):27–44, 1998.
Reverse Engineering Technologies for the CAS Pro-
gram Understanding Project. IBM Systems Journal, [15] L. Feijs and R. D. Jong. 3D visualization of soft-
33(3):477–500, 1994. ware architectures. Communications of the ACM,
41(12):73–78, Dec. 1998.
[4] P. P. Chen. The Entity-Relationship Model – Toward a
Unified View of Data. ACM Transactions on Database [16] J. Grass. Cdiff: A Syntex Directed Differencer for
Systems, 1(1):9–36, Mar. 1976. C++ Programs. In Proceedings of the Usenix C++
Conference, Aug. 1992.
[5] Y.-F. Chen. Reverse engineering. In B. Krishna-
murthy, editor, Practical Reusable UNIX Software, [17] D. R. Hanson and J. L. Korn. A Simple and Exten-
chapter 6, pages 177–208. John Wiley & Sons, New sible Graphical Debugger. In Winter 1997 USENIX
York, 1995. Conference, pages 173–184, Jan. 1997.
[6] Y.-F. Chen, G. S. Fowler, E. Koutsofios, and R. S. Wal- [18] D. Hutchens and R. Basili. System Structure Analysis:
lach. Ciao: A Graphical Navigator for Software and Clustering with Data Bindings. IEEE Transactions on
Document Repositories. In International Conference Software Engineering, 11:749–757, Aug. 1995.
on Software Maintenance, pages 66–75, 1995.
[19] D. Jackson and A. Waingold. Lightweight Extraction
[7] Y.-F. Chen, E. Gansner, and E. Koutsofios. A C++ of Object Models from Bytecode. In Proc. 21st Intl.
Data Model Supporting Reachability Analysis and Conf. Software Engineering, May 1999.
11
[20] S. Mancoridis, B. S. Mitchell, C. Rorres, Y. Chen, and
E. Gansner. Using Automatic Clustering to Produce
High-Level System Organizations of Source Code. In
Sixth International Workshop on Program Compre-
hension, June 1998.
[21] H. Müller, M. Orgun, S. Tilley, and J. Uhl. A Reverse
Engineering Approach to Subsystem Structure Identi-
fication. Journal of Software Maintenance: Research
and Practice, 5:181–204, 1993.
[22] The SGML/XML Web Page. http://www.
oasis-open.org/cover/xml.html, 1999.
[23] S. Paul and A. Prakash. A Framework for Source Code
Search Using Program Patterns. IEEE Transactions on
Software Engineering, 20(3):463–475, June 1994.
[24] H. Rao, Y.-F. Chen, M.-F. Chen, and J. Cheng. iproxy:
An agent-based middleware. In Poster Proceedings of
the Eighth World Wide Web Conference, May 1999.
[25] D. Rayside, S. Kerr, and K. Kontogiannis. Change
and Adaptive Maintenance Detection in Java Software
Systems. In Proc. Fifth Working Conference on Re-
verse Engineering, Oct. 1998.
[26] The Java Reflection API. http://www.
javasoft.com/products, 1998.
[27] R. Schwanke. An Intelligent Tool For Re-Engineering
Software Modularity. In Proc. 13th Intl. Conf. Soft-
ware Engineering, May 1991.
[28] J. Seemann and J. W. von Gudenberg. Pattern-based
design recovery of java software. In Proc. Foundation
of Software Engineering, Nov. 1998.
[29] D. Sharon and R. Bell. Tools that Bind: Creating In-
tegrated Environments. IEEE Software, 12(2):76–85,
Mar. 1995.
[30] I. Thomas. PCTE Interfaces: Supporting Tools in
Software-Engineering Environments. IEEE Software,
6(6):15–23, Nov. 1989.
12