XML Framework For Language Neutral Representation
XML Framework For Language Neutral Representation
net/publication/4129315
Conference Paper in Proceedings of the Euromicro Conference on Software Maintenance and Reengineering, CSMR · April 2005
DOI: 10.1109/CSMR.2005.10 · Source: IEEE Xplore
CITATIONS READS
17 132
2 authors, including:
Kostas Kontogiannis
National Technical University of Athens
185 PUBLICATIONS 3,320 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Kostas Kontogiannis on 01 October 2014.
FactML
Higher Level Representations
External Tools
Source Code
4.3 PDGML and CGML markup languages, i.e. JavaML, CppML, CML,
PascalML and FortranML. Layer 1.2 are the AST
Similarly PDG and Call Graphs can be modeled in representations derived from the generic model of the
UML and corresponding XML DTDs can be generated language family, i.e. ProcML and OOML.
from them.
Layer 2 is the next level of abstraction in terms of
5. The Representation Framework the different intra-procedural and inter-procedural
graphs. This layer is also consists of two sub-layers.
In figure 7 we present the multi-layered framework Layer 2.1 represents the basic facts of a program in the
for language neutral representation of program FactML format. Layer 2.2 is the representations for
artifacts. We also demonstrate the usage of the intra-procedural and the inter-procedural dependence
framework for building generic program analysis tools. and flow graphs of the program expressed as CFGML,
The framework follows a pipe and filter type PDGML, SDGML and CGML.
architectural style. The pipe components are the
different layers of abstractions of the program source 5.2 Transformers
and the filter components are the representation A set of transformer tools is required to convert the
transformers and the analysis tools. representations from one level to the next higher level
5.1 Abstraction Layers of abstractions. Some of them are source code
transformers that are parsers of the source text in order
There are three distinct layers corresponding to to emit corresponding AST in the language specific
three different levels of abstractions of source code in XML format. There has to be one transformer for each
the framework. Layer 0 is the original source text of of the languages to be analyzed.
the program to be analyzed as it is.
The rest of transformers are XML to XML
Layer 1 is the first level of abstraction of the source transformers. These transformers can be built using
code in terms of the AST of the program. We choose XSLT stylesheets [25], XPath/XQuery [26] or DOM
to adopt the AST representations proposed by Zou and [28] manipulation. There will be once transformer for
Mamas to fit in this layer. Since these representations each of the following conversions
also include the generic representations for procedural
and object-oriented language family, they will provide JavaML, CppML to OOML
language neutral representations of the AST. This layer CML, PascalML, FortranML to ProcML
consists of two sub-layers. Layer 1.1 are the ASTs OOML, ProcML to FactML
representations in programming language specific FactML to CFGML, PDGML, CGML
5.3 Analysis Tools 6.2 Operational Statistics
Various program analysis tools can be written on In this section we evaluate the proposed framework
top of the proposed framework. Since these tools will in terms of the sizes and the time required to generate
work on language neutral representations of the the representations by the prototype toolset. Five input
program, it is possible to develop of a single tool to files of different sizes were used to measure the size
perform a particular type of analysis on a source and time parameters. These files were chosen from a
program written in any programming language. For variety of sources ranging from student course projects
example a generic data flow analysis tool can be to standard utility library. The prototype was
written to work on the CFGML or a single slicing tool developed using the Java programming language (JDK
can be written to use the PDGML to perform program 1.3) and all the experiments were run in a Sun
slicing on source code of any language. UltraSPARC III 440 MHz station with 512 MB of
All the representations in the proposed framework RAM and running Solaris 8 Operating System.
are XML and hence can be easily transformed to any Table 2 presents the size of the generated FactML
other formats using XSLT or XQuery in order to files and the time required to generate them by the fact
enable exporting of data to an external tool. If the extractor tool. The size of the FactML is approximately
external tool also uses an XML representation for its 5 times the source code. Table 3 summarizes the
data then it is straightforward to import the data using relationship between the size of a method and the size
the same techniques. However if the external tool does of its corresponding PDGML and the time taken to
not use XML representations, additional mapping tools produce it. Even though the general tendency of the
are needed to map the external formats to the internal size of the PDGML is to increase with the size of the
XML representations. method, it may not be the case always. When there is a
low number of def-use chaining in the program, the
6. A Prototype Implementation number of edges in the graph is low and it will result in
We have developed a prototype toolset based on the a smaller PDGML size. Finally Table 4 shows the
proposed framework. Our prototype works on the results of slicing based on the final uses of a given
JavaML-OOML representation of Mamas and Ret4J variable. The size of the slice compared to the size of
[29] toolkit to generate JavaML-OOML instances of the method shows the same property as the size of the
Java programs. Minor modification is done to Ret4J to PDG. The time required to slice a PDG is quiet
include a lineNumber attribute in the generated XML. reasonable and depends on the size of the source.