Code Generation Using XML Based Document Transformation: Soumen Sarkar Craig Cleaveland
Code Generation Using XML Based Document Transformation: Soumen Sarkar Craig Cleaveland
TRANSFORMATION
Soumen Sarkar
[email protected]
Craig Cleaveland
[email protected]
Published on
With recent interest in XML standards and availability of tools supporting these
standards, it has become possible to generate multiple types of documents by applying
XML document transformation technology. Using XML document transformation
technology, it has become easier to develop custom code/document generators in
application development projects.
An object based server side infrastructure was used in the ‘sample’ project referred to in
this whitepaper. Object and relational model code generation coupled with object
services provided by the EJB framework created a very powerful paradigm of server side
infrastructure development. The project had a tremendous lead by being able to build
further on this sophisticated server side infrastructure rather than spending time on
building the infrastructure itself. The project was totally focused on building application
logic and delivering functionality. Out of approximately 2300 java files in the project,
1900 files were generated using XML document transformation technology.
Introduction
This paper is mostly about how code generation aids speedier application software
development. However, this paper will also highlight the fact that source code generation
processing is a particular application of the broad technology of XML based document
transformation.
While this paper will demonstrate that an XML based generation approach is easy
enough for it to be considered in many projects, it is not a general approach. There are
some limitations. The conclusion addresses this and puts the applicability of the current
XML transformation approach into proper perspective.
INPUT
DOCS
DOCUMENT
GENERATION
TRANSFORM
OUTPUT PROCESSING SELECTION
DOCS
PHASES
WRITER
OUTPUT LANGUAGE
SYNTAX TREE
The above specification is intuitive enough to convey the fact that DayOfWeek is an
enumerated type with seven distinct values. This also shows that the process of defining a
domain specific language in XML consists of defining the markup elements and element
attributes. In the case where domain language elements clash with preexisting XML
elements, the XML namespace facility should be used to distinguish domain language
elements. The set of XML namespace, elements and tag attributes define the XML based
domain language.
Once the input language document is defined in XML, XSLT scripts can be written to
process documents conforming to the input language and generate output documents in
various forms. As previously explained, a document could be a source code file or it
could be an electronic document in various forms. XSLT is an XML based language
standardized by W3C. Processors supporting the XSLT language standard are used to
translate XML documents into other XML or text documents. There are a number of free
XSLT processors available; this study used the SAXON XSLT processor developed by
Michael Kay.
XSLT is used for document transformation; for example, it can be used in a multiple line
publishing scenario including HTML for web clients, Wireless Markup Language
(WML) for wireless clients, and Portable Document Format (PDF) for print
documentation. The concern of one document content yet multiple presentation formats is
neatly addressed by using XSLT transformation in the web based information
architecture. However, in our opinion, the preceding use of XSLT has received the most
attention in web document publishing scenarios and not for its utility of custom
code/document generation in application software development projects. This paper
emphasizes the fact that XSLT offers an easy approach to code generation.
Parser: XML document parsing is implemented in the XSLT processor, whereas with a
custom parser, one needs to implement the parser or understand how to use a program
generator such as lex/yacc to generate the parsing framework.
Tree: The XSLT processor constructs the tree and provides access to the tree as per the
XPATH specification. The user does not have to bother constructing the tree at all. With
the custom parser, the user needs to populate the tree as the parsing progresses.
Tree processing: The XSLT processor provides access to the tree through the XPATH
expression and provides many programmatic constructs and functions to perform tree
processing. In other words, XSLT users write XSLT scripts to perform operations on the
tree. On the other hand, this part needs to be programmatically implemented by code
generator writers following other approaches. XSLT programming for code generation
programming is at a high level, namely, at the level of tree abstraction. Note that with
XSLT, eventhough the code generation programming is at the tree level abstraction, the
programmer never needs to worry about tree data structure implementation details.
To write a code generator, the software developer only writes XSLT scripts using XSLT
and XPATH facilities. Another important benefit of XSLT based code generation is the
amount of flexibility that is allowed in the change of input document grammar.More
specifically, additional elements and attributes can be introduced in the input language
specification without affecting code generation scripts. Contrast this to a custom parser
driven approach, where there will be major change propagation throughout the code
generation system. These are complexities in the application software development of
custom code generation not present in the use of XSLT.
Figure 2 shows the process of document generation using XSLT. Typically, XSLT
processors produce one output file. An external utility could break the single output file
into multiple files. For example, in the case of source code, the output file can delimit the
beginning and the end of each file with markers not likely to occur as part of the source
code. Furthermore, the location of each file could also be indicated as part of the
generation. The external utility then processes the single output file to produce multiple
files in multiple locations. Similarly, input to the XSLT processor should preferably be
one XML file including other XML files. There are techniques to achieve XML include
and W3C is working on a standardized XML include mechanism.
T
R
X I O
E X
M N U
E M
L P T
U P L
T
P T R U
A T W
A R
R T
N I
S R T
S T OUTPUT
E E R
INPUT XML F E FILES
R E O E
FILES R
S R E
M
XSLT
SCRIPTS
• Ability 2: The ability to visualize the tree that will result from the input XML
documents containing code generation specifications.
• Ability 3: The ability to use XSLT and XPATH facilities to browse through
the XML tree and generate output based on the tree content.
It is beyond the scope of this paper to explain XSLT and XPATH. There are a number of
well-written books and internet resources available on these subjects. Installation and
usage details of the XSLT processor are very easy. This section demonstrates the
concepts covered so far in two ways, namely:
• Through a very simple code generation example. The example will illustrate
the need for the three required abilities as defined above.
Let us demonstrate the Java enumeration utility class code generation. We first need to
decide how we would like to specify the code generation (ability 1). We decided that a
particular enumeration specification will look like the following:
The specification has the following statements, not expressible formally with XML
syntax:
With this specification in mind, we start defining XML documents containing Java
enumeration code generation specifications. We have two documents shown below. As
the names hint, AuthoredEnum.xml was authored by project software developers,
whereas GeneratedEnums.xml was generated by a utility program from some data
source.
AuthoredEnums.xml GeneratedEnums.xml
<Enumerations> <Enumerations>
<EnumDef name = ‘AccessEnum’> <EnumDef name = “ActionEnum”>
<choices> <choices>
<choice name = ‘read_only’ value = <choice name = “start” value =
‘1’/> “1”/>
<choice name = ‘read_write’ value = <choice name = “stop” value = “2”/>
‘2’/> <choice name = “test” value = “3”/>
</choices> </choices>
</EnumDef> </EnumDef>
<EnumDef name = ‘SeverityEnum’> <EnumDef name =
<choices> “AccessControlEnum”>
<choice name = ‘critical’ value = <choices>
‘1’/> <choice name = “permit” value =
<choice name = ‘major’ value = “1”/>
‘2’/> <choice name = “deny” value = “2”/>
<choice name = ‘minor’ value = </choices>
‘3’/> </EnumDef>
<choice name = ‘warning’ value = </Enumerations>
‘4’/>
</choices>
</EnumDef>
</Enumerations>
One technique of XML inclusion (using an external parseable entity reference) is used to
present only one XML input file at XSLT processing time. The XML file in effect has all
the enumeration definitions. The XML file is shown below:
AllEnums.xml
<?xml version = “1.0”?>
<!DOCTYPE AllEnums[
<!ENTITY include_authored_enums SYSTEM “AuthoredEnums.xml”>
<!ENTITY include_generated_enums SYSTEM “GeneratedEnums.xml”>
]>
<AllEnums>
&include_authored_enums;
&include_generated_enums;
</AllEnums>
Figure 4: Technique for XML include
The second step is to visualize the internal tree structure that will be created inside the
XSLT processor. The tree structure is depicted in two parts. The first part shows the
overall tree structure and the second part shows the tree structure rooted at the first
<EnumDef> element.
<AllEnums>
<Enumerations> <Enumerations>
NOT
<EnumDef name=’ActionEnum’> <EnumDef name=’AccessControlEnum’> SHOWN
<EnumDef name=’ActionEnum’>
<choices>
For the sake of accuracy, it should be emphasized here that the actual tree structure is
much more detailed than what is depicted here. For example, attributes are nodes; text
In Figure 7, the XSLT script and the output it produces are shown side by side. The
output is shortened for brevity.
Enum_codegen.xslt(version 1) gen_enum.snp
<?xml version=’1.0’?> //@@@BEGIN_FILE ActionEnum.java
<xsl:stylesheet version=’1.0’ //@@@LOCATION common.gencode.enums
xmlns:xsl=’http://www.w3.org/1999/XSL/Transform’> //*************************************
<xsl:output omit-xml-declaration=’yes’/> //*********** Generated code. ************
//*************************************
<xsl:template match = ‘/’> public class ActionEnum
<xsl:for-each select=’//EnumDef’’> {
//@@@BEGIN_FILE <xsl:value-of select=’@name’/>.java public static String getEnumValueAsString(int enumValue)
//@@@LOCATION common.gencode.enums {
//************************************* }
//*********** Generated code. ************ }
//************************************* //@@@END_FILE ActionEnum.java
public class <xsl:value-of select=’@name’/> …….
{ Repetition of above pattern for SeverityEnum.java,
public static String getEnumValueAsString(int AccessEnum.java, AccessControlEnum.java
enumValue) …….
{
}
}
//@@@END_FILE AccessEnum.java
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
1. The first four lines are declarations, which are not very relevant from a code
generation point of view.
2. The line <xsl:template match = ‘/’> instructs the XSLT processor to find the
root node in the tree and apply the rules contained in the template body. The body
works as follows:
b. For each execution of the xsl:for-each loop, all non xsl text is copied
to the output. That means the first line within the xsl:for-each loop
copies //@@@BEGIN_FILE to the output. The XSLT processor then
processes the instruction <xsl:value-of select=’@name’/>.java
which outputs the value of the name attribute of the current <Enumdef>
child followed by the .java extension. In a similar manner, you can analyze
what happens for the other lines within the xsl:for-each loop.
This step illustrates why it is important to visualize the input tree. At every
instruction, we are using the XSLT or XPATH facility to navigate the input tree
and select content from it to mix with our java bits and pieces to produce the
output java files. Without a clear idea of the input tree, it would not be possible to
use XSLT effectively to generate the desired code.
Hopefully, this explains the code generation development process with XML/XSLT. We
are now in a position to show the XSLT script fragment, which completes
getEnumValueAsString(). Please refer to Figure 6.
XSLT code fragment Corresponding generated code fragment
Public static String getEnumValueAsString(int public static String getEnumValueAsString(int
enumValue) enumValue)
{ {
switch(enumValue) switch(enumValue)
{<xsl:for-each select=’choices/choice’> {
case <xsl:value-of select=’@value’/>: case 1:
return “<xsl:value-of select=’@name’/>; return “start”;
</xsl:for-each> case 2:
return “stop”;
default: case 3:
return null; return “test”;
}
} default:
return null;
}
}
Once the SNMP MIB was available in XML form, XSLT processing was applied to
generate object-oriented Java APIs on a very high level (i.e far removed from the
drudgery of programming according to low level SNMP APIs). The high level SNMP
APIs were flawlessly used by all project software developers. This approach could be
compared with CORBA. Before CORBA, distributed application programming used to
be done by expert TCP/IP programmers. With the advent of CORBA, TCP/IP based code
was generated from the contract language. Thus, distributed application programming
nowadays no longer requires TCP/IP experts. With SNMP code generation in our project,
device control code no longer required SNMP experts.
There is another benefit of generating high-level APIs for network access. Two flavors of
implementation were generated. One flavor provides network access by way of SNMP.
Another flavor simulates network access by storing/retrieving data from local files.
Application code using the high level API remains unaffected when one implementation
is switched with another. Code generation for file based SNMP simulation allowed the
project to proceed without waiting for the actual device to be ready. This is a tremendous
advantage since network management development can proceed in parallel and can be
tested with a very large simulated network.
An object based server side infrastructure was used in this project. Application servers
complying to the Enterprise Java Bean (EJB) standard provide concurrency, transaction,
security, persistence, and naming services for the objects. A server side system
development consists of implementing the information model by using Enterprise Java
Beans and providing interfaces for remote access to the information which satisfy
graphical user interface use cases. The project specified the information model along with
a number of relationships in XML. A fictitious object showing object and relationship
model capability is shown below. The model specifies the following:
• The Employee object has attributes like salary, job-title, join-date. It will
inherit other attributes from the Person object.
• The Employee object cannot exist without a containing Division object.
• If one has access to the Employee object, one can get access to all its
subordinates, which are objects of class Employee.
• If one has access to the Employee object, one can get access to the supervisor,
which is an object of class Employee.
<managed-object class-name = “Employee”>
<base-object class-name = “Person”/>
<containing-object class-name = “Division”/>
< one-to-many-relation role-name = “my-subordinates” class-name =
“Employee”/>
<one-to-one-relation role-name = “my-boss” class-name =
“Employee”/>
<attribute name = “salary” type = “float”/>
<attribute name = “job-title” type = “string”/>
<attribute name = “join-date” type = “java.util.Date”/>
</managed-object-class>
• The input language is XML. XML syntax may be abhorrent to some. XML
should be considered an underlying data representation language. Tools such
as editors, both textual and visual, provide high level browsing and editing
capabilities. These tools should be used while working with XML. If XML is
generated from some other source (like ASN.1 mib), then this concern does
not arise.
• XSLT is a purely functional language, which means there are no side effects
allowed. For example, to implement a for-loop of fixed count one needs to
implement a tail recursive template with a suitable termination condition.
There are many more ‘habit adjustments’ application programmers need to
make before becoming comfortable with XSLT programming. Moreover,
XSLT has a ‘logic programming flavor’, a paradigm with which some
application developers may not be familiar. XSLT syntax is quite verbose
which is bothersome to say the least. XML escape mechanisms needs to be
used for generating symbols like ‘<’ (less than), ‘>’ (greater than) and others
which are heavily used in source code. Some XSLT workbenches are
available to mitigate this inconvenience.
Conclusion
This paper showed how XML based document transformation technology can benefit
application software development projects. Here are some software metrics to show you
how our project benefited:
• Out of approximately 2300 java files in the project, 1900 files were generated.
• SNMP coding was extremely easy. For example it took only one line of Java
code to display the RFC1213 ipRouteTable as shown below
System.out.println(new IpRouteTableUtils().getIpRouteTable(deviceInfo));
References
1. Program Generators with XML and Java by J. CRAIG CLEAVELAND.
2. XSLT 2nd Edition, Programmer’s Reference by Michael Kay.
3. SAXON XSLT Processor, http://saxon.sourceforge.net/
4. GSLgen, http://www.imatix.com/html/gslgen/index.htm
5. Fxt, generator of XML document transformers,
http://www.informatik.uni-trier.de/~aberlea/Fxt/
6. Schematron: A XML Structure Validation Language,
http://www.ascc.net/xml/resource/schematron/schematron.html
7. XML in 10 points, http://www.w3.org/XML/1999/XML-in-10-points
8. Extensible Markup Language (1.0), http://www.w3.org/TR/REC-xml
9. XSL Transformation (XSLT) Version 1.0, http://www.w3.org/TR/xslt
10. XML Path Language (XPATH) Version 1.0, http://www.w3.org/TR/xpath