Andevcon2011 Stephen Williams: Javaglue.00 Agenda
Andevcon2011 Stephen Williams: Javaglue.00 Agenda
JavaGlue.00 Agenda
(updated just now by StephenWilliams)
a. Introduction
1. Improvements JavaGlue.02 Improvements
2. A JavaGlue.03 JNI Primer: Java with C and C++
a. Calling C methods
b. Passing Data
i. Scalars
ii. Strings
iii. Byte arrays
c. JNI References
i. Local References
ii. Global References
d. C to Java
i. Allocating scalar arrays
ii. Allocating strings
iii. Calling Java methods
3. JavaGlue.04 Alternatives
a. Hand-coded JNI
i. JavaGlue.04.1 JNI Diagrams
b. SWIG
c. JNA
4. JavaGlue.05 Use
a. Capabilities
b. Limitations
c. A Simple JavaGlue Example: JavaGlue.05.1 Example 1
d. JavaGlue.05.2 JavaGlue Diagrams
e. JavaGlue.07 Memory Management
f. JavaGlue.08 Utility Methods
g. JavaGlue.06 JavaGlue Build System
5. How does JavaGlue work? JavaGlue.10 Internals
6. JavaGlue.11 Adanced JNI
7. JavaGlue.12 CMake
a. Main Characteristics
b. Simple Examples
1 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
JavaGlue AnDevCon2011
JavaGlue.01 About
(updated 12 hours ago by StephenWilliams)
About
Authors: Stephen Williams ([email protected]), with help from Kevin Campbell ([email protected])
and excerpts from the Ogre4j project page (link 1 below).
Introduction
JavaGlue is a fork of XBiG. Why:
Specifically, XBiG is designed to generate Java code and JNI bindings that allow almost any native (i.e.
C or C++) library to be used from Java. XBig was initially used to create Java OGRE
(http://www.ogre3d.org/) bindings as the Ogre4j project (http://ogre4j.sourceforge.net/).
JavaGlue changes are minor compared to the work that obviously went into creating NoodleGlue and
XBiG. It is assumed that JavaGlue and XBiG will merge eventually.
Licensing
The code generation tool is GPL. The linkable libraries from XBiG are LGPL. JavaGlue additions are
Apache 2.0 where this doesn't conflict with XBiG licensing. Generated code, as is generally the case
2 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
with code generators and compilers, is owned by the owner of the input.
Download
Google Code shortly. Snag it here for now: http://sdw.st/conf/AnDevCon2011/javaglue-1.0.zip
The Name
Our hope is that the name JavaGlue will be as discoverable, descriptive and generic as the tool itself
should be. It is likely that XBiG and JavaGlue will merge. May the most useful name win.
Related Links
1. http://sourceforge.net/apps/mediawiki/ogre4j/index.php?title=White_Paper
2. http://ogre4j.sourceforge.net/
http://www.itk.org/ITK/resources/CableSwig.html
AnDevCon2011 JavaGlue
JavaGlue.02 Improvements
(updated 15 hours ago by StephenWilliams)
Improvements of XBiG
We have enhanced XBiG significantly to meet our needs, adding the following functionality:
Support for passing null pointers as arguments to and from C/C++ functions
Efficient and easy byte array movement between Java & C++
Better handling of input include file hierarchies
Improved build system
A number of bugs fixed
Finding and working around details for using with Android
3 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
Design Goals
Minimize or eliminate the need for the native code to be "JavaGlue-aware". Ideally, an existing
native library could be wrapped naturally in Java without requiring any changes to the native code.
In practice this may not completely be the case, but the required changes are fairly small and
non-invasive.
Support "callback interfaces", where native code calls back to methods on objects implemented in
Java
Ensure that applications which use code generated by JavaGlue are not bound by licensing
restrictions
AnDevCon2011 JavaGlue
JNI Diagrams
1. Calling C methods
2. Passing Data
a. Scalars
b. Strings
c. Byte arrays
3. JNI References
a. Local References
b. Global References
4. C to Java
a. Allocating scalar arrays
b. Allocating strings
c. Calling Java methods
AnDevCon2011 JavaGlue
JavaGlue.04 Alternatives
(updated 15 hours ago by StephenWilliams)
JavaGlue Alternatives
JavaGlue alternatives either involve writing and maintaining a lot of metadata, fragile and verbose hand
coding, or libraries that have a lot of run time inefficiencies. No other freely available library is available
that uses only C++ header files as input and generates all Java and C/C++ glue code needed to
4 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
immediately write Java code that can pretty much universally use C++ objects.
Hand-coded JNI
Hand-coded JNI is time consuming, verbose, not typesafe at all, error prone, and hard to maintain. And
it only directly supports Java->C and C->Java calls. Handling C++ code requires you to create functions
with unmangled C linkage that would need to take pointers as integer parameters, to cast properly, and
then to make the C++ method calls desired. All scalar types need to be mapped in each direction with
JNI methods to be called to convert strings, etc. Additionally, the linkage both ways is interpreted at
runtime so typos and out of date interfaces are not detected until a method call is attempted. How many
people have full code coverage built into their projects?
For a few C methods, or very limited linkage to C++, this is doable. There are several steps, but it isn't
too difficult. However, none of the code involved does anything useful and debugging can be
timeconsuming.
http://download.oracle.com/javase/1.5.0/docs/guide/jni/
http://download.oracle.com/javase/1.5.0/docs/guide/jni/spec/jniTOC.html
http://java.sun.com/docs/books/jni/
http://java.sun.com/developer/onlineTraining/Programming/JDCBook/jni.html
http://www.swig.org/
http://jna.java.net/
One big point for every project that implements wrapper(s) for a library in different
programming languages is the effort to maintain the wrapper code. The target library has
5 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
their own release cycles and most major releases introduce API breaking changes. Most
of the projects deal with this issue by using code generators which create the necessary
bindings automatically. We evaluated the application of several code generators such as
SWIG (Simplified Wrapper and Interface Generator) and NoddleGlue but none of the
tested tools met our requirements. SWIG needs very much effort beforehand because
every interface that should be wrapped needs a interface description file. Both tools miss
full support of C/C++ templates which are used quite often in OGRE. For this and other
reasons we decided in autumn 2005 to implement our own generator based on the same
technologies as NoodleGlue. Since autumn 2006 the JNI code generator project is forked
from ogre4j under the name XBiG (XSLT Bindings Generator) and got its own project
space on Sourceforge.net.
NoodleGlue is the wrapper generation tool of "noodle heaven" and uses doxygen to
extract the API information from the library's source code. This approach had the
advantage that parsing and analyzing is done by a tool that is widely-used and tested with
different input languages. So the first step to our generator was already available for free.
Besides the usual outputs like a HTML documentation Doxygen provides a XML output
of the analysed source code. This output is specialized for the Doxygen task to generate
documentation, contains a lot of information that isn't necessary to generate wrapper code
and is structured in a flat (E.g. name spaces are not nested as child XML elements.)
hierarchy. For these reasons and to have the possibility to replace Doxygen with another
tool, we decided to implement a meta layer that is represented in XML too.
To convert the Doxygen output to our meta layer we're using XSLT (Extensible
Stylesheet Language Transformations) which is designed to describe conversions or
transformations of XML code with XML code. One big advantage of XSLT is that it is an
interpreted language and therefore OS (Operation System) independent. The generation
of the meta layer and the layer itself should be independent from any OS or platform to
make it possible to generate bindings for "every" language on every platform. To have a
consistent tool chain the generation of the wrapper code is done with XSLT too. This
reduces the usage of different tools and technologies to one major aspect: XML/XSLT.
As mentioned before, Doxygen could be replaced with another tool that is capable of
parsing source code and generating a XML representation of the parsed input.
6 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
7 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
AnDevCon2011 JavaGlue
JavaGlue.05 Use
(updated 14 hours ago by StephenWilliams)
Capabilities
JavaGlue provides access to just about everything in C/C++ that you would reasonably want access to
from Java. Setters and getters are created for access to public data. Public constructors, destructors,
methods, and types are all available. Globals, class static, and object members are available in a fairly
clean way. Both factory methods and Java-side 'new' of objects is supported, along with pass and return
by value. Enums, template types (possibly requiring typedef), std:string, Vector<byte>, and unsigned
char*[] are all supported. Direct support for handling pointers, including null pointers, and passing by
reference, are all handled in a straightforward and very C++-like way. Name spaces and class hierarchies
are handled by direct mapping to Java package name space. Even C++ multiple inheritance is mapped to
Java in a usable way. Type mapping can be tuned as needed. C++ items in headers can be ignored with a
couple levels of granularity through a config file.
The net result is that through no creation of metadata or programming, you can point the build system at
a hierarchical directory of C and C++ headers, build, and write very C++-like Java code that directly
uses C++ code. And because the C++ code has been mirrored into generated Java code, Eclipse will
8 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
1. Global methods & members - These show up as static methods in a class called GlobalUtility
which is created as necessary at every package level.
2. Variables - Public variables get getters and setters automatically created.
3. Classes - Public classes are fully wrapped and proxied into Java classes, usually with a Java class
with the same name and an interface with 'I' prepended.
4. Methods - All public methods are available. Those returning objects by value have a slightly
modified form in Java: They return void and have an extra first parameter which must be an already
constructed object.
5. Constructors & Destructors - These are proxied normally into Java.
6. Template instances - These are supported, however parameterized templates often must be
typedef'd to be usable. Methods with an untypedefed complex template type as a paramter or return
value will simply be ignored and won't exist in Java.
7. Typedefs - C++ treats typedefs as equivalent to the original type, while JavaGlue wraps them into
Java classes and interfaces of the same name.
8. Enums - Completely usable, including created mapping methods. Java use of C++ enum values
looks different than C++ use of enum, but the semantics are mapped well.
Limitations
1. Templates have to have a concrete instance
2. Parameterized templates often need to be typedef'd. Since types in any form are equivalent in C++,
this is easy.
3. Template or other code instantiation must obviously be triggered in C++. When writing code in
Java, it is too late. In many cases JavaGlue will generate code that will make it happen. Making use
of something in Java is often just a matter of adding a typedef.
4. JavaGlue will sometimes create code to access members or base classes that are not public, causing
compilation errors in the generated C++ code. This can be avoided by adding ignore statements to
ignore_list.xml or hiding code from the JavaGlue analysis.
Temporary Limitations
1. No '_' in enum type names. (Name mangling requires that '_' -> '_1'. This happens elsewhere and
needs to be fixed for enum types.)
2. The generated C++ code may have trouble finding include files in some cases. Paths weren't
preserved in the Doxygen output. JavaGlue is improved here as it tries to regenerate the original
paths, but the method isn't foolproof.
3. Can't change the name of shared libraries or generated paths. They are xbig and org.xbig currently.
Will change to org.javaglue, and be modifiable soon.
4. The original XBiG library code must end up in a separate shared library from generated application
code so that the LGPL relink requirement can be met easily. This will be fixed shortly. Please
consider the current code development only until then.
5. C++ wstring handling code is currently missing due to now-obsolete Android STL issues. Wstring
9 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
JavaGlue.05.1 Example 1
(updated 14 hours ago by StephenWilliams)
test.h:
10 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
BasicTests.java:
12 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
01. /**
02.
*
03.
*/
04. package
test;
05. import
org.xbig.base.*;
06. import
org.xbig.std.*;
07.
08. import
org.xbig.*;
09.
10. //
import
org.junit.Assert;
11. //
import
org.junit.Test;
12. //
import
junit.framework.TestCase;
13.
14. /**
15.
*
@author
swilliams
16.
*
17.
*/
18. public
class
BasicTests
{
19.
public
BasicTests()
{
20.
}
21.
public
static
void
main(String
[]
args)
{
22.
BasicTests
tst
=
new
BasicTests();
23.
tst.test();
24.
}
25.
//@Test
26.
public
void
test()
{
27.
setUp();
28.
Itest
t
=
new
test();
29.
org.xbig.base.InstancePointer
ipn
=
new
org.xbig.base.InstancePointer(0);
30.
org.xbig.base.BytePointer
bpn
=
new
org.xbig.base.BytePointer(ipn);
31.
message("
BytePointer
bp
=
t.doWhatever(bpn);");
32.
BytePointer
bp
=
t.doWhatever(bpn);
33.
message("
if
(bp.longValue()
==
0L)
message(\"Got
a
Null!\");");
34.
if
(bp.object.pointer
==
0L)
message("Got
a
Null!");
35.
message("
t.setString(\"Hi\");");
36.
t.setString("Hi");
37.
message("
message(t.getString());");
38.
message(t.getString());
39.
message("
t.setString(bpn);");
40.
t.setStringP(t.mkStringP("wow!"));
41.
message("
bp
=
t.getCString();");
42.
bp
=
t.getCString();
43.
message("
if
(bp.object.pointer()
==
0L)
message(\"Got
a
Null!\");");
44.
if
(bp.object.pointer
==
0L)
message("Got
a
Null!");
45.
message("doWhatever(null)
isNull:");
46.
t.doWhatever(bpn);
47.
message("isNull:"+t.isNull());
48.
t.doWhatever(t.dupcString("Hi
again!"));
49.
message("isNull:"+t.isNull());
50.
51.
message("isTestNull:"+t.isTestNull(t));
52.
message("isTestNull:"+t.isTestNull(null));
53.
54.
Itest
tn
=
t.getTest();
55.
message("getTest:"+(tn==null)+"(test)tn.object.pointer:"+((test)tn).object.pointer);
56.
message("tn
==
null:
"+(tn
==
null));
57.
tn
=
t.getTestNull();
58.
message("tn
==
null:
"+(tn
==
null));
59.
if
(tn
!=
null)
60.
message("isTestNull:"+(tn==null)+"(test)tn.object.pointer:"+
((test)tn).object.pointer);
61.
62.
message("new
ByteVector");
63.
ByteVector
bv
=
new
ByteVector();
13 of 36 64.
message("mkByteVector()");
3/9/11 4:20 PM
65.
IByteVector
ibv
=
test.mkByteVector();
66.
ibv.reserve(20);
67.
ibv.push_back((byte)'h');
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
To accomplish this, and to make the use of JavaGlue efficiently, we chose to use CMake as a cross-
platform build system. We also wrote generic Java code that used much of the C++ layer so that this
14 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
could all be debugged on the host before running on Dalvik/Android. After solving many problems, and
recently porting to NDK5, we have used this extensively. We do not use the NDK build system, only the
cross-compilation tools, libraries, and headers. Getting this right required a thorough analysis of
arguments for compiling and linking shared libraries for Android on multiple architectures, MacOSX,
and Linux. Windows building is partially solved (we build the C++ code using CMake-generated Visual
Studio projects), but we don't currently build the Java/JavaGlue portion of the system there as there is
no interest.
The way this was accomplished was by running the CMake build system generation step a second time
when necessary. A driver Makefile is used to run a first pass. If JavaGlue code generation is required, a
flag file is removed, causing the rest of the first built to short-circuit. The Makefile reacts to the missing
flag file by running a second CMake generate pass which picks up the file changes through standard
CMake globbing, and then a make with the same parameters is run again. Generally, we maintain the
ability to run a local host and Android build without a clean. A 'make Clean' wipe is needed after certain
types of changes.
JavaGlue.12 CMake describes CMake in a little more depth. The example build system uses
cpp-project-template as a base build environment, with CMake used in Makefile mode and our driver.
Scripts that we use for installing needed apt packages in Ubuntu or Macports packages on MacOSX are
included. Note that we install all development Macports packages with +universal so that we can
produce both 32-bit and 64-bit libraries.
The main JavaGlue / XBiG system is in tools/xbig. Any Java binding related code, Java or C++, goes in
bindings/java, as does the main JavaGlue CMakeLists.txt script. CMake has very good support for
out-of-source builds, so we always build in build/. Be careful not to run 'cmake' outside of build/ as the
cache is sticky and stubborn. There is a script to cleanup mistakes. Currently, we use subdirectories of
build for host vs. Android, etc., but the current example project does not. This will likely change in the
next release.
AnDevCon2011 JavaGlue
15 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
Memory Management
A JavaGlue generated Java proxy class contains:
A "JavaGlue object" consists of an instance of the C++ class and the corresponding Java proxy class
instance that holds a pointer. The JavaGlue object can be created in three ways:
When a JavaGlue object is created by allocation in Java, a flag is set in the Java object so that the object
can be deleted (by calling "delete()"). If not deleted explicitly (which is recommended) then the finalizer
on the Java class will delete the object. Note that finalizers may not run predictably or be guaranteed to
run in a given JVM.
If an object is returned from a method by pointer, JavaGlue records the fact that C++ code "owns" the
allocation of that object and will refuse to call the destructor on that object by throwing an exception.
This is similar to C++ allocation / deallocation rules in a number of environments. There is a Delete
utility class that contains "factory destructors" for some cases that don't automatically end up with
accessible destructors, such as byte arrays or vectors.
In C++, objects are passed to methods in one of three ways, and as return values in one of two ways:
An object returned by value is a copy of the object that the method returned, which typically no longer
exists. In this case, there is a potential quandary. Because Java passes everything but scalar constants as
references, there would be no obvious difference between an object returned by value vs. a pointer to an
object. To avoid all confusion and better match the actual semantics involved, the XBiG authors
implemented the return by value case a return of void with an extra first parameter that is an "out"
variable. This requires the caller to first construct an object matching the return type needed, then pass
this as the first parameter. The reference to this is passed to the generated C method which invokes the
copy constructor to the Java-allocated object. This creates the requirement that the object have a usable-
from-Java constructor (not the case for a naked parameterized template type), and that the Java
application code delete the object later.
16 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
JavaGlue creates a normal Java class that acts as an "interface" class ("I" followed by the class name)
and, usually, a Java class which is a subclass of the interface class. The interface class holds references
to static class methods. There is also a global interface class for public functions that are not class
members. In cases where there is no constructor available to Java, only the "interface" ("I" class) is
generated. While Java cannot create an instance of these classes, a reference (holding a pointer) can be
returned from a C++ method and later passed as a parameter.
AnDevCon2011 JavaGlue
XBiG already included good string conversion methods. JavaGlue adds byte array / byte vector
copy/allocate methods for a number of useful cases, plus memset.
17 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
This provides some frequently needed template instantiations, plus an accessible way to delete allocated
objects.
The xbig_* methods are used in C/C++->Java code. It turns out to be difficult to successfully look up
Java methods from C without using these methods. The default JVM reference doesn't have a complete
class loader so all lookups fail.
jni_base.h:
basedelete.h:
AnDevCon2011 JavaGlue
JavaGlue.10 Internals
(updated 12 hours ago by StephenWilliams)
19 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
JavaGlue (and XBiG) uses Doxygen to parse C++ header files, producing an XML description of all
types, methods, members, and variables. This is processed in a series of stages by an Ant-driven XSL
engine. After producing an intermediate mapping, a Java generation and a C++ generation pass are
made. At this point, the original C/C++ code and the generated C++ code can be compiled into a shared
library. The generated java code can be added to an Eclipse project, or just compiled into a JAR file
which can be referenced by an Eclipse project. Once the Java code compiles, the run settings must run
the application from a directory and environment where the shared library will be found. For an Android
project, this means having the shared library under the correct libs/ARCH/ directory and the JAR file (if
that route is taken) is in the lib/ directory.
test4.java:
20 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
class_org_xbig_test4.cpp:
22 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
AnDevCon2011 JavaGlue
JavaGlue.12 CMake
(updated 12 hours ago by StephenWilliams)
What is CMake?
From the manual:
CMake is a cross-platform build system generator. Projects specify their build process
with platform-independent CMake listfiles included in each directory of a source tree
with the name CMakeLists.txt. Users build a project by using CMake to generate a build
system for a native tool on their platform.
CMake can be very confusing at first, especially if you only read the official manual and only have very
simple projects as reference. Here are a few words that may greatly ease the learning curve:
CMake is a meta-make system. This means that CMake doesn't build anything but build scripts for
actual build systems. CMake can generate Makefiles, XCode projects, Visual Studio projects, and
Eclipse projects, at least. The files generated may seem a bit different than you may expect. Mostly this
is good because some nice automation and other capabilities are provided. CMakeList.txt scripts
reference other CMakeList.txt scripts in subdirectories plus they can include files that may be in a
project (typically named *.cmake). CMake relies on an installed directory of modules and other scripts
that know how to find various libraries, subsystems, executables, etc. These are invoked by requesting
access to (i.e. variables to be set) standard modules, like Java.
Typical operations in CMake scripts involve finding system capabilities, setting variables, globbing for
source code or data files (into variables), defining source directories & files, defining libraries, and
defining executables. Dependencies can be explicitly created while many are inferred automatically.
Custom commands can be defined. Most are used at build time, but there is some limited ability to do
operations at meta-make time. Definition of most operations is at a very high, logical level. Only when
defining a new platform or doing something beyond compile, create library, link executables do you
need to work with native tool definitions, arguments, or anything platform specific.
Many CMake variables have values at meta-make time based on where they are referenced in a
CMakeList.txt file. The key examples of this are variables for the current source and current "binary"
(i.e. build target, shadow build) directories. Generally, a particular CMakeList.txt can only set variables
24 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
for those scripts below it, so frequently things needed by one subdirectory from another are set at a
higher level. In essence, CMake at meta-make time is a functional language with a lot of built-in
functionality, controled by conditionalized scripting to solve all platform-specific issues.
The resulting makefiles have absolute paths everywhere, targets that flow from the top directory down to
where they need to be built, and have a default output that is very clean in the absence of errors. A 'make
VERBOSE=1' gives full detail of steps being taken. For Make, you can still do parallel makes with -j4
(J=4 to the driver build/Makefile in the JavaGlue example project). DEBUG is the default, use
RELEASE=1 for release builds.
http://www.cmake.org/cmake/help/cmake-2-8-docs.html
https://code.google.com/p/cpp-project-template/
AnDevCon2011 Ssx
Ssx
(updated 1 minute ago by StephenWilliams)
License
Written by Stephen Williams, principle at OptimaLogic. Development was split with client. Apache 2.0
25 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
Download
On Google Code soon. For now, snag it from: http://sdw.st/conf/AnDevCon2011/ssx-1.0.zip
Concise Coding
Intro to Ssx
Ssx provides a fast, concise to use and concisely written DOM and SAX parsing library. It is a
non-validating "reasonably conforming" XML parser. In a single Java file in about 1000 lines of code,
written and optimized in about a week. Ssx is meant for parsing of typical application and business data.
It is not intended as a solution to every XML need. There are a number of permanent (DTDs) and a
couple temporary restrictions for the range of XML handled. The embedded SAX parser, which
implements the org.xml.sax.XMLReader interface, is 240 lines of code (with the core parse loop written
in dense "paragraph mode"). This parser also directly supports efficient implementation of the toXml()
method by remembering the text parsed. In a number of cases, this can make re-serializing XML very
fast.
Ssx is standard Java that also works well with Dalvik. The only Android specific code is what is needed
to find SAX when the select the built-in SAX parser is selected:
01. try
{
02.
parser
=
XMLReaderFactory.createXMLReader();
03. }
catch
(Exception
e)
{
04.
//
Try
known
"default"
for
Android:
05.
System.setProperty("org.xml.sax.driver","org.xmlpull.v1.sax2.Driver");
06.
parser
=
XMLReaderFactory.createXMLReader();
07. }
26 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
Ssx is small and simple enough to be extended as needed for a particular use. This could include
additional XPath capabilities, specialized indexing, custom validation, or integrating signing and
encryption. Internal DTD entity definitions or external entities could easily be supported.
Why
XML is, at the base, a reasonably simple data format. A number of details, like namespaces, make
parsing XML somewhat interesting. Still, existing APIs are generally far more complex than they need
to be for most applications. An application generally wants to hand data to a parser, be told if there is an
error, and be able to find and retrieve data elements. In many cases, and especially for an Android
application, there is a lot to be said for having just the code needed to solve the job, leaving the kitchen
sink in the kitchen.
Object Mapping
A number of methods center around creating classes for every business object, then writing code to map
external representations to those objects. This includes object relational and XML mapping.
Traditionally, developers wrote copious glue code at each layer and step. Some modern systems try to
alleviate this by using language-enabled annotations or metadata files so that this mapping can be done
interpretively. This can be helpful, but often the detailed steps and care needed to get this to work rivals
manual glue code.
One must ask: Is this the only way to accomplish the business logic needed? Is it the most efficient?
Easiest to understand and modify? Is this the best use of the developer's time? Consider how many lines
of code need to be written at each layer for each data element introduced. Traditionally, it is several at
least, multiplied by many layers and both directions. The ideal, and often possible case is far less than
one line of code per element is needed at each layer.
The ideal case can be described this way: An application architecture is established where a message
travels from point A to point B, perhaps passing through proxies, intermediate steps that may observe or
also modify or add data, perhaps storing it in message queues or in a database. Each function, library,
and application along the way may be developed separately and updated and different intervals. If the
application at point A adds a new data element, what has to change for it to get to B and perhaps back to
A? In the ideal case, only A needs to change. When B is changed, it can react to that element.
Intermediate applications should not care that something has changed because they read what they are
interested in, insert or replace data that they care about, and pass the message along.
Typical applications are not this resilient and some XML frameworks do not easily enable the best case.
27 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
Something that is *always* an issue is how versioning and extensions are handled. How much code has
to change? What needs to be recompiled? When? Ideally, something like adding an additional field to a
message should be able to propagate through a system without requiring lockstep upgrades and without
conflicting with existing data or databases. XML, properly used, is one way to accomplish this.
(RDF-like graph-based semantic data is a better way, but that is another story.)
Collection Objects
A collection object is an instance of a class that manages sets of data in a structured way. A classic
example is a Map<> that provides a dictionary structure. A DOM-style XML representation is a type of
collection object, although the traditional DOM API is very cumbersome.
One way to avoid a lot of manual glue code or metadata is to use collection objects of some kind to
represent messages and business objects. Interestingly, these can be made arbitrarily hierarchical, just
like an object hierarchy. They can also be wrapped with very lightweight classes so that while the
collection class may provide clean find/set/get, application specific methods can be added. The result
can be used in a typical object oriented fashion while writing very little code that is not business logic.
While Ssx doesn't have this level of API yet, the author has designed and implemented this type of
solution very successfully in the past.
Accept anything, complaining only if it is malformed or you can't find required items
When passing on data received in some sense, pass on extra information even if it is not
understood.
Carefully produce data exactly to specification.
Prefer logical structure to physical: XML can be used to represent graphs and trees. The former are
more flexible.
Use namespaces, and semantic tagging if possible, to uniquely identify the types of elements,
attributes, and relationships.
Writing XML
XML is usually easy to write: Simply concatenate strings, perhaps using a template that can be updated
easily. This is also usually the fastest method. It is a big help if parsed XML or generated data structures
can be easily converted into an XML document or a fragment that can be included.
In some cases, it can be helpful to have an API that allows building the XML output, perhaps in a
non-linear way. Ssx does not yet support this, but will soon.
How it works
Key insights used in Ssx are:
28 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
A single set of Maps could efficiently represent the structure of an arbitrary XML tree.
Structure is provided as map entries from the current node to the next in more than one sense: Next
sibling, next sibling with same name, first child, parent. The "next" operation, which is very
commonly used, is very fast.
No iterators types or objects are needed: Each object is its own iterator!
Each element can be represented with a very lightweight object, with relationships held completely
in the maps.
The text values can be referenced as ranges of the original parsed data. (There are some nuances
here since XML is unicode and the actual source may have been bytes.)
A toXml() method can be supported at every element in a very efficient way. XML can be provided
as a fragment or as fully formed XML with all name spaces defined properly, allowing the XML
subtree to be recreated exactly in a later parse with no application fixup.
A minimal form of XPath allows a DOM-like API to support most operations efficiently in a single
line of code.
Avoid creating objects of any kind. Memory allocation and garbage collection, plus the related
copying, should always be minimized. Character is expensive too.
Unicode character conversion (byte[]->char, char->byte[]) is too expensive. Inline code may be
used in the future.
Avoid function calls when possible.
Using Enums is very expensive! Don't do it in tight loops. A local "int" is very fast.
For small sets, especially with a String key, HashMap is far more expensive than TreeMap. Use
TreeMap.
Even TreeMap is too expensive. Much of the CPU in Ssx is spent in TreeMap.
Direct array access is very cheap.
Reusing objects is a key technique.
When expandable objects are needed, simple with amortized bounds checking / reallocation are
preferred. Once a high-water mark is hit, remember it. Use non-linear expansion in size (doubling
for instance) when data varies widely.
Ssx API
Tagged as 'Ssx':
29 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
AnDevCon2011 (20)
Ssx (4)
Ssx API
Ssx Part 2
AnDevCon2011 Ssx
Ssx API
(updated 16 hours ago by StephenWilliams)
All retrieval methods return null when the request cannot be found, except for the versions which are
given a default value to return.
30 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
01. public
class
Ssx
{
//
Reusable
object
holding
parse
tree.
Just
call
parse()
to
reuse.
02.
//
Determine
which
SAX
parser
is
used:
internal,
local,
or
both,
and
whether
to
time
parsing.
03.
public
static
void
setParseType(boolean
sparsep,
boolean
defaultParserp,
boolean
timeAllp);
04.
//
Parse
XML,
given
as
a
byte
array.
05.
public
Xml
parse(byte[]
xmlBytes,
int
off,
int
len,
String
nsSet)
throws
IOException,
ParseException;
06.
//
Parse
an
XML
string.
'what'
is
information
for
logging.
'timed'
determines
whether
the
parse
tim
07.
public
Xml
parse(String
what,
String
xml,
boolean
timed)
throws
IOException,
ParseException;
08.
//
The
element
node
object.
Returned
from
a
parse
and
most
operations.
09.
public
class
Xml
implements
Comparable
{
10.
//
Allows
objects
to
be
compared.
11.
public
boolean
equals(Object
o);
12.
//
Returns
XML
equivalent
of
the
current
node.
Contained
in
an
'<xml>'
node
with
all
active
name
13.
public
String
toXml()
throws
UnsupportedEncodingException;
14.
//
Returns
the
current
node
as
an
XML
fragment.
15.
public
String
toXmlFragment()
throws
UnsupportedEncodingException;
16.
//
Returns
the
next
sibling
of
this
element.
17.
public
Xml
next();
18.
//
Returns
the
next
sibling
of
this
element
that
has
the
same
name,
skipping
any
other
elements.
19.
public
Xml
nextSameName();
20.
//
Returns
the
node
matching
the
path
qname.
21.
public
Xml
getNode(String
qname);
22.
//
Returns
the
node
matching
the
namespace+localname.
23.
public
Xml
getNode(String
ns,
String
localName);
24.
//
Returns
the
namespace
of
the
current
node.
25.
String
namespace();
26.
//
Return
the
name
of
the
current
node.
27.
String
name();
28.
//
toString(),
getText(),
and
get()
all
return
the
text
for
the
current
element.
29.
public
String
toString();
30.
public
String
getText();
31.
public
String
get();
32.
//
Returns
the
text
value
of
the
given
path
qname.
33.
public
String
get(String
qname);
34.
//
Returns
the
text
value
of
the
given
path
qname,
or
the
passed
default
value
if
the
path
is
not
35.
public
String
get(String
qname,
String
def);
36.
//
Returns
the
text
value
of
the
given
namespace+path,
or
default.
37.
public
String
get(String
ns,
String
path,
String
def);
38.
//
Returns
the
value
of
the
given
node
as
an
int.
39.
public
int
getInt();
40.
public
int
getInt(int
defaultInt);
41.
public
int
getInt(String
path,
int
defaultInt);
42.
//
Returns
the
value
of
the
given
node
as
a
double.
43.
public
double
getDouble();
44.
public
double
getDouble(double
defaultDouble);
45.
public
double
getDouble(String
path,
double
defaultDouble);
46.
public
double
getDouble(String
path);
47.
}
48.
//
Turn
on
debugging
or
verbose
tracing.
49.
public
static
void
setDebug(boolean
deb,
boolean
verb)
{
debug
=
deb;
verbose
=
verb;
}
50.
51.
//////
Utility
methods
that
are
often
missing
or
not
quite
usable.
52.
//
Pull
a
stream
into
a
string
efficiently.
53.
public
static
String
slurp(InputStream
in)
throws
IOException;
54.
//
Pull
a
stream
into
a
byte
array
efficiently.
55.
public
static
byte[]
slurpBytes(InputStream
is)
throws
IOException;
56.
//
These
will
change
soon
to
take
a
pass
list
as
proper
url
encoding
varies
depending
on
situation.
57.
//
Urlencode
a
string
58.
public
static
String
urlEncode(String
s);
59.
//
Does
this
character
need
encoding?
60.
public
static
boolean
needsEncode(char
c);
61.
//
Urldecode
a
string
62.
public
static
String
urlDecode(String
s);
63.
//
Coming
soon:
b64
codec
64. }
31 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
Back to Ssx
AnDevCon2011 Ssx
Ssx Part 2
(updated 16 hours ago by StephenWilliams)
Coming Soon
1. Namespaces are handled well for many cases. What is currently not supported is namespace
definitions that change during a single parse. This includes a default namespace that is redefined or
only defined for a subtree. This can be improved to handle any non-pathological use of
namespaces. Namespace parsing in attribute values may also be handled.
2. It is possible that the lexical events, DTD entity declarations, and other features may be important
enough to be implemented. Some of these could activate a flag to enable extra features when
present to keep parsing as fast as possible in other cases.
3. More incremental parsing will be supported, especially to support the streaming event DOMlet
model.
4. Additional convenience methods, such as date parse.
OpenEXI
OpenEXI is an open source project that combines several open source implementations of the W3C
Efficient XML Interchange binary XML standard. The author participated in the EXI working group and
the XBC working group before it. We plan to merge and refactor the existing code base, then provide an
Ssx API for OpenEXI so that either XML or EXI can be produced or parsed by applications. We have
also begun the process of getting OpenEXI into the Apache Incubator.
http://openexi.sourceforge.net/
What is EXI?
EXI provides a very compact encoding of the XML infoset (i.e., the informational equivalent of an XML
file) with some options. These options allow encoding of a standalone XML file or an XML file with
expected structure and data types specified with an XML Schema. The resulting intermediate encoding
can then optionally include data compression (ZLIB), applied in a particular way. With a schema,
encoding can be much more compact because the schema represents redundancy in the data and certain
data can be encoded as compact binary values.
32 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
The point of all of this is to highly optimize both the processing overhead of parsing and serialization
and the size of the resulting data. Both of these greatly reduce the overall data transfer, processing,
memory usage, and latency of data.
It is a common FAQ why XML + compression isn't just as good. The two main points are that this
makes processing speed even worse for XML and the result is still not as compact in many cases as EXI.
EXI greatly reduces the overhead of XML, particularly when many tags and attribute names are used.
For some XML, there is little of that so only the possibility of restricted character sets or other binary
encoding would make a difference.
Another key point, and one of the key differences between EXI and most prior optimized binary
formats, is that EXI can encode any XML that it is given, whether or not it matches the optional schema.
When to use
EXI is great for large amounts of complex data or for transfer of data that could be more efficient in
binary, such as float or many date/times. It is also efficient for small messages that could reduce down to
a few bytes in some cases.
GenXDM
GenXDM enables applications to write code that uses and manipulates XML trees
without being tied to a particular XML tree representation like DOM, DOM4J, AXIOM,
or any other. It also prods developers towards an immutable view of XML trees, which
will make it easier and faster to work with XML across multiple cores and multiple
processors.
http://www.genxdm.org/
GenXDM is a great concept. The GenXDM developers are interested in Ssx and OpenEXI.
http://lists.xml.org/archives/xml-dev/200401/msg00492.html
33 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
Some other small-ish libraries. None seem nearly as concise and easy to use.
XMLtp
sparta-xml
NanoXML
jdom
tinyxml
piccolo
kXML
About Us
Corporate sdw
OptimaLogic
(updated just now by StephenWilliams)
OptimaLogic, Inc. is a lean R&D consulting organization located in Silicon Valley that provides highly
technical mobile, desktop, and server related consulting in a variety of areas. These include:
Recent clients include technical startups, multi-national corporations, government agencies, and
academia. With access to a wide range of resources, OptimaLogic can quickly find the optimal solution
for your most challenging projects. We have a particular interest in early stage startups and high-profile,
important projects.
AnDevCon2011 ConciseCoding
Concise Coding
(updated 16 hours ago by StephenWilliams)
Most code is far from optimal. It is too verbose, sometimes having hundreds of classes, and use many
lines of code to do things that should be done in a single line. Interfaces are complicated, combining
libraries and techniques often create a combinatorial explosion of total complexity. The chronic
34 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
application of "keep it simple", "don't put too much code in one place", "we don't have time to rewrite
that code" results in code that is far, far more complex than it should be.
In addition to problems in project management and architecture / high-level design, detailed design and
coding frequently suffers from over-application of methods and design rules which leads to inefficiency,
pain, suffering, and project failure. Management challenges are answered somewhat by Agile, Scrum,
etc.
There are many architectural methods and principles that are very helpful. These are also managed with
iterative & Agile methods and best emerging practice. However, in very successful projects, applying
just the right techniques in a sparing way to get lowest total complexity requires competent, experienced
architect coders with the right goals. These are guidelines for choosing those goals.
1. Apply architectural principles and design methodologies, recommended design rules, and favor use
of well-known APIs and architectures
2. Always counterbalance with consideration overall complexity for application developers,
maintenance, and reuse
3. Favor creating tools and libraries to concentrate complexity to keep application development as
simple and concise as possible
4. Develop and use design rules for when *not* to create new classes, files, packages, etc.
a. Avoid "class diarrhea": Most developers seem to have many reasons to create new classes and
practically no reasons not to.
i. Tools are making this worse. Peephole, tool-tip driven development can lead to a system
that is impossible to grok as a whole. Development can grind to a halt as it becomes more
and more difficult to make changes.
b. Don't pollute the namespace, class "space", file "space", etc.
c. Strive to reduce total "surface area" (the total cognitive load) at each level.
d. Architect for flexibility, but recognize when the flexibility is not needed. (Do you really need
to create an indirection for a constant like "http://"? Is it going to change? Does it need
translation? What are you doing???)
5. Be object-oriented at the macro level too
a. Keep everything together when possible and lowest complexity.
b. Expect code reuse: Can a class be copied easily to another project / package, or do all classes
form a complex web. Having to change more than a class or two for incremental additions is a
good sign that something is wrong.
c. Don't ever hide application flow, configuration, and dependencies.
d. Avoid creating interfaces and classes just to pass, return, and store tuples.
i. In some cases, Map<> or String[]/Object[] can be appropriate. (Similar to C++ pair<> or
Qt/C++ or Objective-C properties or even Lisp lists.)
ii. Use generic callback interfaces for generic solutions.
iii. If a custom composite return type or callback interface is desired, declare it in an inner
class right next to where it is used, unless it is a very standard and common element in a
system.
6. Don't avoid refactoring or even a total rewrite
a. Recognize that developers are experts at the problem *after* they have created an initial
design and implemented it. Frequently, what seemed appropriate before solving all of the
35 of 36 3/9/11 4:20 PM
AnDevCon2011 Stephen Williams - JavaGlue file:///Users/sdw/Documents/OptimaLogic/tw/sdwnmptw.html#tag:J...
details is, later, clearly not the best. Expect this. Redesign and rewrite at stopping points or
when development is slowing.
b. Working code can usually be rewritten much faster than the original: You are not wasting all
prior effort.
c. Rewrite in parallel to existing code when possible which can allow toggling or running both
paths and comparing the results.
7. Use or Create new conventions and methods that improve complexity
a. Little things like coding conventions
i. Favor reducing vertical white space: Seeing more code at once is useful. (I prefer K&R
for that reason.)
ii. Use special rules when necessary: "paragraph mode" for intense, dense coding: See
Ssx.SParse.
b. Big things like architectural patterns
i. Signals / Slots, message based, queues, logging/debugging, ...
Back to Ssx
36 of 36 3/9/11 4:20 PM