0% found this document useful (0 votes)
43 views

Scientific Programming With Java Classes Supported With A Scripting Interpreter

The document describes the jLab scientific programming environment. jLab provides a scripting language that is executed by an interpreter implemented in Java. This allows jLab to be platform independent. jLab integrates the dynamic loading of Java classes with the execution of j-scripts. Functions can be defined as Java classes or as jLab j-scripts. jLab was developed to provide an expressive programming environment for computationally intensive scientific applications like neural networks.

Uploaded by

dcbarrientos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Scientific Programming With Java Classes Supported With A Scripting Interpreter

The document describes the jLab scientific programming environment. jLab provides a scripting language that is executed by an interpreter implemented in Java. This allows jLab to be platform independent. jLab integrates the dynamic loading of Java classes with the execution of j-scripts. Functions can be defined as Java classes or as jLab j-scripts. jLab was developed to provide an expressive programming environment for computationally intensive scientific applications like neural networks.

Uploaded by

dcbarrientos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Scientic programming with Java classes supported

with a scripting interpreter


STERGIOS PAPADIMITRIOU
Department of Information Management,
Technological Educational Institute of Kavala,
65404 Kavala, Greece,
email: [email protected]
December 4, 2006
Abstract
The jLab environment provides a Matlab/Scilab like scripting lan-
guage that is executed by an interpreter implemented in the Java lan-
guage. This language supports all the basic programming constructs
and an extensive set of built in mathematical routines that cover all
the basic numerical analysis tasks. Moreover, the toolboxes of jLab
can be easily implemented in Java and the corresponding classes can
be dynamically integrated to the system. The eciency of the Java
compiled code can be directly utilized for any computationally inten-
sive operations. Since jLab is coded in pure Java the build from source
process is much cleaner, faster, platform independent and less error
prone than similar C/C++/Fortran based open source environments
(e.g. Scilab, Octave).
Neuro-Fuzzy algorithms can require enormous computation resources
and at the same time an expressive programming environment. We
demonstrate the potentiality of jLab by describing the implementation
of a Support Vector Machine (SVM) toolkit and by comparing its per-
formance with a C/C++ and a Matlab version and across dierent
computing platforms (i.e. Linux, Sun/Solaris, Windows XP).
keywords: Programming Environments, Java, Scientic Software, Script-
ing Interpreter, Reection
1
1 Introduction
Recently with the growing speed and potentiality of computers the pop-
ulatity of integrated scientic programming environments has signicantly
risen. These environments in general demand much more time and space
resources from the traditional compiled programming languages (i.e. C++
and Fortran). However, they greatly facilitate the task of creating quickly
reliable scientic software, even from scientists with little programming ex-
pertise.
Two categories of general scientic software can be identied: computer
algebra systems that perform extensively symbolic mathematical evaluations
(e.g. Maple [23], Mathematica [22]) and matrix computation systems that
are oriented toward numerical computations and are well suited for engi-
neering applications (e.g. the Matlab [25] that dominates at the commercial
market and the open source clones Scilab [1] and Octave [24]). An excel-
lent recent comparative review of three well-established commercial products
can be found in [16, 17].
These systems are usually implemented in C/C++/Fortran and they
are available in platform specic binary formats or in also platform specic
build from source congurations (e.g. the open source Scilab and Octave
systems). To the contrary, the Java programming language in which the
presented jLab environment is implemented allows platform independence.
We have tested jLab on Linux, Solaris and Windows XP and it runs in the
same way, on all these dierent environments, without any change of the
code.
The Java language oers a superb framework for the construction of
exible scientic software with concepts as:
2
the reection framework: This allows the interpreter to exibly inter-
rogate the dynamically loaded extension toolboxes that contain Java
classes implementing specialized functionality (e.g. ODE solvers, neu-
ral network models) [26].
the parsing exibility: The Java programming language allows to de-
tect exibly the type of the next scanned token. The instanceof oper-
ator allows to check dynamically the token type and to take the corre-
sponding actions (e.g. with statements like: if (nextToken instanceof
VariableToken) ..)
the well-designed portable and powerful graphical environment. This
allows the implementation of high quality scientic graphics that are
platform independent.
the object-orientation that allows modular and robust design that ex-
ploits the reusability of code whenever possible.
the robust exception handling. In a complex, exible programming
environment a lot of errors can occur. jLab catches a lot of exceptions
and in most cases it recovers gracefully without even distorting the
ow of user computation, whenever this is possible.
reliable, simple and uniforminstallation on any platform (e.g. Unix/Linux,
Windows) that supports a recent Java Run time Environment (JRE).
the user friendly graphical conguration of the systems environment
variables and the exploitation of the powerful abilities of Javas AWT/Swing
for displaying both the program output and the program state.
the support of concurrent and parallel computation with the multi-
3
threaded nature of the language and the extended support of dis-
tributed computation technologies [21].
Contrary to some other Fortran and C based open source numerical com-
puting environments such as Scilab and Octave, the compilation of the jLabs
source is extremely fast, simple and platform independent. It compiles in
only a few seconds, while the Scilab or Octave sources take several minutes.
Moreover, at the later environments a lot of machine specic details can
perplex the building from source process.
The paper proceeds as follows: Section 2 presents the architecture of
the components that constitute the jLab system. Section 3 deals with the
important subject of function denition, and elaborates on the two dierent
ways to dene functions: as Java class les and as jLab j-scripts. Section
4 outlines the main points involved with the modules that perform the dy-
namic loading and execution of either Java class les, or jLab coded j-script
modules. Some important issues related to parsing of jLab programs are
discussed in Section 5. Section 6 presents the application of the jLab en-
vironment at the development of code for the implementation of Support
Vector Machines. Finally, section 7 concludes the paper and presents some
basic directions for future work.
2 The architecture of the system
The system at the top level consists of the following main components (Fig-
ure 1):
a. The java Execution engine (jExec), is the part that translates dynam-
ically the jLab programming language and executes the users com-
mands. It is actually a exible interpreter coded in Java that is con-
4
sisted of the following modules:
The Lexical Analyzer. It tokenizes the input in order to permit
the parsing phase to operate on a token stream instead of the
plain text.
The Parser. The parser rst checks the syntax of all the jLabs
programming constructs. Then the parser executes each expres-
sion by building an expression tree and evaluating the nodes of
the tree by a top-down recursive traversal (see Figure 1).
b. The Java toolboxes. These toolboxes consist of Java class libraries that
need to adhere only to a small set of conventions in order to be di-
rectly utilized from jLab. We will demonstrate the construction of a
Java class library in Section 6. The Java programmer that implements
these toolboxes also has access to the wide set of numerical libraries
and application specialized toolboxes (e.g. fuzzy systems, neural net-
works). The popularity of the Java language makes it easy to utilize
excellent libraries for specic domains, e.g. the JOONE library for
neural networks [19], the WEKA data mining system [18] and the
fuzzy expert system of Bigus [20].
c. The jLab toolboxes use the jLab interpreted language to implement
program logic with text code les called J-Files. We selected to fol-
low the syntax of the Scilab language [1]. The similarity of the J-File
syntax with Scilab facilitates the task of incorporating the repository
of Scilabs numerical software. However, currently jLab supports a
subset of Scilab syntax and thus in many cases it is not possible to ex-
ecute Scilab les without modication. We decided to base the syntax
on Scilab for the following reasons: a. Scilab is also an open source
5
eort and can be a productive exchange of ideas between the develop-
ers/researchers of both systems. b. jLab can accelerate signicantly
existent Scilab code by replacing the compute intensive parts with
Java classes. Although the same can be accomplished within Scilab
by linking external code, in jLab is much more easier and modular.
The jLab is a programming environment that integrates the dynamic
loading and execution of Java classes with the execution of J-Files
(both J-Script les and J-Function les).
Also, we should note that the user interface resembles a Matlab type
user interaction via a command prompt on which the user can type
and edit commands. Also, the Javas Swing framework [2] is utilized
extensively to provide elegant dialog boxes, trees for graphical display
of hierarchically organized information etc. We proceed by describing
the jLab architecture that permits implementations of algorithms with
both Java and scripting components.
3 Function Handling
This section elaborates on the important subject of function handling. The
jLab environment allows to integrate both functions implemented as meth-
ods of Java classes and J-Script based functions implemented as J-Files.
From the former functions, the basic ones are implemented as a built-in
class library while specialized Java class libraries can extend the potential
of the system at particular application domains. Also the basic functions
are handled internally by the system. The general function architecture is
demonstrated by Figures 2 and 3. We proceed by scrutinizing the main
components.
6
3.1 J-Files, J-Functions and extension J-Classes
In jLab a specic Java class, i.e. the FunctionManager class, is used to
implement the functionality of function handling and to represent any func-
tions used in an expression. The details of the FunctionManager class are
described in a following subsection. A function can be implemented either
as a compiled Java class le or as a jLab J-File. We will refer to the for-
mer functions as compiled Java functions (abbreviated extension J-Classes).
The J-Files are interpreted and they resemble the syntax of Scilabs .sce les.
They implement either functions and are refered as J-Functions or they are
simply batches of jLab code, the J-Scripts.
The J-Files can be easily programmed since the jLab language is untyped
and their syntax is kept simple, Scilab-like and to a large extent Scilab com-
patible. They can be directly executed in the jLab environment by placing
them in directories accessible by the jLabScriptPath jLabs environment
variable that has a similar role for J-File loading that the Javas virtual
machine classpath has for class loading.The J-Scripts se rve as batch les
for jLabs commands.
The J-Functions can return multiple return parameters in a syntax [rv
1
, rv
2
, . . .]=
some-J-Function(arg
1
, arg
2
, . . .), where rv
i
denotes the return values and
arg
i
are the arguments of the function.
An example of a J-Function that returns multiple values is:
% computes many values
function [xAdd,xSub,zMult]=computeMulti(a,b)
xAdd = a+100*b;
xSub = a-100*b;
zMult = xAdd*xSub;
7
Their main disadvantage is their speed of execution: they are usually
slower than the equivalent Matlab or Scilab functions. However this draw-
back can be bypassed when the programmer implements the equivalent func-
tionality with a Java class le, i.e. a J-Class, that can also be dynamically
executed by the system. At this case the code is very fast, since it is com-
piled Java code, and can compete even corresponding C++ or Fortran li-
brary functions. Although some Java libraries perform even better than
native code libraries we should expect a delay by a constant factor of about
2 to 3, due to the virtual machine overhead.
We refer to the dynamically connected J-Classes that aim to implement
various toolboxes and are implemented with Java classes, as extension J-
Classes. The extension J-Classes oer the potential to easily extend the
functionality of the system at several application domains with Java code.
The interfacing with J-Functions is encapsulated with the ExternalFunc-
tion class. Each compiled extension J-Class operates on a list of objects of
the Operand abstract class type. As we will see, this design allows for max-
imum exibility in parameter passing.
3.2 The Internal Functions
In addition to the forementioned extension J-Classes there are several other
important classes that also represent Java class code, although this type of
code is integrated with the system. These are represented by the Internal-
Function class that is the base class for all the internal function types. Some
subclasses of InternalFunction class that further specialize the corresponding
properties and behavior of the function are (Figure 3):
ComplexFunction: Class representing a jLab Complex function.
jLab has extensive provisions for complex arithmetic.
8
MatrixFunction: Class implementing the mathematical functions
for matrices.
StandardFunction: Class implementing the standard mathematical
functions. (e.g. abs(), exp(), log(), ln()).
TrigonometricFunction: Class implementing trigonometric func-
tions. (e.g. sin(), cos(), tan() ).
It is important to emphasize the basic distinction between Internal and
External functions: Internal functions are hardwired to the system while
the External can be dynamically extended by the user. We should note at
this point that the External classes are loaded by a special class loader (i.e.
the ExternalFunctionClassLoader).
3.3 The Function Manager
The function manager class is an essential component with respect to the
dynamic class execution. It is implemented by means of a Java class. It uses
a method evaluate() to evaluate each function. The evaluation code rst
checks if the function name is overloaded by a variable name. If a variable
overloads the function, then a variable is created and the parameters of the
function are treated as the limits of the variable. The variable with these
limits in turn is evaluated and the corresponding result is returned as the
result of the attempt to evaluate the function.
Otherwise, i.e. when the function name is not overloaded by a variable
name, it calls the function manager (implemented with the class Function-
Manager) in order to nd the function. The FunctionManager tracks dy-
namically the extension J-Classes. The potentiality of the Java language for
dynamic class loading and execution allows jLab to incorporate easily with
9
its kernel any number of Java classes without any recompilation of the
system. All that is required is to place the compiled class les in directories
visible from the jLabClassPath variable.
The evaluation of an extension function is very fast since it is compiled
Java code. However, a user with missing or limited Java experience is not
expected to be able to implement extension classes. These users can use the
jLabs scripting language and implement J-Files (J-Functions, J-Scripts). A
function is referred as UserFunction if it is implemented as a J-File.
The evaluation task of each function, whether ExternalFunction (i.e.
Java code) or UserFunction (i.e. J-File), starts by rst evaluating the
operands of the function. Then the corresponding J-script or the Java func-
tion is evaluated by calling rst the clone(), so the original functions stay
untouched.
Although the evaluation code depends on the function type, each eval-
uate() function adheres to the same signature in order to permit exible
evaluation of expression trees, comprised of functions of various types (e.g.
both Internal and External functions). The evaluation is performed accord-
ing to some priority rules explained below.
In order to evaluate an InternalFunction the system rst checks whether
the function by itself is an expression. In the armative case all the childs of
the expression are evaluated recursively. Having evaluated all the childs, the
root node, which represents the InternalFunction object obtains its value.
This value corresponds to its return value, that is returned. When the
InternalFunction is not an expression it represents a numeric value, which
is returned as the functions return value.
The FunctionManager maintains the set of functions for the foremen-
tioned categories of Internal functions (e.g. trigonometric, standard, ma-
10
trix), and manages the dynamically expanded set of User functions (both
extension Java Classes and J-Files). The Java class les that implement
external extension J-Classes are loaded by a specic class loader, the JClass-
Loader. Another type of loader, the J-File loader, loads the J-Files (i.e. the
UserFunctions). The FunctionManager starts by constructing a number of
internal functions. A function is processed by rst checking whether it is
an Internal Function (i.e. a bult-in Java piece of code). In the case that
the search outcome is negative, the extension J-Classes becomes the tar-
get. Finally, the Internal Functions are scrutinized. This order of function
evaluation is illustrated by Figure 4. Also, Figure 5 illustrates the stages
of expression parsing. We should stress the point that even the J-Files are
processed into Java UserFunction classes and then are handled uniformly.
The conguration of jLab is simple: as we already mentioned, two envi-
ronment variables are used to set the search path for J-Files (i.e. executable
scripts) and Java classes (i.e. executable bytecodes) respectively. The rst
one is the already mentioned jLabScriptPath variable and the other the
jLabClassPath variable. Both are settable and adjustable from within the
graphical interface. These parameters set up the environment for the code
loaders that are elaborated at the following subsection.
4 The Code Loaders
The custom code loaders are essential to the exibility and extensibility of
the system. Contrary to similar systems, as Scilab [1] and Matlab [25], jLab
can be easily extended with specialized Java toolboxes that run as fast as
the Java runtime (i.e. the Java Virtual Machine implementation) permits.
In order to achieve this, jLab owns two types of code loaders implemented
with dierent classes. The rst one is the Java class loader (abbreviated
11
jClassLoader) that resembles the functionality of exible java class loaders
[2], while the second the J-File loader accomplishes the elaborate handling
of J-Files (either J-Functions or J-Scripts).
The class loaders keep all the loaded classes in a global hashtable (im-
plemented with the Hashtable standard JDK class). The hashtable allows
fast lookup at any loaded class. Thus, although the time to locate a new
class is linear in the number of extension classes, the subsequent calls to the
same class cost only O(1) time. The J-File namespace is handled similarly.
The jClassLoader maintains a root directory for the available jLab exten-
sion Java class les (i.e. the extension J-Classes). The String baseClassDir
maintains the path of this root at the local le system and is a cong-
urable parameter (e.g. for Unix/Linux lesystems can be /javaApps/jLab)
that also can be supplied as a command line argument at the jLabs execu-
tion. The jClassLoader can locate and execute any Java class le located
under this root. With this design we can obtain modular tree-based or-
ganization of the jLabs classes, extensibility and exploitation of the superb
le-handling facilities of current operating systems.
The baseClassDir parameter is very signicant and is expected as a com-
mand line argument. It is in essence the root directory where the classes
of the jLab system are installed at the local lesystem. At the baseClass-
Dir there can exist two other important but optional conguration les: a.)
the jLab.unix.properties and the jLab.win.properties. Whenever these les
exist, jLab initiates automatically the jLabScriptPath parameter. Depend-
ing on the operating platform (Unix/Linux or Windows) the corresponding
le is used. These property les are utilized by the JFileLoader class that
has the task of locating and retrieving the code implemented in the jLabs
interpreted language.
12
The jClassLoader attempts rst to locate a class in the formerly men-
tioned hashtable. In the case that the class is not in this hash table, a
search process follows. It uses a simple and eective algorithm to locate
the dynamically loaded Java class les: it expects them at the subdirectory
./jExec/Functions in the jLab directory tree, e.g. at the previous example
it will be: /home/user5/javaApps/jLab/jExec/Functions. Whenever the
search at the basedir ./jExec/Functions fails, the system tries to locate the
class at all the directories associated with the jLabs jLabUserClasses envi-
ronment variable. This order of class searching allows the user to extend the
existing class names with his/her own classes or j-Files and to keep his/her
classes separately from those supplied within the jLab system.
The jFileLoader is a class that can load and execute j-Files (both the
j-Scripts and the j-Functions) of the jLab language. We remind that the j-
Function les implement jLab functions while the j-Scripts simply organize a
batch of commands, i.e. they are just a couple of commands that are typed
in a text le. The jFileLoader in turn calls the FunctionParser to parse
the text of the j-File and to return a UserFunction class to jExec ready for
computation.
The ReectionFunctionLoader is a class that calls a function from an
external class using reection. The reection system allows Java program-
mers to look and handle the elds of objects that were not known at compile
time [2]. The Javas reection mechanism allows to add new classes to the
jLab system at run time. With this mechanism the system can dynamically
inquire about the capabilities of the classes that were added. The Java run-
time system maintains runtime type identication on all objects, that keeps
track of the class to which each object belongs. This information is used by
the virtual machine to select the proper methods for execution.
13
Since it is quite easy to incorporate Java code into the jLab environment,
at the extension j-Class framework, the scripting code ts usually only for
the implementation of the high level application logic, while the number
crunching numerical routines should be coded in Java.
5 Parser Design
This section elaborates on the important issue of parsing. The rst sub-
section deals with the issue of function parsing, i.e. how jLab deals with
the various types of functions. The next subsection analyzes the expression
parsing, that includes the handling of the programming constructs of the
language (e.g. if-then, for-loop , while-loop).
5.1 Function Parsing
As we already emphasized jLab is an environment that can be eciently
utilized with mixed mode programming: the high level structure of the
program should be coded as a j-Script and the number crunching routines
in Java. The Java based extension code is implemented as extension J-
classes with the ExternalFunction class and are important since they are
the basic means for the ecient extension of jLabs functionality. Every
Java programmer can extend easily jLab by following a few simple rules for
the interfacing of the new functions. The interface for passing parameters to
an external function (class ExternalFunction) is quite exible allowing the
implementations of arbitrary functions.
Each user specied external function extends the ExternalFunction class.
It returns a generic structure of type OperandToken and accepts parameters
in an array of Token classes. Numeric parameters can be easily passed with
a NumberToken structure. The Java runtime object type checking operator
14
instanceof is valuable for discovering the types of parameters at runtime.
Also, the StringToken is the class that represents Strings. Upon evalua-
tion it returns the token itself. It is very suitable for passing alphanumeric
information in jLab routines.
The FunctionParser class parses user functions. We recall that user
functions are implemented as J-Files. The later contain either functions
(i.e. the J-Functions) or they are simple script les (i.e. the J-Scripts).
The UserFunction class is the class that handles the user edited J-File
functions. This class implements a method that takes the jLab code of the
function as a string and returns the UserFunction created. The J-File code
of the function is represented with an OperandToken class. A standard
Java ArrayList class maintains the values of the input parameters of the
function. Similarly, the names of the return values are kept in a return
variables ArrayList.
A ag indicates whether the UserFunction class represents a J-script or
a J-Function. For J-Functions, the number of parameters that the function
denes within its text body should match the number of parameters at the
calling sequence.
J-Scripts can be evaluated directly from their text code. However, jLab
has harder work in order to execute J-Functions. For J-Functions, a local
context of their local variables is rst created. At the next processing step
the formal parameters of the function are initialized with the values of the
actual parameters. After the parameter passing has been performed, the ex-
ecution of the function code can be accomplished. The function code must
be cloned so that the original code remains untouched. The function eval-
uation code assigns the corresponding values to the return variable. When
multiple return variables exist those are collected within a matrix and this
15
matrix is returned.
5.2 Expression Parsing
The Interpreter (jExec) starts by separating the expression into tokens and
then it constructs an expression tree. These actions are performed with
the aid of the parser. This expression tree is subsequently evaluated. The
exible exception handling capabilities of Java are utilized in order to store
information about a possible error on expression evaluation as one special
variable.
The Expression class implements a tree where each node has a variable
number of children. Each node keeps information for the operator that
it implements. Also each expression keeps track of the index of the child
being executed. The operator being held within the node is used in order
to evaluate the expression accordingly. If this operator is an assignment
then we evaluate the right side and we assign to the left side variable the
evaluation outcome.
The tokenizer as is well known is one of the rst phase of compiler
processing [3]. Although tools, as the lex (or ex) and yacc (or bison) are
valuable, we implemented manually a lexical analyzer and a parser in order
to have maximum speed and exibility. Moreover, these tools t better
for code generators and not for the interpretation of the code that jLab
performs. Finally, they are most suited for C code generation.
The class that represents a number used in an expression is the Num-
berToken class. This class holds a 2D array of complex numbers in a 3D
array of real values, since each complex number is represented by a 2X1
array to hold the corresponding real and imaginary value. A wide variety of
operations is supported on NumberTokens. These operations add, subtract,
16
multiply, raise to a power, scalar multiply, scalar divide, perform trigono-
metric functions (e.g. sin, cos, tan etc.), exponentiations and logarithms.
Tokens are used also to represent complex jLabs programming language
constructs as the while-do, if-then-else, for-loop. For example, the syntax of
the for-loop construct is:
for (forInitialization ; forRelation; forUpdate)
forCode
Let us consider some concerns involving the implementation of the for-
loop. The ForOperatorToken consists from four other tokens: the forIni-
tialization represents the initialization of the construct, and similarly the
forRelation, the forUpdate and the forCode represent the condition test,
the updating of the contents of the variables across successive iterations and
the code block that the for construct executes repetitively.
Subsequently, the evaluation code of the ForOperatorToken evaluates
rst the forInitialization token in order for the initializations to take ef-
fect and then implements the logic of the for-loop by repeatedly evaluating
the forCode as long as the forRelation is true, updating also the incre-
ment/decrement (i.e. evaluation of the forIncrement token).
Another important token type is the FunctionToken that is used to rep-
resent any functions used in an expression. The FunctionToken class imple-
ments all the required functionality for executing the function. Specically,
it rst checks if the function is overloaded by a variable name. If so, the
system creates a variable and sets the parameters of the function as the
limits of the variable. Next, it evaluates the variable with the limits and
returns the results. If the function name is not overloaded by a variable
the system calls the FunctionManager in order to nd the function. If the
FunctionManager detects that the function is UserFunction it proceeds by
17
evaluating it, by rst evaluating its operands and then the function code.
The evaluation of operators resembles the evaluation of functions. Each
operator is evaluated by the function evaluate that takes as parameters an
array of Tokens and returns an OperandToken.
Since jLab is untyped an eective mechanism for handling dynamically
the current set of variables and the objects to which they refer is required.
jLab utilizes the built-in Hashtable Javas data structure in order to perform
fast lookups. The dynamic class inspection facilities of Java allow to test
easily the type of data that is associated with a variable (with the instanceof
operator).
The system implements local variables by using the concept of nesting.
In the case of a J-File that does not have its own parameters it is executed
at the global context. The contexts are implemented with the well known
pop() and push() stack operators [3].
Having presented some concepts related to the jLab implementation, we
proceed by presenting an application to the development of Support Vector
Machine Learning software.
6 Application for Support Vector Learning
This section demonstrates the potentiality of the jLab for the implemen-
tation of complex computational tasks by using its exibility to directly
incorporate available Java numerical software. In particular we will deal
within the eld of Computational Intelligence, with the Support Vector Ma-
chines and we will explore the machinery of the LibSVM Java library [30].
Initially, we briey present the SVM model principles. Subsequently, we
present the jLab class interface and the jLab code. Finally, we elaborate
on the computational performance issue and we perform some comparative
18
tests.
6.1 SVM Principles
Support Vector Machines (SVMs) are a relatively new machine learning
model that is based on the Statistical Learning Theory of Vapnik [15].
Numerical algorithms for the ecient solution of the quadratic program-
ming problem involved at the SVM training have been developed recently
[7, 12, 11, 9]. Although the sophisticated numerical algorithms have real-
ized the practical application to large data sets, the involved computation
is still heavy for scripting languages as Matlab/Scilab and therefore, the
compiled languages (e.g. C++ and Java) are still necessary for acceptable
performance.
We illustrate how easy is to interface powerful SVM software with the
jLab and to utilize it at application domains. First we outline the basic
SVM theory.
Linear separability of data & Linear SVMs
Suppose we are given a set of examples (x
1
, y
1
), . . . , (x
l
, y
l
), where x
i

N
and y
i
{1} are the input patterns and their class labels, respectively.
Initially, we assume that the two classes of the classication problem are
linearly separable. In this case, we can nd an optimal weight vector w
0
such that w
0

2
is minimum (in order to maximize the margin = 2/w
0

of separation [8, 14]) and y


i
(w
0
x
i
+ b
0
) 1, i = 1, . . . , l.
The support vectors are those training examples x
i
that satisfy the equal-
ity, i.e. y
i
(w
0
x
i
+b
0
) = 1. They dene two hyperplanes. The one hyper-
plane goes through the support vectors of one class and the other through
the support vectors of the other class. The distance between the two hyper-
planes is maximized when the norm of the weight vector w
0

2
is minimum.
19
This minimization can proceed by maximizing the following function with
respect to the variables
i
(Lagrange multipliers) [15, 4, 10]:
W() =
l

i=1
a
i

1
2
l

i=1
l

j=1

i

j
y
i
y
j
< x
i
, x
j
> (1)
subject to the constraints: 0
i
and

l
i=1

i
y
i
= 0. If
i
> 0 then x
i
corresponds to a support vector. The classication of an unknown vector x
is obtained by computing
F(x) = sgn{w
0
x + b
0
}, where w
0
=
l

i=1

i
y
i
x
i
(2)
and the sum accounts only N
s
l nonzero support vectors (i.e. training
set vectors x
i
whose
i
are nonzero). Clearly, after the training, the clas-
sication can be accomplished eciently by taking the dot product of the
optimum weight vector w
0
with the input vector x.
Non-linear Separability of data & Non-Linear SVMs
The case that the data is not linearly separable is handled by introducing
slack variables (
1
,
2
, . . . ,
l
) with
i
0 such that, y
i
(w x
i
+ b
0
)
1
i
, i = 1, . . . , l. The introduction of the variables
i
, allows misclassied
points, which have their corresponding
i
> 1. Thus,

l
i=1

i
is an upper
bound on the number of training errors. The corresponding generalization
of the concept of optimal separating hyperplane is obtained by the solution
of the following optimization problem:
minimize
1
2
w w + C
l

i=1

i
(3)
subject to
y
i
(w x
i
+ b
0
) 1
i
and
i
0, i = 1, . . . , l (4)
20
The control of the learning capacity is achieved by the minimization of
the rst part of Equation 3 while the purpose of the second term is to pun-
ish for misclassication errors. The parameter C is a kind of regularization
parameter, that controls the tradeo between learning capacity and training
set errors. Clearly, a large C corresponds to assigning a higher penalty to
errors.
Finally, the case of nonlinear SVMs should be considered. The input
data in this case are mapped into a high-dimensional feature space through
some nonlinear mapping chosen a priori [9, 8, 14]. The optimal separating
hyperplane is then constructed in this space. Further details on the math-
ematical method can be found at the references, an excellent reference is
[10].
6.2 jLab SVM class interface
Due to space limitations we will not present the whole SVM class interface,
but instead we will limit ourselves to the SVM training routine. This rou-
tine demonstrates the general method of interfacing Java classes in jLab as
extension J-Classes. Each extension J-Class is available as a jLab function
and its functionality can be directly utilized from within jLabs scripting
machinery.
For the particular example, the svmTrain class provides to the jLab the
function
double [] [] svmTrain( double [] [] trainData, int [] trainLabels, String
svmModelFile, String svmType);
21
This jLab function triggers the functionality of the evaluate() method of
the Java class with the same name.
At the rst stage of processing the Interpreter localizes the class le
svmTrain at the jLabs class path (unless the class was already cached either
because it is already used, or by the class preload mechanism). Then the
interpreter utilizes the parameters of the jLabs method svmTrain in order
to prepare the call to evaluate().
Subsequently, the interpreter exploits the Javas reection mechanism [2]
in order to call the evaluation method. The corresponding Java interface to
the LibSVM package [27] is very simple and is performed by the following
chunk of code:
/**returns the trained SVM saved at the corresponding file
* @param operands[0] = train attributes
* @param operands[1] = training labels */
public OperandToken evaluate(Token[] operands)
{
int kernelType = svm_parameter.RBF; // default kernel type
String sKernelType="";
int nargin = getNArgIn(operands); // number of arguments
if (nargin < 3)
throwjExecException("svmTrain: number of arguments != 3");
if (!(operands[0] instanceof NumberToken))
throwjExecException("svmTrain: first argument
must be a matrix train [][]");
if (!(operands[1] instanceof NumberToken))
22
throwjExecException("svmTrain: second argument
must be a matrix targetValue []");
if (!(operands[2] instanceof StringToken))
throwjExecException("svmTrain: third argument
must be a String svmTrainSavedModel for saving trained SVM");
if (nargin == 4) { // fourth argument the SVM kernel type
sKernelType = ( (StringToken)operands[3]).getValue();
if (sKernelType.equalsIgnoreCase("rbf"))
kernelType = svm_parameter.RBF;
else if (sKernelType.equalsIgnoreCase("poly"))
kernelType = svm_parameter.POLY;
}
double [][] train = (double [][])((NumberToken)
operands[0]).getValues();
double [][] targetValues = (double [][])((NumberToken)
operands[1]).getValues();
String svmModelFile = ((StringToken) operands[2]).getValue();
svm_model model;
svm_parameter param = new svm_parameter();
// default values
param.svm_type = svm_parameter.C_SVC;
param.kernel_type = kernelType;
param.degree = 3;
param.gamma = 1; // 1/k
param.coef0 = 0;
param.nu = 0.5;
23
param.cache_size = 40;
param.C = 1;
param.eps = 1e-3;
param.p = 0.1;
param.shrinking = 1;
param.nr_weight = 0;
param.weight_label = new int[0];
param.weight = new double[0];
int NInstances = train.length;
int Dim = train[0].length;
svm_problem prob = new svm_problem();
prob.l = NInstances;
prob.y = new double[prob.l];
prob.x = new svm_node[prob.l][Dim];
for (int i=0; i < prob.l; i++)
for(int j=0;j<Dim;j++)
{
prob.x[i][j] = new svm_node();
prob.x[i][j].index = j+1;
prob.x[i][j].value = train[i][j];
prob.y[i] = targetValues[0][i];
}
model = svm.svm_train(prob, param);
24
int nSVs = model.l;
double [] [] values = new double[nSVs][Dim];
for (int n=0; n<nSVs; n++)
for (int m=0; m<Dim; m++)
values[n][m] = model.SV[n][m].value;
try {
svm.svm_save_model(svmModelFile, model);
}
catch (IOException e) {}
OperandToken result = new NumberToken(values);
return result;
}
}
6.3 jLab Scripting SVM code
The part of the jLab code that performs the SVM training and evaluation
is presented below:
svmModelFile = myDataDir+"sonar.svm";
% the file that will contain the trained SVM model
tic;
25
SVecs = svmTrain(trainData, trainLabels,
svmModelFile, "rbf");
timeTrain = toc();
predictions = svmPredict(testData, svmModelFile);
% evaluate the prediction performance
successCnt=0; failCnt=0;
for (k=1; k<=LenTrain; k=k+1)
if (predictions(k)==testLabels(k)) successCnt = successCnt+1;
else
failCnt=failCnt+1;
end;
str = "Classification Performance: successCnt = "+
successCnt+", failCnt = "+failCnt+" ("+
(successCnt*100.0)/(successcnt+failCnt)+" %)";
disp(str);
str = "SVM Training time = "+timeTrain;
disp(str);
6.4 jLab performance
The execution speed of an algorithm implemented in jLab depend heavily on
the proportion of processing performed in Java related to that implemented
as a J-Script. Clearly, the number crunching code should be coded in Java
and only the control logic should be coded as a J-Script in order to obtain
rapid and exible experimentation. We have performed experiments with
a SVM-Matlab toolbox downloaded from http://asi.insa-rouen.fr/ arako-
tom/toolbox/index.html, that implements in pure Matlab various current
26
kernel and SVM algorithms described also in [28, 29]. The jLab based on
the LibSVM Java implementation [27] is on the average about ten times
faster than the pure Matlab version. However, the LS-SVM Matlab tool-
box of [31] incorporates .MEX code compiled in C++ and is of comparable
speed to our Java based LibSVM implementation. We should note that the
pure C++ implementation of the LibSVM algorithms is only 2 to 3 times
faster than the Java version. This fact surprised us initially, and it can be
explained by the signicant advance at the design and implementation of the
Java virtual machine environment. We have tested both the Java and C++
LibSVM implementations on a Pentium 4 PC at 2.6 GHz clock speed, both
using the Fedora Core 5 Linux (based on 2.6.15 Linux kernel) and the Sun
Solaris 10 operating system running at the same PC also. At both platforms
we have used the recent version of the Java Run time Environment (i.e. JRE
1.5.0 07) supplied by Sun Microsystems and the GNU C++ compiler. We
have tested also the jLab on the Windows XP platform, and the important
point that we have derived is that the execution speed is similar to the Linux
and Solaris based experiments. The only signicant factor that aects the
execution speed is the JRE version: we have observed notable improvement
in execution speed by using later improved versions of the Sun Microsys-
tems JRE. In particular the average training time for the data of the classic
UCI Sonar dataset, on a Pentium 4 1.8 GHz PC capable of multibooting
all the three tested operating systems were: a. Windows XP: 0.36 sec
for the main training accomplished by the Java class le, and 39 sec for
J-Script preparation of data for training b. Linux (Fedora Core 5, with
2.6.13 kernel: 0.41 sec for Java class and 31 sec for J-Script preparation
respectively. c. Sun/Solaris 10: 0.35 sec for Java class and 32 sec for
the J-Script. All the evaluated platforms have used the Sun Microsystems
27
Windows XP Linux Solaris 10
JDK 1.5, SVM Training (Java) 0.36 0.41 0.35
JDK 1.5, jLab Script Data Preprocessing 39 31 32
JDK 1.6, SVM Training (Java) 0.3 0.31 0.29
JDK 1.6, jLab Script Data Preprocessing 34 27 26
Table 1: Some averaged results from the performance of jLab across various
platforms (time in secs).
Java Vitual Machine and JDK, version 1.5. Also, the GNU supplied JRE
(gcj, gjava) succeeds in compiling most of the jLab system (although there
are problems in compiling all the integrated system) but the resulting Java
code does not run as eciently as with the Suns Java Virtual Machine. The
memory requirements and overhead cost of the script interpreter are very
small when no class preloaded is performed. In this case only the accessed
classes are loaded in memory. However, the class preload operation loads
all the extension classes beforehand in memory and therefore consumes size
proportional to the number of extension classes. Table 1 presents some com-
parative averaged performance results. The averaging was performed across
30 trials. The results clearly illustrate the advantage of the Java compiled
code over the scripting jLab that was used only for data preprocessing.
The Java code of jLab is open source and can be downloaded from page:
http://infoman.teikav.edu.gr/stpapad/
7 Conclusions
The paper has presented a powerful scripting language that is executed by
an interpreter implemented in the Java language. This language supports
all the basic programming constructs and an extensive set of built in math-
ematical routines that cover all the basic numerical analysis tasks. These
28
toolboxes can be easily implemented in Java and the corresponding classes
can be dynamically integrated to the system.
The jLab is based on a mixed mode programming paradigm:
Java compiled code for the computationally demanding operations and
Scripting code for fast implementation of the programs structure.
This design permits to obtain both speed eciency and exibility while at
the same time allows the utilization of the vast amounts of scientic software
that is implemented in the Java language. The implementation of jLab in
pure Java allows a much cleaner, faster, platform independent and less error
prone build from source process, than similar C/C++/Fortran based open
source environments (e.g. Scilab, Octave). Specically, the clean and build
all process takes only about 5 to 8 secs at the Netbeans 5.5 IDE. Similar is the
required build time at the Eclipse development platform. We can contrast
this with several (about 15-20) minutes required to run the congure script
and the making process on a Linux Fedora Core 5 installation on a 3.2 GHz
Pentium-4.
We have demonstrated the potentiality of jLab with the implementa-
tion of a Support Vector Machine (SVM) toolkit. Also we have compared
its performance with a C/C++ and a Matlab version and across dierent
computing platforms (i.e. Linux, Sun/Solaris, Windows XP). Neuro-Fuzzy
algorithms can require enormous computation resources and at the same
time an expressive programming environment.
Future work will proceed with the porting of the JOONE library for
neural networks [19] and the WEKA data mining system that can easily
provide an excessive set of routines for data preprocessing and visualization
[18]. Furthermore we work on improving the parser in order to allow more
29
exible contructs, and we improve the eciency of the parsing phase, in or-
der to be able to compete with C/C++ parser implementations (e.g. Scilab,
Octave).
References
[1] Stephen L. Campbell, Jean-Philippe Chancelier, Ramine Nikoukhah,
Modeling and Simulation in Scilab/Scicos, Springer, 2006
[2] Cay Horstmann, Gary Cornell, Core Java 2, Vol I Fundamentals, Vol
II - Advanced Techniques. Sun Microsystems Press, 7th edition, 2005
[3] Principles of Compiler Design, Alfred V. Aho, and Jerey D. Ullman,
Addison-Wesley, 1977
[4] Simon Haykin, Neural Networks, MacMillan College Publishing Com-
pany, Second Edition, 1999
[5] Jung-Hsien Chiang, Pei-Yi Hao, Support Vector Learning Mechanism
for Fuzzy Rule-Based Modeling: A New Approach, Vol. 12, No. 1,
February 2004, pp. 1-12
[6] D. Chakraborty and N. R. Pal, A Neuro-Fuzzy Scheme for Simulta-
neous Feature Selection and Fuzzy Rule-Based Classication, IEEE
Transactions on Neural Networks, Vo. 15, No. 1, January 2004, p. 110-
123
[7] Chang, C.-C., Lin, C.J, LIBSVM: A library for
support vector machines,2001, Available on-line:
http://www.csie.ntu.edu.tw/cjlin/libsvm
30
[8] B. Scholkopf, S. Mika, J. C. Burges, P. Knirsch, K.-R. Muller, G. Ratsch
and A. Smola, Input Space Versus Feature Space in Kernel-Based
Methods, IEEE Trans. On Neural Networks, vol. 10, no. 5, 1999.
[9] B. Scholkopf, A. J. Smola, R. C. Williamson, P. L. Bartlett, New
support vector algorithms, Neural Computation:1207-1245, 2000
[10] Bernhard Scholkopf, Alexander J. Smola, Learning with Kernels: Sup-
port Vector Machines, Regularization and Beyond, MIT Press 2002
[11] T. Joachims, Making Large-Scale SVM Learning Practical, Advances
in Kernel Methods - Support Vector Learning, Bernhard Scholkopf,
Christopher J. C. Burges, and Alexander J. Smola (eds), MIT Press,
Cambridge, USA, 1998
[12] E. Osuna, R. Freund, F. Girosi, An improved training algorithm for
support vector machines, Neural Networks for Signal Processing VII,
Proceedings of the 1997 IEEE Workshop pp. 276-285, Amelia Island,
FL.
[13] D. Mattera, S. Haykin, Support vector machines for dynamic recon-
struction of a chaotic system, in Advances in Kernel Methods - Sup-
port Vector Learning, B. Scholkopf, J. Burges, A. J. Smola,, Eds. Cam-
bridge, MA:MIT Press, 1999, pp. 211-242
[14] C. Cortes, V. Vapnik, Support vector networks, Machine Learning,
vol. 20, pp 1-25, 1995
[15] V. N. Vapnik., 1998, Statistical Learning Theory, New York, Wiley 9.
V. N. Vapnik, An Overview of Statistical Learning Theory, IEEE
Trans. On Neural Networks, Vol. 10, No 5, 1999, pp. 988-999
31
[16] Norman Chonacky, David Winch, 3Ms for Instruction: Reviews of
Maple, Mathematica and Matlab, Computing in Science and Engi-
neering, May/June 2005, Part I, pp. 7-13
[17] Norman Chonacky, David Winch, 3Ms for Instruction: Reviews of
Maple, Mathematica and Matlab, Computing in Science and Engi-
neering, July/August 2005, Part II, pp. 14-23
[18] Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning
Tools and Techniques, Second Edition, Morgan Kaufmann Series, 2005
[19] Je T. Heaton, Introduction to Neural Networks with Java, Heaton
Research, 2005
[20] Joseph Bigus, Jennifer Bigus, Constructing Intelligent Agents Using
Java: Professional Developers Guide, 2nd Edition, Wiley 2001
[21] Maassen J., Van Nieuwpoort, Veldema R., Bal H., Kielmann T., Jacobs
C., Hofman R., Ecient Java RMI for Parallel Programming, ACM
Transactions on Programming Languages and Systems (TOPLAS), Vol.
23, Nr. 6, ACM 2001, pp. 747-775
[22] Michael Trott, The Mathematica Guidebook: Programming,
Springer, 2004
[23] Erwin Kreyszig, Maple Computer Guide for Advanced Engineering
Mathematics (8th Ed.), Wiley, 2000
[24] John W. Eaton, GNU Octave Manual, Network Theory Ltd, 2002
[25] Desmond J. Higham, Nicholas J. Higham, Matlab Guide, Second
Edition, SIAM Computational Mathematics, 2005
32
[26] Vugranam C. Sreedhar, Michael Brurke, and Jong-Doek Choi, A
framework for interprocedural optimization in the presence of dynamic
class loading In ACM SIGPLAN Conference on Programming Lan-
guage Design and Implementation, pp 196-207, 2000
[27] R.-E. Fan, P.-H. Chen, and C.-J. Lin, Working set selection using
the second order information for training SVM, Journal of Machine
Learning Research 6, 1889-1918, 2005
[28] A. Rakotomamonjy and S. Canu, Learning, frame, reproducing ker-
nel and regularization, Technical Report TR2002-01, Perception, Sys-
temes et Information, INSA de Rouen, 2002.
[29] Canu S., Mary X., and Rakotomamonjy A., Functional learning
through kernels, volume 190, chapter 5, pages 89-110, IOS Press, ad-
vances in learning theory: methods, models and applications, nato sci-
ence series iii: computer and systems sciences edition, 2003.
[30] Web Page of SVM-Matlab toolbox: http://asi.insa-
rouen.fr/arakotom/toolbox/index.html
[31] J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Van-
dewalle, Least Squares Support Vector Machines, World Scientic, Sin-
gapore, 2002
33
34
35
36
37
38

You might also like