Performance Evaluation of Java For Numerical Computing: Roldan Pozo
Performance Evaluation of Java For Numerical Computing: Roldan Pozo
Roldan Pozo
Leader, Mathematical Software Group
National Institute of Standards and Technology
Background: Where we are coming from...
Performance
interpreters too slow
poor optimizing compilers
virtual machine
Why not Java?
60
(i,j,k)
50
40
Mflops
30
20
10
0
10 30 50 70 90 110 130 150 170 190 210 230 250
250
200
150
Mflops
(i,j,k)
100 optimzied
50
0
40 120 200 280 360 440 520 600 680 760 840 920 1000
unstructured pattern
coordinate storage
(CSR/CSC)
array bounds check cannot
be optimized away
Sparse matrix/vector Multiplication
(Mflops)
0
JVMs have improved over time
35
30
25
20
15
10
0
1.1.6 1.1.8 1.2.1 1.3
90
80
70
60
50
C
40 Java
30
20
10
0
FFT SOR MC Sparse LU
* Sun JDK 1.3 (HotSpot) , javac -0; Sun cc -0; SunOS 5.7
SciMark (large): Java vs. C
(Sun UltraSPARC 60)
70
60
50
40
C
30 Java
20
10
0
FFT SOR MC Sparse LU
* Sun JDK 1.3 (HotSpot) , javac -0; Sun cc -0; SunOS 5.7
SciMark: Java vs. C
(Intel PIII 500MHz, Win98)
120
100
80
60 C
Java
40
20
0
FFT SOR MC Sparse LU
* Sun JDK 1.2, javac -0; Microsoft VC++ 5.0, cl -0; Win98
SciMark (large): Java vs. C
(Intel PIII 500MHz, Win98)
60
50
40
30 C
Java
20
10
0
FFT SOR MC Sparse LU
* Sun JDK 1.2, javac -0; Microsoft VC++ 5.0, cl -0; Win98
SciMark: Java vs. C
(Intel PIII 500MHz, Linux)
160
140
120
100
80 C
60 Java
40
20
0
FFT SOR MC Sparse LU
* RH Linux 6.2, gcc (v. 2.91.66) -06, IBM JDK 1.3, javac -O
SciMark results
500 MHz PIII (Mflops)
70
60
50
Mflops
40
30
20
10
*500MHz PIII, Microsoft C/C++ 5.0 (cl -O2x -G6), Sun JDK 1.2, Microsoft JDK 1.1.4, IBM JRE 1.1.8
C vs. Java
Why C is faster than Java
direct mapping to hardware
more opportunities for aggressive optimization
no garbage collection
Why Java is faster than C (?)
different compilers/optimizations
performance more a factor of economics than
technology
PC compilers arent tuned for numerics
Current JVMs are quite good...
Mflops 200
150
100
50
*IBM RS/6000 67MHz POWER2 (266 Mflops peak) AIX Fortran, HPJC
Yet another approach...
HotSpot
Sun Microsystems
Progressive profiler/compiler
trades off aggressive
compilation/optimization at code bottlenecks
quicker start-up time than JITs
tailors optimization to application
Concurrency
Java threads
runs on multiprocessors in NT, Solaris, AIX
provides mechanisms for locks,
synchornization
can be implemented in native threads for
performance
no native support for parallel loops, etc.
Concurrency
Remote Method Invocation (RMI)
extension of RPC
high-level than sockets/network programming
works well for functional parallelism
works poorly for data parallelism
serialization is expensive
no parallel/distribution tools
Numerical Software
(Libraries)
Scientific Java Libraries
Matrix library (JAMA) IBM
NIST/Mathworks Array class package
LU, QR, SVD, eigenvalue Univ. of Maryland
solvers
Linear Algebra library
Java Numerical Toolkit
JLAPACK
(JNT)
port of LAPACK
special functions
BLAS subset
Visual Numerics
LINPACK
Complex
Java Numerics Group
industry-wide consortium to establish tools,
APIs, and libraries
IBM, Intel, Compaq/Digital, Sun, MathWorks, VNI, NAG
NIST, Inria
Berkeley, UCSB, Austin, MIT, Indiana