A Hardware Implementation of The Java Virtual Machine
A Hardware Implementation of The Java Virtual Machine
A Hardware Implementation
of the Java Virtual Machine
Marc Tremblay and Michael OConnor
Sun Microelectronics
Slide 1
Good performance
Time-to-market
Low power consumption
Slide 4
HotJava
Applets
Virtual Machine
Host Porting Interface
Adaptor
Adaptor
Browser
OS
Hardware
Architecture
OS
Hardware
Architecture
Adaptor
OS
JavaOS
Hardware
Architecture
picoJava
Slide 5
picoJava
Directly executes bytecodes
Excellent performance
Eliminates the need for an interpreter
or a JIT compiler
Small memory footprint
Simple core
Legacy blocks and circuits are not present
Slide 6
Slide 7
Slide 8
Slide 9
Instruction Length
100%
80%
others
60%
3 bytes
40%
2 bytes
1 byte
20%
Hot Java
Pento.
Dhrys.
Ray
Compr.
Tomcat
Javac
0%
Slide 10
Slide 11
Slide 12
byte
byte 22
byte
byte 4
byte 33
0..3
0..3byte
bytepadding
padding
default
default offset
offset
numbers
numbers of
of pairs
pairs that
that follow
follow (N)
(N)
match
match 11
jump
jump offset
offset 11
match
match 22
jump
jump offset
offset 22
...
...
...
...
match
match N
N
jump
jump offset
offset N
N
Slide 13
Interpreter Loop
loop: 1: fetch bytecodes
2: indirect jump to
emulation code
Emulation Code
1: get operands
2: perform
operation
3: increment PC
4: go to loop
Slide 14
1:
2:
3:
4:
load tos
load tos-1
add
store tos-1
Slide 15
Slide 16
calls/ret
compute
st object
ld object
Hot Java
Pento
Dhryst.
Ray
Compr
Javac
stack ops
Slide 17
calls/ret
7%
stack ops
43%
ld object
17%
Slide 18
compute
36%
st object
6%
ld object
21%
Pipeline Design
RISC pipeline attributes
Stages based on fundamental paths (e.g. cache
access, ALU path, registers access)
No operation on cache/memory data
Hardwire all simple operations
Slide 20
Implementation of
Critical Instructions
Stack
getfield_quick offset
Fetch field from object
Executes as a load
[object + offset] on
picoJava
iadd
Fully pipelined
Executes in a single
cycle
objectref
value
...
...
value1
result
value2
...
...
Before
After
Slide 21
5%
Interpreter
Run Time
Speeding up the
Interpreter by 30X results in: 95
5
3.2
5
8.2
Representative Applications
Lots of Objects
Threaded Code
60 - 80%
40 - 20%
Interpreter
Synchronization
Garbage Collection
Object Creation
Speeding up the
60
40
2
40
42
Percentage of Calls
12
10
8
6
4
2
0
picoJava:
A System Performance Approach
Accelerates object-oriented programs
simple pipeline with enhancements for features specific
.to bytecodes
support for method invocation
Accelerates runtime
(gc.c, monitor.c, threadruntime.c, etc.)
Support for threads
Support for garbage collection
Slide 25
System Programming
Instructions added to support system
programming
Slide 26
picoJava - Summary
Best system price/performance for running
Java-powered applications in embedded markets
Embedded market very sensitive to system
cost and power consumption
Interpreter and/or JIT compiler eliminated
Excellent system performance
Efficient implementation through use of the
same methodology, process and circuit
techniques developed for RISC processors
Slide 27