LLVA: A Low-Level Virtual Instruction Set Architecture

1) The document proposes a new virtual instruction set architecture (LLVA) that aims to provide more flexibility than traditional hardware ISAs. 2) LLVA is designed to capture rich program information for compilers while still being implementable in hardware. It uses a typed instruction set, infinite virtual registers, and explicit control and dataflow information. 3) The goal of LLVA is to loosen the restrictions of hardware ISAs being used as both the software interface and hardware interface, in order to allow more flexible processor and compiler optimizations. It aims to provide benefits not seen in other virtual architectures that used existing ISAs.

Uploaded by

ناصرھرە

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views12 pages

LLVA: A Low-Level Virtual Instruction Set Architecture

Uploaded by

ناصرھرە

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

LLVA: A Low-level Virtual Instruction Set Architecture

Vikram Adve Chris Lattner Michael Brukman Anand Shukla Brian Gaeke
Computer Science Department
University of Illinois at Urbana-Champaign
{vadve,lattner,brukman,ashukla,gaeke}@cs.uiuc.edu

Abstract Along with design alternatives for the microarchitecture

and system architecture, we believe it is also important to
A virtual instruction set architecture (V-ISA) imple- rethink the instruction set architecture — software’s sole
mented via a processor-specific software translation interface to the processor. Traditional processor instruction
layer can provide great flexibility to processor design- sets are a strait-jacket for both hardware and software. They
ers. Recent examples such as Crusoe and DAISY, how- provide little useful information about program behavior
ever, have used existing hardware instruction sets as to the execution engine of a processor, they make it diffi-
virtual ISAs, which complicates translation and optimiza- cult for hardware designers to develop innovative software-
tion. In fact, there has been little research on specific de- controlled mechanisms or modify instruction sets, and they
signs for a virtual ISA for processors. This paper proposes a greatly constrain compiler optimizations to those that can
novel virtual ISA (LLVA) and a translation strategy for im- be expressed by an instruction set that must also serve as
plementing it on arbitrary hardware. The instruction set an external program representation. The fundamental prob-
is typed, uses an infinite virtual register set in Static Sin- lem is that the same instruction set (the hardware ISA) is
gle Assignment form, and provides explicit control-flow used for two very different purposes: as the persistent rep-
and dataflow information, and yet uses low-level opera- resentation of software, and as the interface by which prim-
tions closely matched to traditional hardware. It includes itive hardware operations are specified and sequenced.
novel mechanisms to allow more flexible optimization of na- 1.1. Virtual Instruction Set Computers
tive code, including a flexible exception model and minor
constraints on self-modifying code. We propose a trans- As a step towards loosening these restrictions, several
lation strategy that enables offline translation and trans- research and commercial groups have advocated a class
parent offline caching of native code and profile infor- of architectures we term Virtual Instruction Set Comput-
mation, while remaining completely OS-independent. It ers (VISC) [9, 14, 23, 32]. Such an architecture defines a
also supports optimizations directly on the representa- virtual instruction set (called the V-ISA in Smith et al.’s ter-
tion at install-time, runtime, and offline between exe- minology [32]) that is used by all user and operating sys-
cutions. We show experimentally that the virtual ISA is tem software, as illustrated in Fig. 1. An implementation
compact, it is closely matched to ordinary hardware in- of the architecture includes both (a) a hardware processor
struction sets, and permits very fast code generation, yet with its own instruction set (the implementation ISA or I-
has enough high-level information to permit sophisti- ISA), and (b) an implementation-specific software transla-
cated program analyses and optimizations. tion layer that translates virtual object code to the I-ISA.
Because the translation layer and the hardware processor
are designed together, Smith et al. refer to this implemen-
tation strategy as a codesigned virtual machine [32]. Fisher
1. Introduction has described a closely related vision for building families
In recent years, traditional superscalar processors and of processors customized for specific application areas that
compilers have had to resort to increasingly complex de- maintain compatibility and performance via software trans-
signs for small improvements in performance. This has lation [15].
spurred a wide range of research efforts exploring novel At the most basic level, a VISC architecture decouples
microarchitecture and system design strategies in search of the program representation (V-ISA) from the actual hard-
cost-effective long-term solutions [5, 21, 29, 31, 33]. While ware interface (I-ISA), allowing the former to focus on cap-
the outcome is far from clear, what seems clear is that an ex- turing program behavior while the latter focuses on soft-
tensive rethinking of processor design has begun. ware control of hardware mechanisms. This brings two fun-
1.2. Our Contribution: Design for A Virtual In-
Application Software struction Set
Operating System
Although virtual architectures have been discussed for a
Kernel Virtual long time and real implementations exist (viz., IBM S/38
ISA (V−ISA)
and AS/400, DAISY, and Transmeta’s Crusoe), there has
Processor−specific
translator been little research exploring design options for the V-ISA.
Implementation Both DAISY and Crusoe used traditional hardware ISAs as
Hardware Processor ISA (I−ISA) their V-ISA. The IBM machines do use a specially devel-
oped V-ISA, but, as we explain in Section 6, it is also ex-
tremely complex, OS-dependent, requires complex OS ser-
Figure 1. System organization for a virtual ar-
vices for translation, and is designed more for a particular
chitecture. This is similar to Fig. 1(c) in [32]
application domain than for general-purpose software.
and Fig. 5 in [24].
We believe a careful design for the V-ISA driven by the
needs of compiler technology, yet “universal” enough to
support arbitrary user and OS software, is crucial to achieve
damental benefits to the hardware processor design and its the full benefits of the virtual architecture strategy. A com-
software translation layer: mon question is whether Java bytecode (as suggested by
Smith et al. [32]) or Microsoft’s Common Language In-
frastructure (CLI) could be used as a V-ISA for processors.
1. The virtual instruction set can include rich pro-
Since a processor V-ISA must support all external user soft-
gram information not suitable for a direct hard-
ware and arbitrary operating systems, we believe the an-
ware implementation, and can be independent of most
swer is “no”. These representations are designed for a cer-
implementation-specific design choices.
tain class of languages, and are not sufficiently language-
independent for a processor interface. They include com-
2. The I-ISA and its translator provide a truly co-
plex runtime software requirements, e.g., garbage collec-
operative hardware/software design: the transla-
tion and extensive runtime libraries, which are difficult to
tor can provide information to hardware through
implement without operating system support. Finally, they
implementation-specific mechanisms and instruc-
are generally not well-suited for low-level code such as op-
tion encodings, while the hardware can expose novel
erating system trap handlers, debuggers, and performance
microarchitectural mechanisms to allow coopera-
monitoring tools.
tive hardware/software control and also to assist the
This paper proposes a design for a Virtual Instruction Set
translator.
Architecture, and an accompanying compilation strategy for
arbitrary hardware. More specifically, this work makes three
These two fundamental benefits could be exploited in po- contributions:
tentially unlimited ways by processor designers. Prior
work [14, 32, 11] has discussed many potential hardware • It proposes a V-ISA design that is rich enough to sup-
design options enabled by this approach, which are im- port sophisticated compiler analyses and transforma-
practical with conventional architectures. Furthermore, tions, yet low-level enough to be closely matched to
I-ISA instruction encodings and software-controlled mech- native hardware instruction sets and to support all ex-
anisms can both be changed relatively easily with each pro- ternal code, including OS and kernel code.
cessor design, something that is quite difficult to do for
current processors. Finally, external compilers can fo- • It carefully defines the behavior of exceptions and self-
cus on machine-independent optimizations while the modifying code to minimize the difficulties faced by
translator serves as a common back-end compiler cus- previous translators for DAISY [14] and Crusoe [11].
tomized for the processor implementation. • It describes a translation strategy that allows offline
The cost of this increased flexibility is the possible over- translation and offline caching of native code and pro-
head of software translation (if it must be done “online”). file information, using an OS-independent interface to
Nevertheless, recent advances in dynamic compilation, pro- access external system resources. The translation strat-
gram optimization, and hardware speed can mitigate the egy also leverages aggressive optimization at install-
performance penalty, and could make this idea more vi- time, runtime, and offline (“idle-time”), tailored to the
able today than it has been in the past. Furthermore, hard- particular hardware and to profiling information from
ware mechanisms can be used to assist these tasks in many actual users (these capabilities follow directly from the
ways [14, 11, 19, 35, 27, 30]. V-ISA [26]).
The virtual instruction set we propose uses simple RISC- Supporting arbitrary operating systems and system designs
like operations, but is fully typed using a simple language- raises significant potential challenges for supporting offline
independent type system, and includes explicit control flow translation and caching, as described in Section 4.1.
and dataflow information in the form of a Static Single As- Because the primary consumer of a V-ISA is the software
signment (SSA) representation. Equally important is what translation layer, the design of a V-ISA must be driven by
the V-ISA does not include: a fixed register set, stack frame an understanding of compiler technology. Most non-trivial
layout, low-level addressing modes, limits on immediate optimization and code generation tasks rely on information
constants, delay slots, speculation, predication, or explicit about global control-flow, dataflow, and data dependence
interlocks. All of these are better suited to the I-ISA than the properties of a program. Such properties can be extremely
V-ISA. Nevertheless, the V-ISA is low-level enough to per- difficult to extract from native machine code.
mit extensive machine-independent optimization in source- The challenge is to design a V-ISA that provides such
level and link-time compilers (unlike Java bytecode, for ex- high-level information about program behavior, yet is ap-
ample), reducing the amount of optimization required dur- propriate as an architecture interface for all external soft-
ing translation from V-ISA to I-ISA. ware, including applications, libraries, and operating sys-
The benefits of a V-ISA design can only be determined tems. We propose a set of design goals for such a V-ISA:
after developing new processor design options and the soft-
ware/hardware that exploit its potential. Instead, our goal in 1. Simple, low-level operations that can be implemented
this work is to evaluate the design in terms of its suitabil- without a runtime system: To serve as a processor-
ity as a V-ISA. We have implemented the key components level instruction set for arbitrary software and enable
of the compilation strategy for SPARC V9 and Intel IA-32 implementation without operating system support, the
hardware processors, including an aggressive link-time in- V-ISA must use simple, low-level operations that can
terprocedural optimization framework (which operates on each be mapped directly to a small number of hard-
the V-ISA directly), native code generators that can be run ware operations.
in either offline or JIT mode, and a software trace cache to 2. No execution-oriented features that obscure program
support trace-based runtime optimizations for the SPARC behavior: The V-ISA should exclude ISA features that
V9. With these components, we address two questions: make program analysis difficult and which can instead
• Qualitatively, is the instruction set rich enough to en- be managed by the translator, such as limited num-
able both machine-independent and dependent opti- bers and types of registers, a specific stack frame lay-
mizations during code generation? out, low-level calling conventions, limited immediate
fields, or low-level addressing modes.
• Experimentally, is the instruction set low-level enough
to map closely to a native hardware instruction set, and 3. Portability across some family of processor designs:
to enable fast translation to native code? It is impractical to design a “universal” V-ISA for
all conceivable hardware processor designs. Instead,
2. Design Goals for A Virtual ISA a good V-ISA design must enable some broad class
Figure 1 shows an overview of a system based on a of processor implementations and maintain compati-
VISC processor. A VISC architecture defines an external bility at the level of virtual object code for all proces-
instruction set (V-ISA) and a binary interface specification sors in that class (key challenges include endianness
(V-ABI). An implementation of the architecture includes a and pointer size).
hardware processor plus a translator, collectively referred 4. High-level information to support sophisticated pro-
to as the “processor.” We use “external software” to mean gram analysis and transformations: Such high-level
all software except the translator. The translator is essen- information is important not only for optimizations
tially a compiler which translates “virtual object code” in but also for good machine code generation, e.g., effec-
the V-ISA to native object code in the I-ISA. tive instruction scheduling and register allocation. Fur-
In our proposed system architecture, all external soft- thermore, improved program analysis can enable more
ware may only use the V-ISA. This rigid constraint on ex- powerful cooperative software/hardware mechanisms
ternal software is important for two reasons: as described above.
1. To ensure that the hardware processor can evolve (i.e., 5. Language independence: Despite including high-level
both the I-ISA and its implementation details visible to information (especially type information), it is essen-
the translator can be changed), without requiring any tial that the V-ISA should be completely language-
external software to be recompiled. independent, i.e., the types should be low-level and
2. To ensure that arbitrary operating systems that con- general enough to implement high-level language op-
form to the V-ABI can run on the processor. erations correctly and reasonably naturally.
6. Operating system support: The V-ISA must fully sup- LLVAcode (unlike JVM bytecode, where operations like ar-
port arbitrary operating systems that implement the ray bounds checks, virtual function call resolution, and in-
virtual Application Binary Interface (V-ABI) associ- lining are difficult to eliminate statically).
ated with the V-ISA (discussed in Section 3.1). There- LLVA provides low-level operations that can be used to
fore, it must provide all the necessary low-level mech- implement high-level language features, but in a machine-
anisms such as traps, memory management, and low- independent manner. For example, array and structure in-
level device I/O. dexing operations are lowered to typed pointer arithmetic
with the getelementptr instruction (explained below).
Note that high-level virtual machines such as JVM and Source-level array bounds checks are turned into explicit
CLI fail to meet goals #1 (they use complex, high-level op- comparisons. Virtual function dispatch in C++ becomes a
erations with large runtime libraries), #5 (e.g., JVM and CLI pair of loads to retrieve the function pointer followed by
are tailored for object-oriented languages with a particular a call (optimizations can eliminate some of these in the
inheritence model), and #6 (their complex runtime systems static compiler, translator, or both) [26].
require significant OS support in practice). In contrast, tra-
ditional machine ISAs fail to meet goals #2 and #4, and sat-
isfy #3 only to a limited extent (e.g., existing programs can- Type Name
not exploit new hardware instructions or larger architected arithmetic add, sub, mul, div, rem
register files). bitwise and, or, xor, shl, shr
comparison seteq, setne, setlt, setgt, setle, setge
3. LLVA: A V-ISA for high performance control-flow ret, br, mbr, invoke, unwind
memory load, store, getelementptr, alloca
3.1. Components of the V-ISA other cast, call, phi
The LLVA Virtual Instruction Set is a source-language- Table 1. Entire LLVA Instruction Set
neutral, low-level, orthogonal, three-address instruction set.
Figure 2 shows an example C function and the correspond-
ing LLVA code. The basic components of the instruction set
Global Data-flow (SSA) & Control Flow Information A
are as follows.
key feature of LLVA that enables efficient dynamic transla-
Register Set and Memory Model LLVA uses an infinite, tion is the use of SSA form as the primary representation
typed, register file where all registers are in Static Single for scalar register values. SSA form is widely used for com-
Assignment (SSA) form [10] (described below). Registers piler optimizations because it allows for efficient “sparse”
can only hold scalar values, viz., boolean, integer, float- algorithms for global dataflow problems and provides ex-
ing point, and pointer. This type information and the SSA plicit def-use chains.
representation together provide the information needed for To represent SSA information directly in the code, LLVA
simple or aggressive register allocation algorithms. To sup- uses an explicit phi instruction to merge values at control-
port an infinite register set, we use a self-extending instruc- flow join points. (for example, the %Ret.1 value in Fig-
tion encoding, but define a fixed-size 32-bit format to hold ure 2(b)). The translator elimiantes the φ-nodes by introduc-
small instructions for compactness and translator efficiency. ing copy operations into predecessor basic blocks. These
Memory is partitioned into stack, heap, and global memory, copies are usually eliminated during register allocation.
and all memory is explicitly allocated. LLVA is a load/s- Exposing an explicit Control flow Graph (CFG) is an-
tore architecture: only load and store instructions ac- other crucial feature of LLVA1 . Each function in LLVA is a
cess data values in memory. list of basic blocks, and each basic block is a list of instruc-
tions ending in a single control flow instruction that explic-
LLVA Instructions LLVA has a small, orthogonal instruc- itly specifies its successor basic blocks. A control flow in-
tion set consisting of only the 28 instructions listed in Ta- struction can be a branch, multi-way branch, function re-
ble 1. The orthogonality makes optimal pattern-matching turn, invoke or unwind. invoke and unwind are used
instruction selectors easier to use. Because almost all in- to implement source-language exceptions via stack unwind-
structions are simple, three-address instructions with regis- ing, in a manner that is explicit, portable, and can be trans-
ter operands (add, mul, seteq, etc), the translation pro- lated into efficient native code [26].
cess is primarily concerned with combining multiple LLVA LLVA Type System The LLVA instruction set is fully-
instructions into more complex I-ISA instructions wherever typed, using a low-level, source-language-independent type
possible. Furthermore, the simple low-level operations al-
low arbitrary machine-independent optimizations to be per- 1 In contrast, extracting a Control Flow Graph from normal machine
formed ahead of time by static compilers when generating code can be quite difficult in practice, due to indirect branches.
%s t r u c t . QuadTree = t y p e { double , [ 4 x %QT∗ ] }
%QT = t y p e % s t r u c t . QuadTree
t y p e d e f s t r u c t QuadTree {
double Data ; v o i d % S u m 3 r d C h i l d r e n (%QT∗ %T , d o u b l e ∗ %Result ) {
s t r u c t QuadTree ∗ C h i l d r e n [ 4 ] ; e n t r y : %V = a l l o c a d o u b l e ; ; %V i s t y p e ’ d o u b l e ∗ ’
} QT ; %tmp . 0 = s e t e q %QT∗ %T , n u l l ; ; t y p e ’ bool ’
br b o o l %tmp . 0 , l a b e l % e n d i f , label % else
v o i d S u m 3 r d C h i l d r e n (QT ∗ T ,
double ∗ R e s u l t ) { else : ; ; tmp . 1 = & T [ 0 ] . C h i l d r e n [ 3 ] ’ C h i l d r e n ’ = Field #1
double Ret ; %tmp . 1 = g e t e l e m e n t p t r %QT∗ %T , l o n g 0 , ubyte 1 , long 3
i f ( T = = 0 ) { Ret = 0 ; %C h i l d 3 = l o a d %QT∗∗ %tmp . 1
} else { c a l l v o i d % S u m 3 r d C h i l d r e n (%QT∗ % C h i l d 3 , d o u b l e ∗ %V)
QT ∗ C h i l d 3 = %tmp . 2 = l o a d d o u b l e ∗ %V
T [ 0 ] . Children [ 3 ] ; %tmp . 3 = g e t e l e m e n t p t r %QT∗ %T , l o n g 0 , ubyte 0
d o u b l e V; %tmp . 4 = l o a d d o u b l e ∗ %tmp . 3
S u m 3 r d C h i l d r e n ( C h i l d 3 , &V ) ; %R e t . 0 = add d o u b l e %tmp . 2 , % tmp . 4
Ret = V + T [ 0 ] . Data ; br l a b e l % e n d i f
}
∗ R e s u l t = Ret ; endif : %R e t . 1 = p h i d o u b l e [ % R e t . 0 , % e l s e ] , [ 0 . 0 , % e n t r y ]
} s t o r e double % Ret . 1 , double∗ % R e s u l t
(a) Example function r e t v o i d ; ; R e t u r n w i t h no v a l u e
}
(b) Corresponding LLVA code

Figure 2. C and LLVA code for a function

system. The type system is very simple, consisting of prim- pointer would be 20 bytes and 32 bytes respectively.
itive types with predefined sizes (ubyte, uint, float, dou-
ble, etc...) and 4 derived types (pointer, array, structure, and 3.2. Representation Portability
function). We chose this small set of derived types for two As noted in Section 2, a key design goal for a V-ISA is
reasons. First, we believe that most high-level language data to maintain object code portability across a family of pro-
types are eventually represented using some combination of cessor implementations. LLVA is broadly aimed to support
these low-level types, e.g., a C++ class with base classes and general-purpose uniprocessors (Section 3.6 discusses some
virtual functions is usually represented as a nested structure possible extensions). Therefore, it is designed to abstract
type with data fields and a pointer to an constant array of away implementation details in such processors, including
function pointers. Second, standard language-independent the number and types of registers, pointer size, endianness,
optimizations use only some subset of these types (if any), stack frame layout, and machine-level calling conventions.
including optimizations that require array dependence anal- The stack frame layout is abstracted by using an explicit
ysis, pointer analysis (even field-sensitive algorithms [16]), alloca instruction to allocate stack space and return a
and call graph construction. (typed) pointer to it, making all stack operations explicit.
All instructions in the V-ISA have strict type rules, and As an example, the V variable in Figure 2(a) is allocated
most are overloaded by type (e.g. ‘add int %X, %Y’ on the stack (instead of in a virtual register) because its ad-
vs. ‘add float %A, %B’). There are no mixed-type op- dress is taken for passing to Sum3rdChildren. In prac-
erations and hence, no implicit type coercion. An explicit tice, the translator preallocates all fixed-size alloca objects
cast instruction is the sole mechanism to convert a regis- in the function’s stack frame at compile time.
ter value from one type to another (e.g. integer to floating The call instruction provides a simple abstract calling
point or integer to pointer). convention, through the use of virtual register or constant
The most important purpose of the type system, however, operands. The actual parameter passing and stack adjust-
is to enable typed memory access. LLVA achieves this via ment operations are hidden by this abstract, but low-level,
type-safe pointer arithmetic using the getelementptr instruction.
instruction. This enables pointer arithmetic to be expressed Pointer size and endianness of a hardware imple-
directly in LLVA without exposing implementation details, mentation are difficult to completely to abstract away.
such as pointer size or endianness. To do this, offsets are Type-safe programs can be compiled to LLVA ob-
specified in terms of abstract type properties (field number ject code will be automatically portable, without expos-
for a structure and element index for an array). ing such I-ISA details. Non-type-safe code, however, (e.g.,
In the example, the %tmp.1 getelementptr instruc- machine-dependent code in C that is conditionally com-
tion calculates the address of T[0].Children[3], by piled for different platforms) requires exposing such details
using the symbolic indexes 0, 1, and 3. The “1” index is of the actual I-ISA configuration. For this reason, LLVA in-
a result of numbering the fields in the structure. On sys- cludes flags for properties that the source-language com-
tems with 32-bit and 64-bit pointers, the offset from the %T piler can expose to the source program (currently, these are
pointer size and endianness). This information is also en- 3.4. Self-modifying and Self-extending Code
coded in the object file so that, using this information, the
We use the term Self-Modifying Code (SMC) for a
translator for a different hardware I-ISA can correctly exe-
program that explicitly modifies its own pre-existing in-
cute the object code (although this emulation would incur
structions. We use the term Self-Extending Code (SEC)
a substantial performance penalty on I-ISAs without hard-
to refer to programs in which new code is added at run-
ware support).
time, but that do not modify any pre-existing code. SEC
3.3. Exception Semantics encompasses several behaviors such as class loading in
Java [17], function synthesis in higher-order languages,
Previous experience with virtual processor architectures, and program-controlled dynamic code generation. SEC is
particularly DAISY and Transmeta, show that there are generally much less problematic for virtual architectures
three especially difficult features to emulate in traditional than SMC. Furthermore, most commonly cited examples of
hardware interfaces: load/store dependences, precise excep- “self-modifying code” (e.g., dynamic code generation for
tions, and self-modifying code. The LLVA V-ISA already very high performance kernels or dynamic code loading in
simplifies detecting load/store dependences in one key way: operating systems and virtual machines) are really exam-
the type, control-flow, and SSA information enable sophisti- ples of SEC rather than SMC. Nevertheless, SMC can be
cated alias analysis algorithms in the translator, as discussed useful for implementing runtime code modifications in cer-
in 5.1. For the other two issues also, we have the opportu- tain kinds of tools such as runtime instrumentation tools or
nity to minimize their impact through good V-ISA design. dynamic optimization systems.
Precise exceptions are important for implementing many LLVA allows arbitrary SEC, and allows a constrained
programming languages correctly (without overly complex form of SMC that exploits the execution model for the
or inefficient code), but maintaining precise exceptions V-ISA. In particular, a program may modify its own (vir-
greatly restricts the ability of compiler optimizations to re- tual) instructions via a set of intrinsic functions, but such a
order code. Static compilers often have knowledge about change only affects future invocations of that function, not
operations that cannot cause exceptions (e.g., a load of a any currently active invocations. This ensures that SMC can
valid global in C), or operations whose exceptions can be be implemented efficiently and easily by the translator, sim-
ignored for a particular language (e.g., integer overflow in ply by marking the function’s generated code invalid, forc-
many languages). ing it to be regenerated the next time the function is invoked.
We use two simple V-ISA rules to retain precise excep-
tions but expose non-excepting operations to the translator: 3.5. Support for Operating Systems
• Each LLVA instruction defines a set of possible excep- LLVA uses two key mechanisms to support operating
tions that can be caused by executing that instruction. systems and user-space applications: intrinsic functions and
Any exception delivered to the program is precise, in a privileged bit. LLVA uses a small set of intrinsic functions
terms of the visible state of an LLVA program. to support operations like manipulating page tables and
• Each LLVA instruction has a boolean attribute named other kernel operations. These intrinsics are implemented
ExceptionsEnabled. Exceptions generated by an by the translator for a particular target. Intrinsics can be de-
instruction are ignored if ExceptionsEnabled is fined to be valid only if the privileged bit is set to true, oth-
false for that instruction; otherwise all exception con- erwise causing a kernel trap. A trap handler is an ordinary
ditions are delivered to the program. Exceptions- LLVA function with two arguments: the trap number and a
Enabled is true by default for load, store and pointer of type void* to pass in additional information to
div instructions. It is false by default for all other op- the handler. Trap handlers can refer to the register state of
erations, notably all arithmetic operations. an LLVM program using a standard, program-independent
register numbering scheme for virtual registers. Other in-
Note also that the ExceptionsEnabledattribute is a trinsic functions can be used to traverse the program stack
static attribute and is provided in addition to other mecha- and scan stack frames in an I-ISA-independent manner, and
nisms provided by the V-ABI to disable exceptions dynam- to register the entry points for trap handlers.
ically at runtime (e.g. for use in trap handlers).
A second attribute for instructions we are considering
3.6. Possible Extensions to the V-ISA
would allow exceptions caused by the instruction to be Thre are two important kinds of functionality that could
delivered without being precise. Static compilers for lan- be added to the V-ISA. First, the architecture certainly re-
guages like C and C++ could flag many untrapped exception quires definition of synchronization operations and a mem-
conditions (e.g., memory faults) in this manner, allowing ory model to support parallel programs (these primitives are
the translator to reorder such operations more freely (even difficult to make universal, and thus may have to be de-
if the hardware only supported precise exceptions). fined with a family of implementations in mind). Second,
profile information to offline storage. It can exploit all the
Application Operating System optimization mechanisms enabled by the V-ISA, described
Software below. Such a processor should obtain all the benefits of a
Kernel Storage
V−ISA
I−ISA code
VISC design without any need for online translation.
Profile info More commonly, however, a processor is designed with
Execution Manager
Optional
translator no assumptions about the OS or available storage. The
components
Code
Profiling
Static/Dynamic lack of such knowledge places constraints on the transla-
Generation Optimization
tor, as can be seen in DAISY’s and Crusoe’s translation
I−ISA schemes [11, 14]. Not only is the entire translator program
Hardware Processor
located in ROM, but the translated code and any associ-
Figure 3. The LLVA execution manager and ated profile information live only in memory and are never
interface to offline storage. cached in persistent storage between executions of a pro-
gram. Consequently, programs are always translated online
after being launched, if the translation does not exist in an
packed operations (also referred to as subword parallelism) in-memory cache.
are valuable to media and signal-processing codes. These We propose a translation strategy for such a situation that
operations must be encoded in the V-ISA because it is diffi- can enable offline translation and caching, if an OS ported
cult for the translator to automatically synthesize them from to LLVA chooses to exploit it. We have developed a transpar-
ordinary sequential code. Finally, we are developing V- ent execution environment called LLEE that embodies this
ISA extensions that provide machine-independent abstrac- strategy, though it is currently implemented at user-level on
tions for chip parallelism. These extensions could be valu- a standard POSIX system, as described below. It is depicted
able as explicit on-chip parallelism becomes more prevalent in Figure 3.
(e.g., [33, 21, 31]), raising potentially serious challenges for The LLEE translation strategy can be summarized as
preserving portability while achieving the highest possible “offline translation when possible, online translation when-
performance across different generations of processors. ever necessary.” A subset (perhaps all) of the translator suf-
ficient for translation and some set of optimizations would
4. Translation Strategy live in ROM or flash memory on the processor chip. It is in-
The goals of our translation strategy are to (a) minimize voked only by LLEE. The V-ABI defines a standard, OS-
the need for online translation, and (b) to exploit the novel independent interface with a set of routines that enables
optimization capabilities enabled by a rich, persistent code LLEE to read, write, and validate data in offline storage.
representation. This paper does not aim to develop new opti- This interface is the sole “gateway” that LLEE could use
mization techniques. We are developing such techniques in to call into the OS. An OS ported to LLVA can choose to
ongoing research, as part of a complete framework for life- implement these routines for higher performance, but they
long code optimization on ordinary processors [26]. Here, are strictly optional and the system will operate correctly in
we focus on the VISC translation strategy and on the impli- their absence.
cations of the optimization capabilities for VISC designs. Briefly, the basic gateway includes routines to create,
We begin by describing the “on-chip” runtime execution delete, and query the size of an offline cache, read or write
engine (LLEE) that manages the translation process. We fo- a vector of N bytes tagged by a unique string name from/to
cus in particular on strategies by which it interacts with the a cache, and check a timestamp on an LLVA program or on
surrounding software system to get access to offline stor- a cached vector. Because these routines are implemented by
age and enable offline translation. We then describe how the OS, and so cannot be linked into the translator, we also
the translation strategy exploits the optimization capabili- define one special LLVA intrinsic routine (recall that an in-
ties enabled by a rich persistent code representation. trinsic is a function implemented by the translator) that the
OS can use at startup to register the address of the gateway
4.1. LLEE: OS-Independent Translation System routine with the translator. This gateway routine can then
We distinguish two scenarios with different primary con- be called directly by the translator to query the addresses of
straints on the translation system. The first is when a pro- other gateway routines, also at startup. This provides a sim-
cessor is designed or optimized for a particular OS (e.g., ple but indefinitely extensible linkage mechanism between
PowerPCs customized for AS/400 systems running IBM’s translator and OS.
OS/400 [9]). For a VISC processor in such a scenario, the LLEE orchestrates the translation process as follows.
translator can live in offline storage as part of the OS, it can When the OS loads and transfers control to an LLVA exe-
be invoked to perform offline translation, and it can use OS- cutable in memory, LLEE is invoked by the processor hard-
specific interfaces directly to read and write translations and ware. If the OS gateway has been implemented, LLEE uses
it to look for a cached translation of the code, checks its retains rich enough information to support extensive opti-
timestamp if it exists, and reads it into memory if the trans- mizations, as demonstrated in 5.1.
lation is not out of date. If successful, LLEE performs re- Install-time optimization is just an application of the
location as necessary on the native code and then transfers translator’s optimization and code generation capabilities to
control to it directly. If any condition fails, LLEE invokes generate carefully tuned code for a particular system con-
the JIT compiler on the entry function. Any new translated figuration. This is a direct benefit of retaining a rich code
code generated by the JIT compiler can be written back to representation until software is installed, while still retain-
the offline cache if the gateway is available. During idle ing the ability to do offline code generation.
times, the OS can notify LLEE to perform offline transla- Unlike other trace-driven runtime optimizers for native
tion of an LLVA program by initiating “execution” as above, binary code, such as Dynamo [4], we have both the rich
but flagging it for translation and not actual execution. V-ISA and a cooperating code generator. Our V-ISA pro-
Our implementation of LLEE is faithful to this descrip- vides us with ability to perform static instrumentation to
tion except: (a) LLEE is a user-level shared library that is assist runtime path profiling, and to use the CFG at run-
loaded when starting a shell. This library overrides ex- time to perform path profiling within frequently executed
ecve() with a new version that recognizes LLVA exe- loop regions while avoiding interpretation. It also lets us de-
cutables and either invokes the JIT on them or executes the velop an aggressive optimization strategy that operates on
cached native translations from the disk, using a user-level traces of LLVA code corresponding to the hot traces of na-
version of our gateway. (b) Both the JIT and offline compil- tive code. We have implemented the tracing strategy and
ers are ordinary programs running on Solaris and Linux, and software trace cache, including the ability to gather cross-
the offline compiler reads and writes disk files directly. (c) procedure traces, [26], and we are now developing runtime
LLVA executables can invoke native libraries not yet com- optimizations that exploit these traces.
piled to LLVA, e.g., the X11 library. The rich information in LLVA also enables “idle-time”
profile-guided optimization (PGO) using the translator’s op-
4.2. Optimization Strategy timization and code generation capabilities. The important
The techniques above make it possible to perform offline advantage is that this step can use profile information gath-
translation for LLVA executables, even with a completely ered from executions on an end-user’s system. This has
OS-independent processor design. There are also important three distinct advantages over static PGO: (a) the profile in-
new optimization opportunities created by the rich V-ISA formation is more likely to reflect end-user behavior than
code representation, that a VISC architecture can exploit, hypothetical profile information generated by developers
but most of which are difficult to for programs compiled di- using predicted input sets; (b) developers often do not use
rectly to native code. These include: profile-guided optimization or do so only in limited ways,
whereas “idle-time” optimization can be completely trans-
1. Compile-time and link-time machine-independent op- parent to users, if combined with low-overhead profiling
timization (outside the translator). techniques; and (c) idle-time optimization can combine pro-
2. Install-time, I-ISA-specific optimization (before trans- file information with detailed information about the user’s
lation). specific system configuration.
3. Runtime, trace-driven machine-specific optimization. 5. Initial Evaluation
4. “Idle-time” (between executions) profile-guided, We believe the performance implications of a Virtual
machine-specific optimization using profile informa- ISA design cannot be evaluated meaningfully without (at
tion reflecting actual end-user behavior. least) a processor design with hardware mechanisms that
support translation and optimization [11]), and (preferably)
As noted earlier, the LLVA representation allow substan- basic cooperative hardware/software mechanisms that ex-
tial optimization to be performed before translation, mini- ploit the design. Since the key contribution of this paper is
mizing optimization that must be performed online. Of this, the design of LLVA, we focus on evaluating the features
optimization at link-time is particularly important because of this design. In particular, we consider the 2 questions
it is the first time that most or all modules of an application listed in the Introduction: does the representation enable
are simultaneously available, without requiring changes to high-level analysis and optimizations, and is the represen-
application Makefiles and without sacrificing the key bene- tation low-level enough to closely match with hardware and
fits of separate compilation. In fact, many commercial com- to be translated efficiently?
pilers today perform interprocedural optimization at link-
time, by exporting their proprietary compiler internal repre- 5.1. Supporting High Level Optimizations
sentation during static compilation [3, 20]. Such compiler- The LLVA code representation presented in this paper is
specific solutions are unnecessary with LLVA because it also used as the internal representation of a sophisticated
compiler framework we call Low Level Virtual Machine to support powerful (language-independent) compiler tasks
(LLVM) [26]. LLVM includes front-ends for C and C++ traditionally performed only in source-level compilers.
based on GCC, code generators for both Intel IA-32 and 5.2. Low-level Nature of the Instruction Set
SPARC V9 (each can be run either offline or as a JIT com-
piling functions on demand), a sophisticated link-time op- Table 2 presents metrics to evaluate the low-level nature
timization system, and a software trace cache. Compared of the LLVA V-ISA. The benchmarks we use include the
with the instruction set in Section 3, the differences in the PtrDist benchmarks [2] and the SPEC CINT2000 bench-
compiler IR are: (a) the compiler extracts type information marks (we omit three SPEC codes because their LLVAob-
for memory allocation operations and converts them into ject code versions fail to link currently). The first two
typed malloc and free instructions (the back-ends trans- columns in the table list the benchmark names and the num-
late these back into the library calls), and (b) the Excep- ber of lines of C source code for each.
tionEnabled bit is hardcoded based on instruction op- Columns 3 and 4 in the table show the fully linked code
code. The compiler system uses equivalent internal and ex- sizes for a statically compiled native executable and for the
ternal representations, avoiding the need for complex trans- LLVA object program. The native code is generated from
lations at each stage of the compilation process. the LLVA object program using our static back end for
SPARC V9. These numbers are comparable because they
The compiler uses the virtual instruction set for a va-
reflect the same LLVA optimizations were applied in both
riety of analyses and optimizations including many classi-
cases. The numbers show that the virtual object code is sig-
cal dataflow and control-flow optimizations, as well as more
nificantly smaller than the native code, roughly 1.3x to 2x
aggressive link-time interprocedural analyses and transfor-
for the larger programs in the table (the smaller programs
mations. The classical optimizations directly exploit the
have even larger ratios)2 . Overall, despite containing ex-
control-flow graph, SSA representation, and several choices
tra type and control flow information and using SSA form,
of pointer analysis. They are usually performed on a per
the virtual code is still quite compact for two reasons. First,
module-basis, before linking the different LLVA object code
most instructions usually fit in a single 32-bit word. Second,
modules, but can be performed at any stage of a program’s
the virtual code does not include verbose machine-specific
lifetime where LLVA code is available.
code for argument passing, register saves and restores, load-
We also perform several novel interprocedural tech- ing large immediate constants, etc.
niques using the LLVA representation, all of which operate The next five columns show the number of LLVA instruc-
at link-time. Data Structure Analysis is an efficient, context- tions, the total number of machine instructions generated by
sensitive pointer analysis, which computes both an accu- the X86 back-end, and the ratio of the latter to the former
rate call graph and points-to information. Most importantly, (also for Sparc). This back-end performs virtually no op-
it is able to identify information about logical data struc- timization and very simple register allocation resulting in
tures (e.g., an entire list, hashtable, or graph), including significant spill code. Nevertheless, each LLVA instruction
disjoint instances of such structures, their lifetimes, their translates into very few I-ISA instructions on average; about
internal static structure, and external references to them. 2-3 for X86 and 3-4 for SPARC V9. Furthermore, all LLVA
Automatic Pool Allocation is a powerful interprocedural instructions are translated directly to native machine code
transformation that uses Data Structure Analysis to parti- – no emulation routines are used at all. These results indi-
tion the heap into separate pools for each data structure in- cate that the LLVA instruction set uses low-level operations
stance [25]. Finally, we have shown that the LLVA repre- that match closely with native hardware instructions.
sentation is rich enough to perform complete, static analy- Finally, the last three columns in the table show the to-
sis of memory safety for a large class of type-safe C pro- tal code generation time taken by the X86 JIT compiler to
grams [24, 13]. This work uses both the techniques above, compile the entire program (regardless of which functions
plus an interprocedural array bounds check removal algo- are actually executed), the total running time of each pro-
rithm [24] and some custom interprocedural dataflow and gram when compiled natively for X86 using gcc -O3, and
control flow analyses [13]. the ratio of the two. As the table shows, the JIT compila-
The interprocedural techniques listed above are tradi- tion times are negligible, except for large codes with short
tionally considered very difficult even on source-level im- running time. Furthermore, this behavior should extend to
perative languages, and are impractical for machine code. In much larger programs as well because the JIT translates
fact, all of these techniques fundamentally require type in- functions on demand, so that unused code is not translated
formation for pointers, arrays, structures and functions in (we show the compilation time for the entire program, since
LLVA plus the Control Flow Graph. The SSA represen- that makes the data easier to understand). Overall, this data
tation significantly improves both the precision and speed
of the analyses and transformations. Overall, these exam- 2 The GCC compiler generates more compact SPARC V8 code, which
ples amply demonstrate that the virtual ISA is rich enough is roughly equal in size to the bytecode [26].
Program #LOC Native LLVM code #LLVM #X86 Ratio #SPARC Ratio Translate Run Ratio
size (KB) size (KB) Inst. Inst. Inst. Time (s) time (s)
ptrdist-anagram 647 21.7 10.7 776 1817 2.34 2550 3.29 0.0078 1.317 0.006
ptrdist-ks 782 24.9 12.1 1059 2732 2.58 4446 4.20 0.0039 1.694 0.002
ptrdist-ft 1803 20.9 10.1 799 1990 2.49 2818 3.53 0.0117 2.797 0.004
ptrdist-yacr2 3982 58.3 36.5 4279 10881 2.54 12252 2.86 0.0429 2.686 0.016
ptrdist-bc 7297 112.0 74.4 7276 19286 2.65 25697 3.53 0.1308 1.307 0.100
179.art 1283 37.8 17.9 2027 5385 2.66 7031 3.47 0.0253 114.723 0.000
183.equake 1513 44.4 23.9 2863 6409 3.14 8275 2.89 0.0273 18.005 0.002
181.mcf 2412 32.0 17.3 2039 4707 2.31 4601 2.26 0.0175 24.516 0.001
256.bzip2 4647 73.5 55.7 5103 11984 2.35 14157 2.77 0.0371 20.896 0.002
164.gzip 8616 94.0 68.6 7594 17500 2.30 20880 2.75 0.0527 19.332 0.003
197.parser 11391 223.0 175.3 17138 41671 2.43 57274 3.34 0.1601 4.718 0.034
188.ammp 13483 265.1 163.2 21961 53529 2.44 67679 3.08 0.1074 58.758 0.002
175.vpr 17729 331.0 184.4 18041 58982 3.27 74696 4.14 0.1425 7.924 0.018
300.twolf 20459 487.7 330.0 45017 104613 2.32 119691 2.66 0.0156 9.680 0.002
186.crafty 20650 555.5 336.4 34080 104093 3.05 110630 3.25 0.4531 15.408 0.029
255.vortex 67223 976.3 719.3 72039 195648 2.72 224488 3.12 0.7773 6.753 0.115
254.gap 71363 1088.1 854.4 111482 246102 2.21 272483 2.44 0.4824 3.729 0.129

Table 2. Metrics demonstrating code size and low-level nature of the V-ISA

shows that it is possible to do a very fast, non-optimizing The IBM AS/400, building on early ideas in the S/38, de-
translation of LLVA code to machine code at very low cost. fined a Machine Interface (MI) that was very high-level, ab-
Any support to translate code offline and/or to cache trans- stract and hardware-independent (e.g., it had no registers or
lated code offline should further reduce the impact of this storage locations). It was the sole interface for all applica-
translation cost. tion software and for much of OS/400. Their design, how-
Overall, both the instruction count ratio and the JIT com- ever, differed from ours in fundamental ways, and hence
pilation times show that the LLVA V-ISA is very closely does not meet the goals we laid out in Section 2. Their MI
matched to hardware instruction sets in terms of the com- was targeted at a particular operating system (the OS/400),
plexity of the operations, while the previous subsection it was designed to be implemented using complex operating
showed that it includes enough high-level information for system and database services and not just a translator, and
sophisticated compiler optimizations. This combination of was designed to best support a particular workload class,
high-level information with low-level operations is the cru- viz., commercial database-driven workloads. It also had a
cial feature that (we believe) makes the LLVA instruction far more complex instruction set than ours (or any CISC
set a good design for a Virtual Instruction Set Architecture. processors), including string manipulation operations, and
“object” manipulation operations for 15 classes of objects
6. Related Work (e.g., programs and files). In contrast, our V-ISA is philo-
sophically closer to modern processor instruction sets in be-
Virtual machines of different kinds have been widely ing a minimal, orthogonal, load/store architecture; it is OS-
used in many software systems, including operating sys- independent and requires no software other than a transla-
tems (OS), language implementations, and OS and hard- tor; and it is designed to support modern static and dynamic
ware emulators. These uses do not define a Virtual ISA optimization techniques for general-purpose software.
at the hardware level, and therefore do not directly benefit
processor design (though they may influence it). The chal- DAISY [14] developed a dynamic translation scheme for
lenges of using two important examples – Java Virtual Ma- emulating multiple existing hardware instruction sets (Pow-
chine and Microsoft CLI – as a processor-level virtual ISA erPC, Intel IA-32, and S/390) on a VLIW processor. They
were discussed in the Introduction. developed a novel translation scheme with global VLIW
scheduling fast enough for online use, and hardware exten-
We know of four previous examples of VISC archi-
sions to assist the translation. Their translator operated on
tectures, as defined in Section 1: the IBM System/38
a page granularity. Both the DAISY and Transmeta transla-
and AS/400 family [9], the DAISY project at IBM Re-
tors are stored entirely in ROM on-chip. Because they fo-
search [14], Smith et al.’s proposal for Codesigned Virtual
cus on existing V-ISAs with existing OS/hardware interface
Machines in the Strata project [32], and Transmeta’s Cru-
specifications, they cannot assume any OS support and thus
soe family of processors [23, 11]. All of these distinguish
cannot cache any translated code or profile information in
the virtual and physical ISAs as a fundamental proces-
off-processor storage, or perform any offline translation.
sor design technique. To our knowledge, however, none
except the IBM S/38 and AS/400 have designed a virtual in- Transmeta’s Crusoe uses a dynamic translation scheme
struction set for use in such architectures. to emulate Intel IA-32 instructions on a VLIW hardware
processor [23]. The hardware includes important support- optimization strategy but without the benefits of a rich V-
ing mechanisms such as shadowed registers and a gated ISA. Many JIT compilers for Java, Self, and other languages
store buffer for speculation and rollback recovery on ex- combine fast initial compilation with adaptive reoptimiza-
ceptions, and alias detection hardware in the load/store tion of “hot” methods (e.g., see [1, 6, 18, 34]). Finally,
pipeline. Their translator, called Code Morphing Software many hardware techniques have been proposed for improv-
(CMS), exploits these hardware mechanisms to reorder in- ing the effectiveness of dynamic optimization [27, 30, 35].
structions aggressively in the presence of the challenging When combined with a rich V-ISA that supports more ef-
features identified in Section 3.3, namely, precise excep- fective program analyses and transformations, these soft-
tions, memory dependences, and self-modifying code (as ware and hardware techniques can further enhance the ben-
well as memory-mapped I/O) [11]. They use a trace-driven efits of VISC architectures.
reoptimization scheme to optimize frequently executed dy-
namic sequences of code. Crusoe does do not perform any 7. Conclusions and Future Work
offline translation or offline caching, as noted above. Trends in modern processors indicate that CPU cycles
Smith et al. in the Strata project have recently but and raw transistors are becoming increasingly cheap, while
perhaps most clearly articulated the potential benefits of control complexity, wire delays, power, reliability, and test-
VISC processor designs, particularly the benefits of co- ing cost are becoming increasingly difficult to manage. Both
designing the translator and a hardware processor with an trends favor virtual processor architectures: the extra CPU
implementation-dependent ISA [32]. They describe a num- cycles can be spent on software translation, the extra tran-
ber of examples illustrating the flexibility hardware design- sistors can be spent on mechanisms to assist that translation,
ers could derive from this strategy. They have also devel- and a cooperative hardware/software design supported by a
oped several hardware mechanisms that could be valuable rich virtual program representation could be used in numer-
for implementing such architectures, including relational ous ways to reduce hardware complexity and potentially in-
profiling [19], a microarchitecture with a hierarchical reg- crease overall performance.
ister file for instruction-level distributed processing [22], This paper presented LLVA, a design for a language-
and hardware support for working set analysis [12]. They independent, target-independent virtual ISA. The instruc-
do not propose a specific choice of V-ISA, but suggest that tion set is low-level enough to map directly and closely to
one choice would be to use Java VM as the V-ISA (an op- hardware operations but includes high-level type, control-
tion we discussed in the Introduction). flow and dataflow information needed to support sophis-
Previous authors have developed Typed Assembly Lan- ticated analysis and optimization. It includes novel mech-
guages [28, 7] with goals that generally differ significantly anisms to overcome the difficulties faced by previous vir-
from ours. Their goals are to enable compilation from tual architectures such as DAISY and Transmeta’s Crusoe,
strongly typed high-level languages to typed assembly lan- including a flexible exception model, minor constraints on
guage, enabling sound (type preserving) program transfor- self-modifying code to dovetail with the compilation strat-
mations, and to support program safety checking. Their type egy, and an OS-independent interface to access offline stor-
systems are higher-level than ours, because they attempt age and enable offline translation.
to propagate significant type information from source pro- Evaluating the benefits of LLVA requires a long-term
grams. In comparison, our V-ISA uses a much simpler, low- research program. We have three main goals in the near
level type system aimed at capturing the common low-level future: (a) Develop and evaluate cooperative (i.e., code-
representations and operations used to implement compu- signed) software/hardware design choices that reduce hard-
tations from high-level languages. It is also designed to to ware complexity and assist the translator to achieve high
support arbitrary non-type-safe code efficiently, including overall performance. (b) Extend the V-ISA with machine-
operating system and kernel code. independent abstractions of fine- and medium-grain paral-
Binary translation has been widely used to provide bi- lelism, suitable for mapping to explicitly parallel processor
nary compatibility for legacy code. For example, the FX!32 designs, as mentioned in Section 3.6. (c) Port an existing
tool uses a combination of online interpretation and offline operating system (in incremental steps) to work on top of
profile-guided translation to execute Intel IA-32 code on Al- the LLVA architecture, and explore the OS design implica-
pha processors [8]. Unlike such systems, a VISC architec- tions of such an implementation.
ture makes binary translation an essential part of the design
strategy, using it for all codes, not just legacy codes. Acknowledgements
There is a wide range of work on software and hard- We thank Jim Smith, Sarita Adve, John Criswell and
ware techniques for transparent dynamic optimization of the anonymous referees for their detailed feedback on this
programs. Transmeta’s CMS [11] and Dynamo [4] iden- paper. This work has been supported by an NSF CA-
tify and optimize hot traces at runtime, similar to our re- REER award, EIA-0093426, the NSF Operating Systems
and Compilers program under grant number CCR-9988482, [19] T. H. Heil and J. E. Smith. Relational profiling: enabling
and the SIA’s MARCO Focus Center program. thread-level parallelism in virtual machines. In MICRO,
pages 281–290, Monterey, CA, Dec 2000.
References [20] IBM Corp. XL FORTRAN: Eight Ways to Boost Perfor-
[1] A.-R. Adl-Tabatabai, et al. Fast and effective code genera- mance. White Paper, 2000.
tion in a just-in-time Java compiler. In PLDI, May 1998. [21] Intel Corp. Special Issue on Intel HyperThreading Technol-
[2] T. Austin, et al. The pointer-intensive benchmark ogy in Pentium 4 Processors. Intel Technology Journal, Q1,
suite. Available at www.cs.wisc.edu/˜austin/ptr- 2002.
dist.html, Sept 1995. [22] H.-S. Kim and J. E. Smith. An instruction set and microar-
[3] A. Ayers, S. de Jong, J. Peyton, and R. Schooler. Scal- chitecture for instruction level distributed processing. In
able cross-module optimization. ACM SIGPLAN Notices, ISCA, Alaska, May 2002.
33(5):301–312, 1998. [23] A. Klaiber. The Technology Behind Crusoe Processors,
[4] V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A trans- 2000.
parent dynamic optimization system. In PLDI, pages 1–12, [24] S. Kowshik, D. Dhurjati, and V. Adve. Ensuring code safety
June 2000. without runtime checks for real-time control systems. In
[5] D. Burger and J. R. Goodman. Billion-transistor architec- CASES, Grenoble, France, Oct 2002.
tures. Computer, 30(9):46–49, Sept 1997. [25] C. Lattner and V. Adve. Automatic Pool Allocation for Dis-
[6] M. G. Burke, J.-D. Choi, S. Fink, D. Grove, M. Hind, joint Data Structures. In Proc. ACM SIGPLAN Workshop on
V. Sarkar, M. J. Serrano, V. C. Sreedhar, H. Srinivasan, and Memory System Performance, Berlin, Germany, Jun 2002.
J. Whaley. The Jalapeño Dynamic Optimizing Compiler for [26] C. Lattner and V. Adve. LLVM: A Compilation Framework
Java. In Java Grande, pages 129–141, 1999. for Lifelong Program Analysis and Transformation. Tech.
[7] J. Chen, D. Wu, A. W. Appel, and H. Fang. A provably sound Report UIUCDCS-R-2003-2380, Computer Science Dept.,
TAL for back-end optimization. In PLDI, San Diego, CA, Univ. of Illinois at Urbana-Champaign, Sept 2003.
Jun 2003. [27] M. C. Merten, A. R. Trick, E. M. Nystrom, R. D. Barnes,
[8] A. Chernoff, et al. FX!32: A profile-directed binary transla- and W. m. W. Hwu. A hardware mechanism for dynamic ex-
tor. IEEE Micro, 18(2):56–64, 1998. traction and relayout of program hot spots. In ISCA, pages
[9] B. E. Clark and M. J. Corrigan. Application system/400 per- 59–70, Jun 2000.
formance characteristics. IBM Systems Journal, 28(3):407– [28] G. Morrisett, D. Walker, K. Crary, and N. Glew. From Sys-
423, 1989. tem F to typed assembly language. TOPLAS, 21(3):528–569,
[10] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and May 1999.
F. K. Zadeck. Efficiently computing static single assignment [29] P. Oberoi and G. S. Sohi. Parallelism in the front-end. In
form and the control dependence graph. TOPLAS, pages ISCA, June 2003.
13(4):451–490, October 1991. [30] S. J. Patel and S. S. Lumetta. rePLay: A Hardware Frame-
[11] J. C. Dehnert, et al. The Transmeta Code Morphing Soft- work for Dynamic Optimization. IEEE Transactions on
ware: Using speculation, recovery and adaptive retransla- Computers, Jun 2001.
tion to address real-life challenges. In Proc. 1st IEEE/ACM [31] K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, , and
Symp. Code Generation and Optimization, San Francisco, J. Huh. Exploiting ILP, TLP, and DLP with the Polymor-
CA, Mar 2003. phous TRIPS Architecture. In ISCA, June 2003.
[12] A. S. Dhodapkar and J. E. Smith. Managing multi- [32] J. E. Smith, T. Heil, S. Sastry, and T. Bezenek. Achieving
configuration hardware via dynamic working set analysis. In high performance via co-designed virtual machines. In Inter-
ISCA, Alaska, May 2002. national Workshop on Innovative Architecture (IWIA), 1999.
[13] D. Dhurjati, S. Kowshik, V. Adve, and C. Lattner. Mem- [33] J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sin-
ory safety without runtime checks or garbage collection. In haroy. The POWER4 system microarchitecture. IBM Jour-
LCTES, San Diego, CA, Jun 2003. nal of Research and Development, 46(1):5–26, 2002.
[14] K. Ebcioglu and E. R. Altman. DAISY: Dynamic compi- [34] D. Ungar and R. B. Smith. Self: The power of simplicity. In
lation for 100% architectural compatibility. In ISCA, pages OOPSLA, 1987.
26–37, 1997. [35] C. Zilles and G. Sohi. A programmable coprocessor for pro-
[15] J. Fisher. Walk-time techniques: Catalyst for architectural filing. In HPCA, Jan 2001.
change. Computer, 30(9):46–42, Sept 1997.
[16] R. Ghiya, D. Lavery, and D. Sehr. On the importance of
points-to analysis and other memory disambiguation meth-
ods for C programs. In PLDI. ACM Press, 2001.
[17] J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java
Language Specification, 2nd Ed. Addison-Wesley, Reading,
MA, 2000.
[18] D. Griswold. The Java HotSpot Virtual Machine Architec-
ture, 1998.

Preparing For A Technical IOS Job Interview
No ratings yet
Preparing For A Technical IOS Job Interview
120 pages
Survey of Virtual Machine Research PDF
No ratings yet
Survey of Virtual Machine Research PDF
12 pages
Coa Concept
No ratings yet
Coa Concept
18 pages
An Overview of Virtual Machine Architectures: Ross Rosemark
No ratings yet
An Overview of Virtual Machine Architectures: Ross Rosemark
11 pages
Virtualization Io
No ratings yet
Virtualization Io
16 pages
Jurnal Organisasi Dan Aksitektur Komputer Terbaru
No ratings yet
Jurnal Organisasi Dan Aksitektur Komputer Terbaru
6 pages
Virtual Machines and Dynamic Translation: Implementing Isas in Software
No ratings yet
Virtual Machines and Dynamic Translation: Implementing Isas in Software
30 pages
Instruction Set Architecture
No ratings yet
Instruction Set Architecture
134 pages
Essays on Infrastructure-as-code
From Everand
Essays on Infrastructure-as-code
Ravi Rajamani
No ratings yet
Implementation Virtualization
No ratings yet
Implementation Virtualization
5 pages
Lec03 ISA Intro
No ratings yet
Lec03 ISA Intro
20 pages
Modern Computer Architecture: Lecture1 Fundamentals of Quantitative Design and Analysis (I)
No ratings yet
Modern Computer Architecture: Lecture1 Fundamentals of Quantitative Design and Analysis (I)
41 pages
Interpretation & Binary Translation
No ratings yet
Interpretation & Binary Translation
8 pages
4 Instruction Set Architecture
No ratings yet
4 Instruction Set Architecture
12 pages
L2 Computer Architecture (1) - 075755
No ratings yet
L2 Computer Architecture (1) - 075755
12 pages
Instruction Set Architecture (ISA)
No ratings yet
Instruction Set Architecture (ISA)
6 pages
Ca 12
No ratings yet
Ca 12
64 pages
Rust for Embedded Systems
From Everand
Rust for Embedded Systems
James Oakton
No ratings yet
Lec3 - Single-Cycle and Multi-Cycle Microarchitectures
No ratings yet
Lec3 - Single-Cycle and Multi-Cycle Microarchitectures
121 pages
Mastering the Art of x86 Assembly Programming: Unlocking the Secrets of Expert-Level Skills
From Everand
Mastering the Art of x86 Assembly Programming: Unlocking the Secrets of Expert-Level Skills
Steve Jones
No ratings yet
Lecture 3
No ratings yet
Lecture 3
30 pages
Unit-I Part-II Virtualization
No ratings yet
Unit-I Part-II Virtualization
70 pages
L02 SimpleImps
No ratings yet
L02 SimpleImps
41 pages
Taxonomy
No ratings yet
Taxonomy
30 pages
Lab2 Assembly Lab I
No ratings yet
Lab2 Assembly Lab I
44 pages
2 Isa
No ratings yet
2 Isa
32 pages
Section - A - Unit-2 STORED PROGRAM CONCEPT
No ratings yet
Section - A - Unit-2 STORED PROGRAM CONCEPT
100 pages
Instruction Set Architecture
No ratings yet
Instruction Set Architecture
45 pages
CC Unit 2
No ratings yet
CC Unit 2
21 pages
RISC-V Architecture and Implementation Guide: Definitive Reference for Developers and Engineers
From Everand
RISC-V Architecture and Implementation Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Design and Implementation with i.MX Processors: Definitive Reference for Developers and Engineers
From Everand
Design and Implementation with i.MX Processors: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Wk05 - CPU Architecture (Part 1)
No ratings yet
Wk05 - CPU Architecture (Part 1)
72 pages
Vmware Architecture
No ratings yet
Vmware Architecture
152 pages
Virt
No ratings yet
Virt
26 pages
WASI-NN for Machine Learning Interfaces: The Complete Guide for Developers and Engineers
From Everand
WASI-NN for Machine Learning Interfaces: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Compiler and Virtual Machine of A M
No ratings yet
Compiler and Virtual Machine of A M
9 pages
Unit II Virtualization Basics
No ratings yet
Unit II Virtualization Basics
12 pages
Virtual Machine
No ratings yet
Virtual Machine
35 pages
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
From Everand
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
RISC, CISC, and ISA Variations: Prof. Hakim Weatherspoon CS 3410, Spring 2015
No ratings yet
RISC, CISC, and ISA Variations: Prof. Hakim Weatherspoon CS 3410, Spring 2015
55 pages
Introduction To QEMU
No ratings yet
Introduction To QEMU
73 pages
l1 Intro PDF
No ratings yet
l1 Intro PDF
36 pages
Comprehensive Guide to Mbed Development: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Mbed Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
From Everand
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
Mario Marinov
No ratings yet
Comparch 2015 S 03
No ratings yet
Comparch 2015 S 03
44 pages
Week 3 - IsA Appendix A Sp24
No ratings yet
Week 3 - IsA Appendix A Sp24
48 pages
Week 2 Day 1 - Embedded Systems
No ratings yet
Week 2 Day 1 - Embedded Systems
57 pages
02.EECE 345 Computer Architecture ISA Design
No ratings yet
02.EECE 345 Computer Architecture ISA Design
109 pages
Chapter 1
No ratings yet
Chapter 1
31 pages
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
From Everand
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Embedded Processors: Instruction Set Architecture (ISA)
No ratings yet
Embedded Processors: Instruction Set Architecture (ISA)
90 pages
Aca Notes
No ratings yet
Aca Notes
23 pages
Lec2 - ISA Vs Microarchitecture
No ratings yet
Lec2 - ISA Vs Microarchitecture
38 pages
Study Guide Automating and Programming Cisco Data Center Solutions 300-635 DCAUTO Exam
From Everand
Study Guide Automating and Programming Cisco Data Center Solutions 300-635 DCAUTO Exam
Anand Vemula
No ratings yet
Eecs 2011 62
No ratings yet
Eecs 2011 62
34 pages
Lec02 ISA Intro
No ratings yet
Lec02 ISA Intro
38 pages
Mudge Mpsoc
No ratings yet
Mudge Mpsoc
47 pages
CAO Presentation
No ratings yet
CAO Presentation
12 pages
Computer Architecture A Quantitative Approach 2nd Edition 1gcu6vr0gn
No ratings yet
Computer Architecture A Quantitative Approach 2nd Edition 1gcu6vr0gn
7 pages
Microprocessor Book
No ratings yet
Microprocessor Book
296 pages
ACA Chapter 1
100% (1)
ACA Chapter 1
106 pages
Provisional Selection List For The Post of Assistant Professor Geology in HED 18 - 10 - 2023
No ratings yet
Provisional Selection List For The Post of Assistant Professor Geology in HED 18 - 10 - 2023
6 pages
Response To Reviewers
No ratings yet
Response To Reviewers
8 pages
270 858 2 PB
No ratings yet
270 858 2 PB
4 pages
Provisional Selection List For The Post of Assistant Professor English in HED 03 - 11 - 2023
No ratings yet
Provisional Selection List For The Post of Assistant Professor English in HED 03 - 11 - 2023
16 pages
Cutoff Total Cutoff Total JRF & Assistant Professor Assistant Professor Subcd Subject Category
No ratings yet
Cutoff Total Cutoff Total JRF & Assistant Professor Assistant Professor Subcd Subject Category
46 pages
Design of High-Speed 16 To 4 Priority Encoder Using GDI: Ii. Cmos
No ratings yet
Design of High-Speed 16 To 4 Priority Encoder Using GDI: Ii. Cmos
5 pages
Provisional Selection List For The Post of Assistant Professor Functional English in HED 19 - 10 - 2023
No ratings yet
Provisional Selection List For The Post of Assistant Professor Functional English in HED 19 - 10 - 2023
6 pages
Formats For Submission of Documents at The Time of Interview For The Post of Assistant Professor in Various Medical Colleges in HME 04 - 07 - 2023
No ratings yet
Formats For Submission of Documents at The Time of Interview For The Post of Assistant Professor in Various Medical Colleges in HME 04 - 07 - 2023
6 pages
Provisional Selection List For The Post of Assistant Professor Environmental Science in HED 04 - 10 - 2023
No ratings yet
Provisional Selection List For The Post of Assistant Professor Environmental Science in HED 04 - 10 - 2023
10 pages
Filling Up of The Post of Assistant Professor in Various Govt Degree Colleges Extension of Date Thereof 20 - 07 - 2023
No ratings yet
Filling Up of The Post of Assistant Professor in Various Govt Degree Colleges Extension of Date Thereof 20 - 07 - 2023
1 page
Embedding in Medical Images: An Efficient Scheme For Authentication and Tamper Localization
No ratings yet
Embedding in Medical Images: An Efficient Scheme For Authentication and Tamper Localization
30 pages
Academic Arrangement 2021-2022 (Kashmir Division)
No ratings yet
Academic Arrangement 2021-2022 (Kashmir Division)
2 pages
Smart City Security: Digitalcommons@Kennesaw State University
No ratings yet
Smart City Security: Digitalcommons@Kennesaw State University
10 pages
Embedded Systems - Challenges and Work Directions
No ratings yet
Embedded Systems - Challenges and Work Directions
2 pages
Hurrah 2017
No ratings yet
Hurrah 2017
5 pages
Adhoc Networks Me
No ratings yet
Adhoc Networks Me
20 pages
Imam-ul-Ambiya (صلى الله عليه وآله وسلم) - Urdu
No ratings yet
Imam-ul-Ambiya (صلى الله عليه وآله وسلم) - Urdu
4 pages
Challenges For Modelling and Analysis in Embedded Systems and Systems-Of-Systems Design
No ratings yet
Challenges For Modelling and Analysis in Embedded Systems and Systems-Of-Systems Design
7 pages
Teaching Computer Science With Cybersecurity Education Built-In
No ratings yet
Teaching Computer Science With Cybersecurity Education Built-In
8 pages
Teaching Statement1
No ratings yet
Teaching Statement1
2 pages
Teaching Statement: Dr. Gonzalo Mateos
No ratings yet
Teaching Statement: Dr. Gonzalo Mateos
2 pages
Dua For Students For Concentration
No ratings yet
Dua For Students For Concentration
6 pages
Teaching Statement MASSIMIL
No ratings yet
Teaching Statement MASSIMIL
3 pages
Electronics PDF
No ratings yet
Electronics PDF
2 pages
Block Based Medical Image Watermarking Technique For Tamper Detection and Recovery
No ratings yet
Block Based Medical Image Watermarking Technique For Tamper Detection and Recovery
10 pages
1.delegates in C
No ratings yet
1.delegates in C
19 pages
Dot-Net-UNIT 1
No ratings yet
Dot-Net-UNIT 1
117 pages
Collections and Generics PDF
No ratings yet
Collections and Generics PDF
63 pages
Java Programming
No ratings yet
Java Programming
85 pages
Generics PDF
67% (3)
Generics PDF
16 pages
Bjarnason Scalaz
No ratings yet
Bjarnason Scalaz
47 pages
Chapter 1 CGC
No ratings yet
Chapter 1 CGC
38 pages
(BookRAR - Net) - Programming IOS 14 Using Swift UI
No ratings yet
(BookRAR - Net) - Programming IOS 14 Using Swift UI
141 pages
Adjava PDF
No ratings yet
Adjava PDF
422 pages
A Software Architecture For Zero-Copy RPC in Java: Chi-Chao Chang and Thorsten Von Eicken
No ratings yet
A Software Architecture For Zero-Copy RPC in Java: Chi-Chao Chang and Thorsten Von Eicken
16 pages
L26. Generic
No ratings yet
L26. Generic
37 pages
CPP Best Practices
100% (1)
CPP Best Practices
45 pages
Minimizing Impact On JAVA Virtual Machine Via JAVA Code Optimization PDF
No ratings yet
Minimizing Impact On JAVA Virtual Machine Via JAVA Code Optimization PDF
6 pages
Wrox Professional Net Framework 2 01
No ratings yet
Wrox Professional Net Framework 2 01
583 pages
C# Course Pack
100% (1)
C# Course Pack
81 pages
Presenti C#
No ratings yet
Presenti C#
137 pages
PPL Unit-2-Mcqs-Sheet
No ratings yet
PPL Unit-2-Mcqs-Sheet
20 pages
C# 9.0 in A Nutshell: The Definitive Reference 1st Edition Joseph Albahari Instant Download
No ratings yet
C# 9.0 in A Nutshell: The Definitive Reference 1st Edition Joseph Albahari Instant Download
55 pages
C# Basic Note
100% (1)
C# Basic Note
85 pages
Runtime Storage Management (AutoRecovered)
No ratings yet
Runtime Storage Management (AutoRecovered)
17 pages
Type Script
No ratings yet
Type Script
37 pages
Assignment 3
No ratings yet
Assignment 3
17 pages
Language Based Security-1
No ratings yet
Language Based Security-1
44 pages
C Sharp Unit-I Notes
No ratings yet
C Sharp Unit-I Notes
20 pages
Core Java Career Essentials
No ratings yet
Core Java Career Essentials
107 pages
1.0 Abstract: in Today's Information Technology World, Security For Systems Is Becoming
No ratings yet
1.0 Abstract: in Today's Information Technology World, Security For Systems Is Becoming
27 pages
Enums: // Int Enum Pattern - Has Severe Problems!
No ratings yet
Enums: // Int Enum Pattern - Has Severe Problems!
7 pages
C# MCQ
No ratings yet
C# MCQ
160 pages
JSR 275
No ratings yet
JSR 275
235 pages

LLVA: A Low-Level Virtual Instruction Set Architecture

Uploaded by

LLVA: A Low-Level Virtual Instruction Set Architecture

Uploaded by

LLVA: A Low-level Virtual Instruction Set Architecture

Abstract Along with design alternatives for the microarchitecture

Figure 2. C and LLVA code for a function

You might also like