From Quads To Graphs
From Quads To Graphs
Intermediate Representation's
Journey
Cli Click
CRPC-TR93366-S
October 1993
We follow the slow mutation of an intermediate representation from a well known quad-based form to a graph-
based form. The nal form is similar to an operator-level Program Dependence Graph or Gated Single Assignment
form. The nal form contains all the information required to execute the program. What is more important, the
graph form explicitly contains use-def information. Analyses can use this information directly without having to
calculate it. Transformations modify the use-def information directly, without requiring separate steps to modify
the intermediate representation and the use-def information.
Keywords: Intermediate representations, optimizing compilers, constant propagation
1 Introduction
Intermediate representations do not exist in a vacuum. They are the stepping stone from what the
programmer wrote to what the machine understands. Intermediate representations must bridge a large
semantic gap (for example, from Fortran 90 vector operations to a 3-address add in some machine code).
During the translation from a high-level language to machine code, an optimizing compiler repeatedly
analyzes and transforms the intermediate representation. As compiler users we want these analyses and
transformations to be fast and correct. As compiler writers we want optimizations to be simple to write,
easy to understand, and easy to change. Our goal is a representation that is simple and light weight while
allowing easy expression of fast optimizations.
This article chronicles the slow mutation of an intermediate representation from a well known quad-
based form to a graph-based form. The nal form is similar to (although not exactly equal to) an operator-
level Program Dependence Graph or Gated Single Assignment form.[1, 2, 3, 4] The nal form contains all
the information required to execute the program. What is more important, the graph form explicitly
contains use-def information. Analyses can use this information directly without having to calculate it.
Transformations modify the use-def information directly without requiring separate steps to modify the
intermediate representation and the use-def information. The graph form is a simple single-tiered structure
instead of a two-tiered Control-Flow Graph containing basic blocks (tier 1) of instructions (tier 2). This
single tier is re
ected in our algorithms.
One of our philosophies for making a compiler run fast is to do as much of the work as possible as early
in the compile as possible. This leads us to attempt strong peephole optimizations in a one-pass front end.
Our nal representation allows peephole optimizations that, under certain circumstances, are as strong as
pessimistic conditional constant propagation.
This paper is not about building a complete optimizer, parser, or a compiler front-end. We assume
that the reader can write a compiler front-end. This paper is about the design decisions that led us from
a more traditional intermediate representation to the current graph-based representation. This transition
is a continuous and on-going process.
Throughout this exposition, we use C++ to describe the data structures and code; a working knowledge
of C++ will aid in understanding this paper. The later sections contain bits of C++ code sliced directly
out of our compiler (with some reformatting to save space).
This work has been supported by ARPA through ONR grant N00014-91-J-1989.
class Inst f
Enumeration of all possible opcodes
==
Normal SSA
int x 1; int x0 1;
do f do f x1 (x0; x3);
if( x 6= 1 ) if( x1 6= 1 )
x 2; x2 2;
x3 (x1; x2);
g while( pred() ) g while( pred() )
return(x); return(x3 );
Figure 4 Sample code in SSA form
correlation between variables and expressions. Therefore, the variable name can be used as a direct map
to the expression that denes it. In our implementation, we want this mapping to be fast.
In the instruction's concrete representation there are elds that hold the source variables' names (as
machine integers). To speed up the variable-to-denition mapping, we replace the variable name with a
pointer to the variable's dening instruction (actually a pointer to the instruction's concrete representation
as a data structure). Performing the mapping from variable name to dening instruction now requires a
single pointer lookup. In this manner, use-def information is explicitly encoded. Figure 5 describes the
new instruction format.
We now have an abstract mapping from variables and expressions to instructions and back. Instructions
no longer need to encode the variable name being dened (they use the abstract mapping instead) so the
dst eld can be removed. However, the front end still needs to map from variable names to instructions
while parsing. The variable names are mapped to a dense set of integer indices. We use a simple array to
map the integer indices to instructions. We update the mapping after any peephole optimizations.6
6 The astute reader may notice that if we always insert instructions into the basic block's linked list and the peephole optimizer
returns a previous instruction, we will attempt to put the same instruction on the linked list twice, corrupting the list. We
correct this in Section 4.
x+ x
dst
Inst next;
next
- g;
-
Figure 6 shows the new parser interface. Since we are using use-def information instead of a context
window, we no longer need a prev variable to hold the context window.
parse()
f
int dst, src0, src1; ==The instruction variables
Inst map; ==A map from variable to instruction
: : :; ==parse and compute quad information
g
== Check for peephole optimizations specic to an opcode
switch( quad!op ) f
case Copy: Copy opcode?
==
break;
case Add: Add opcode?
==
quad = src1;
g
g else if( src1!op == Constant ) f ==Right input is a constant?
if( (int)(src1!src0) == 0 ) f ==Adding a zero on the right?
delete quad; ==Same as a Copy of the left input
quad = src0;
g
g
break;
case Subtract: ==Subtract opcode?
if( src0 == src1 ) f ==Subtracting equivalents?
delete quad; ==Then replace with a constant zero
class Inst f
Enumeration of all possible opcodes
==
AddInst( Inst src0, Inst src1, Block c ) : src0(src0), src1(src1), Inst( Add, c ) fg
g;
g;
Inst AddInst::vpeephole()
f
if( src0!op == Constant ) f ==Left input is a constant?
;
::: ==Same code as before
g else if( src1!op == Constant ) f ==Right input is a constant?
Arena( Arena next ) : next(next) fg ==New Arena, plug in at head of linked list
g;
class Inst f
static Arena arena; Arena to store instructions in
==
static char hwm, max, old; High water mark, limit in Arena
==
f if( ptr == old ) hwm = old; g Check for deleting recently allocated space
==
;
::: == Other member functions
g;
abstraction implementation
def def
@
R
@
use
@
I
@ xuse
Figure 14 The implementation of dependence edges
Every instruction takes a control input from the basic block that determines when the instruction is
executed. If the input is an edge in our abstract graph, then the basic block must be a node in the abstract
graph. So we dene a Region instruction [3] to replace a basic block. A Region instruction takes control
from each predecessor block as inputs and produces a merged control as an output.
g;
?@
?
R
@ class AddInst : public Inst f
Out edges Inst const control; == Controlling instruction
Inst const src0, const src1; == Source (input) variables
Before AddInst( Inst src0, Inst src1, Inst block )
: src0(src0), src1(src1), control(block), Inst( Add ) fg
: : :; ==Other member functions
In edges g;
@
R?
@ ?
class RegionInst : public Inst f
Region const int max; Number of control inputs
==
?@
Inst const srcs; Array of control inputs
==
? R
@ RegionInst( int max ) : max(max), srcs(new Inst[max]), Inst( Region ) fg
? @
@
? R Inst ;
::: == Other member functions
g;
Out edges
After
Figure 15 Explicit control dependence
Since Region instructions merge control, they do not need a separate control input to determine when
they execute. So the control input eld is moved from the Inst denition to the class-specic area for each
inherited class. Figure 15 shows the change in the base class and some inherited classes.
If the basic block ends in a conditional instruction, that instruction is replaced with an If instruction.
... Control
Predicate
cc := predicate
? ?
branch eq B1 If
True ?
? @ False
R
@ True+
Q
sFalse
QQ
Start i0
Start: i0 = initial data
? ? ? ? ? ?
? ? Region
. Select Select
? ?
loop: i1 = (i0 ; i2) i1 Compose
i2 = i1 + 1
?
cc = test( i2) -
i2 +1
.
branch ne loop
J
=
. - J
. cc predicate J
?
?
If loop exit data
...i2... ? loop back control
loop exit control ?
if( Debug ) f g
== Debug code ::: ; == Same code as before
: : :; return this; == Return instruction replacing this one
g g
control paths can reach a particular merge point.12 For merge points created with structured code (i.e.,
an if/then/else construct), all the control paths reaching the merge point are known. After parsing all the
paths to the merge point, Region and Compose instructions can be optimized. For merge points created
with labels, the front-end cannot optimize the merge point until the label goes out of scope. In general,
this includes the entire procedure body.
6.3 Value Numbering and Control
If we encode the control input into our value numbering's hash and key-compare functions we no longer
match two identical instructions in two dierent basic blocks. This means we no longer need to
ush our
hash table between basic blocks. However, we are still doing only local value numbering. Ignoring the
control input (and doing some form of global value numbering) is covered in Section 9.
6.4 A Uniform Representation
At this point we are using the same basic Inst class to represent the entire program. Control and data
ow are represented uniformly as edges between nodes in a graph. From this point on, we rene the graph,
but we do not make any major changes to it.
Having made the paradigm shift from quads to graphs, what have we gained? The answer lies in the
next section where we look at the generic code for peephole optimizations. This code applies uniformly
to all instruction types. Adding a new instruction type (or opcode) does not require any changes. This
peephole code is powerful; while the front end is parsing and generating instructions, the peepholer is
value-numbering, constant folding, and eliminating unreachable code.
7 Types and Pessimistic Optimizations
Our previous vpeephole functions combined both constant folding and identity-function optimizations.
As we see in Section 10, conditional constant propagation cannot use identity-function optimizations and
requires only some type of constant nding code. So we break these functions up into Compute for nding
constants, and Identity for nding identity functions. Any constants found by Compute are saved in
the instruction as a type.
A type is a set of values. We are interested in the set of values, or type, that an instruction might take
on during execution. We use a set of types arranged in the lattice with >, constants, and ?.13 Figure 19
shows the lattice and the class structure to hold lattice elements. For the control-producing instructions
we use > and ? to represent the absence (unreachable) and presence (reachable) of control.
12 During the parsing stage, we cannot yet be in SSA form. We do not have -functions to mark places to insert Compose
instructions.
13 The lattice used by our compiler is much more complex, allowing us to nd ranges and
oating point constants in addition
to integer constants.
int con;
c A ? g;
cc
A?
?
Code for nding identity functions and computing a new Type is given in Figure 20. If Identity nds
that instruction x is an identity function of some other instruction y, then x is deleted and y is returned
as x's replacement. Deleting x and returning y only works if nothing references x (else we have a dangling
pointer to x). So the Identity code can only be used when we have recently created x, during parsing.
Since one of our goals is parse-time optimizations, this is not a problem.
7.1 Putting it Together: Pessimistic Optimizations
Our next peephole optimizer works as follows:
Compute a new Type for the instruction.
If the instruction's type is a constant, replace the instruction with a Constant instruction. We
delete this before creating the new instruction, so our fast operator new and delete can reuse the
storage space. This also means we need to save the relevant constant before we delete this.
Value-number the instruction, trying to nd a previously existing instruction. If one exists, delete
this instruction and use the old one. We don't need to run Identity on the old instruction, since
Identity already ran on the old instruction before it was placed in the hash table.
Next, try to see if this instruction is an identity function on some previous instruction.
If we did not nd a replacement instruction we must be computing a new value. Place this new
instruction in the value-numbering hash table.
Finally, we return the optimized instruction.
g
Inst v n = tmp!hash lookup(); ==Hash on object-specic key
if( v n ) f delete tmp; return v n; g ==Return any hit
tmp = tmp!Identity(); ==Find any identity function
if( tmp == this ) hash install(); ==New instruction for hash table
class Inst f
Default Type computation for Projections
==
break;
g
g
g
g
return tmp;
g
while Loads do not produce a new S tore) completely serializes I/O. At program exit, the I/O state is
required. However, the S tore is not required. Non-memory-mapped I/O requires a subroutine call.
8.4 Subroutines
Subroutines are treated like simple instructions that take in many values and return many values. Sub-
routines take in and return at least the control token, the S tore, and the I/O state. They also take in any
input parameters, and often return a result (they already return three other results). The peephole (and
other) optimization calls can use any interprocedural information present.
8.5 Projection Instructions
With the denition of Projection instructions a major gap in our model is lled. We now have concrete
code on how the peepholer nds and removes unreachable code. We also see how to handle memory,
subroutines, and instructions that dene many values.
So far, every data instruction includes a control input that essentially denes what basic block the
instruction belongs to. But in many cases we do not care what block an instruction is placed in, as long
as it gets executed after its data dependencies are satised and before any uses. Indeed, on a superscalar
or VLIW machine, we may want to move many instructions across basic block boundaries to ll idle slots
on multiple functional units. In the next section we look at a simple solution to this problem: removing
?
? @
R
g
y Load
Figure 25 Treatment of memory (S tore)
class Inst f
static Inst DefUse; ==Array of def-use edges
static int EdgeCnt; ==Number of use-def edges
static int visit goal; ==Goal color for visits
g;
return DefUse;
g
g
g
du tmp += def use cnt; ==Bump so next Inst gets dierent section
use!def use edge[--u!tmp] = this; ==We use what the value denes
g
g
for( int i=0; i<x!def use cnt, i++ )==Yes! For all users...
worklist push(x!def use edge[i]); ==...put user on worklist
g
g
g
11 Summary
To produce a faster optimizer, we decided to do some of the work in the front end. We reasoned that cheap
peephole optimizations done while parsing would reduce the size of our intermediate representation and
the expense of later optimization phases. To do useful peephole optimizations we need use-def information
and the static single assignment property.
We made our front end build in SSA form. Because we cannot analyze the entire program while we
parse it, we had to insert redundant -functions. We observed that variable names were a one-to-one map
to program expressions. So we replaced the variable names with pointers to their concrete representations.
At this point the variable name dened by any expression becomes redundant, so we removed the name
(the dst eld) from our Insts. We also observed an implicit
ow of control within basic blocks, which we
made explicit (and thus subject to optimizations). We discovered, while trying to write peephole versions of
unreachable code elimination, that our model was not compositional. We xed this by bringing in control
dependences to the -functions and breaking up -functions into Select and Compose instructions.
We took advantage of C++'s inheritance mechanisms and restructured our Insts into separate classes
for each kind of instruction. We also plugged in specialized allocate and delete functions.
At this point we noticed that our basic blocks' structures held no more information than a typical
instruction (i.e., a variety of dependence edges). So we replaced the basic block structures with Region
instructions. Our same peephole mechanism we had been using all along now allowed us to do unreachable
code elimination in addition to the regular constant folding and value numbering optimizations.
17 Actually we put all zero-input instructions on the worklist. There are a variety of ways to deal with constants, none of
which are mentioned here.