Essentials of Compilation
Essentials of Compilation
An Incremental Approach
April 2, 2019
ii
This book is dedicated to the programming
language wonks at Indiana University.
iv
Contents
1 Preliminaries 5
1.1 Abstract Syntax Trees and S-expressions . . . . . . . . . . . . 5
1.2 Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Interpreters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Example Compiler: a Partial Evaluator . . . . . . . . . . . . 14
3 Register Allocation 37
3.1 Registers and Calling Conventions . . . . . . . . . . . . . . . 38
3.2 Liveness Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Building the Interference Graph . . . . . . . . . . . . . . . . . 40
3.4 Graph Coloring via Sudoku . . . . . . . . . . . . . . . . . . . 42
3.5 Print x86 and Conventions for Registers . . . . . . . . . . . . 48
v
vi CONTENTS
6 Functions 93
6.1 The R4 Language . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Functions in x86 . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2.1 Calling Conventions . . . . . . . . . . . . . . . . . . . 95
6.2.2 Efficient Tail Calls . . . . . . . . . . . . . . . . . . . . 98
6.3 Shrink R4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.4 Reveal Functions . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.5 Limit Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.6 Remove Complex Operators and Operands . . . . . . . . . . 100
6.7 Explicate Control and the C3 language . . . . . . . . . . . . . 100
CONTENTS vii
12 Appendix 131
12.1 Interpreters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
12.2 Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . 131
12.2.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 132
12.3 x86 Instruction Set Quick-Reference . . . . . . . . . . . . . . 132
viii CONTENTS
List of Figures
ix
x LIST OF FIGURES
5.1 Example program that creates tuples and reads from them. . 72
5.2 The syntax of R3 , extending R2 (Figure 4.1) with tuples. . . 72
5.3 Interpreter for the R3 language. . . . . . . . . . . . . . . . . . 74
5.4 Type checker for the R3 language. . . . . . . . . . . . . . . . 75
5.5 A copying collector in action. . . . . . . . . . . . . . . . . . . 77
5.6 Depiction of the Cheney algorithm copying the live tuples. . . 79
5.7 Maintaining a root stack to facilitate garbage collection. . . . 80
5.8 Representation for tuples in the heap. . . . . . . . . . . . . . 81
5.9 The compiler’s interface to the garbage collector. . . . . . . . 82
5.10 Output of the expose-allocation pass, minus all of the
has-type forms. . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.11 The C2 language, extending C1 (Figure 4.5) with vectors. . . 85
5.12 Output of uncover-locals for the running example. . . . . . 86
5.13 The x862 language (extends x861 of Figure 4.4). . . . . . . . . 88
5.14 Output of the select-instructions pass. . . . . . . . . . . 89
5.15 Output of the print-x86 pass. . . . . . . . . . . . . . . . . . 91
5.16 Diagram of the passes for R3 , a language with tuples. . . . . 92
1
2 LIST OF FIGURES
input language and add or modify passes to handle the new feature [Ghu-
loum, 2006]. In this way, the students see how the language features motivate
aspects of the compiler design.
After graduating from Indiana University in 2005, Jeremy went on to
teach at the University of Colorado. He adapted the nano pass and incre-
mental approaches to compiling a subset of the Python language [Siek and
Chang, 2012]. Python and Scheme are quite different on the surface but
there is a large overlap in the compiler techniques required for the two lan-
guages. Thus, Jeremy was able to teach much of the same content from the
Indiana compiler course. He very much enjoyed teaching the course orga-
nized in this way, and even better, many of the students learned a lot and
got excited about compilers.
Jeremy returned to teach at Indiana University in 2013. In his absence
the compiler course had switched from the front-to-back organization to
a back-to-front organization. Seeing how well the incremental approach
worked at Colorado, he started porting and adapting the structure of the
Colorado course back into the land of Scheme. In the meantime Indiana had
moved on from Scheme to Racket, so the course is now about compiling a
subset of Racket (and Typed Racket) to the x86 assembly language. The
compiler is implemented in Racket 7.1 [Flatt and PLT, 2014].
This is the textbook for the incremental version of the compiler course at
Indiana University (Spring 2016 - present) and it is the first open textbook
for an Indiana compiler course. With this book we hope to make the Indiana
compiler course available to people that have not had the chance to study in
Bloomington in person. Many of the compiler design decisions in this book
are drawn from the assignment descriptions of Dybvig and Keep [2010]. We
have captured what we think are the most important topics from Dybvig and
Keep [2010] but we have omitted topics that we think are less interesting
conceptually and we have made simplifications to reduce complexity. In
this way, this book leans more towards pedagogy than towards the absolute
efficiency of the generated code. Also, the book differs in places where we
saw the opportunity to make the topics more fun, such as in relating register
allocation to Sudoku (Chapter 3).
Prerequisites
The material in this book is challenging but rewarding. It is meant to
prepare students for a lifelong career in programming languages. We do
not recommend this book for students who want to dabble in programming
LIST OF FIGURES 3
languages.
The book uses the Racket language both for the implementation of the
compiler and for the language that is compiled, so a student should be
proficient with Racket (or Scheme) prior to reading this book. There are
many other excellent resources for learning Scheme and Racket [Dybvig,
1987, Abelson and Sussman, 1996, Friedman and Felleisen, 1996, Felleisen
et al., 2001, 2013, Flatt et al., 2014]. It is helpful but not necessary for the
student to have prior exposure to x86 (or x86-64) assembly language [Intel,
2015], as one might obtain from a computer systems course [Bryant and
O’Hallaron, 2005, 2010]. This book introduces the parts of x86-64 assembly
language that are needed.
Acknowledgments
Many people have contributed to the ideas, techniques, organization, and
teaching of the materials in this book. We especially thank the following
people.
• Kent Dybvig
• Daniel P. Friedman
• Ronald Garcia
• Abdulaziz Ghuloum
• Jay McCarthy
• Dipanwita Sarkar
• Andrew Keep
• Oscar Waddell
• Michael Wollowski
Jeremy G. Siek
http://homes.soic.indiana.edu/jsiek
4 LIST OF FIGURES
1
Preliminaries
In this chapter, we review the basic tools that are needed for implementing
a compiler. We use abstract syntax trees (ASTs), which refer to data struc-
tures in the compilers memory, rather than programs as they are stored on
disk, in concrete syntax. ASTs can be represented in many different ways, de-
pending on the programming language used to write the compiler. Because
this book uses Racket (http://racket-lang.org), a descendant of Lisp,
we use S-expressions to represent programs (Section 1.1). We use grammars
to defined programming languages (Section 1.2) and pattern matching to
inspect individual nodes in an AST (Section 1.3). We use recursion to con-
struct and deconstruct entire ASTs (Section 1.4). This chapter provides an
brief introduction to these ideas.
The primary data structure that is commonly used for representing pro-
grams is the abstract syntax tree (AST). When considering some part of a
program, a compiler needs to ask what kind of part it is and what sub-parts
it has. For example, the program on the left, represented by an S-expression,
corresponds to the AST on the right.
5
6 1. PRELIMINARIES
We shall use the standard terminology for trees: each circle above is called
a node. The arrows connect a node to its children (which are also nodes).
The top-most node is the root. Every node except for the root has a parent
(the node it is the child of). If a node has no children, it is a leaf node.
Otherwise it is an internal node.
Recall that an symbolic expression (S-expression) is either
1. an atom, or
2. a pair of two S-expressions, written (e1 .e2 ), where e1 and e2 are each
an S-expression.
An atom can be a symbol, such as ‘hello, a number, the null value ’(),
etc. We can create an S-expression in Racket simply by writing a backquote
(called a quasi-quote in Racket). followed by the textual representation of
the S-expression. It is quite common to use S-expressions to represent a list,
such as a, b, c in the following way:
‘(a . (b . (c . ())))
Each element of the list is in the first slot of a pair, and the second slot
is either the rest of the list or the null value, to mark the end of the list.
Such lists are so common that Racket provides special notation for them
that removes the need for the periods and so many parenthesis:
‘(a b c)
For another example, an S-expression to represent the AST (1.1) is created
by the following Racket expression:
AST node has operation ‘+ and its two children are ‘(read) and ‘(- 8),
just as in the diagram (1.1).
To build larger S-expressions one often needs to splice together sev-
eral smaller S-expressions. Racket provides the comma operator to splice
an S-expression into a larger one. For example, instead of creating the
S-expression for AST (1.1) all at once, we could have first created an S-
expression for AST (1.5) and then spliced that into the addition S-expression.
(define ast1.4 ‘(- 8))
(define ast1.1 ‘(+ (read) ,ast1.4))
In general, the Racket expression that follows the comma (splice) can be
any expression that computes an S-expression.
When deciding how to compile program (1.1), we need to know that
the operation associated with the root node is addition and that it has two
children: read and a negation. The AST data structure directly supports
these queries, as we shall see in Section 1.3, and hence is a good choice for
use in compilers. In this book, we will often write down the S-expression
representation of a program even when we really have in mind the AST
because the S-expression is more concise. We recommend that, in your
mind, you always think of programs as abstract syntax trees.
1.2 Grammars
A programming language can be thought of as a set of programs. The
set is typically infinite (one can always create larger and larger programs),
so one cannot simply describe a language by listing all of the programs
in the language. Instead we write down a set of rules, a grammar, for
building programs. We shall write our rules in a variant of Backus-Naur
Form (BNF) [Backus et al., 1960, Knuth, 1964]. As an example, we describe
a small language, named R0 , of integers and arithmetic operations. The first
rule says that any integer is an expression, exp, in the language:
Each rule has a left-hand-side and a right-hand-side. The way to read a rule
is that if you have all the program parts on the right-hand-side, then you
can create an AST node and categorize it according to the left-hand-side.
A name such as exp that is defined by the grammar rules is a non-terminal.
The name int is a also a non-terminal, however, we do not define int be-
cause the reader already knows what an integer is. Further, we make the
8 1. PRELIMINARIES
simplifying design decision that all of the languages in this book only handle
machine-representable integers. On most modern machines this corresponds
to integers represented with 64-bits, i.e., the in range −263 to 263 − 1. How-
ever, we restrict this range further to match the Racket fixnum datatype,
which allows 63-bit integers on a 64-bit machine.
The second grammar rule is the read operation that receives an input
integer from the user of the program.
The third rule says that, given an exp node, you can build another exp
node by negating it.
exp ::= (- exp) (1.4)
Symbols such as - in typewriter font are terminal symbols and must literally
appear in the program for the rule to be applicable.
We can apply the rules to build ASTs in the R0 language. For example,
by rule (1.2), 8 is an exp, then by rule (1.4), the following AST is an exp.
–
(- 8) (1.5)
8
Now we can see that the AST (1.1) is an exp in R0 . We know that (read) is
an exp by rule (1.3) and we have shown that (- 8) is an exp, so we can apply
rule (1.6) to show that (+ (read) (- 8)) is an exp in the R0 language.
If you have an AST for which the above rules do not apply, then the
AST is not in R0 . For example, the AST (- (read) (+ 8)) is not in R0
because there are no rules for + with only one argument, nor for - with two
arguments. Whenever we define a language with a grammar, we implicitly
mean for the language to be the smallest set of programs that are justified
by the rules. That is, the language only includes those programs that the
rules allow.
The last grammar rule for R0 states that there is a program node to
mark the top of the whole program:
(match ast1.1
[‘(,op ,child1 ,child2)
(print op) (newline) ’+
(print child1) (newline) ’(read)
(print child2)]) ’(- 8)
The match form takes AST (1.1) and binds its parts to the three variables
op, child1, and child2. In general, a match clause consists of a pattern
and a body. The pattern is a quoted S-expression that may contain pattern-
variables (each one preceded by a comma). The pattern is not the same thing
as a quasiquote expression used to construct ASTs, however, the similarity
is intentional: constructing and deconstructing ASTs uses similar syntax.
While the pattern uses a restricted syntax, the body of the match clause
may contain any Racket code whatsoever.
A match form may contain several clauses, as in the following function
leaf? that recognizes when an R0 node is a leaf. The match proceeds
10 1. PRELIMINARIES
through the clauses in order, checking whether the pattern can match the
input S-expression. The body of the first clause that matches is executed.
The output of leaf? for several S-expressions is shown on the right. In the
below match, we see another form of pattern: the (? fixnum?) applies
the predicate fixnum? to the input S-expression to see if it is a machine-
representable integer.
(leaf? ‘(read)) #t
(leaf? ‘(- 8)) #f
(leaf? ‘(+ (read) (- 8))) #f
1.4 Recursion
Sometimes such a trick will save a few lines of code, especially when it
comes to the program wrapper. Yet this style is generally not recommended
because it can get you into trouble. For instance, the above function is sub-
tly wrong: (R0? ‘(program (program 3))) will return true, when it should
return false.
1
This principle of structuring code according to the data definition is advocated in the
book How to Design Programs http://www.ccs.neu.edu/home/matthias/HtDP2e/.
12 1. PRELIMINARIES
(define (interp-exp e)
(match e
[(? fixnum?) e]
[‘(read)
(let ([r (read)])
(cond [(fixnum? r) r]
[else (error ’interp-R0 "input␣not␣an␣integer" r)]))]
[‘(- ,e1) (fx- 0 (interp-exp e1))]
[‘(+ ,e1 ,e2) (fx+ (interp-exp e1) (interp-exp e2))]
))
(define (interp-R0 p)
(match p
[‘(program ,e) (interp-exp e)]))
1.5 Interpreters
The meaning, or semantics, of a program is typically defined in the spec-
ification of the language. For example, the Scheme language is defined in
the report by Sperber et al. [2009]. The Racket language is defined in its
reference manual [Flatt and PLT, 2014]. In this book we use an interpreter
to define the meaning of each language that we consider, following Reynold’s
advice in this regard [Reynolds, 1972]. Here we warm up by writing an inter-
preter for the R0 language, which serves as a second example of structural
recursion. The interp-R0 function is defined in Figure 1.2. The body of the
function is a match on the input program p and then a call to the interp-exp
helper function, which in turn has one match clause per grammar rule for
R0 expressions.
Let us consider the result of interpreting a few R0 programs. The fol-
lowing program simply adds two integers.
(+ 10 32)
The result is 42, as you might have expected. Here we have written the
program in concrete syntax, whereas the parsed abstract syntax would be
the slightly different: (program (+ 10 32)).
The next example demonstrates that expressions may be nested within
each other, in this case nesting several additions and negations.
(+ 10 (- (+ 12 20)))
1.5. INTERPRETERS 13
compile
P1 P2
(1.7)
interp-L1 (i) interp-L2 (i)
o
14 1. PRELIMINARIES
In the next section we see our first example of a compiler, which is another
example of structural recursion.
(define (pe-neg r)
(cond [(fixnum? r) (fx- 0 r)]
[else ‘(- ,r)]))
(define (pe-arith e)
(match e
[(? fixnum?) e]
[‘(read) ‘(read)]
[‘(- ,e1)
(pe-neg (pe-arith e1))]
[‘(+ ,e1 ,e2)
(pe-add (pe-arith e1) (pe-arith e2))]))
17
18 2. INTEGERS AND VARIABLES
integer n.
compile
P1 P2
interp-R1 interp-x86
reg ::= rsp | rbp | rax | rbx | rcx | rdx | rsi | rdi |
r8 | r9 | r10 | r11 | r12 | r13 | r14 | r15
arg ::= $int | %reg | int(%reg)
instr ::= addq arg, arg | subq arg, arg | negq arg | movq arg, arg |
callq label | pushq arg | popq arg | retq | label: instr
prog ::= .globl main
main: instr +
.globl main
main:
movq $10, %rax
addq $32, %rax
retq
The move instruction, movq s d reads from s and stores the result in d. The
callq label instruction executes the procedure specified by the label.
Figure 2.4 depicts an x86 program that is equivalent to (+ 10 32). The
globl directive says that the main procedure is externally visible, which is
necessary so that the operating system can call it. The label main: indicates
the beginning of the main procedure which is where the operating system
starts executing this program. The instruction movq $10, %rax puts 10 into
register rax. The following instruction addq $32, %rax adds 32 to the 10 in
rax and puts the result, 42, back into rax.
The last instruction, retq, finishes the main function by returning the
integer in rax to the operating system. The operating system interprets this
integer as the program’s exit code. By convention, an exit code of 0 indicates
the program was successful, and all other exit codes indicate various errors.
Nevertheless, we return the result of the program as the exit code.
Unfortunately, x86 varies in a couple ways depending on what operating
system it is assembled in. The code examples shown here are correct on
Linux and most Unix-like platforms, but when assembled on Mac OS X,
labels like main must be prefixed with an underscore, as in _main.
We exhibit the use of memory for storing intermediate results in the next
example. Figure 2.5 lists an x86 program that is equivalent to (+ 52 (- 10)).
22 2. INTEGERS AND VARIABLES
start:
movq $10, -8(%rbp)
negq -8(%rbp)
movq -8(%rbp), %rax
addq $52, %rax
jmp conclusion
.globl main
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
jmp start
conclusion:
addq $16, %rsp
popq %rbp
retq
This program uses a region of memory called the procedure call stack (or
stack for short). The stack consists of a separate frame for each procedure
call. The memory layout for an individual frame is shown in Figure 2.6. The
register rsp is called the stack pointer and points to the item at the top of
the stack. The stack grows downward in memory, so we increase the size of
the stack by subtracting from the stack pointer. The frame size is required
to be a multiple of 16 bytes. In the context of a procedure call, the return
address is the next instruction on the caller side that comes after the call
instruction. During a function call, the return address is pushed onto the
stack. The register rbp is the base pointer which serves two purposes: 1)
it saves the location of the stack pointer for the calling procedure and 2) it
is used to access variables associated with the current procedure. The base
pointer of the calling procedure is pushed onto the stack after the return
address. We number the variables from 1 to n. Variable 1 is stored at
address −8(%rbp), variable 2 at −16(%rbp), etc.
Getting back to the program in Figure 2.5, the first three instructions
are the typical prelude for a procedure. The instruction pushq %rbp saves
the base pointer for the procedure that called the current one onto the stack
and subtracts 8 from the stack pointer. The second instruction movq %rsp,
%rbp changes the base pointer to the top of the stack. The instruction subq
2.2. THE X86 ASSEMBLY LANGUAGE 23
Position Contents
8(%rbp) return address
0(%rbp) old rbp
-8(%rbp) variable 1
-16(%rbp) variable 2
... ...
0(%rsp) variable n
$16, %rsp moves the stack pointer down to make enough room for storing
variables. This program just needs one variable (8 bytes) but because the
frame size is required to be a multiple of 16 bytes, it rounds to 16 bytes.
The four instructions under the label start carry out the work of com-
puting (+ 52 (- 10)). The first instruction movq $10, -8(%rbp) stores 10 in
variable 1. The instruction negq -8(%rbp) changes variable 1 to −10. The
movq $52, %rax places 52 in the register rax and addq -8(%rbp), %rax
adds the contents of variable 1 to rax, at which point rax contains 42.
The three instructions under the label conclusion are the typical finale
of a procedure. The first two are necessary to get the state of the machine
back to where it was at the beginning of the procedure. The addq $16,
%rsp instruction moves the stack pointer back to point at the old base
pointer. The amount added here needs to match the amount that was
subtracted in the prelude of the procedure. Then popq %rbp returns the old
base pointer to rbp and adds 8 to the stack pointer. The last instruction,
retq, jumps back to the procedure that called this one and adds 8 to the
stack pointer, which returns the stack pointer to where it was prior to the
procedure call.
The compiler will need a convenient representation for manipulating x86
programs, so we define an abstract syntax for x86 in Figure 2.7. We re-
fer to this language as x860 with a subscript 0 because later we introduce
extended versions of this assembly language. The main difference com-
pared to the concrete syntax of x86 (Figure 2.3) is that it does nto allow
labelled instructions to appear anywhere, but instead organizes instructions
into groups called blocks and a label is associated with every block, which is
why the program form includes an association list mapping labels to blocks.
The reason for this organization becomes apparent in Chapter 4.
24 2. INTEGERS AND VARIABLES
register ::= rsp | rbp | rax | rbx | rcx | rdx | rsi | rdi |
r8 | r9 | r10 | r11 | r12 | r13 | r14 | r15
arg ::= (int int) | (reg register) | (deref register int)
instr ::= (addq arg arg) | (subq arg arg) | (movq arg arg) | (retq)
| (negq arg) | (callq label) | (pushq arg) | (popq arg)
block ::= (block info instr + )
x860 ::= (program info ((label . block)+ ))
(a) x86 arithmetic instructions typically have two arguments and update
the second argument in place. In contrast, R1 arithmetic operations
take two arguments and produce a new value. An x86 instruction may
have at most one memory-accessing argument. Furthermore, some
instructions place special restrictions on their arguments.
(d) An R1 program can have any number of variables whereas x86 has 16
registers and the procedure calls stack.
(e) Variables in R1 can overshadow other variables with the same name.
The registers and memory locations of x86 all have unique names or
addresses.
Each of these steps is called a pass of the compiler, because step traverses
(passes over) the AST of the program. We begin by giving a sketch about
how we might implement each pass, and give them names. We shall then
figure out an ordering of the passes and the input/output language for each
pass. The very first pass has R1 as its input language and the last pass has
x86 as its output language. In between we can choose whichever language
is most convenient for expressing the output of each pass, whether that
be R1 , x86, or new intermediate languages of our own design. Finally, to
implement the compiler, we shall write one function, typically a structural
recursive function, per pass.
Pass uniquify This pass deals with the shadowing of variables by renaming
every variable to a unique name, so that shadowing no longer occurs.
The next question is: in what order should we apply these passes? This
question can be a challenging one to answer because it is difficult to know
ahead of time which orders will be better (easier to implement, produce more
efficient code, etc.) so often some trial-and-error is involved. Nevertheless,
we can try to plan ahead and make educated choices regarding the orderings.
Let us consider the ordering of uniquify and remove-complex-opera*.
The assignment of subexpressions to temporary variables involves introduc-
ing new variables and moving subexpressions, which might change the shad-
owing of variables and inadvertently change the behavior of the program.
But if we apply uniquify first, this will not be an issue. Of course, this
26 2. INTEGERS AND VARIABLES
uniquify remove-complex.
R1 R1 R1
explicate-control
uncover-locals
C0 C0
select-instr.
assign-homes patch-instr. print-x86
x86∗0 x86∗0 x860 x86†0
the program. At the start of the program, these variables are uninitialized
(they contain garbage) and each variable becomes initialized on its first
assignment.
Exercise 2. Complete the uniquify pass by filling in the blanks, that is,
implement the clauses for variables and for the let construct.
by the ending .rkt. Use the interp-tests function (Appendix 12.2) from
utilities.rkt to test your uniquify pass on the example programs.
(program ()
(let ([tmp.1 42])
(let ([a tmp.1])
(let ([tmp.2 a])
(let ([b tmp.2])
b)))))
The read operation does not have a direct counterpart in x86 assembly,
so we have instead implemented this functionality in the C language, with
the function read_int in the file runtime.c. In general, we refer to all of
the functionality in this file as the runtime system, or simply the runtime
for short. When compiling your generated x86 assembly code, you will need
to compile runtime.c to runtime.o (an “object file”, using gcc option -c)
and link it into the final executable. For our purposes of code generation,
all you need to do is translate an assignment of read to some variable lhs
(for left-hand side) into a call to the read_int function followed by a move
from rax to the left-hand side. The move from rax is needed because the
return value from read_int goes into rax, as is the case in general.
(callq read_int)
(assign lhs (read)) ⇒
(movq (reg rax) (var lhs))
There are two cases for the tail non-terminal: return and seq. Re-
garding (return e), we recommend treating it as an assignment to the rax
register followed by a jump to the conclusion of the program (so the conclu-
sion needs to be labeled). For (seq s t), we simply process the statement s
and tail t recursively and append the resulting instructions.
Exercise 5. Implement the select-instructions pass and test it on all
of the example programs that you created for the previous passes and create
three new example programs that are designed to exercise all of the inter-
esting code in this pass. Use the interp-tests function (Appendix 12.2)
from utilities.rkt to test your passes on the example programs.
Exercise 8. Implement the print-x86 pass and test it on all of the example
programs that you created for the previous passes. Use the compiler-tests
function (Appendix 12.2) from utilities.rkt to test your complete com-
piler on the example programs.
36 2. INTEGERS AND VARIABLES
3
Register Allocation
37
38 3. REGISTER ALLOCATION
call that they did before the call. The callee can freely use any of the caller-
saved registers. However, if the callee wants to use a callee-saved register,
the callee must arrange to put the original value back in the register prior to
returning to the caller, which is usually accomplished by saving and restoring
the value from the stack.
1 (block () {}
2 (movq (int 1) (var v)) {v}
3 (movq (int 46) (var w)) {v, w}
4 (movq (var v) (var x)) {w, x}
5 (addq (int 7) (var x)) {w, x}
6 (movq (var x) (var y)) {w, x, y}
7 (addq (int 4) (var y)) {w, x, y}
8 (movq (var x) (var z)) {w, y, z}
9 (addq (var w) (var z)) {y, z}
10 (movq (var y) (var t.1)) {z, t.1}
11 (negq (var t.1)) {z, t.1}
12 (movq (var z) (reg rax)) {t.1}
13 (addq (var t.1) (reg rax)) {}
14 (jmp conclusion)) {}
is less than ideal for two reasons. First, it can be rather expensive because
it takes O(n2 ) time to look at every pair in a set of n live variables. Second,
there is a special case in which two variables that are live at the same time
do not actually interfere with each other: when they both contain the same
value because we have assigned one to the other.
A better way to compute the interference graph is to focus on the writes.
That is, for each instruction, create an edge between the variable being
written to and all the other live variables. (One should not create self
edges.) For a callq instruction, think of all caller-saved registers as being
written to, so and edge must be added between every live variable and every
caller-saved register. For movq, we deal with the above-mentioned special
case by not adding an edge between a live variable v and destination d if v
matches the source of the move. So we have the following three rules.
3. If instruction Ik is a move: (movq s d), then add the edge (d, v) for
every v ∈ Lafter (k) unless v = d or v = s.
Working from the top to bottom of Figure 3.2, we obtain the following
interference for the instruction at the specified line number.
Line 2: no interference,
Line 3: w interferes with v,
Line 4: x interferes with w,
Line 5: x interferes with w,
Line 6: y interferes with w,
Line 7: y interferes with w and x,
Line 8: z interferes with w and y,
Line 9: z interferes with y,
Line 10: t.1 interferes with z,
Line 11: t.1 interferes with z,
Line 12: no interference,
Line 13: no interference.
Line 14: no interference.
v w x
y z t.1
1 2 3
3 1 2
2 3 1
Figure 3.4: A Sudoku game board and the corresponding colored graph.
If you can color the remaining vertices in the graph with the nine colors, then
you have also solved the corresponding game of Sudoku. Figure 3.4 shows
an initial Sudoku game board and the corresponding graph with colored
vertices. We map the Sudoku number 1 to blue, 2 to yellow, and 3 to
red. We only show edges for a sampling of the vertices (those that are
colored) because showing edges for all of the vertices would make the graph
unreadable.
Given that Sudoku is an instance of graph coloring, one can use Sudoku
strategies to come up with an algorithm for allocating registers. For ex-
ample, one of the basic techniques for Sudoku is called Pencil Marks. The
idea is that you use a process of elimination to determine what numbers
no longer make sense for a square, and write down those numbers in the
square (writing very small). For example, if the number 1 is assigned to a
square, then by process of elimination, you can write the pencil mark 1 in
all the squares in the same row, column, and region. Many Sudoku com-
puter games provide automatic support for Pencil Marks. The Pencil Marks
technique corresponds to the notion of color saturation due to Brélaz [1979].
The saturation of a vertex, in Sudoku terms, is the set of colors that are no
longer available. In graph terminology, we have the following definition:
Algorithm: DSATUR
Input: a graph G
Output: an assignment color[v] for each vertex v ∈ G
W ← vertices(G)
while W 6= ∅ do
pick a vertex u from W with the highest saturation,
breaking ties randomly
find the lowest color c that is not in {color[v] : v ∈ adjacent(u)}
color[u] ← c
W ← W − {u}
write down that number! But what if there are no squares with only one
possibility left? One brute-force approach is to just make a guess. If that
guess ultimately leads to a solution, great. If not, backtrack to the guess and
make a different guess. One good thing about Pencil Marks is that it reduces
the degree of branching in the search tree. Nevertheless, backtracking can
be horribly time consuming. One way to reduce the amount of backtrack-
ing is to use the most-constrained-first heuristic. That is, when making a
guess, always choose a square with the fewest possibilities left (the vertex
with the highest saturation). The idea is that choosing highly constrained
squares earlier rather than later is better because later there may not be
any possibilities.
In some sense, register allocation is easier than Sudoku because we can
always cheat and add more numbers by mapping variables to the stack. We
say that a variable is spilled when we decide to map it to a stack location. We
would like to minimize the time needed to color the graph, and backtracking
is expensive. Thus, it makes sense to keep the most-constrained-first heuris-
tic but drop the backtracking in favor of greedy search (guess and just keep
going). Figure 3.5 gives the pseudo-code for this simple greedy algorithm
for register allocation based on saturation and the most-constrained-first
heuristic, which is roughly equivalent to the DSATUR algorithm of Brélaz
[1979] (also known as saturation degree ordering [Gebremedhin, 1999, Al-
Omari and Sabri, 2006]). Just as in Sudoku, the algorithm represents colors
with integers, with the first k colors corresponding to the k registers in a
given machine and the rest of the integers corresponding to stack locations.
With this algorithm in hand, let us return to the running example and
3.4. GRAPH COLORING VIA SUDOKU 45
consider how to color the interference graph in Figure 3.3. We shall not use
register rax for register allocation because we use it to patch instructions,
so we remove that vertex from the graph. Initially, all of the vertices are
not yet colored and they are unsaturated, so we annotate each of them with
a dash for their color and an empty set for the saturation.
v : −, {} w : −, {} x : −, {}
y : −, {} z : −, {} t.1 : −, {}
y : −, {} z : −, {0} t.1 : 0, {}
The most saturated vertices are now w and y. We color y with the first
available color, which is 0.
v : −, {} w : −, {0, 1} x : −, {0, }
uniquify remove-complex.
R1 R1 R1
explicate-control
uncover-locals
C0 C0
select-instr.
patch-instr. print-x86
∗ ∗
x86 x86 x86 x86†
uncover-live allocate-reg.
x86∗ x86∗
build-inter.
assigned location.
Test your updated compiler by creating new example programs that
exercise all of the register allocation algorithm, such as forcing variables to
be spilled to the stack.
Recall the print-x86 pass generates the prelude and conclusion instructions
for the main function. The prelude saved the values in rbp and rsp and the
conclusion returned those values to rbp and rsp. The reason for this is
that our main function must adhere to the x86 calling conventions that we
described in Section 3.1. In addition, the main function needs to restore
(in the conclusion) any callee-saved registers that get used during register
allocation. The simplest approach is to save and restore all of the callee-
saved registers. The more efficient approach is to keep track of which callee-
saved registers were used and only save and restore them. Either way, make
sure to take this use of stack space into account when you are calculating
the size of the frame. Also, don’t forget that the size of the frame needs to
be a multiple of 16 bytes.
Using the same assignment that was produced by register allocator described
in the last section, we get the following program.
3.6. CHALLENGE: MOVE BIASING∗ 49
(block () (block ()
(movq (int 1) (var v)) (movq (int 1) (reg rbx))
(movq (int 46) (var w)) (movq (int 46) (reg rdx))
(movq (var v) (var x)) (movq (reg rbx) (reg rcx))
(addq (int 7) (var x)) (addq (int 7) (reg rcx))
(movq (var x) (var y)) (movq (reg rcx) (reg rbx))
(addq (int 4) (var y)) (addq (int 4) (reg rbx))
⇒
(movq (var x) (var z)) (movq (reg rcx) (reg rcx))
(addq (var w) (var z)) (addq (reg rdx) (reg rcx))
(movq (var y) (var t.1)) (movq (reg rbx) (reg rbx))
(negq (var t.1)) (negq (reg rbx))
(movq (var z) (reg rax)) (movq (reg rcx) (reg rax))
(addq (var t.1) (reg rax)) (addq (reg rbx) (reg rax))
(jmp conclusion)) (jmp conclusion))
While this allocation is quite good, we could do better. For example,
the variables v and x ended up in different registers, but if they had been
placed in the same register, then the move from v to x could be removed.
We say that two variables p and q are move related if they participate
together in a movq instruction, that is, movq p, q or movq q, p. When the
register allocator chooses a color for a variable, it should prefer a color that
has already been used for a move-related variable (assuming that they do
not interfere). Of course, this preference should not override the preference
for registers over stack locations, but should only be used as a tie breaker
when choosing between registers or when choosing between stack locations.
We recommend that you represent the move relationships in a graph,
similar to how we represented interference. The following is the move graph
for our running example.
v w x
y z t.1
Now we replay the graph coloring, pausing to see the coloring of x and
v. So we have the following coloring and the most saturated vertex is x.
(block ()
(movq (int 1) (reg rcx))
(movq (int 46) (reg rbx))
(addq (int 7) (reg rcx))
(movq (reg rcx) (reg rdx))
(addq (int 4) (reg rdx))
(addq (reg rbx) (reg rcx))
(movq (reg rdx) (reg rbx))
(negq (reg rbx))
(movq (reg rcx) (reg rax))
(addq (reg rbx) (reg rax))
(jmp conclusion))
The R0 and R1 languages only had a single kind of value, the integers. In
this Chapter we add a second kind of value, the Booleans, to create the
R2 language. The Boolean values true and false are written #t and #f
respectively in Racket. We also introduce several operations that involve
Booleans (and, not, eq?, <, etc.) and the conditional if expression. With
the addition of if expressions, programs can have non-trivial control flow
which has an impact on several parts of the compiler. Also, because we now
have two kinds of values, we need to worry about programs that apply an
operation to the wrong kind of value, such as (not 1).
There are two language design options for such situations. One option
is to signal an error and the other is to provide a wider interpretation of
the operation. The Racket language uses a mixture of these two options,
depending on the operation and the kind of value. For example, the result
of (not 1) in Racket is #f because Racket treats non-zero integers like #t.
On the other hand, (car 1) results in a run-time error in Racket stating
that car expects a pair.
The Typed Racket language makes similar design choices as Racket,
except much of the error detection happens at compile time instead of run
time. Like Racket, Typed Racket accepts and runs (not 1), producing #f.
But in the case of (car 1), Typed Racket reports a compile-time error
because Typed Racket expects the type of the argument to be of the form
(Listof T) or (Pairof T1 T2).
For the R2 language we choose to be more like Typed Racket in that
we shall perform type checking during compilation. In Chapter 8 we study
the alternative choice, that is, how to compile a dynamically typed language
like Racket. The R2 language is a subset of Typed Racket but by no means
53
54 4. BOOLEANS AND CONTROL FLOW
Figure 4.1: The syntax of R2 , extending R1 (Figure 2.1) with Booleans and
conditionals.
function and the similar parts into the one match clause shown in Fig-
ure 4.2. We do not use interp-op for the and operation because of the
short-circuiting behavior in the order of evaluation of its arguments.
(define primitives (set ’+ ’- ’eq? ’< ’<= ’> ’>= ’not ’read))
the value returned by the interpreter, that is, if the type checker returns
Integer, then the interpreter should return an integer. Likewise, if the
type checker returns Boolean, then the interpreter should return #t or #f.
Note that if your type checker does not signal an error for a program, then
interpreting that program should not encounter an error. If it does, there is
something wrong with your type checker.
(- e1 e2 ) ⇒ (+ e1 (- e2 ))
By performing these translations near the front-end of the compiler, the later
passes of the compiler will not need to deal with these constructs, making
those passes shorter. On the other hand, sometimes these translations make
it more difficult to generate the most efficient code with respect to the
number of instructions. However, these differences typically do not affect the
number of accesses to memory, which is the primary factor that determines
execution time on modern computer architectures.
Exercise 14. Implement the pass shrink that removes subtraction, and,
or, <=, >, and >= from the language by translating them to other constructs
in R2 . Create tests to make sure that the behavior of all of these constructs
stays the same after translation.
xorq instruction can be used to encode not. The xorq instruction takes
two arguments, performs a pairwise exclusive-or operation on each bit of its
arguments, and writes the results into its second argument. Recall the truth
table for exclusive-or:
0 1
0 0 1
1 1 0
For example, 0011 XOR 0101 = 0110. Notice that in row of the table for the
bit 1, the result is the opposite of the second bit. Thus, the not operation
can be implemented by xorq with 1 as the first argument: 0001 XOR 0000 =
0001 and 0001 XOR 0001 = 0000.
Next we consider the x86 instructions that are relevant for compiling the
comparison operations. The cmpq instruction compares its two arguments
to determine whether one argument is less than, equal, or greater than the
other argument. The cmpq instruction is unusual regarding the order of its
arguments and where the result is placed. The argument order is backwards:
if you want to test whether x < y, then write cmpq y, x. The result of cmpq
is placed in the special EFLAGS register. This register cannot be accessed
directly but it can be queried by a number of instructions, including the set
instruction. The set instruction puts a 1 or 0 into its destination depending
on whether the comparison came out according to the condition code cc (e
for equal, l for less, le for less-or-equal, g for greater, ge for greater-or-
equal). The set instruction has an annoying quirk in that its destination
argument must be single byte register, such as al, which is part of the rax
register. Thankfully, the movzbq instruction can then be used to move from
a single byte register to a normal 64-bit register.
60 4. BOOLEANS AND CONTROL FLOW
For compiling the if expression, the x86 instructions for jumping are
relevant. The jmp instruction updates the program counter to point to
the instruction after the indicated label. The jmp-if instruction updates
the program counter to point to the instruction after the indicated label
depending on whether the result in the EFLAGS register matches the con-
dition code cc, otherwise the jmp-if instruction falls through to the next
instruction. Because the jmp-if instruction relies on the EFLAGS register,
it is quite common for the jmp-if to be immediately preceeded by a cmpq
instruction, to set the EFLAGS regsiter. Our abstract syntax for jmp-if
differs from the concrete syntax for x86 to separate the instruction name
from the condition code. For example, (jmp-if le foo) corresponds to
jle foo.
(program ()
((block62 .
(seq (assign tmp54 (read))
(program ()
(if (eq? tmp54 2)
(if (if (eq? (read) 1)
(goto block59)
(eq? (read) 0)
(goto block60))))
(eq? (read) 2))
(block61 .
(+ 10 32)
(seq (assign tmp53 (read))
(+ 700 77)))
(if (eq? tmp53 0)
⇓ (goto block57)
(goto block58))))
(program () ⇒
(block60 . (goto block56))
(if (if (let ([tmp52 (read)])
(block59 . (goto block55))
(eq? tmp52 1))
(block58 . (goto block56))
(let ([tmp53 (read)])
(block57 . (goto block55))
(eq? tmp53 0))
(block56 . (return (+ 700 77)))
(let ([tmp54 (read)])
(block55 . (return (+ 10 32)))
(eq? tmp54 2)))
(start .
(+ 10 32)
(seq (assign tmp52 (read))
(+ 700 77)))
(if (eq? tmp52 1)
(goto block61)
(goto block62))))))
are no unnecessary uses of eq? and every use of eq? is part of a conditional
jump. The down-side of this output is that it includes trivial blocks, such
as block57 through block60, that only jump to another block. We discuss
a solution to this problem in Section 4.11.
Recall that in Section 2.6 we implement the explicate-control pass for
R1 using two mutually recursive functions, explicate-control-tail and
explicate-control-assign. The former function translated expressions in
tail position whereas the later function translated expressions on the right-
hand-side of a let. With the addition of if expression in R2 we have a
new kind of context to deal with: the predicate position of the if. So we
shall need another function, explicate-control-pred, that takes an R2
expression and two pieces of C1 code (two tail’s) for the then-branch and
else-branch. The output of explicate-control-pred is a C1 tail. However,
these three functions also need to contruct the control-flow graph, which we
recommend they do via updates to a global variable. Next we consider the
specific additions to the tail and assign functions, and some of cases for the
pred function.
The explicate-control-tail function needs an additional case for if.
The branches of the if inherit the current context, so they are in tail posi-
tion. Let B1 be the result of explicate-control-tail on the thn branch
and B2 be the result of apply explicate-control-tail to the else branch.
Then the if translates to the block B3 which is the result of applying
explicate-control-pred to the predicate cnd and the blocks B1 and B2 .
(if cnd thn els) ⇒ B3
Next we consider the case for if in the explicate-control-assign
function. So the context of the if is an assignment to some variable x and
then the control continues to some block B1 . The code that we generate for
both the thn and els branches shall both need to continue to B1 , so we add
B1 to the control flow graph with a fresh label `1 . Again, the branches of the
if inherit the current context, so that are in assignment positions. Let B2
be the result of applying explicate-control-assign to the thn branch,
variable x, and the block (goto `1 ). Let B3 be the result of applying
explicate-control-assign to the else branch, variable x, and the block
(goto `1 ). The if translates to the block B4 which is the result of applying
explicate-control-pred to the predicate cnd and the blocks B2 and B3 .
(if cnd thn els) ⇒ B4
The function explicate-control-pred will need a case for every ex-
pression that can have type Boolean. We detail a few cases here and leave
64 4. BOOLEANS AND CONTROL FLOW
the rest for the reader. The input to this function is an expression and two
blocks, B1 and B2 , for the branches of the enclosing if. One of the base
cases of this function is when the expression is a less-than comparision. We
translate it to a conditional goto. We need labels for the two branches B1
and B2 , so we add them to the control flow graph and obtain some labels
`1 and `2 . The translation of the less-than comparison is as follows.
(< e1 e2 ) ⇒ (if (< e1 e2 ) (goto `1 ) (goto `2 ))
The case for if in explicate-control-pred is particularly illuminating,
as it deals with the challenges that we discussed above regarding the example
of the nested if expressions. Again, we add the two input branches B1 and
B2 to the control flow graph and obtain the labels `1 and `2 . The branches
thn and els of the current if inherit their context from the current one, i.e.,
predicate context. So we apply explicate-control-pred to thn with the
two blocks (goto `1 ) and (goto `2 ), to obtain B3 . Similarly for the els
branch, to obtain B4 . Finally, we apply explicate-control-pred to the
predicate cnd and the blocks B3 and B4 to obtain the result B5 .
(if cnd thn els) ⇒ B5
Exercise 15. Implement the pass explicate-code by adding the cases for
if to the functions for tail and assignment contexts, and implement the
function for predicate contexts. Create test cases that exercise all of the
new cases in the code for this pass.
(assign lhs (not arg)) ⇒ ((movq arg 0 lhs 0 ) (xorq (int 1) lhs 0 ))
Next consider the cases for eq? and less-than comparison. Translating
these operations to x86 is slightly involved due to the unusual nature of the
cmpq instruction discussed above. We recommend translating an assignment
from eq? into the following sequence of three instructions.
(cmpq arg 02 arg 01 )
(assign lhs (eq? arg 1 arg 2 )) ⇒ (set e (byte-reg al))
(movzbq (byte-reg al) lhs 0 )
Regarding the tail non-terminal, we have two new cases, for goto and
conditional goto. Both are straightforward to handle. A goto becomes a
jump instruction.
(goto `) ⇒ ((jmp `))
A conditional goto becomes a compare instruction followed by a conditional
jump (for “then”) and the fall-through is to a regular jump (for “else”).
(if (eq? arg 1 arg 2 ) ((cmpq arg 02 arg 01 )
(goto `1 ) ⇒ (jmp-if e `1 )
(goto `2 )) (jmp `2 ))
now produces many basic blocks arranged in a control-flow graph. The first
question we need to consider is in what order should we process the basic
blocks? Recall that to perform liveness analysis, we need to know the live-
after set. If a basic block has no successor blocks, then it has an empty
live-after set and we can immediately apply liveness analysis to it. If a basic
block has some successors, then we need to complete liveness analysis on
those blocks first. Furthermore, we know that the control flow graph does
not contain any cycles (it is a DAG, that is, a directed acyclic graph)1 . What
all this amounts to is that we need to process the basic blocks in reverse
topological order. We recommend using the tsort and transpose functions
of the Racket graph package to obtain this ordering.
The next question is how to compute the live-after set of a block given
the live-before sets of all its successor blocks. During compilation we do
not know which way the branch will go, so we do not know which of the
successor’s live-before set to use. The solution comes from the observation
that there is no harm in identifying more variables as live than absolutely
necessary. Thus, we can take the union of the live-before sets from all the
successors to be the live-after set for the block. Once we have computed the
live-after set, we can proceed to perform liveness analysis on the block just
as we did in Section 3.2.
The helper functions for computing the variables in an instruction’s ar-
gument and for computing the variables read-from (R) or written-to (W )
by an instruction need to be updated to handle the new kinds of arguments
and instructions in x861 .
(program () _block31:
(if (eq? (read) 1) 42 0)) movq $42, %rax
jmp _conclusion
⇓ _block32:
movq $0, %rax
(program ()
jmp _conclusion
((block32 . (return 0))
_start:
(block31 . (return 42))
callq _read_int
(start .
movq %rax, %rcx
(seq (assign tmp30 (read))
cmpq $1, %rcx
(if (eq? tmp30 1)
je _block31
(goto block31)
jmp _block32
(goto block32))))))
⇓ .globl _main
_main:
(program ((locals . (tmp30))) ⇒ pushq %rbp
((block32 . movq %rsp, %rbp
(block () pushq %r12
(movq (int 0) (reg rax)) pushq %rbx
(jmp conclusion))) pushq %r13
(block31 . pushq %r14
(block () subq $0, %rsp
(movq (int 42) (reg rax)) jmp _start
(jmp conclusion))) _conclusion:
(start . addq $0, %rsp
(block () popq %r14
(callq read_int) popq %r13
(movq (reg rax) (var tmp30)) popq %rbx
(cmpq (int 1) (var tmp30)) popq %r12
(jmp-if e block31) popq %rbp
(jmp block32))))) retq
uncover-locals explicate-control
C1 C1
select-instr.
patch-instr. print-x86
∗ ∗ ∗
x86 x86 x86 x86†
uncover-live allocate-reg.
x86∗ x86∗
build-inter.
1
This may sound contradictory, but Racket’s Void type corresponds to what is more
commonly called the Unit type. This type is inhabited by a single value that is usually
written unit or ()[Pierce, 2002].
71
72 5. TUPLES AND GARBAGE COLLECTION
Figure 5.1: Example program that creates tuples and reads from them.
1. preserve all tuple that are reachable from the root set via a path of
pointers, that is, the live tuples, and
A copying collector accomplishes this by copying all of the live objects from
the FromSpace into the ToSpace and then performs a slight of hand, treating
the ToSpace as the new FromSpace and the old FromSpace as the new
ToSpace. In the example of Figure 5.5, there are three pointers in the root
set, one in a register and two on the stack. All of the live objects have
been copied to the ToSpace (the right-hand side of Figure 5.5) in a way that
preserves the pointer relationships. For example, the pointer in the register
still points to a 2-tuple whose first element is a 3-tuple and second element is
a 2-tuple. There are four tuples that are not reachable from the root set and
therefore do not get copied into the ToSpace. (The sitation in Figure 5.5,
with a cycle, cannot be created by a well-typed program in R3 . However,
creating cycles will be possible once we get to R6 . We design the garbage
collector to deal with cycles to begin with, so we will not need to revisit this
issue.)
There are many alternatives to copying collectors (and their older sib-
lings, the generational collectors) when its comes to garbage collection, such
as mark-and-sweep and reference counting. The strengths of copying col-
lectors are that allocation is fast (just a test and pointer increment), there
is no fragmentation, cyclic garbage is collected, and the time complexity of
collection only depends on the amount of live data, and not on the amount
of garbage [Wilson, 1992]. The main disadvantage of two-space copying col-
lectors is that they use a lot of space, though that problem is ameliorated
in generational collectors. Racket and Scheme programs tend to allocate
many small objects and generate a lot of garbage, so copying and genera-
tional collectors are a good fit. Of course, garbage collection is an active
research topic, especially concurrent garbage collection [Tene et al., 2011].
Researchers are continuously developing new techniques and revisiting old
trade-offs [Blackburn et al., 2004, Jones et al., 2011, Shahriyar et al., 2013,
Cutler and Morris, 2015, Shidal et al., 2015].
Registers Heap
FromSpace ToSpace
1 #f …
Stack #t 42
9
7 5 3 6
#t
8
0
4 5 2
…
Registers Heap
FromSpace ToSpace
1 #f …
Stack #t 42
#t 42
9
7 5 3
7 5 3 6
#t 8
8
0
4
4 5 2
…
7 5 4
scan free
pointer pointer
7 5 4 #t 42
scan free
pointer pointer
7 5 4 #t 42 3
scan free
pointer pointer
7 5 4 #t 42 3 8
scan free
pointer pointer
7 5 4 #t 42 3 8
scan free
pointer pointer
Figure 5.6: Depiction of the Cheney algorithm copying the live tuples.
80 5. TUPLES AND GARBAGE COLLECTION
Registers Heap
1 #f …
9 7 5
#t
0
… 4
… 1 1 0 0 0 0 1 0 1
… 1 0 0 0 0 0 0 1 1 1
(program ()
(vector-ref
(vector-ref
(let ((vecinit48
(let ((vecinit44 42))
(let ((collectret46
(if (<
(+ (global-value free_ptr) 16)
(global-value fromspace_end))
(void)
(collect 16))))
(let ((alloc43 (allocate 1 (Vector Integer))))
(let ((initret45 (vector-set! alloc43 0 vecinit44)))
alloc43))))))
(let ((collectret50
(if (< (+ (global-value free_ptr) 16)
(global-value fromspace_end))
(void)
(collect 16))))
(let ((alloc47 (allocate 1 (Vector (Vector Integer)))))
(let ((initret49 (vector-set! alloc47 0 vecinit48)))
alloc47))))
0)
0))
(program
((locals . ((tmp54 . Integer) (tmp51 . Integer) (tmp53 . Integer)
(alloc43 . (Vector Integer)) (tmp55 . Integer)
(initret45 . Void) (alloc47 . (Vector (Vector Integer)))
(collectret46 . Void) (vecinit48 . (Vector Integer))
(tmp52 . Integer) (tmp57 . (Vector Integer))
(vecinit44 . Integer) (tmp56 . Integer) (initret49 . Void)
(collectret50 . Void))))
((block63 . (seq (collect 16) (goto block61)))
(block62 . (seq (assign collectret46 (void)) (goto block61)))
(block61 . (seq (assign alloc43 (allocate 1 (Vector Integer)))
(seq (assign initret45 (vector-set! alloc43 0 vecinit44))
(seq (assign vecinit48 alloc43)
(seq (assign tmp54 (global-value free_ptr))
(seq (assign tmp55 (+ tmp54 16))
(seq (assign tmp56 (global-value fromspace_end))
(if (< tmp55 tmp56) (goto block59) (goto block60)))))))))
(block60 . (seq (collect 16) (goto block58)))
(block59 . (seq (assign collectret50 (void)) (goto block58)))
(block58 . (seq (assign alloc47 (allocate 1 (Vector (Vector Integer))))
(seq (assign initret49 (vector-set! alloc47 0 vecinit48))
(seq (assign tmp57 (vector-ref alloc47 0))
(return (vector-ref tmp57 0))))))
(start . (seq (assign vecinit44 42)
(seq (assign tmp51 (global-value free_ptr))
(seq (assign tmp52 (+ tmp51 16))
(seq (assign tmp53 (global-value fromspace_end))
(if (< tmp52 tmp53) (goto block62) (goto block63)))))))))
The vec 0 and arg 0 are obtained by recursively processing vec and arg. The
move of vec 0 to register r11 ensures that offsets are only performed with
register operands. This requires removing r11 from consideration by the
register allocating.
We compile the allocate form to operations on the free_ptr, as shown
below. The address in the free_ptr is the next free address in the FromSpace,
so we move it into the lhs and then move it forward by enough space for the
tuple being allocated, which is 8(len + 1) bytes because each element is 8
bytes (64 bits) and we use 8 bytes for the tag. Last but not least, we initialize
the tag. Refer to Figure 5.8 to see how the tag is organized. We recommend
using the Racket operations bitwise-ior and arithmetic-shift to com-
pute the tag. The type annoation in the vector form is used to determine
the pointer mask region of the tag.
(assign lhs (allocate len (Vector type . . .)))
=⇒
(movq (global-value free_ptr) lhs 0 )
(addq (int 8(len + 1)) (global-value free_ptr))
(movq lhs 0 (reg r11))
(movq (int tag) (deref r11 0))
r15, to store the pointer to the top of the root stack. So r15 is not available
for use by the register allocator.
(collect bytes)
=⇒
(movq (reg r15) (reg rdi))
(movq bytes (reg rsi))
(callq collect)
The syntax of the x862 language is defined in Figure 5.13. It differs from
x861 just in the addition of the form for global variables. Figure 5.14 shows
the output of the select-instructions pass on the running example.
5.6. SELECT INSTRUCTIONS 89
(program
((locals . ((tmp54 . Integer) (tmp51 . Integer) (tmp53 . Integer)
(alloc43 . (Vector Integer)) (tmp55 . Integer)
(initret45 . Void) (alloc47 . (Vector (Vector Integer)))
(collectret46 . Void) (vecinit48 . (Vector Integer))
(tmp52 . Integer) (tmp57 Vector Integer) (vecinit44 . Integer)
(tmp56 . Integer) (initret49 . Void) (collectret50 . Void))))
((block63 . (block ()
(movq (reg r15) (reg rdi))
(movq (int 16) (reg rsi))
(callq collect)
(jmp block61)))
(block62 . (block () (movq (int 0) (var collectret46)) (jmp block61)))
(block61 . (block ()
(movq (global-value free_ptr) (var alloc43))
(addq (int 16) (global-value free_ptr))
(movq (var alloc43) (reg r11))
(movq (int 3) (deref r11 0))
(movq (var alloc43) (reg r11))
(movq (var vecinit44) (deref r11 8))
(movq (int 0) (var initret45))
(movq (var alloc43) (var vecinit48))
(movq (global-value free_ptr) (var tmp54))
(movq (var tmp54) (var tmp55))
(addq (int 16) (var tmp55))
(movq (global-value fromspace_end) (var tmp56))
(cmpq (var tmp56) (var tmp55))
(jmp-if l block59)
(jmp block60)))
(block60 . (block ()
(movq (reg r15) (reg rdi))
(movq (int 16) (reg rsi))
(callq collect)
(jmp block58))
(block59 . (block ()
(movq (int 0) (var collectret50))
(jmp block58)))
(block58 . (block ()
(movq (global-value free_ptr) (var alloc47))
(addq (int 16) (global-value free_ptr))
(movq (var alloc47) (reg r11))
(movq (int 131) (deref r11 0))
(movq (var alloc47) (reg r11))
(movq (var vecinit48) (deref r11 8))
(movq (int 0) (var initret49))
(movq (var alloc47) (reg r11))
(movq (deref r11 8) (var tmp57))
(movq (var tmp57) (reg r11))
(movq (deref r11 8) (reg rax))
(jmp conclusion)))
(start . (block ()
(movq (int 42) (var vecinit44))
(movq (global-value free_ptr) (var tmp51))
(movq (var tmp51) (var tmp52))
(addq (int 16) (var tmp52))
(movq (global-value fromspace_end) (var tmp53))
(cmpq (var tmp53) (var tmp52))
(jmp-if l block62)
(jmp block63))))))
_block58: _block61:
movq _free_ptr(%rip), %rcx movq _free_ptr(%rip), %rcx
addq $16, _free_ptr(%rip) addq $16, _free_ptr(%rip)
movq %rcx, %r11 movq %rcx, %r11
movq $131, 0(%r11) movq $3, 0(%r11)
movq %rcx, %r11 movq %rcx, %r11
movq -8(%r15), %rax movq %rbx, 8(%r11)
movq %rax, 8(%r11) movq $0, %rdx
movq $0, %rdx movq %rcx, -8(%r15)
movq %rcx, %r11 movq _free_ptr(%rip), %rcx
movq 8(%r11), %rcx addq $16, %rcx
movq %rcx, %r11 movq _fromspace_end(%rip), %rdx
movq 8(%r11), %rax cmpq %rdx, %rcx
jmp _conclusion jl _block59
_block59: jmp _block60
movq $0, %rcx
jmp _block58 .globl _main
_block62: _main:
movq $0, %rcx pushq %rbp
jmp _block61 movq %rsp, %rbp
_block60: pushq %r12
movq %r15, %rdi pushq %rbx
movq $16, %rsi pushq %r13
callq _collect pushq %r14
jmp _block58 subq $0, %rsp
_block63: movq $16384, %rdi
movq %r15, %rdi movq $16, %rsi
movq $16, %rsi callq _initialize
callq _collect movq _rootstack_begin(%rip), %r15
jmp _block61 movq $0, (%r15)
_start: addq $8, %r15
movq $42, %rbx jmp _start
movq _free_ptr(%rip), %rdx _conclusion:
addq $16, %rdx subq $8, %r15
movq _fromspace_end(%rip), %rcx addq $0, %rsp
cmpq %rcx, %rdx popq %r14
jl _block62 popq %r13
jmp _block63 popq %rbx
popq %r12
popq %rbp
retq
uncover-locals explicate-control
C2 C2
select-instr.
patch-instr.
x86∗2 x86∗2 x86∗2
Functions
93
94 6. FUNCTIONS
type ::= Integer | Boolean | (Vector type + ) | Void | (type ∗ -> type)
cmp ::= eq? | < | <= | > | >=
exp ::= int | (read) | (- exp) | (+ exp exp) | (- exp exp)
| var | (let ([var exp]) exp)
| #t | #f | (and exp exp) | (or exp exp) | (not exp)
| (cmp exp exp) | (if exp exp exp)
| (vector exp+ ) | (vector-ref exp int)
| (vector-set! exp int exp) | (void)
| (exp exp∗ )
def ::= (define (var [var:type]∗ ):type exp)
R4 ::= (program info def ∗ exp)
(program ()
(define (map-vec [f : (Integer -> Integer)]
[v : (Vector Integer Integer)])
: (Vector Integer Integer)
(vector (f (vector-ref v 0)) (f (vector-ref v 1))))
(define (add1 [x : Integer]) : Integer
(+ x 1))
(vector-ref (map-vec add1 (vector 0 41)) 1)
)
In Section 2.2 we saw the use of the callq instruction for jumping to a
function whose location is given by a label. Here we instead will be jumping
to a function whose location is given by an address, that is, we need to
make an indirect function call. The x86 syntax is to give the register name
prefixed with an asterisk.
callq *%rbx
(define (interp-def d)
(match d
[‘(define (,f [,xs : ,ps] ...) : ,rt ,body)
(mcons f ‘(lambda ,xs ,body ()))]
))
(define (interp-R4 p)
(match p
[‘(program ,ds ... ,body)
(let ([top-level (for/list ([d ds]) (interp-def d))])
(for/list ([b top-level])
(set-mcdr! b (match (mcdr b)
[‘(lambda ,xs ,body ())
‘(lambda ,xs ,body ,top-level)])))
((interp-exp top-level) body))]
))
a frame. The caller sets the stack pointer, register rsp, to the last data item
in its frame. The callee must not change anything in the caller’s frame, that
is, anything that is at or above the stack pointer. The callee is free to use
locations that are below the stack pointer.
jmp *%rax
6.3 Shrink R4
The shrink pass performs a couple minor modifications to the grammar to
ease the later passes. This pass adds an empty info field to each function
definition:
(define (f [x1 : type 1 ...) : type r exp)
⇒ (define (f [x1 : type 1 ...) : type r () exp)
and introduces an explicit main function.
(program info ds ... exp) ⇒ (program info ds0 mainDef )
where mainDef is
(define (main) : Integer () exp0 )
type ::= Integer | Boolean | (Vector type + ) | Void | (type ∗ -> type)
exp ::= int | (read) | (- exp) | (+ exp exp)
| var | (let ([var exp]) exp)
| #t | #f | (not exp) | (cmp exp exp) | (if exp exp exp)
| (vector exp+ ) | (vector-ref exp int)
| (vector-set! exp int exp) | (void) | (app exp exp∗ )
| (fun-ref label)
def ::= (define (label [var:type]∗ ):type exp)
F1 ::= (program info def ∗ )
(f x1 . . . xn ) ⇒ (f x1 . . . x5 (vector x6 . . . xn ))
In the body of the function, all occurrances of the ith argument in which
i > 5 must be replaced with a vector-ref.
registers, and inside the function we should generate a movq instruction for
each parameter, to move the argument value from the appropriate register
to a new local variable with the same name as the old parameter.
Next, consider the compilation of function calls, which have the following
form upon input to select-instructions.
(assign lhs (call fun args . . .))
In the mirror image of handling the parameters of function definitions, the
arguments args need to be moved to the argument passing registers. Once
the instructions for parameter passing have been generated, the function call
itself can be performed with an indirect function call, for which I recommend
creating the new instruction indirect-callq. Of course, the return value
from the function is stored in rax, so it needs to be moved into the lhs.
(indirect-callq fun)
(movq (reg rax) lhs)
Regarding tail calls, the parameter passing is the same as non-tail calls:
generate instructions to move the arguments into to the argument passing
registers. After that we need to pop the frame from the procedure call stack.
However, we do not yet know how big the frame is; that gets determined
during register allocation. So instead of generating those instructions here,
we invent a new instruction that means “pop the frame and then do an
indirect jump”, which we name tail-jmp.
6.10. UNCOVER LIVE 103
Recall that in Section 2.6 we recommended using the label start for the
initial block of a program, and in Section 2.8 we recommended labelling the
conclusion of the program with conclusion, so that (return arg) can be
compiled to an assignment to rax followed by a jump to conclusion. With
the addition of function definitions, we will have a starting block and con-
clusion for each function, but their labels need to be unique. We recommend
prepending the function’s name to start and conclusion, respectively, to
obtain unique labels. (Alternatively, one could gensym labels for the start
and conclusion and store them in the info field of the function definition.)
Figure 6.9 gives an overview of the passes needed for the compilation of
R4 .
6.14. AN EXAMPLE TRANSLATION 105
(program ()
(define (add86)
((locals (x87 . Integer) (y88 . Integer))
(program (num-params . 2))
(define (add [x : Integer] ((add86start .
[y : Integer]) (block ()
: Integer (+ x y)) (movq (reg rcx) (var x87))
(add 40 2)) (movq (reg rdx) (var y88))
(movq (var x87) (reg rax))
⇓ (addq (var y88) (reg rax))
(program () ⇒ (jmp add86conclusion)))))
(define (main)
(define (add86 [x87 : Integer]
((locals . ((tmp89 . (Integer Integer -> Integer))))
[y88 : Integer]) : Integer ()
(num-params . 0))
((add86start . (return (+ x87 y88)))))
((mainstart .
(define (main) : Integer ()
(block ()
((mainstart .
(leaq (fun-ref add86) (var tmp89))
(seq (assign tmp89 (fun-ref add86))
(movq (int 40) (reg rcx))
(tailcall tmp89 40 2))))))
(movq (int 2) (reg rdx))
(tail-jmp (var tmp89))))))
⇓
_mainstart:
leaq _add90(%rip), %rsi
_add90start:
movq $40, %rcx
movq %rcx, %rsi
movq $2, %rdx
movq %rdx, %rcx
movq %rsi, %rax
movq %rsi, %rax
addq $0, %rsp
addq %rcx, %rax
popq %r14
jmp _add90conclusion
popq %r13
.globl _add90
popq %rbx
.align 16
popq %r12
_add90:
subq $0, %r15 _mainconclusion:
pushq %rbp
popq %rbp addq $0, %rsp
movq %rsp, %rbp
jmp *%rax popq %r14
pushq %r12
popq %r13
pushq %rbx
.globl _main popq %rbx
pushq %r13
.align 16 popq %r12
pushq %r14
_main: subq $0, %r15
subq $0, %rsp
pushq %rbp popq %rbp
jmp _add90start
movq %rsp, %rbp retq
_add90conclusion:
pushq %r12
addq $0, %rsp
pushq %rbx
popq %r14
pushq %r13
popq %r13
pushq %r14
popq %rbx
subq $0, %rsp
popq %r12
movq $16384, %rdi
subq $0, %r15
movq $16, %rsi
popq %rbp
callq _initialize
retq
movq _rootstack_begin(%rip), %r15
jmp _mainstart
typecheck uniquify
R4 R4 R4
reveal-functions
remove-complex. expose-alloc.
F1 F1 F1 F1
limit-functions
explicate-control
C3 C3
uncover-locals
select-instr.
patch-instr.
x86∗3 x86∗3 x86∗3
107
108 7. LEXICALLY SCOPED FUNCTIONS
type ::= Integer | Boolean | (Vector type + ) | Void | (type ∗ -> type)
exp ::= int | (read) | (- exp) | (+ exp exp) | (- exp exp)
| var | (let ([var exp]) exp)
| #t | #f | (and exp exp) | (or exp exp) | (not exp)
| (eq? exp exp) | (if exp exp exp)
| (vector exp+ ) | (vector-ref exp int)
| (vector-set! exp int exp) | (void)
| (exp exp∗ )
| (lambda: ([var:type]∗ ):type exp)
def ::= (define (var [var:type]∗ ):type exp)
R5 ::= (program def ∗ exp)
g
x y code
5 4
h
x y
3 4
Figure 7.3: Example closure representation for the lambda’s in Figure 7.1.
7.2 Interpreting R5
Figure 7.4 shows the definitional interpreter for R5 . The clause for lambda
saves the current environment inside the returned lambda. Then the clause
for app uses the environment from the lambda, the lam-env, when inter-
preting the body of the lambda. The lam-env environment is extended with
the mapping of parameters to argument values.
Figure 7.7 provides an overview of all the passes needed for the compi-
lation of R5 .
7.5. AN EXAMPLE TRANSLATION 113
typecheck uniquify
R4 R4 R4
reveal-functions
remove-complex. expose-alloc. limit-functions
F1 F1 F1 F1 F1
convert-to-clos.
explicate-control
C3 C3
uncover-locals
select-instr.
patch-instr.
x86∗3 x86∗3 x86∗3
Dynamic Typing
115
116 8. DYNAMIC TYPING
similarly for the integer 0. However, (not #f) should produce #t whereas
(not 0) should produce #f. Furthermore, the behavior of not, in general,
cannot be determined at compile time, but depends on the runtime type of
its input, as in the example above that depends on the result of (read).
The way around this problem is to include information about a value’s
runtime type in the value itself, so that this information can be inspected
by operators such as not. In particular, we shall steal the 3 right-most bits
from our 64-bit values to encode the runtime type. We shall use 001 to
identify integers, 100 for Booleans, 010 for vectors, 011 for procedures, and
101 for the void value. We shall refer to these 3 bits as the tag and we define
the following auxilliary function.
(We shall say more about the new Vectorof type shortly.) This stealing
of 3 bits comes at some price: our integers are reduced to ranging from
−260 to 260 . The stealing does not adversely affect vectors and procedures
because those values are addresses, and our addresses are 8-byte aligned so
the rightmost 3 bits are unused, they are always 000. Thus, we do not lose
117
information by overwriting the rightmost 3 bits with the tag and we can
simply zero-out the tag to recover the original address.
In some sense, these tagged values are a new kind of value. Indeed, we
can extend our typed language with tagged values by adding a new type to
classify them, called Any, and with operations for creating and using tagged
values, yielding the R6 language that we define in Section 8.1. The R6
language provides the fundamental support for polymorphism and runtime
types that we need to support dynamic typing.
There is an interesting interaction between tagged values and garbage
collection. A variable of type Any might refer to a vector and therefore
it might be a root that needs to be inspected and copied during garbage
collection. Thus, we need to treat variables of type Any in a similar way to
variables of type Vector for purposes of register allocation, which we discuss
in Section 8.4. One concern is that, if a variable of type Any is spilled, it
must be spilled to the root stack. But this means that the garbage collector
needs to be able to differentiate between (1) plain old pointers to tuples,
(2) a tagged value that points to a tuple, and (3) a tagged value that is
not a tuple. We enable this differentiation by choosing not to use the tag
000. Instead, that bit pattern is reserved for identifying plain old pointers to
tuples. On the other hand, if one of the first three bits is set, then we have
a tagged value, and inspecting the tag can differentiation between vectors
(010) and the other kinds of values.
We shall implement our untyped language R7 by compiling it to R6
(Section 8.5), but first we describe the how to extend our compiler to handle
the new features of R6 (Sections 8.2, 8.3, and 8.4).
8.2 Shrinking R6
In the shrink pass we recommend compiling project into an explicit if
expression that uses three new operations: tag-of-any, value-of-any, and
exit. The tag-of-any operation retrieves the type tag from a tagged value
of type Any. The value-of-any retrieves the underlying value from a tagged
value. Finally, the exit operation ends the execution of the program by in-
voking the operating system’s exit function. So the translation for project
is as follows. (We have ommitted the has-type AST nodes to make this
output more readable.)
(let ([tmp e0 ])
(if (eq? (tag-of-any tmp) tag)
(project e type) ⇒
(value-of-any tmp)
(exit)))
Tag of Any Recall that the tag-of-any operation extracts the type tag
from a value of type Any. The type tag is the bottom three bits, so we obtain
the tag by taking the bitwise-and of the value with 111 (7 in decimal).
8.4. REGISTER ALLOCATION FOR R6 123
(movq e0 lhs’)
(assign lhs (tag-of-any e)) ⇒
(andq (int 7) lhs’)
Value of Any Like inject, the instructions for value-of-any are differ-
ent depending on whether the type T is a pointer (vector or procedure) or
not (Integer or Boolean). The following shows the instruction selection for
Integer and Boolean. We produce an untagged value by shifting it to the
right by 3 bits.
(movq e0 lhs’)
(assign lhs (project e T )) ⇒
(sarq (int 3) lhs’)
In the case for vectors and procedures, there is no need to shift. Instead we
just need to zero-out the rightmost 3 bits. We accomplish this by creating
the bit pattern . . . 0111 (7 in decimal) and apply bitwise-not to obtain
. . . 1000 which we movq into the destination lhs. We then generate andq
with the tagged value to get the desired result.
(movq (int . . . 1000) lhs’)
(assign lhs (project e T )) ⇒
(andq e0 lhs’)
8.5 Compiling R7 to R6
Figure 8.6 shows the compilation of many of the R7 forms into R6 . An
important invariant of this pass is that given a subexpression e of R7 , the
pass will produce an expression e0 of R6 that has type Any. For example,
124 8. DYNAMIC TYPING
#t ⇒ (inject #t Boolean)
(inject
(+ (project e01 Integer)
(+ e1 e2 ) ⇒
(project e02 Integer))
Integer)
the first row in Figure 8.6 shows the compilation of the Boolean #t, which
must be injected to produce an expression of type Any. The second row of
Figure 8.6, the compilation of addition, is representative of compilation for
many operations: the arguments have type Any and must be projected to
Integer before the addition can be performed.
The compilation of lambda (third row of Figure 8.6) shows what hap-
pens when we need to produce type annotations: we simply use Any. The
compilation of if and eq? demonstrate how this pass has to account for
some differences in behavior between R7 and R6 . The R7 language is more
permissive than R6 regarding what kind of values can be used in various
places. For example, the condition of an if does not have to be a Boolean.
For eq?, the arguments need not be of the same type (but in that case, the
result will be #f).
9
Gradual Typing
This chapter will be based on the ideas of Siek and Taha [2006].
125
126 9. GRADUAL TYPING
10
Parametric Polymorphism
This chapter may be based on ideas from Cardelli [1984], Leroy [1992], Shao
[1997], or Harper and Morrisett [1995].
127
128 10. PARAMETRIC POLYMORPHISM
11
High-level Optimization
This chapter will present a procedure inlining pass based on the algorithm
of Waddell and Dybvig [1997].
129
130 11. HIGH-LEVEL OPTIMIZATION
12
Appendix
12.1 Interpreters
We provide several interpreters in the interp.rkt file. The interp-scheme
function takes an AST in one of the Racket-like languages considered in this
book (R1 , R2 , . . .) and interprets the program, returning the result value.
The interp-C function interprets an AST for a program in one of the C-like
languages (C0 , C1 , . . .), and the interp-x86 function interprets an AST for
an x86 program.
The lookup function takes a key and an association list (a list of key-
value pairs), and returns the first value that is associated with the given
key, if there is one. If not, an error is triggered. The association list may
contain both immutable pairs (built with cons) and mutable mapirs (built
with mcons).
The map2 function ...
131
132 12. APPENDIX
12.2.1 Testing
The interp-tests function takes a compiler name (a string), a description
of the passes, an interpreter for the source language, a test family name
(a string), and a list of test numbers, and runs the compiler passes and
the interpreters to check whether the passes correct. The description of
the passes is a list with one entry per pass. An entry is a list with three
things: a string giving the name of the pass, the function that implements
the pass (a translator from AST to AST), and a function that implements
the interpreter (a function from AST to result value) for the language of
the output of the pass. The interpreters from Appendix 12.1 make a good
choice. The interp-tests function assumes that the subdirectory tests
has a bunch of Scheme programs whose names all start with the family
name, followed by an underscore and then the test number, ending in .scm.
Also, for each Scheme program there is a file with the same number except
that it ends with .in that provides the input for the Scheme program.
(define (interp-tests name passes test-family test-nums) ...
−4(%ebp)). Most x86 instructions only allow at most one memory reference
per instruction. Other operands must be immediates or registers.
134 12. APPENDIX
Instruction Operation
addq A, B A+B →B
negq A −A → A
subq A, B B−A→B
callq L Pushes the return address and jumps to label L
callq *A Calls the function at the address A.
retq Pops the return address and jumps to it
popq A ∗rsp → A; rsp + 8 → rsp
pushq A rsp − 8 → rsp; A → ∗rsp
leaq A,B A → B (C must be a register)
cmpq A, B compare A and B and set the flag register
je L Jump to label L if the flag register matches the
jl L condition code of the instruction, otherwise go to the
jle L next instructions. The condition codes are e for
jg L “equal”, l for “less”, le for “less or equal”, g for
jge L “greater”, and ge for “greater or equal”.
jmp L Jump to label L
movq A, B A→B
movzbq A, B A → B, where A is a single-byte register (e.g., al or
cl), B is a 8-byte register, and the extra bytes of B are
set to zero.
notq A ∼A→A (bitwise complement)
orq A, B A|B → B (bitwise-or)
andq A, B A&B → B (bitwise-and)
salq A, B B « A → B (arithmetic shift left, where A is a constant)
sarq A, B B » A → B (arithmetic shift right, where A is a constant)
sete A
If the flag matches the condition code, then 1 → A, else
setl A
0 → A. Refer to je above for the description of the
setle A
condition codes. A must be a single byte register (e.g.,
setg A
al or cl).
setge A
Table 12.1: Quick-reference for the x86 instructions used in this book.
Bibliography
Hussein Al-Omari and Khair Eddin Sabri. New graph coloring algorithms.
Journal of Mathematics and Statistics, 2(4), 2006.
AndrewW. Appel. Runtime tags aren’t necessary. LISP and Symbolic Com-
putation, 2(2):153–162, 1989. ISSN 0892-4635. doi: 10.1007/BF01811537.
URL http://dx.doi.org/10.1007/BF01811537.
135
136 BIBLIOGRAPHY
Cody Cutler and Robert Morris. Reducing pause times with clustered col-
lection. In Proceedings of the 2015 International Symposium on Mem-
ory Management, ISMM ’15, pages 131–142, New York, NY, USA, 2015.
ACM. ISBN 978-1-4503-3589-8. doi: 10.1145/2754169.2754184. URL
http://doi.acm.org/10.1145/2754169.2754184.
Olivier Danvy. Three steps for the CPS transformation. Technical Report
CIS-92-02, Kansas State University, December 1991.
BIBLIOGRAPHY 137
David Detlefs, Christine Flood, Steve Heller, and Tony Printezis. Garbage-
first garbage collection. In Proceedings of the 4th International Symposium
on Memory Management, ISMM ’04, pages 37–48, New York, NY, USA,
2004. ACM. ISBN 1-58113-945-4. doi: 10.1145/1029873.1029879. URL
http://doi.acm.org/10.1145/1029873.1029879.
E. W. Dijkstra. Why numbering should start at zero. Technical Report
EWD831, University of Texas at Austin, 1982.
Amer Diwan, Eliot Moss, and Richard Hudson. Compiler support for
garbage collection in a statically typed language. In Proceedings of
the ACM SIGPLAN 1992 Conference on Programming Language Design
and Implementation, PLDI ’92, pages 273–282, New York, NY, USA,
1992. ACM. ISBN 0-89791-475-9. doi: 10.1145/143095.143140. URL
http://doi.acm.org/10.1145/143095.143140.
R. Kent Dybvig. The SCHEME Programming Language. Prentice-Hall, Inc.,
Upper Saddle River, NJ, USA, 1987. ISBN 0-13-791864-X.
R. Kent Dybvig. The development of chez scheme. In Proceedings of the
Eleventh ACM SIGPLAN International Conference on Functional Pro-
gramming, ICFP ’06, pages 1–12, New York, NY, USA, 2006. ACM. ISBN
1-59593-309-3. doi: 10.1145/1159803.1159805. URL http://doi.acm.
org/10.1145/1159803.1159805.
R. Kent Dybvig and Andrew Keep. P523 compiler assignments. Technical
report, Indiana University, 2010.
Matthias Felleisen and Daniel P. Friedman. Control operators, the SECD-
machine and the lambda-calculus. pages 193–217, 1986.
Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, and Shriram Kr-
ishnamurthi. How to Design Programs: An Introduction to Programming
and Computing. MIT Press, Cambridge, MA, USA, 2001. ISBN 0-262-
06218-6.
Matthias Felleisen, M.D. Barski Conrad, David Van Horn, and Eight Stu-
dents of Northeastern University. Realm of Racket: Learn to Program,
One Game at a Time! No Starch Press, San Francisco, CA, USA, 2013.
ISBN 1593274912, 9781593274917.
Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. The
essence of compiling with continuations. In Conference on Programming
Language Design and Implementation, PLDI, pages 502–514, June 1993.
138 BIBLIOGRAPHY
Matthew Flatt and PLT. The Racket reference 6.0. Technical report, PLT
Inc., 2014. http://docs.racket-lang.org/reference/index.html.
Matthew Flatt, Robert Bruce Findler, and PLT. The racket guide. Technical
Report 6.0, PLT Inc., 2014.
Daniel P. Friedman and Matthias Felleisen. The Little Schemer (4th Ed.).
MIT Press, Cambridge, MA, USA, 1996. ISBN 0-262-56099-2.
Daniel P. Friedman and David S. Wise. Cons should not evaluate its argu-
ments. Technical Report TR44, Indiana University, 1976.
Richard Jones and Rafael Lins. Garbage Collection: Algorithms for Auto-
matic Dynamic Memory Management. John Wiley & Sons, Inc., New
York, NY, USA, 1996. ISBN 0-471-94148-4.
BIBLIOGRAPHY 139
Richard Jones, Antony Hosking, and Eliot Moss. The Garbage Collection
Handbook: The Art of Automatic Memory Management. Chapman &
Hall/CRC, 1st edition, 2011. ISBN 1420082795, 9781420082791.
Andrew W. Keep. A Nanopass Framework for Commercial Compiler De-
velopment. PhD thesis, Indiana University, December 2012.
R. Kelsey, W. Clinger, and J. Rees (eds.). Revised5 report on the algorithmic
language scheme. Higher-Order and Symbolic Computation, 11(1), August
1998.
Brian W. Kernighan and Dennis M. Ritchie. The C programming language.
Prentice Hall Press, Upper Saddle River, NJ, USA, 1988. ISBN 0-13-
110362-8.
Donald E. Knuth. Backus normal form vs. backus naur form. Commun.
ACM, 7(12):735–736, December 1964. ISSN 0001-0782. doi: 10.1145/
355588.365140. URL http://doi.acm.org/10.1145/355588.365140.
Eugene Kohlbecker, Daniel P. Friedman, Matthias Felleisen, and Bruce
Duba. Hygienic macro expansion. In LFP ’86: Proceedings of the 1986
ACM conference on LISP and functional programming, pages 151–161,
New York, NY, USA, 1986. ACM. ISBN 0-89791-200-4.
Xavier Leroy. Unboxed objects and polymorphic typing. In POPL ’92: Pro-
ceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles
of programming languages, pages 177–188, New York, NY, USA, 1992.
ACM Press. ISBN 0-89791-453-8.
Henry Lieberman and Carl Hewitt. A real-time garbage collector based on
the lifetimes of objects. Commun. ACM, 26(6):419–429, June 1983. ISSN
0001-0782. doi: 10.1145/358141.358147. URL http://doi.acm.org/10.
1145/358141.358147.
Michael Matz, Jan Hubicka, Andreas Jaeger, and Mark Mitchell. System V
Application Binary Interface, AMD64 Architecture Processor Supplement,
October 2013.
John McCarthy. Recursive functions of symbolic expressions and their com-
putation by machine, part i. Commun. ACM, 3(4):184–195, 1960. ISSN
0001-0782.
E.F. Moore. The shortest path through a maze. In Proceedings of an Inter-
national Symposium on the Theory of Switching, April 1959.
140 BIBLIOGRAPHY
Jonathan Shidal, Ari J. Spilo, Paul T. Scheid, Ron K. Cytron, and Kr-
ishna M. Kavi. Recycling trash in cache. In Proceedings of the 2015
International Symposium on Memory Management, ISMM ’15, pages
118–130, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3589-
8. doi: 10.1145/2754169.2754183. URL http://doi.acm.org/10.1145/
2754169.2754183.
Jeremy G. Siek and Walid Taha. Gradual typing for functional languages. In
Scheme and Functional Programming Workshop, pages 81–92, September
2006.
Gerald Jay Sussman and Guy L. Steele Jr. Scheme: an interpreter for
extended lambda calculus. Technical Report AI Memo No. 349, MIT,
December 1975.
Gil Tene, Balaji Iyengar, and Michael Wolf. C4: the continuously concurrent
compacting collector. In Proceedings of the international symposium on
Memory management, ISMM ’11, pages 79–88, New York, NY, USA, 2011.
ACM. doi: http://doi.acm.org/10.1145/1993478.1993491.
Oscar Waddell and R. Kent Dybvig. Fast and effective procedure inlining. In
Proceedings of the 4th International Symposium on Static Analysis, SAS
’97, pages 35–52, London, UK, 1997. Springer-Verlag.