C 2002/03 T.Grust Compiler Construction: 1. Introduction 19
C 2002/03 T.Grust Compiler Construction: 1. Introduction 19
( 1. Introduction, p. 3)
– Example:
If we need to adapt our compiler to translate a different source language than Tiger,
we only need to rewrite the early phases (Lex → Translate, see below).
All phases following Translate remain untouched (after Translate, all specifics of the
source language have been “abstracted away”).
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 20
1.1.1 A Brief Overview of the Tiger Compiler Phases
• Let us trace the compilation of the Tiger program below and see how the different
phases transform the initial source program (later on, we pick certain source
program fragments only to keep the exposition short).
rem.tig
1 /* compute the remainder when dividing x by y */
2 let function rem (x : int, y : int) : int =
3 let var d := x / y
4 in
5 x - d * y
6 end
7
8 var r := 0
9 in
10 r := rem (10, 3)
11 end
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 21
1.2 Intermediate Representations (Tree Languages)
• We have seen that the compiler phases pass different intermediate representations
(IR).
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 22
• How do we precisely describe the valid IR (tree) forms?
L→R (T )
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 23
A conforming tree:
OpExp
ooo
OOO
O
ooo OOO
ooo OO
NumExp Plus NumExp
num num
N.B.
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 24
• Example: Grammar that describes the valid IR trees for a simple straight-line
programming language (no loops, no gotos):
• A valid program written in the straight-line programming language (provided that 3,5,10
and a,b are acceptable for num and id, respectively):
CompoundStm[[[[[[[[[[[
eeee
eeeeeeeeeee [[[[[[[[[
[[[[[[[[[
e
eeeeee [[[[[[[[
AssignStm
??
CompoundStm
eeee WWWWW
??
eeeeeeeee WWWWW
WWWWW
?? eee
eeeeee WWWW
a OpExp
oo OOO AssignStm
?? PrintStm
ooo OOO ??
oo OOO ??
ooo O
NumExp Plus NumExp b jjj
EseqExpTTTT LastExpList
jjjj TTTT
jjjj TTTT
jjjj TT
5 3 PrintStm OpExp
oo OOO IdExp
ooo OOO
oo OOO
ooo O
PairExpList
oo OOO NumExp Times IdExp b
oo OOO
ooooo OOO
O
o
IdExp LastExpList 10 a
a OpExp
oo OOO
ooo OOO
oo OOO
ooo O
IdExp Minus NumExp
a 1
N.B.
– This IR tree shows all node attributes (not just the IR subtrees of a node).
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 27
• How can we represent these IR trees in C code (i.e., inside our compiler)?
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 28
• In each of these structs, embed
1 a kind field to indicate which node type this node actually has (e.g., for Exp
(struct A_exp_) kind could be IdExp, NumExp, OpExp, EseqExp),
2 all attributes and subtree (pointers) for this specific node type.
Rule: If a node type is described by a single attribute value (e.g., NumExp),
embed this value in the struct; if we need to represent more attribute
values/subtrees, embed a nested struct that groups this information.
• Example:
C code
1 struct A_exp_ {
2 enum { A_idExp, A_numExp, A_opExp, A_eseqExp } kind;
3 string id; /* A_idExp */
4 int num; /* A_numExp */
5 struct { A_exp left;
6 A_binop oper;
7 A_exp right; } op; /* A_opExp */
8 struct { A_stm stm;
9 A_exp exp; } eseq; /* A_eseqExp */
10 }
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 29
• The kind field determines which attribute/subtree information is valid for any given
node4. All other fields are unused and may not be accessed!
We get:
C code
1 struct A_exp_ {
2 enum { A_idExp, A_numExp, A_opExp, A_eseqExp } kind;
3 union {
4 string id; /* A_idExp */
5 int num; /* A_numExp */
6 struct { A_exp left;
7 A_binop oper;
8 A_exp right; } op; /* A_opExp */
9 struct { A_stm stm;
10 A_exp exp; } eseq; /* A_eseqExp */
11 } u;
12 }
4 For example, accessing the op attributes (right, oper, left) while kind == A_idExp will result in havoc!
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 30
slp.h
c 2002/03 T.Grust · Compiler Construction: 1. Introduction
– Example: create an A_opExp node with subtrees A_exp e1 and e2 and A_binop op:
C code
1 A_exp n;
2
3 n = malloc (sizeof (*n));
4 if (!n) { ... handle memory allocation failure ... };
5
6 n->kind = A_opExp;
7 n->u.op.left = e1;
8 n->u.op.oper = op;
9 n->u.op.right = e2;
• Such node creation routines will be needed over and over the compiler.
⇒ Provide node constructors to allocate and initialize IR tree nodes.
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 32
• Example: node constructors for node types A_CompoundStm and A_IdExp.
C code
1 A_stm A_CompoundStm (A_stm stm1, A_stm stm2)
2 {
3 A_stm s = checked_malloc (sizeof (*s));
4
5 s->kind = A_compoundStm;
6 s->u.compound.stm1 = stm1;
7 s->u.compound.stm2 = stm2;
8
9 return s;
10 }
C code
1 A_exp A_IdExp (string id) /* typedef char *string */
2 {
3 A_exp e = checked_malloc (sizeof (*e));
4
5 e->kind = A_idExp;
6 e->u.id = id;
7
8 return e;
9 }
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 33
• To actually construct larger IR trees, we can now simply plug the constructors together
and build trees bottom-up:
CompoundStm
oo OOO
oo OOO
ooooo OOO
OO
o
AssignStm
?? PrintStm
??
??
a NumExp LastExpList
42 IdExp
a
C code
1 A_stm p = A_CompoundStm (A_AssignStm ("a", A_NumExp (42)),
2 A_PrintStm (A_LastExpList (A_IdExp ("a"))));
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 34
1.2.1 Summary of IR Tree Representation Rules
3 The struct X_E_ itself is never used anywhere else, instead declare X_E (pointer to
struct):
typedef struct X_E_ *X_E;
4 Each struct X_E_ contains a kind enum which contains a enumeration constant for
each grammar rule with lefthand side E, and a union u to carry the specific
attributes/subtrees:
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 35
5 In union u, collect the information represented on the righthand side for each grammar
rule for E. If several attributes/subtrees need to be represented, embed a struct
carrying this information (e.g., compound in A_stm_).
6 If a single value describes the righthand side of a grammar rule for E, embed this value
directly (e.g., num in A_exp_).
7 Each IR node type X_E will have a constructor that initializes all struct fields;
malloc() is never called outside these constructors.
8 Each C file (compiler phase or module) will have a prefix X_ unique to that file.
9 Naming/capitalization:
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 36
• Variations of these general IR representation rules:
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 37
– Bad idea, because the C compiler loses the ability to check that we do not build
“nonsense” IR trees (everything is a generic A_node and may occur anywhere).
Example:
C code (buggy)
1 A_node n = A_OpExp (A_IdExp ("a"),
2 A_AssignStm ("b", A_NumExp (42)),
3 A_PrintStm (A_LastExpList (A_NumExp (0))));
N.B.
– In a real compiler, we would write code to build complex IR trees and bugs in that
code might not be that obvious to us at all.
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 38
2 Consider the A_binop constructors:
C code
1 A_binop A_Plus ()
2 {
3 A_binop op = checked_malloc (sizeof (*op));
4 op->kind = A_plus;
5 return op;
6 }
– The A_binop nodes encapsulate a single enum value kind only. This is uniform
but unnecessarily complex and wastes space.
– A_binop nodes only occur inside A_exp (of kind A_opExp) nodes.
⇒ Encode the operator inside A_opExp directly (using an enum) .
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 39
C code (modified A exp node)
1 enum { A_plus, A_minus, A_times, A_div } A_binop;
2
3 struct A_exp_ {
4 enum { A_idExp, A_numExp, A_opExp, A_eseqExp } kind;
5 string id; /* A_idExp */
6 int num; /* A_numExp */
7 struct { A_exp left;
8 A_binop oper;
9 A_exp right; } op; /* A_opExp */
10 struct { A_stm stm;
11 A_exp exp; } eseq; /* A_eseqExp */
12 }
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 40
1.3 C Coding Guidelines for the Tiger Compiler Project
We strongly suggest that you follow the guidelines below when you build C source code
for the compiler.
1 Each phase of the compiler belongs in its own .c source file (which #includes an
associated .h header file containing exported function prototypes and type
declarations).
[Separate compilation, handling, reusability]
2 Each phase shall have an identifier prefix X_ unique to this phase. All global names
(struct/union fields are not global) shall start with the prefix.
[Organize the otherwise flat C namespace (avoid clashes), clarify origin of name]
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 41
3 All functions shall have prototypes and the C compiler shall be told to warn about
uses of functions without prototypes. (gcc: -Wmissing-prototypes)
[In C, functions without prototypes default to return int and to accept int
arguments (e.g., pointers, characters may be implicityly casted to int)]
4 Each phase includes util.h and the compiler is linked against util.o.
util.h
1 #include <assert.h>
2
3 typedef char *string;
4 typedef char bool;
5
6 #define TRUE 1
7 #define FALSE 0
8
9 void *checked_malloc(int);
10 string String(char *);
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 42
– assert: halt program if asserted expression yields 0.
Example:
C code
1 A_exp e;
2 e = malloc (sizeof (*e));
3 assert (e);
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 43
5 Values of type string are heap-allocated strings. Constructor String("foo")
allocates four bytes and copies the argument string.
Convention: a function that receives a string argument may assume the
string contents never change ⇒ it is safe to store the associated character
pointer, there is no need to copy the string.
6 Never call malloc() directly, aways use checked_malloc().
[We may later re-implement checked_malloc() to, e.g., use a GC library.]
c 2002/03 T.Grust · Compiler Construction: 1. Introduction 44