Unit 4
Unit 4
Unit 4
▪Syntax tree
▪Postfix Notation
▪Three-Address Code
Syntax tree
•Syntax tree is more than condensed form of a parse tree.
•The operator and keyword nodes of the parse tree are moved
to their parents and a chain of single productions is replaced by
single link in syntax tree.
•The internal nodes are operators and child nodes are
operands.
• To form syntax tree put parentheses in the expression, this way
it's easy to recognize which operand should come first.
Syntax tree
•Example –
x = (a + b * c) / (a – b * c)
Postfix Notation
•The ordinary (infix) way of writing the sum of a and b is with
operator in the middle : a + b
•The postfix notation for the same expression places the operator
at the right end as ab +.
•In general, if e1 and e2 are any postfix expressions, and + is
any binary operator, the result of applying + to the values
denoted by e1 and e2 is postfix notation by e1e2 +.
•No parentheses are needed in postfix notation because the
position and arity (number of arguments) of the operators permit
only one way to decode a postfix expression.
•In postfix notation the operator follows the operand.
Postfix Notation
ab – cd + ab -+*.
Three-Address Code
•A statement involving no more than three references(two for operands and
one for result) is known as three address statement.
•A sequence of three address statements is known as three address code.
Three address statement is of the form x = y op z , here x, y, z will have
address (memory location).
•Sometimes a statement might contain less than three references but it is still
called three address statement.
•For Example :a = b + c * d;
• The intermediate code generator will try to divide this expression into
sub-expressions and then generate the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2
Three-Address Code
• A three-address code has at most three address locations to
calculate the expression. A three- address code can be
represented in three forms :
o Quadruples
o Triples
o Indirect Triples
Quadruples
• Each instruction in quadruples presentation is divided into four
fields: operator, arg1, arg2, and result. The example is
represented below in quadruples format:
OP arg1 arg2
(0) * c d
(1) + b (0)
(2) = (1)
Indirect Triples
•This representation is an enhancement over triples
representation.
•It uses pointers instead of position to store results.
• This enables the optimizers to freely re-position the
sub-expression to produce an optimized code.
OP arg1 arg2
35 (0) * c d
36 (1) + b (0)
37 (2) = (1)
Type Expressions
Example: int[2][3]
array(2,array(3,integer))
•
Translation of Expressions and Statements
• We discussed how to find the types and offset of variables
• We have therefore necessary preparations to discuss about
translation to intermediate code
• We also discuss the type checking
Three-address code for expressions
Incremental Translation
Addressing Array Elements
• Layouts for a two-dimensional array:
Semantic actions for array reference
Translation of Array References
– Dependency Graphs
– S-Attributed Definitions
– L-Attributed Definitions
• Translation Schemes
(3)
Semantic Analysis
• meaning
Semantic Analysis computes additional information related to the
of the program once the syntactic structure is known.
• In typed languages as C, semantic analysis involves adding information to
the symbol table and performing type checking.
• The information to be computed is beyond the capabilities of standard
parsing techniques, therefore it is not regarded as syntax.
• As for Lexical and Syntax analysis, also for Semantic Analysis we need both
a Representation Formalism and an Implementation Mechanism.
•
As representation formalism this lecture illustrates what are called Syntax
Directed Translations.
(4)
– Generate Code;
– Insert information into the Symbol Table;
– Perform Semantic Check;
– Issue error messages;
– etc.
• There are two notations for attaching semantic rules:
Summary
• Syntax Directed Translations
– Dependency Graphs
– S-Attributed Definitions
– L-Attributed Definitions
• Translation Schemes
(7)
digit.lexval= 3
(12)
Inherited Attributes
• Inherited Attributes are useful for expressing the dependence of a construct
on the context in which it appears.
• It is always possible to rewrite a syntax directed definition to use only
synthesized attributes, but it is often more natural to use both synthesized and
inherited attributes.
• Evaluation Order. Inherited attributes cannot be evaluated by a simple
PreOrder traversal of the parse-tree:
– Unlike synthesized attributes, the order in which the inherited attributes
of the children are computed is important!!! Indeed:
∗ Inherited attributes of the children can depend from both left and right
siblings!
(13)
• Inherited attributes that do not depend from right children can be evaluated
by a classical PreOrder traversal.
• The annotated parse-tree for the input real id , id , id is:
1 2 3
D
• Inherited attributes that do not depend from right children can be evaluated
• The annotated parse-tree for the input real id1 , id2, id3 is:
D
L.in= real
T .type = real
, id
real L.in 3
, id
L.in 2
id
1
(14)
• Inherited attributes that do not depend from right children can be evaluated
• The annotated parse-tree for the input real id1 , id2, id3 is:
D
L.in= real
T .type = real
real , id
real L.in= 3
, id
L.in 2
id
1
(14)
• Inherited attributes that do not depend from right children can be evaluated
• The annotated parse-tree for the input real id1 , id2, id3 is:
D
L.in= real
T .type = real
real , id
real L.in= 3
, id
L.in= real 2
id
1
• L.in is then inherited top-down the tree
by the other L-nodes.
• At each L-node the procedure addtype inserts into the symbol table the type
of the identifier.
(15)
Summary
• Syntax Directed Translations
– Dependency Graphs
– S-Attributed Definitions
– L-Attributed Definitions
• Translation Schemes
(16)
Dependency Graphs
• Implementing a Syntax Directed Definition consists primarily in finding an
order for the evaluation of attributes
– Each attribute value must be available when a computation is performed.
• Dependency Graphs are the most general technique used to evaluate syntax
directed definitions with both synthesized and inherited attributes.
• A Dependency Graph shows the interdependencies among the attributes of
the various nodes of a parse-tree.
– There is a node for each attribute;
– If attribute b depends on an attribute c there is a link from the node for c
to the node for b (b ← c).
Evaluation Order
• The evaluation order of semantic rules depends from a Topological
Sort
derived from the dependency graph.
• Topological Sort: Any ordering m , m , . . . , m such that if m → m
1 2 k i j
is a link in the dependency graph then mi < mj .
1. This method fails if the dependency graph has a cycle: We need a test for
non-circularity;
2. This method is time consuming due to the construction of the dependency
graph.
• Alternative Approach. Design the syntax directed definition in such a
way that attributes can be evaluated with a fixed order avoiding to build the
dependency graph (method followed by many compilers).
(20)
Summary
• Syntax Directed Translations
– Dependency Graphs
– S-Attributed Definitions
– L-Attributed Definitions
• Translation Schemes
(22)
• In the simple case of just one attribute per grammar symbol the stack has two
fields: state and val
state val
Z Z.x
Y Y .x
X X.x
...
...
• The current top of the stack is indicated by the pointer top.
Summary
• Syntax Directed Translations
– Dependency Graphs
– S-Attributed Definitions
– L-Attributed Definitions
• Translation Schemes
(27)
L-Attributed Definitions
• L-Attributed Definitions contain both synthesized and inherited attributes
but do not need to build a dependency graph to evaluate them.
• Definition. A syntax directed definition is L-Attributed if each inherited
Summary
• Syntax Directed Translations
– Dependency Graphs
– S-Attributed Definitions
– L-Attributed Definitions
• Translation Schemes
(30)
Translation Schemes
• Translation Schemes are more implementation oriented than syntax directed
definitions since they indicate the order in which semantic rules and attributes
are to be evaluated.
• Definition. A Translation Scheme is a context-free grammar in which
declarations”:
D→ T {L.in := T .type} L
→
T int {T .type :=integer}
T → real {T .type :=real}
L → id {addtype(id.entry, L.in)}
(33)
introduce a transformation that makes all the actions occur at the right ends
of their productions.
– For each embedded semantic action we introduce a new Marker (i.e., a
non terminal, say M ) with an empty production (M → ǫ);
Xj and A:
1. Xj .s is stored in the val entry in the parser stack associated with Xj ;
2. Xj .i is stored in the val entry in the parser stack associated with Mj ;
3. A.i is stored in the val entry in the parser stack immediately before the
position storing M1 .
•
Remark 1. Since there is only one production for each marker a grammar
Mj Xj .i
top→ X X X .s
j −1 j −1 • A.i is in val[top − 2j + 2];
j −1
M Xj −1.i
j −1 • X .i is in val[top − 2j + 3];
... ... 1
• X .s is in val[top − 2j + 4];
1
X1 X1 .s
• X .i is in val[top − 2j + 5];
2
M1 X1 .i
• and so on.
(top-2j+2)→
MA A.i
(top-2j)→
(40)
Summary
– Dependency Graphs
– S-Attributed Definitions
– L-Attributed Definitions
• Translation Schemes
Boolean Expression
• The translation of if-else-statements and while-statements is tied to
the translation of boolean expressions.
• In programming languages, boolean expressions are used to
• 1. Alter the flow of control.
• 2. Compute logical values.
Boolean Expression
1. Alter the flow of control.
▪ Boolean expressions are used as conditional expressions in statements that
alter the flow of control.
▪ The value of such boolean expressions is implicit in a position reached in a
program.
For example, in if (E) S, the expression E must be true if statement S is reached.
•2. Compute logical values.
▪ A boolean expression can represent true or false as values.
▪ Boolean expressions can be evaluated in analogy to arithmetic expressions
using three-address instructions with logical operators.
Boolean Expression
• The intended use of boolean expressions is determined by its
syntactic context.
• For example, an expression following the keyword if is used to alter
the flow of control, while an expression on the right side of an
assignment is used to denote a logical value.
• Such syntactic contexts can be specified in a number of ways:
• we may use two different nonterminals, use inherited attributes, or set a flag
during parsing.
• Alternatively we may build a syntax tree and invoke different procedures for
the two different uses of boolean expressions.
Boolean expressions
• Focus on the use of boolean expressions to alter the flow
of control
• Boolean expressions are composed of the boolean operators
• && - AND, I I - OR, and ! - NOT
• applied to elements that are boolean variables or relational expressions.
• Relational expressions are of the form El re1 E2, where El and E2 are arithmetic
expressions
• Grammar to generate Boolean expression:
B → B1 or B2 | B → B1 and B2 | B → not B1 | B → (B1)|B → E1 relop E2 |
B → false| B → true
Boolean expressions
• Semantic definition of the programming language determines
• whether all parts of a boolean expression must be evaluated.
• If the language definition permits (or requires) portions of a boolean
expression to go unevaluated
• the compiler can optimize the evaluation of boolean expressions by computing only
enough of an expression to determine its value.
• In an expression such as B1 I I B2, neither B1 nor B2 is necessarily
evaluated fully.
• If we determine that B1 is true, then we can conclude that the entire expression is
true without having to evaluate B2.
• Similarly, given B1&&B2, if B1 is false, then the entire expression is false.
Short-Circuit Code
• In short-circuit (or jumping) code, the boolean operators &&, I I , and !
translate into jumps.
• The operators themselves do not appear in the code
• Instead, the value of a boolean expression is represented by a position
in the code sequence.
Short-Circuit Code
• The statement
Syntax-directed definition
• Create a new label B.true and attach it to the first three-address instruction generated for the
statement S1
• Within B. code are jumps based on the value of B.
• If B is true, control flows to the first instruction of S1 .code, and
• If B is false, control flows to the instruction immediately following S1.code.
• By setting B.false to S.next, we ensure that control will skip the code for S1 if B evaluates to false.
Syntax directed definition for Boolean Expression
S -> if ( B ) S1 else S2
Syntax-directed definition
• begin – local variable that holds a new label attached to the first instruction for the while-statement, also the first
instruction for B.
• begin- is a variable rather than an attribute, because begin is local to the semantic rules for this production.
• S.next - marks the instruction that control must flow to if B is false; hence, B. false is set to be S.next
• B. true - Code for B generates a jump to this label if B is true and attached to the first instruction for S1
• goto begin - causes a jump back to the beginning of the code for the boolean expression.
• S1 .next is set to this label begin, so jumps from within S1. code can go directly to begin
Control-Flow Translation of Boolean
Expressions
• Boolean expression B is translated into three-address instructions
• B is evaluated using conditional and unconditional jumps to one of
two labels:
• B.true - if B is true
• B.fa1se - if B is false
Control-Flow Translation of Boolean
Expressions
The constants true and false translate into jumps to B.true and
B.false, respectively
Control-Flow Translation of Boolean Expressions -Example
S
Consider the
expression
S
S.nex
L1:
B S1
L2: x = 0 S1.code
L1:
B1 B2
If B1.conditon goo L2
goo L3
B1.rue = B.rue = L2
L3:If B2.conditon goo L2
B1.false = L3
goo L1
B2.rue = B.rue = L2
L2: x = 0 B2.false = B.false = L1
B.rue
L1:
B.false
B1 B2
B1.rue = L4
B1.false = B.false = L1
B2.rue = B.rue = L2
B2.false = B.false = L1
If B1.conditon goo L4
goo L1
L4: If B2.conditon goo L2
goo L1
Avoiding Redundant Gotos
if x > 200 goto L4
• Translated code of Boolean
goto L1 expressions are not
optimized ones
L4: … • Redundant gotos are noticed
• Can be avoided using
fallthrough
ifFalse x > 200 goto L1
L4: …
(fall through)
Using Fall Through
We now adapt the semantic rules for boolean expressions to
allow control to fall through whenever possible.
S → if (E) S1
{E.true = fall; // not newlabel; E.false =
S.next;
S1.next = S.next;
S.code = E.code || S1.code }
E → E1 && E2
{E1.true = fall;
E1.false = if (E.false = fall) then newlabel() else E.false;
E2.true = E.true; E2.false = E.false;
E.code = if (E.false = fall) then E1.code ||
E2.code ||
label(E1.false)
else E1.code || E2.code }
Using Fall Through Cont’d
E → E1 relop E2
{test = E1.addr relop E2.addr
s = if E.true != fall and E.false != fall then
gen(‘if’ test ‘goto’, E.true) || gen(‘goto’, E.false) else if (E.true
!= fall) then gen(‘if’ test ‘goto’, E.true)
else if (E.false != fall) then gen(‘ifFalse’ test ‘goto’, E.false)
else ‘’
E.code := E1.code || E2 .code || s
}
Using Fall Through Example
if (X < 100 || x > 200 && x != y) x =
0;
=>
if x < 100 goto L2 ifFalse x > 200
goto L1 ifFalse x!=y goto L1
L2: x = 0 L1:
Problems to be solved
• Let’s now try to construct the translation scheme for Boolean expression.
• Let the grammar be:
B → B1 or MB2
B → B1 and MB2
B → not B1
B → (B1)
B → id1 relop id2
B → false
B → true
M→ε
Backpatching
• If B1 is true, then B is also true, so the jumps on B1. truelist become part of B.truelist.
• If B1 is false, we must next test B2, so the target for the jumps B1.falselist must be the beginning of the code
generated for B2.
• This target is obtained using the marker nonterminal M.
• M produces, as a synthesized attribute M.instr, the index of the next instruction, just before B2 code starts
being generated
• The value M.instr will be backpatched onto the B1 .falselist (i.e., each instruction on the list B1. falselist will
receive M.instr as its target label) when we have seen the remainder of the production B -> B1 or M B2.
Backpatching -Boolean expressions
2)B → B1 and M B2
backpatch(B1.truelist, M.instr)
B.truelist = B2.truelist
B.falselist = merge(B1.falselist, B2.falselist)
• If B1 is true, we must next test B2, so the target for the jumps B1. truelist must be the beginning of the code
generated for B2.
• This target is obtained using the marker nonterminal M.instr , the index of the next instruction, just before B2
code starts being generated
• The value M.instr will be backpatched onto the B1.truelist.
• If B1 is false, then B is also false, so the jumps on B1. falselist become part of B.falselist.
Backpatching -Boolean expressions
Swaps the true
3) B → not B1 and false lists
B.truelist = B1.falselist
B.falselist = B1.truelist
4) B → (B1) Ignores
parenthesis
B.truelist = B1.truelist
B.falselist = B1.falselist • generates two instructions, a
conditional goto and an
5) B → id1 relop id2 unconditional one.
B.truelist = makelist(nextinstr) • Both gotos have unfilled targets
B.falselist = makelist(nextinstr+1 ) • These instructions are put on
B.truelist and B.falselist,
emit(if id1.place relop id2.place goto __ ) respectively
emit(goto ___)
Backpatching-Boolean expressions
6) B → true
B.truelist = makelist(nextinstr)
emit(goto ___)
7) B → false
E.falselist = makelist(nextinstr)
emit(goto ___)
8) M → ε
M.instr = nextinstr;
Backpatching-Boolean expressions Example
Expression of of the
B form B → B1 or M B2
Consider the
expression
B1
B2
B → B1 or M B2
backpatch(B1.falselist, M.instr)
B.truelist = merge(B1.truelist, B2.truelist)
B.falselist = B2.falselist
Backpatching-Boolean expressions Example
B
B1
B2
B1
B2
The marker nonterminal M in the production records the value
of nextinstr, which at this time is 102
B → B1 or M B2
backpatch(B1.falselist, M.instr)
B.truelist = merge(B1.truelist, B2.truelist)
B.falselist = B2.falselist
Backpatching-Boolean expressions Example
B
B1 B2
B1 B2
The marker nonterminal M in the
production records the value of
2)B → B1 and M B2 nextinstr, which at this time is 104
backpatch(B1.truelist, M.instr)
B.truelist = B2.truelist
B.falselist = merge(B1.falselist, B2.falselist)
Backpatching-Boolean expressions Example
B
B1
B2
B1 B2
B1.truelist =102
M.instr=104
backpatch(102,104) – instruction
2)B → B1 and M B2
in 102 is filled with 104
backpatch(B1.truelist, M.instr)
B.truelist = B2.truelist
B.falselist = merge(B1.falselist, B2.falselist)
Backpatching-Boolean expressions Example
Expression of of the
B form B → B1 or M B2
B2
B1
B1.falselist =101
M.instr=102
backpatch(101,102) – instruction
B → B1 or M B2 in 101 is filled with 102
backpatch(B1.falselist, M.instr)
B.truelist = merge(B1.truelist, B2.truelist)
B.falselist = B2.falselist
Backpatching-Boolean expressions Example
B
•S denotes a statement
• L denotes statement list
• B denotes boolean expression
Backpatching-Flow of control
• Boolean expressions generated by nonterminal B have two lists of
jumps
• B. truelist -Contains the list of all the jump statements left incomplete to be
filled by the label for the start of the code for B=true.
• B.falselist-Contains the list of all the jump statements left incomplete to be
filled by the label for the start of the code for B=false.
• Statements generated by nonterminals S and L have a list of unfilled
jumps
• Eventually filled by backpatching.
• S.nextlist - list of all conditional and unconditional jumps to the instruction
following the code for statement S in execution order.
• L.nextlist - list of all conditional and unconditional jumps to the instruction
following the code for statement L in execution order
Backpatching-Translation of Flow of control
statements
Backpatching-Flow of control
• When L’s address has been found, we can do this easily with the
information in the symbol table.
• Calling sequence
• allocate space for activation record
• evaluate arguments
• establish environment pointers
• save status and return address
• jump to the beginning of the procedure
146
Procedure Calls …
Example
147
Code Generation for procedure calls
• Generate three address code needed to evaluate arguments which are
expressions
S → call id ( Elist )
for each item p on queue do emit('param' p)
emit('call' id.place)
Elist → Elist , E
append E.place to the end of queue
Elist → E
initialize queue to contain E.place
148
Code Generation
Code Generation
Requirements Challenges
•Preserve semantic •Problem of
meaning of source generating
program optimal target
•Make effective program is
use of available undecidable
resources of •Many
target machine subproblems
•Code generator encountered in
itself must run code generation
efficiently are
computationally
Main Tasks of Code Generator
• Instruction selection: choosing
appropriate target-machine instructions
to implement the IR statements
• Registers allocation and assignment:
deciding what values to keep in which
registers
• Instruction ordering: deciding in what
order to schedule the execution of
instructions
Issues in the Design of a Code Generator
•Input to the Code
Generator
•The Target Program
•Instruction Selection
•Register Allocation
•Evaluation Order
Input to the Code Generator
• The input to the code generator is the intermediate representation of the
source program
• Here it is assume that all syntactic and static semantic errors have been detected,
that the necessary type checking has taken place, and that type conversion operators
have been inserted wherever necessary.
• The code generator can therefore proceed on the assumption that its input is free of
these kinds of errors.
The Target Program
• The instruction-set architecture of the target machine has a
significant impact on the difficulty of constructing a good code
generator that produces high-quality machine code.
• The most common target-machine architectures are
• RISC (reduced instruction set computer)
• It has many registers, three-address instructions, simple addressing modes, and a
relatively simple instruction-set architecture
• CISC (complex instruction set computer),
• It has few registers, two-address instructions, a variety of addressing modes,
several
register classes, variable-length instructions, and instructions with side effects
• Stack based
• operations are done by pushing operands onto a stack and then performing the
operations on the operands at the top of the stack
The Target Program (Cont)
• To overcome the high performance penalty of interpretation, just-in-time
(JIT) Java compilers have been created.
• These JIT compilers translate bytecodes during run time to the native
hardware instruction set of the target machine
• Producing an absolute machine-language program as output has the
advantage that it can be placed in a fixed location in memory and
immediately executed
• Producing a relocatable machine-language program (often called an
object module) as output allows subprograms to be compiled separately. A
set of relocatable object modules can be linked together and loaded for
execution by a linking loader.
Instruction Selection
• The code generator must map the IR program into a code sequence
that can be executed by the target machine.
• The complexity of performing this mapping is determined by a
factors such as
• The level of the IR
• The nature of the instruction-set architecture
• The desired quality of the generated code.
Instruction Selection
• If the IR is high level, the code generator may translate each IR
statement into a sequence of machine instructions using code
templates. Such statement by-statement code generation, however,
often produces poor code that needs
• If the IR reflects some of the low-level details of the underlying
machine, then the code generator can use this information to generate
more efficient code sequences.
• The quality of the generated code is usually determined by its speed
and
size.
• On most machines, a given IR program can be implemented by many
different code sequences, with significant cost differences between the
different implementations
Register Allocation
• A key problem in code generation is deciding what values to hold in what registers.
• Registers are the fastest computational unit on the target machine, but we usually do
not have enough of them to hold all values.
• Values not held in registers need to reside in memory.
• Instructions involving register operands are invariably shorter and faster than those
involving operands in memory, so efficient utilization of registers is particularly
important.
• The use of registers is often subdivided into two sub-problems:
• Register allocation, during which we select the set of variables that will reside in registers at
each
point in the program.
• Register assignment, during which we pick the specific register that a variable will reside in.
• Finding an optimal assignment of registers to variables is difficult, even with single-
register machines.
• Mathematically, the problem is NP-complete.
• The problem is further complicated because the hardware and/or the operating system
of the target machine may require that certain register-usage conventions be observed.
Evaluation Order
• The order in which computations are performed can affect the
efficiency of the target code.
• Some computation orders require fewer registers to hold
intermediate results than others.
• However, picking a best order in the general case is a difficult NP-
complete problem.
• Initially, we shall avoid the problem by generating code for the
three- address statements in the order in which they have been
produced by the intermediate code generator.
Prepared by R I Minu
Simple Code Generator
Content..
✔Introduction(Simple Code Generator)
✔Register and Address Descriptors
✔A Code-Generation Algorithm
✔The Function getreg
✔Generating Code for Other Types of Statements
Introduction
• Code Generator:
✔ Code generator can be considered as the final phase of
compilation.
✔ It generates a target code for a sequence of three-address
statement.
✔ It consider each statement in turn, remembering if any of the
operands of the statement are currently in registers, and
taking advantage of that fact if possible.
Introduction
• We assume that computed results canbe left in
registers an long as possible.
• Storing them only
a) if their register is needed for another
computation (or)
b) Just before a procedurecall, jump, or labeled
statement
• Condition (b)implies that everything must be stored just
before the end of a basic block.
Introduction
• The reason we must do so is that, after leaving a basic block, we may be
able to go to several different blocks, or we may go to one particular block
that can be reached from several others.
• In either case, we cannot, without extra effort, assume that a datum used by
a block appears in the same register no matter how control reached that
block.
• Thus, to avoid a possible error, our simple code- generator algorithm stores
everything when moving across basic block boundaries as well as when
procedure calls are made.
Introduction
• We canproduce reasonablecode for a three- address
statement a:=b+c
• If we generate the single instruction ADD Rj,Ri
– Keep rack of locaton where curren value of he name can be found a runtme
P sort
L Java
Machines
• Machine executing • Example: Sun workstation
language M executing SPARC machine
code
M sparc
Executing Programs
• Program implementation language must match
machine
sort
sparc
sparc
Interpreters
• Interpreter executing • Example: Lisp interpreter
language L written in running on sparc
language M
L Lisp
M sparc
Interpreting Programs
• Interpreter mediates between programming language
and machine language
sort
Lisp
Lisp
sparc
sparc
Virtual Machines
• Interpreter creates a “virtual machine”
sort sort
Lisp Lisp
Lisp
sparc Lisp
sparc
Compilers
• Compiler translating • Example: C compiler for
from source language S sparc platform
to target language T
implemented in M
S --> T C sparc
✇
M sparc
Compiling Programs
• Compiler inputs program in source language, outputs
in target language
sort sort
C == sparc
C sparc
Java Programming Environment
• Javac: Java to Java byte code (JBC) compiler
• Java: Java Virtual Machine byte code interpreter
P P P
Java Java JBC JBC JBC
✇
M JBC
“javac” M “java” M
M
Where Do Compilers Come From?
L M
✇
M
A lot of work
Where Do Compilers Come From?
L M L M
✇ ✇
C C M M
✇
M
L M
✇
L
Bootstrapping a Compiler
• Write the compiler in its own language (#0)
• Write a simple native compiler (#1)
• Use compiler #1 to compile #0 to get native compiler
with more frills (#2)
• Repeat as desired
Bootstrapping a Compiler
L M L M
✇ ✇
#0 (the real L L --> M M #2 (real thing
thing) M compiled)
#1 (simple)
Bootstrapping a Compiler, Stage 2
L M L M
✇ ✇
#0 (the real L L M M #3 (compiled with
✇
thing) the real thing)
M
L N L N
✇ ✇
C C N N
✇
N
Porting a Compiler II
• Rewrite back end to target new machine
• Compile using native compiler
L N L N
✇ ✇
L L M M “cross
✇
compiler”
M
Cross Compilers
• A cross compiler compiles to a target language
different from the language of the machine it runs on
Porting a Compiler II
• Rewrite back end to target new machine
• Compile using native compiler
• Recompile using cross compiler
L N L N
✇ ✇
L L N N
✇
M