Intermediate code
generation
Compiler construction
Introduction to semantic analysis
Analysis Synthesis
of input program of output program
(front-end) (back-end)
Compiler Passes character
stream
Intermediate
Lexical Analysis Code Generation
token intermediate
stream form
Syntactic Analysis Optimization
abstract intermediate
syntax tree form
Semantic Analysis Code Generation
annotated target
AST language
Compiler construction
Introduction to semantic analysis
• The intermediate code generator is a phase in the
compiler that translates the source code into an
intermediate representation, which is closer to
the target machine code but still independent of
the target machine.
• This intermediate code serves as a bridge
between the high-level source code and the low-
level machine code.
• It simplifies the process of code optimization and
target code generation.
• The main goal of the intermediate code generator
is to produce an efficient and easily transformable
intermediate representation of the source code.
Compiler construction
Intermediate code generation
.
Compiler construction
Outline
• Why front end and back end in compilers?
• Different forms of intermediate code generation
• Postfix notation
• Abstract Syntax Tree
• Three Address Code
Compiler construction
Recall
• The objective of a compiler is to analyze a source
program and produce target code.
• Front end analyzes the source program and generates
an intermediate code.
• Eack end takes the Intermediate code as input and
generates the target code.
Compiler construction
Need for Intermediate code
C C Compiler Machine
Progra for 80X86 instructions
m System for 80X86
systems
C C Compiler Machine
Progra for SPARC instructions
m System for SPARC
systems
Compiler construction
Need for Intermediate code
C
C Compile
Program r Front
end
Intermedi
ate code
Compiler Compiler
Eank end Eank end
for 80X86 for SPARC
System System
Machine Machine
instructions instructions
for 80X86 for SPARC
systems systems
Compiler construction
Advantages of using Intermediate Code
• Retargeting to a different machine.
• Optimization of the code at intermediate level.
Front Target
Language 1 end machine
back 2
Intermediate end Target
Language Representati machine 1
2 on Target
Language machine 3
3
Compiler construction
Recall
AST/DAGTAc
postfix
A+bte
Intermediate code
Front- - Back-
end end
Target machine code
it (a cb)
Enables machine-independent code optimization
Compiler construction
Intermediate Representations
• We could translate the source program directly into the
target language.
• However, there are benefits to having an intermediate,
machine-independent representation.
• A clear distinction between the machine-independent and machine-dependent
parts of the compiler
• Retargeting is facilitated; the implementation of language processors for new
machines will require replacing only the back-end
• We could apply machine independent code optimization techniques
Compiler construction
Intermediate Representations
• Intermediate representations span the gap between the
source and target languages.
• High Level Representations
• closer to the source language
• easy to generate from an input program
• code optimizations may not be straightforward
• Low Level Representations
• closer to the target machine
• Suitable for register allocation and instruction selection
• easier for optimizations, final code generation
Compiler construction
Options for intermediate code
• There are several options for intermediate code.
• Specific to the language being implemented
• P-code for Pascal
• Object code for C
• Bytecode for Java
• Language independent:
• 3-address code
Compiler construction
Intermediate forms
• Postfix notation
• Syntax tree (Graphical representation of statements)
• Abstract Syntax Tree
• Parse Tree
• Directed acyclic graph(DAG)
• Three-address code
• Quadruple
Compiler construction
Postfix Notation:
• Any expression can be written unambiguously without
parentheses, nor need for stating operator precedence.
• Ideally suited for source languages that primarily deal
with expression like SNOEOL.
• We can easily build interpreters for these expressions,
using a stack.
• This is the procedure followed by most assemblers.
Compiler construction
Register Allocator
•
• A register allocator assigns temporary variables (which are
often represented in the intermediate code) to a limited
number of CPU registers in the target machine. Since most
modern CPUs have a limited number of registers, the
allocator must decide which variables should be stored in
registers and which should be stored in memory.
• The goal of register allocation is to minimize register spilling,
where variables are stored in slower memory because there
aren't enough registers available. Efficient register allocation
can have a significant impact on the performance of the
generated code.
Compiler construction
How to generate the postfix code?
• A semantic stack is used to represent the postfix code being
generated.
• This stack is initially empty.
• Semantic actions are connected with each production (as
seen in semantic analysis).
• Only one semantic action is used to create the semantic
stack:
• push <value> : place a value (address or operator) on
the semantic stack
Compiler construction
How to generate the postfix code?
All logic or arithmetic operations are assumed to be directly
supported by the machine:
+, *, /, -, and, or
Compiler construction
How to generate the postfix code?
An example ‘semantic grammar’
E -> E+T {push +} a*(9+d) a9d+*
E -> E-T {push -}
E -> T
T -> T*F {push *}
T -> T/F {push /}
T -> F
F -> i {push i}
F -> (E)
Parentheses have no
effect on resulting
postfix
code
Compiler construction
Suffix notation
a*(b+c/a)
Compiler construction
Suffix notation
a*(b+c/a)
Compiler construction
Suffix notation
a*(b+c/a)
Compiler construction
Suffix notation
a*(b+c/a)
Compiler construction
Suffix notation
a*(b+c/a)
Compiler construction
Suffix notation
a*(b+c/a)
Compiler construction
Suffix notation
a*(b+c/a)
The Result
• We can treat the stack generated by the prior process as the intermediate representation
of the input, in postfix notation form:
Compiler construction
Extending to other structures:
• It is clear postfix notation can be used for representing
mathematical expressions.
• Can it also be extended to handle all other programming
constructs?
E.g., assignment? Control-flow structures?
• If feasible, this notation would offer a good candidate for
intermediate code representation:
• It is simple
• It is easy to interpret
• It is unambiguous
Compiler construction
Unary Operators:
• Some operators, such as ‘-’, can be a unary (single
argument) or binary operator.
• If we just map these operators into the postfix notation, it
will be ambiguous whether they operate on one or two
arguments.
• Two solutions:
• 1. Convert to a binary op:
• Map: ‘-a’ to ‘0a-’
• 2. Create a new operator for the unary use of ‘-’:
• map: ‘-a’ to ‘a_’
Compiler construction
How to generate the postfix
code:
An example ‘semantic grammar’
E -> E+T {push +}
E -> E-T {push -}
E -> T
T -> T*F {push *}
T -> T/F {push /}
T -> F
F -> i {push i}
F ->(E)
Note that
F -> -F {push _} Unary –
has distinct
operator
Compiler construction
Extending to other Structures
Assignment:
• An assignment statement:
V=a+b can be represented as follows: Vab+=
• The ‘=‘ operator thus:
• Takes the previous two elements on the stack
• Assumes the earlier one is a memory address
• And places the second value in that location.
a=a*(9+d) aa9d+*=
Compiler construction
Flow Control
• Many flow control statements (if-then, while, for, etc.) can
be mapped onto assembler which depends on some sort
of conditional or unconditional jump statement
CMP [A], [B]
JZ L1
• The intermediate code can make use of the same kind of
solution:
label jmp_to
• Jmp_to is a unary operator which takes the prior element
on the stack as a memory address.
Similar for: jump_if_zero and jump_if_false
Compiler construction
Generating Target Code from
Postfix form
• Generating Assembler from Postfix form: stack-based
assembler
• The operators used in postfix notation are similar to
those used in most assembler languages.
• The stack data structure is also a basis for most
assemblers
• It should thus be easy to translate from postfix
representation to a given assembler language.
Compiler construction
If(f<6) Postfix form
z=p/q;
else if(f==6) f6< ?z p q/=: f6==?z p q*=:z p q-
z=p*q;
else 1 f 12 JNE 20
z=p-q; 2 6 13 z
3 JGE 25 14 p
4 z 16 q
5 p 17 *
6 q 18 =
7 / 19 25 Jump
8 = 20 z
9 25 Jump 21 p
10 f 22 q
11 6 23 -
Postfix form 24 =
Intermediate code 25
Compiler construction
Syntax Trees
• A syntax tree shows the structure of a program by
abstracting away irrelevant details from a parse tree.
• Each node represents a computation to be performed;
• The children of the node represents what that
computation is performed on.
• Syntax trees decouple parsing from subsequent
processing.
Compiler construction
Syntax Trees: Structure
• Expressions:
• leaves: identifiers or constants;
• internal nodes are labeled with operators;
• the children of a node are its operands.
• Statements:
• a node’s label indicates what kind of statement
it is;
• the children correspond to the components of
the statement.
Compiler construction
Abstract Syntax Tree
block
block while(expr)
{
S1; S1 S2 S3 s1;
S2; while
S3;
}
expr S1
If(expr) If-else
s1;
else expr S1 S2
s2;
Compiler construction
Abstract Syntax Tree
if(a<b)
{
if-else
p=q;
r=s;
} < stmt-list stmt-list
else
{
a b= = = =
c=d;
e=f
}
p q r s c de f
Compiler construction
Syntax Trees: Example
Grammar :
E -> E + T | T
T -> T * F | F
F -> ( E ) | id
Input: id + id * id
Compiler construction
Example
Compiler construction
Constructing Syntax Trees
• General Idea: construct bottom-up using synthesized attributes.
Compiler construction
Directed Acyclic Graph (DAG):
• More compact representation
• Gives clues regarding generation of efficient code
Compiler construction
Generating DAG from AST
Pros: Easy restructuring of code and/or expressions
For intermediate code optimization
Cons: Memory intensive
Compiler construction
Three Address Code
• Low-level Intermediate Representation.
• Addresses
• Instruction operands are addresses and come in one of
three flavors:
• name : will ultimately be a key in the symbol table
• constant : holds literal value
• temporary : generated by compiler to hold intermediate result
• Instructions are of the form ‘x = y op z,’ where x, y, z are
variables, constants, or “temporaries”.
• At most one operator allowed on RHS, so no ‘built-up”
expressions.
• Instead, expressions are computed using temporaries
(compiler-generated variables).
Compiler construction
Three-Address Code - Example
Compile: a : = b * -c + b * -c
Code for syntax tree
t1 := - c
t2 := b * t1
t3 := - c
t4 : = b * t3
t5 := t2 + t4
a := t5
Code for DAG
t1 := - c
t2 := b * t1
t5 := t2 + t2
a := t5
Compiler construction
Three-Address Code
Linearized representation of AST
a+a*(b-c)+(b-c)*d
Compiler construction
Three address statements
• Assignment statements: x:= y op z , x:=op y
• Indexed assignments: x:= y[i],x[i]:=y
• Pointer assignments: x:=&y,x:=*y,*x:=y
• Copy statements: x:= y
• Unconditional jumps: goto lab
• Conditional jumps: if a relop y goto lab
• Function calls: param x … call p,n
return y
Compiler construction
Creating 3AC
• Assume bottom up parser
– Covers a wider range of grammars
– LALR sufficient to cover most programming languages
• Creating 3AC via SDD
• Attributes examples:
– code – code generated for a nonterminal
– temp – name of variable that stores result of nonterminal
• newTemp() – helper function that returns the name of a new
variable
Compiler construction