0% found this document useful (0 votes)
58 views47 pages

Intermediate Code Generation in Compilers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views47 pages

Intermediate Code Generation in Compilers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Intermediate code

generation

Compiler construction
Introduction to semantic analysis
Analysis Synthesis
of input program of output program
(front-end) (back-end)
Compiler Passes character
stream
Intermediate
Lexical Analysis Code Generation

token intermediate
stream form

Syntactic Analysis Optimization

abstract intermediate
syntax tree form

Semantic Analysis Code Generation

annotated target
AST language
Compiler construction
Introduction to semantic analysis

• The intermediate code generator is a phase in the


compiler that translates the source code into an
intermediate representation, which is closer to
the target machine code but still independent of
the target machine.
• This intermediate code serves as a bridge
between the high-level source code and the low-
level machine code.
• It simplifies the process of code optimization and
target code generation.
• The main goal of the intermediate code generator
is to produce an efficient and easily transformable
intermediate representation of the source code.
Compiler construction
Intermediate code generation
.

Compiler construction
Outline

• Why front end and back end in compilers?

• Different forms of intermediate code generation


• Postfix notation
• Abstract Syntax Tree
• Three Address Code

Compiler construction
Recall

• The objective of a compiler is to analyze a source


program and produce target code.
• Front end analyzes the source program and generates
an intermediate code.
• Eack end takes the Intermediate code as input and
generates the target code.

Compiler construction
Need for Intermediate code

C C Compiler Machine
Progra for 80X86 instructions
m System for 80X86
systems
C C Compiler Machine
Progra for SPARC instructions
m System for SPARC
systems

Compiler construction
Need for Intermediate code
C
C Compile
Program r Front
end

Intermedi
ate code

Compiler Compiler
Eank end Eank end
for 80X86 for SPARC
System System
Machine Machine
instructions instructions
for 80X86 for SPARC
systems systems
Compiler construction
Advantages of using Intermediate Code

• Retargeting to a different machine.

• Optimization of the code at intermediate level.

Front Target
Language 1 end machine
back 2
Intermediate end Target
Language Representati machine 1
2 on Target
Language machine 3
3
Compiler construction
Recall

AST/DAGTAc
postfix
A+bte
Intermediate code
Front- - Back-
end end
Target machine code

it (a cb)
Enables machine-independent code optimization

Compiler construction
Intermediate Representations
• We could translate the source program directly into the
target language.

• However, there are benefits to having an intermediate,


machine-independent representation.
• A clear distinction between the machine-independent and machine-dependent
parts of the compiler
• Retargeting is facilitated; the implementation of language processors for new
machines will require replacing only the back-end
• We could apply machine independent code optimization techniques

Compiler construction
Intermediate Representations

• Intermediate representations span the gap between the


source and target languages.

• High Level Representations


• closer to the source language
• easy to generate from an input program
• code optimizations may not be straightforward

• Low Level Representations


• closer to the target machine
• Suitable for register allocation and instruction selection
• easier for optimizations, final code generation

Compiler construction
Options for intermediate code

• There are several options for intermediate code.

• Specific to the language being implemented


• P-code for Pascal
• Object code for C
• Bytecode for Java

• Language independent:
• 3-address code

Compiler construction
Intermediate forms

• Postfix notation
• Syntax tree (Graphical representation of statements)
• Abstract Syntax Tree
• Parse Tree
• Directed acyclic graph(DAG)
• Three-address code
• Quadruple

Compiler construction
Postfix Notation:

• Any expression can be written unambiguously without


parentheses, nor need for stating operator precedence.

• Ideally suited for source languages that primarily deal


with expression like SNOEOL.

• We can easily build interpreters for these expressions,


using a stack.

• This is the procedure followed by most assemblers.

Compiler construction
Register Allocator

• A register allocator assigns temporary variables (which are
often represented in the intermediate code) to a limited
number of CPU registers in the target machine. Since most
modern CPUs have a limited number of registers, the
allocator must decide which variables should be stored in
registers and which should be stored in memory.
• The goal of register allocation is to minimize register spilling,
where variables are stored in slower memory because there
aren't enough registers available. Efficient register allocation
can have a significant impact on the performance of the
generated code.

Compiler construction
How to generate the postfix code?
• A semantic stack is used to represent the postfix code being
generated.

• This stack is initially empty.

• Semantic actions are connected with each production (as


seen in semantic analysis).

• Only one semantic action is used to create the semantic


stack:
• push <value> : place a value (address or operator) on
the semantic stack
Compiler construction
How to generate the postfix code?

All logic or arithmetic operations are assumed to be directly


supported by the machine:

+, *, /, -, and, or

Compiler construction
How to generate the postfix code?
An example ‘semantic grammar’

E -> E+T {push +} a*(9+d) a9d+*


E -> E-T {push -}
E -> T
T -> T*F {push *}
T -> T/F {push /}
T -> F
F -> i {push i}
F -> (E)
Parentheses have no
effect on resulting
postfix
code
Compiler construction
Suffix notation

a*(b+c/a)

Compiler construction
Suffix notation

a*(b+c/a)

Compiler construction
Suffix notation

a*(b+c/a)

Compiler construction
Suffix notation

a*(b+c/a)

Compiler construction
Suffix notation

a*(b+c/a)

Compiler construction
Suffix notation

a*(b+c/a)

Compiler construction
Suffix notation

a*(b+c/a)

The Result
• We can treat the stack generated by the prior process as the intermediate representation
of the input, in postfix notation form:

Compiler construction
Extending to other structures:

• It is clear postfix notation can be used for representing


mathematical expressions.
• Can it also be extended to handle all other programming
constructs?
E.g., assignment? Control-flow structures?
• If feasible, this notation would offer a good candidate for
intermediate code representation:
• It is simple
• It is easy to interpret
• It is unambiguous

Compiler construction
Unary Operators:

• Some operators, such as ‘-’, can be a unary (single


argument) or binary operator.

• If we just map these operators into the postfix notation, it


will be ambiguous whether they operate on one or two
arguments.

• Two solutions:
• 1. Convert to a binary op:
• Map: ‘-a’ to ‘0a-’
• 2. Create a new operator for the unary use of ‘-’:
• map: ‘-a’ to ‘a_’

Compiler construction
How to generate the postfix
code:
An example ‘semantic grammar’
E -> E+T {push +}
E -> E-T {push -}
E -> T
T -> T*F {push *}
T -> T/F {push /}
T -> F
F -> i {push i}
F ->(E)
Note that
F -> -F {push _} Unary –
has distinct
operator
Compiler construction
Extending to other Structures

Assignment:
• An assignment statement:
V=a+b can be represented as follows: Vab+=

• The ‘=‘ operator thus:


• Takes the previous two elements on the stack
• Assumes the earlier one is a memory address
• And places the second value in that location.
a=a*(9+d) aa9d+*=

Compiler construction
Flow Control

• Many flow control statements (if-then, while, for, etc.) can


be mapped onto assembler which depends on some sort
of conditional or unconditional jump statement
CMP [A], [B]
JZ L1
• The intermediate code can make use of the same kind of
solution:
label jmp_to
• Jmp_to is a unary operator which takes the prior element
on the stack as a memory address.
Similar for: jump_if_zero and jump_if_false

Compiler construction
Generating Target Code from
Postfix form

• Generating Assembler from Postfix form: stack-based


assembler
• The operators used in postfix notation are similar to
those used in most assembler languages.
• The stack data structure is also a basis for most
assemblers
• It should thus be easy to translate from postfix
representation to a given assembler language.

Compiler construction
If(f<6) Postfix form
z=p/q;
else if(f==6) f6< ?z p q/=: f6==?z p q*=:z p q-
z=p*q;
else 1 f 12 JNE 20
z=p-q; 2 6 13 z
3 JGE 25 14 p
4 z 16 q
5 p 17 *
6 q 18 =
7 / 19 25 Jump
8 = 20 z
9 25 Jump 21 p
10 f 22 q
11 6 23 -

Postfix form 24 =
Intermediate code 25
Compiler construction
Syntax Trees

• A syntax tree shows the structure of a program by


abstracting away irrelevant details from a parse tree.
• Each node represents a computation to be performed;
• The children of the node represents what that
computation is performed on.
• Syntax trees decouple parsing from subsequent
processing.
Compiler construction
Syntax Trees: Structure

• Expressions:
• leaves: identifiers or constants;
• internal nodes are labeled with operators;
• the children of a node are its operands.

• Statements:
• a node’s label indicates what kind of statement
it is;
• the children correspond to the components of
the statement.

Compiler construction
Abstract Syntax Tree
block
block while(expr)
{
S1; S1 S2 S3 s1;
S2; while
S3;
}
expr S1
If(expr) If-else
s1;
else expr S1 S2
s2;

Compiler construction
Abstract Syntax Tree

if(a<b)
{
if-else
p=q;
r=s;
} < stmt-list stmt-list
else
{
a b= = = =
c=d;
e=f
}
p q r s c de f

Compiler construction
Syntax Trees: Example

Grammar :
E -> E + T | T
T -> T * F | F
F -> ( E ) | id
Input: id + id * id

Compiler construction
Example

Compiler construction
Constructing Syntax Trees
• General Idea: construct bottom-up using synthesized attributes.

Compiler construction
Directed Acyclic Graph (DAG):
• More compact representation
• Gives clues regarding generation of efficient code
Compiler construction
Generating DAG from AST

Pros: Easy restructuring of code and/or expressions


For intermediate code optimization
Cons: Memory intensive

Compiler construction
Three Address Code
• Low-level Intermediate Representation.
• Addresses
• Instruction operands are addresses and come in one of
three flavors:
• name : will ultimately be a key in the symbol table
• constant : holds literal value
• temporary : generated by compiler to hold intermediate result
• Instructions are of the form ‘x = y op z,’ where x, y, z are
variables, constants, or “temporaries”.
• At most one operator allowed on RHS, so no ‘built-up”
expressions.
• Instead, expressions are computed using temporaries
(compiler-generated variables).
Compiler construction
Three-Address Code - Example

Compile: a : = b * -c + b * -c
Code for syntax tree
t1 := - c
t2 := b * t1
t3 := - c
t4 : = b * t3
t5 := t2 + t4
a := t5
Code for DAG
t1 := - c
t2 := b * t1
t5 := t2 + t2
a := t5
Compiler construction
Three-Address Code
Linearized representation of AST

a+a*(b-c)+(b-c)*d

Compiler construction
Three address statements

• Assignment statements: x:= y op z , x:=op y


• Indexed assignments: x:= y[i],x[i]:=y
• Pointer assignments: x:=&y,x:=*y,*x:=y
• Copy statements: x:= y
• Unconditional jumps: goto lab
• Conditional jumps: if a relop y goto lab
• Function calls: param x … call p,n
return y

Compiler construction
Creating 3AC

• Assume bottom up parser


– Covers a wider range of grammars
– LALR sufficient to cover most programming languages

• Creating 3AC via SDD

• Attributes examples:
– code – code generated for a nonterminal
– temp – name of variable that stores result of nonterminal
• newTemp() – helper function that returns the name of a new
variable

Compiler construction

You might also like