0% found this document useful (0 votes)
49 views32 pages

C Ompiler Theory: (Intermediate C Ode Generation - Abstract S Yntax + 3 Address C Ode)

The document discusses intermediate code generation in compilers. It describes why intermediate code is used, common intermediate representations like syntax trees and three-address code, and how three-address code can be generated from expressions and statements. It also discusses code generation from the intermediate representation to the target machine language.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views32 pages

C Ompiler Theory: (Intermediate C Ode Generation - Abstract S Yntax + 3 Address C Ode)

The document discusses intermediate code generation in compilers. It describes why intermediate code is used, common intermediate representations like syntax trees and three-address code, and how three-address code can be generated from expressions and statements. It also discusses code generation from the intermediate representation to the target machine language.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

C ompiler Theory

(Intermediate C ode Generation –


Abstract S yntax + 3 Addres s C ode)

006
Why intermediate code ?

 Details of the source language are confined to the front-


end (analysis phase) of a compiler, while details of the
target machine are confined to the back-end (synthesis)
part.
 This saves a considerable amount of effort since with m
front-ends and n back-ends we have m*n compilers.
Intermediate representations
 Syntax Trees
 Code is represented in the form of a tree where
nodes represent constructs in the source program;
the children of a node represent the meaningful
components of a construct.

 Three-Address Code
 Made up of instructions of the general form x=y op z
 X, y and z are the three addresses.

 C
 Often used as an intermediate representation. e.g.
LOTOS language
Directed Acyclic Graphs (i)

 Nodes in a syntax tree represent constructs


in the source program

 A DAG is used to identify common sub-


expressions. e.g. a+a*(b-c)+(b-c)*d

 By doing so it gives the compiler important


hints on how to generate efficient code to
evaluate the expressions.
DAG for a+a*(b-c)+(b-c)*d
Three Address Code (TAC)

 An alternative form of intermediate (lower level)


representation.
 x+y*x becomes
 t1 = y * z
 t 2 = x + t1
 For expressions TAC is very similar to syntax
trees.
 For statements it would produce labels and
jumps in a similar fashion to machine code.
TAC for a+a*(b-c)+(b-c)*d
Addresses and Instructions

 An address can be
 A name : source program names
 In an implementation we would have these
as pointers pointing towards the symbol
table.
 A constant
 A compiler-generated temporary
 To store temporary results
Addresses and Instructions (i)

 Assignment instructions
 x = y op z where op is binary
 x = op y where op is unary (-, !, type casting)
 Copy Instructions
 x=y
 Jumps
 Unconditional : goto L
 Conditional :
 if x goto L
 ifFalse x goto L
 if x relop y goto L
Addresses and Instructions (ii)

 Procedure Calls and returns


 Parameters : param x
 Procedure with n params: call p,n
 Function with n params : y = call f,n
 Indexed Array[] Access
 x = y[i]
 x[i] = y
 Address and Pointer Assignments (no need to be covered)
 x = &y
 x = *y
 *x = y
do i = i+1; while (a[i] < v);
Quadruples and Triples

 Data structures to hold three address


code instructions.
 A Quadruple has four fields ( x = y+z)
 Op (+)
 Arg1 (y)
 Arg2 (z)
 Result (x)
An example ...

 Note that in an actual implementation a,b and c should be


pointers to the symbol table.
Triples

 Omit result field.


 Instead of a result field we can use
pointers to the triple structure itself.
 This makes DAG and triple
representation practically identical,
since we are pointing to a node.
 In next example (n) indicates position
n in the triple structure
An example ...

 Note similarity ...


Code Generation

 This is the final phase of a compiler


 Takes an intermediate representation and generates
the equivalent target program
 Code optimisation (if any) occurs between the
intermediate and target code generation
 We require that the code
 is correct and
 effectively uses the resources on the target
machine
 Is itself efficient in generating code
Code Generation

 Whoever is designing the code generator must have


a very good knowledge of the architecture of the
target hardware and operating system.
 Should keep in mind
 Memory management
 Instruction selection
 Register allocation
 Evaluation order
 We shall look at generic issues
Input to the Code Generator

 Consists of
 Intermediate Code produced by front end
 Symbol Table to determine the runtime addresses
of the data objects denoted by the names in the
intermediate representation

 The underlying machine memory is byte-addressable


and would have a number (say n) of general-purpose
registers.
Assembly Language Instructions

 Most instructions consist of an operator,


followed by a target, followed by a list of
source operands.

 A label may precede an instruction

 We the next few slides we shall look at a


number of different instruction classes.
Load Operations

 LD dst, addr
 Loads the value in location addr into
location dst.
 dst = addr
 LD r, x loads the value at addr x into r
 LD r1, r2 loads the contents of register r2
into register r1
Store operations

 ST x, r
 Stored the value in register r into the
location x.
 This instruction denotes the assignment
x = r.
 Note difference from LD
Computation Operators

 OP dst, src1, src2


 OP would be ADD, SUB etc
 dst, src1, src2 are locations
 Applies operation OP to the values in
locations src1 and src2, and place the
result of this operation in location dst
 ADD r1, r2, r3 computes r1=r2+r3
 Unary OP do not have src2
Unconditional Jumps

 BR L
 Causes control to branch to the machine
instruction with label L
 BR stands for branch
Conditional Jumps

 Bcond r, L
 Where r is a register and L is a label
 Cond stands for any of the of the common
tests on values eg. LT, GT, etc
 BLTZ r, L causes a jump to label L if the
value in register r is less than zero, and
allows control to pass to the next machine
instruction if not.
3AC to machine code (x=y-z)

 LD R1, y // R1 = y
 LD R2, z // R2 = z
 SUB R1, R1, R2 // R1 = R1 - R2
 ST x, R1 // x = R1
3AC to machine code (if x<y goto L)

 LD R1, x // R1 = x
 LD R2, y // R2 = y
 SUB R1, R1, R2 // R1 = R1 - R2
 BLTZ R1, M // if R1<0 jump M
Instruction Selection

 Instruction selection effects


 Execution speed and
 Size
 A rich instruction set may provide
several ways to perform any given
operation.
 Typical e.g. ... INC x instruction
replaces ADD x + 1;
Register Allocation

 Instructions involving registry operands are usually


shorter and much faster than those involving memory
operands. For this reason utilization of registers is
important in generating fast code.
 The use of registers is often subdivided into two problems
 During register allocation, we select the set of
variables that will reside in registers at a point in the
program
 During subsequent register assignments, we pick the
specific register that the variable will be stored in
 Optimal assignment of registers is NP-complete ... hence
some heuristics have to be used.
Standard code optimizations (i)

 Common Sub Expression


 An expression E is called a common sub-expression if
E was previously computed and the values of the
variables in E have not changed. In such cases we can
use the previously computed value of E.
 Copy Propagation
 Reorganises assignment statements so that:
 x=y
 z=x
 Becomes
 x=y
 z=y
Standard code optimizations (ii)

 Dead Code Elimination


 A variable is 'live' at a point in a program if its value
can be used subsequently, otherwise it is 'dead'.
Statements may compute values that are never used
in a program.
 e.g. Computing expressions values which are never
assigned
 e.g. If (debug) then print ... and someone (using data
flow analysis) the compiler can deduce that debug is
always false. Check and print code can be removed.
Standard code optimizations (iii)

 copy propagations + dead code elimination


 x = t3
 a[t2] = t5
 a[t4] = x
 goto b2
 Elimination of copy propagation
 x= t3
 a[t2] = t5
 a[t4] = t3
 goto b5
 Elimination of dead code
 a[t2] = t5
 a[t4] = t3
 goto b5
Standard code optimizations (iii)

 Loop optimisation
 Loops are an important place where optimisations may
occur. Clearly we try to reduce the number of
instructions inside the loop ! One option is to use code
motion ( and maintain semantics )
 This transformation takes out of the loop any
expressions that have the same evaluation
independent of the number of times the loop executes
(loop-invariant computation) and places it before the
loop.
 e.g. while (i <= (limit-1)) .... limit – 1 can be
computed before the loop !! i <= t where t = limit-1

You might also like