0% found this document useful (0 votes)
44 views6 pages

cs471 16 Ir

The document discusses intermediate code generation in compilers. It explains that an intermediate representation (IR) is generated from the abstract syntax tree to make the code portable, optimizable, and able to interface with compiler components. A good IR has a small number of node types, is easy to translate to and from the AST, and keeps machine dependencies out for as long as possible. The document describes different types of IRs like high-level IR for early optimizations and medium-level IR that is more machine-like but still language-independent. It also outlines properties of the "IR machine" that the intermediate code runs on.

Uploaded by

usha kiran kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views6 pages

cs471 16 Ir

The document discusses intermediate code generation in compilers. It explains that an intermediate representation (IR) is generated from the abstract syntax tree to make the code portable, optimizable, and able to interface with compiler components. A good IR has a small number of node types, is easy to translate to and from the AST, and keeps machine dependencies out for as long as possible. The document describes different types of IRs like high-level IR for early optimizations and medium-level IR that is more machine-like but still language-independent. It also outlines properties of the "IR machine" that the intermediate code runs on.

Uploaded by

usha kiran kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Intermediate Code Generation

Source code

lexical
Lexical Analysis errors
Intermediate Code tokens
syntax
Syntactic Analysis errors
AST
CS 471 semantic
Semantic Analysis errors
October 29, 2007
AST’

Intermediate Code Gen


IR
1 CS 471 – Fall 2007

Motivation Why an Intermediate Representation?

What we have so far... • What is the IR used for?


• An abstract syntax tree – Portability
– With all the program information – Optimization
– Known to be correct – Component Interface
• Well-typed – Program understanding
• Nothing missing • Compiler
• No ambiguities – Front end does lexical analysis, parsing, semantic
What we need... analysis, translation to IR
– Back end does optimization of IR, translation to
• Something “Executable”
machine instructions
• Closer to actual machine level of abstraction
• Try to keep machine dependences out of IR
for as long as possible

2 CS 471 – Fall 2007 3 CS 471 – Fall 2007

Intermediate Code What Makes a Good IR?

• Abstract machine code – (Intermediate • Easy to translate from AST


Representation) • Easy to translate to assembly
• Allows machine-independent code generation, • Narrow interface: small number of node types
optimization (instructions)
optimize – Easy to optimize
x86
– Easy to retarget
AST (>40 node types)
AST IR PowerPC

Alpha IR (13 node types)

x86 (>200 opcodes)

4 CS 471 – Fall 2007 5 CS 471 – Fall 2007

1
Intermediate Representations Intermediate representation

• Any representation between the AST and ASM Components


– 3 address code: triples, quads (low-level) • code representation
– Expression trees (high-level) • symbol table
• Tiger intermediate MEM • analysis information
code: • string table
+ Issues
a[i]; MEM BINOP
• Use an existing IR or design a new one?
– Many available: RTLs, SSA, LVM, etc
a MUL i CONST • How close should it be to the source/target?

6 CS 471 – Fall 2007 7 CS 471 – Fall 2007

IR selection The IR Machine


Using an existing IR A machine with
• cost savings due to reuse • Infinite number of temporaries (think registers)
• it must be expressive and appropriate for the • Simple instructions
compiler operations – 3-operands
Designing an IR – Branching
• decide how close to machine code it should be – Calls with simple calling convention
• decide how expressive it should be • Simple code structure
• decide its structure – Array of instructions
• consider combining different kinds of IRs • Labels to define targets of branches

8 CS 471 – Fall 2007 9 CS 471 – Fall 2007

Temporaries Optimizing Compilers

The machine has an infinite number of • Goal: get program closer to machine code without
temporaries losing information needed to do useful optimizations
• Call them t0, t1, t2, .... • Need multiple IR stages
• Temporaries can hold values of any type opt
• The type of the temporary is derived from the x86 (LIR)
generation optimize optimize
• Temporaries go out of scope with each function opt
AST HIR MIR PowerPC (LIR)

opt
Alpha (LIR)

10 CS 471 – Fall 2007 11 CS 471 – Fall 2007

2
High-Level IR (HIR) Medium-Level IR (MIR)

• used early in the process • Try to reflect the range of features in the source
• usually converted to lower form later on language in a language-independent way
• Preserves high-level language constructs • Intermediate between AST and assembly
• –structured flow, variables, methods • Unstructured jumps, registers, memory locations
• Allows high-level optimizations based on properties • Convenient for translation to high-quality machine
of source language (e.g. inlining, reuse of constant code
variables) • OtherMIRs:
• Example: AST – quadruples: a = b OP c (“a” is explicit, not arc)
– UCODE: stack machine based (like Java bytecode)
– advantage of tree IR: easy to generate, easier to do
reasonable instruction selection
– advantage of quadruples: easier optimization

12 CS 471 – Fall 2007 13 CS 471 – Fall 2007

Low-Level IR (LIR) IR classification: Level

• Assembly code + extra pseudo instructions


• Machine dependent for i := op1 to op2 step op3 i := op1
if step < 0 goto L2
• Translation to assembly code is trivial instructions
endfor L1: if i > op2 goto L3
• Allows optimization of code for low-level instructions
considerations: scheduling, memory layout High-level i := i + step
goto L1
L2: if i < op2 goto L3
instructions
i := i + step
goto L2
L3:
Medium-level

14 CS 471 – Fall 2007 15 CS 471 – Fall 2007

IR classification: Structure Graphical IRs

Graphical Parse tree


• Trees, graphs Abstract syntax tree
• Not easy to rearrange • High-level
• Large structures • Useful for source-level information
Linear • Retains syntactic structure
• Looks like pseudocode • Common uses
• Easy to rearrange – source-to-source translation
– semantic analysis
Hybrid
– syntax-directed editors
• Combine graphical and linear IRs
• Example:
– low-level linear IR for basic blocks, and
– graph to represent flow of control

16 CS 471 – Fall 2007 17 CS 471 – Fall 2007

3
Graphical IRs: Often Use Basic Blocks Basic blocks

Basic block = a sequence of consecutive statements


in which flow of control enters at the beginning and unsigned int fibonacci (unsigned int n) {
read(n)
leaves at the end without halt or possibility of f0 := 0
unsigned int f0, f1, f2;
f1 := 1
branching except at the end f0 = 0;
if n<=1 goto L0
f1 = 1;
i := 2
if (n <= 1)
L2: if i<=n goto L1
return n;
Partitioning a sequence of statements into BBs for (int i=2; i<=n; i++) {
return f2
L1: f2 := f0+f1
1. Determine leaders (first statements of BBs) f2 = f0+f1;
f0 := f1
f0 = f1;
– The first statement is a leader f1 := f2
f1 = f2;
– The target of a conditional is a leader }
i := i+1
go to L2
– A statement following a branch is a leader return f2;
L0: return n
2. For each leader, its basic block consists of the leader }

and all the statements up to but not including the


next leader.

18 CS 471 – Fall 2007 19 CS 471 – Fall 2007

Basic blocks Graphical IRs

Leaders:
entry Tree, for basic block*
read(n)
• root: operator
read(n)
f0 := 0 f0 := 0 • up to two children: operands
f1 := 1 f1 := 1 • can be combined
if n<=1 goto L0 n <= 1
i := 2 Uses:
L2: if i<=n goto L1
return f2 return n
• algebraic simplifications gt, t2
i := 2
L1: f2 := f0+f1 • may generate locally optimal code.
f0 := f1
f1 := f2 i<=n add, t1 0
L1: i := 2 assgn, i add, t1 gt, t2
i := i+1
go to L2 t1:= i+1
L0: return n f2 := f0+f1 return f2 assgn, i 1
t2 := t1>0
f0 := f1 2 i 1 t1 0
f1 := f2 if t2 goto L1
i := i+1 2
exit *straight-line code with no branches or branch targets.
20 CS 471 – Fall 2007 21 CS 471 – Fall 2007

Graphical IRs Graphical IRs

Directed acyclic graphs (DAGs) Generating DAGs


• Like compressed trees • Check whether an operand is already present
– leaves: variables, constants available on entry – if not, create a leaf for it
– internal nodes: operators • Check whether there is a parent of the operand
• annotated with variable names? that represents the same operation
– distinct left/right children – if not create one, then label the node
• Used for basic blocks (DAGs don't show control representing the result with the name of the
flow) destination variable, and remove that label from
• Can generate efficient code. all other nodes in the DAG.
– Note: DAGs encode common expressions
• But difficult to transform
• Good for analysis

22 CS 471 – Fall 2007 23 CS 471 – Fall 2007

4
Graphical IRs Graphical IRs

Directed acyclic graphs (DAGs) Control flow graphs (CFGs)


• Example • Each node corresponds to a
m := 2 * y * z – basic block, or
n := 3 * y * z – part of a basic block, or
p := 2 * y - z • may need to determine facts at specific points
within BB
– a single statement
• more space and time
• Each edge represents flow of control

24 CS 471 – Fall 2007 25 CS 471 – Fall 2007

Graphical IRs Graphical IRs

Dependence graphs : they represents Dependence graphs control dependence:


constraints on the sequencing of operations s3 and s4 are executed only when a<=10
Example:
true or flow dependence:
s1 a := b + c s2 uses a value defined in s1
• Dependence = a relation between two statements s2 if a>10 goto L1 This is read-after-write dependence
that puts a constraint on their execution order. s3 d := b * e antidependence:
– Control dependence s4 e := d + 1 s4 defines a value used in s3
• Based on the program's control flow. s5 L1: d := e / 2 This is write-after-read dependence

– Data dependence output dependence:


s5 defines a value also defined in s3
• Based on the flow of data between statements. This is write-after-write dependence
• Nodes represent statements s1 s2 s3 s4 s5
• Edges represent dependences
– Labels on the edges represent types of input dependence:
dependences s5 uses a value also uses in s3
This is read-after-read situation. It places no constraints
Built for specific optimizations, then discarded in the execution order, but is used in some optimizations.

26 CS 471 – Fall 2007 27 CS 471 – Fall 2007

IR classification: Structure Linear IRs

Graphical Sequence of instructions that execute in order


• Trees, graphs of appearance
• Not easy to rearrange
Control flow is represented by conditional
• Large structures
branches and jumps
Linear
• Looks like pseudocode Common representations
• Easy to rearrange • stack machine code
Hybrid • three-address code
• Combine graphical and linear IRs
• Example:
– low-level linear IR for basic blocks, and
– graph to represent flow of control

28 CS 471 – Fall 2007 29 CS 471 – Fall 2007

5
Linear IRs Example: Three-Address Code

Stack machine code Each instruction is of the form


• Assumes presence of operand stack x := y op z
• Useful for stack architectures, JVM • y and z can be only registers or constants
• Operations typically pop operands and push results. • Just like assembly
• Advantages Common form of intermediate code
– Easy code generation The AST expression x + y * z is translated as
– Compact form t1 := y * z
• Disadvantages t2 := x + t1
– Difficult to rearrange • Each subexpression has a “home”
– Difficult to reuse expressions

30 CS 471 – Fall 2007 31 CS 471 – Fall 2007

Three Address Code Three Address Code

• Result, operand, operand, operator • Statements (cont)


– x := y op z, where op is a binary operator and x, – if x relop y goto L
y, z can be variables, constants or compiler – Indexed assignments x := y[j] or s[j] := z
generated temporaries (intermediate results) – Address and pointer assignments (for C)
• Can write this as a shorthand • x := &y
– <op, arg1, arg2, result > -- quadruples • x := *p
• Statements • *x := y
– Assignment x := y op z – Parm x;
– Copy stmts x := y – call p, n;
– Goto L – return y;

32 CS 471 – Fall 2007 33 CS 471 – Fall 2007

Bigger Example Summary

Consider the AST • IRs provide the interface between the front
and back ends of the compiler
• Should be machine and language independent
• Should be amenable to optimization
t1t1:=
:=--cc
t2t2:=
:=bb**tt11 Next Time: Tiger IR
t3t3:=
:=--cc
t4t4:=
:=bb**tt33
t5t5:=
:=tt22++t4t4
aa := :=tt55

34 CS 471 – Fall 2007 35 CS 471 – Fall 2007

You might also like