0% found this document useful (0 votes)
4 views36 pages

Lecture 08

The document discusses the process of intermediate code generation in compilers, highlighting the role of intermediate representations (IR) in optimizing code and enabling platform independence. It covers the advantages of using IR, such as retargeting and optimization opportunities, and explains various forms of intermediate code like three-address code and quadruples. Additionally, it addresses the importance of directed acyclic graphs (DAGs) in representing expressions and optimizing code generation.

Uploaded by

nihafahima9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views36 pages

Lecture 08

The document discusses the process of intermediate code generation in compilers, highlighting the role of intermediate representations (IR) in optimizing code and enabling platform independence. It covers the advantages of using IR, such as retargeting and optimization opportunities, and explains various forms of intermediate code like three-address code and quadruples. Additionally, it addresses the importance of directed acyclic graphs (DAGs) in representing expressions and optimizing code generation.

Uploaded by

nihafahima9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Intermediate Code Generation

12/8/2024 1
Summary of Front End

Lexical Analyzer (Scanner)


+
Syntax Analyzer (Parser)
+ Semantic Analyzer

Front
Abstract Syntax Tree w/Attributes End

Intermediate-code Generator

Error Non-optimized Intermediate Code


Message

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 2
Component-Based Approach to Building
Compilers
Source program Source program
in Language-1 in Language-2

Language-1 Front End Language-2 Front End

Non-optimized Intermediate Code

Intermediate-code Optimizer

Optimized Intermediate Code


Target-1 Code Target-2 Code
Generator Generator

Target-1 machine code Target-2 machine code

4
Intermediate Representation (IR)

A kind of abstract machine language that


can express the target machine operations
without committing to too much machine
details.

•Why IR ?

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 5
Without IR

C SPARC

Pascal HP PA

FORTRAN x86

C++ IBM PPC

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 6
With IR

C SPARC

Pascal HP PA
IR
FORTRAN x86

C++ IBM PPC

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 7
With IR

Pascal Common ?
IR Backend
FORTRAN

C++

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 8
Advantages of Using an Intermediate
Language

1. Retargeting - Build a compiler for a new machine by


attaching a new code generator to an existing front-end.
2. Optimization - reuse intermediate code optimizers in
compilers for different languages and different machines.
Note: the terms “intermediate code”, “intermediate
language”, and “intermediate representation” are all
used interchangeably.

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 9
Issues in Designing an IR

❖ Whether to use an existing IR


▪ if target machine architecture is similar
▪ if the new language is similar
❖ Whether the IR is appropriate for the kind of
optimizations to be performed
▪ e.g. speculation and prediction
▪ some transformations may take much
longer than they would on a different IR

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 10
Issues in Designing an IR

❖ Designing a new IR needs to consider


▪ Level (how machine dependent it is)
▪ Structure
▪ Expressiveness
▪ Appropriateness for general and special
optimizations
▪ Appropriateness for code generation
▪ Whether multiple IRs should be used
12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 11
Source code can change or translate into
machine code. But, we also need
intermediate code. Now the question here
is, why do we need intermediate code? Or
advantages of having intermediate code
Importance of Intermediate Code
Generation

•Platform Independence: The intermediate code can be


further translated into machine code for different target
platforms.
•Optimization Opportunities: Optimization techniques
can be applied to intermediate code, improving efficiency
before generating the final machine code.
•Simplifies Code Generation: By breaking down the
process into two phases (intermediate and final code
generation), the complexity of translating high-level code
directly into machine code is reduced.
Characteristics of Intermediate Code

• Abstraction Level: It’s an abstraction between


the source and target code.
• Portable: Intermediate code is independent of
specific machine architectures.
• Efficient: It allows for optimizations that are
difficult to apply to high-level or machine-level
code directly.
Directed Acyclic Graph (DAG)
• Definition: A Directed Acyclic Graph (DAG) is a graph
with directed edges and no cycles. In compiler
optimization, DAGs are used to represent expressions
and control dependencies.
• Purpose: DAGs provide a way to simplify the
representation of expressions and detect common
subexpressions, which can be eliminated to optimize the
code.
• Key Features:
– Nodes represent operations or operands.
– Edges represent dependencies between operations.
– It is acyclic, meaning no node can have a path back
to itself, ensuring a clear order of execution.
DAG for Expressions
• A DAG has leaves corresponding to atomic operands &
interior nodes corresponding to operators.
❑ Difference between DAG & Syntax Tree
• a node N in a DAG has more than one parent if N
represents a common sub-expression
• in a syntax tree, the tree for the common sub
expression would be replicated as many times as the
sub expression appears in the original expression.
Thus, a DAG not only represents expressions more
succinctly, it gives the compiler important clues
regarding the generation of efficient code to evaluate
the expressions.
a + a * (b-c) + (b-c) * d

a + a*(b-c) (b-c) * d

a * (b-c)

(b-c)
Value Number Method for Constructing
DAG’s

• Often, the nodes of a syntax tree or DAG are


stored in an array of records.
• Each row of the array represents one record,
and therefore one node.
DAG Representation
A variant of syntax tree.
Example: D = ((A+B*C) + (A*B*C))/ -C
=
DAG: Direct Acyclic
D / Graph
+ _
+ *

*
A
B C
12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 19
Postfix Notation (PN)
A mathematical notation wherein every
operator follows all of its operands.
Examples:

The PN of expression 9* (5+2) is 952+*

How about (a+b)/(c-d) ? ab+cd-/

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 20
Postfix Notation (PN) – Cont’d

Form Rules:
1. If E is a variable/constant, the PN of E is E
itself
2. If E is an expression of the form E1 op E2, the
PN of E is E1’E2’op (E1’ and E2’ are the PN of E1
and E2, respectively.)
3. If E is a parenthesized expression of form
(E1), the PN of E is the same as the PN of E1.

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 21
Three Address Code
• In three-address code, there is at most
one operator on the right side of an
instruction;
• That is, no built-up arithmetic expressions
are permitted.
• source-language expression: x + y * z

where t1 & t2 are compiler-generated temporary


names.
Three-Address Statements

A popular form of intermediate code used in optimizing


compilers is three-address statements.
Source statement:
x = a + b* c + d
Three address statements with temporaries t1 and t2:
t1 = b* c
t2 = a + t1
x = t2 + d

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 23
Three Address Code
The general form
x := y op z
x,y,and z are names, constants,
compiler-generated temporaries
op stands for any operator such as +,-,…
x*5-y might be translated as
t1 := x * 5
t2 := t1 - y
12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 24
Syntax-Directed Translation Into
Three-Address

Temporary
• In general, when generating three-address
statements, the compiler has to create new temporary
variables (temporaries) as needed.
• We use a function newtemp( ) that returns a new
temporary each time it is called.
• Recall Topic-2: when talking about this topic

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 25
Syntax-Directed Translation Into
Three-Address

• The syntax-directed definition for E in a production


id := E has two attributes:
1. E.place - the location (variable name or offset) that
holds the value corresponding to the nonterminal
2. E.code - the sequence of three-address statements
representing the code for the nonterminal

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 26
Syntax tree vs. Three address code

Expression: (A+B*C) + (-B*A) - B


_
+ B T1 := B * C
T2 = A + T1
+ * T3 = - B
_ A
A * T4 = T3 * A
T5 = T2 + T4
B C
B T6 = T5 – B

Three address code is a linearized representation


of a syntax tree (or a DAG) in which explicit names
(temporaries) correspond to the interior nodes of the graph.

12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 27
DAG vs. Three address code

Expression: D = ((A+B*C) + (A*B*C))/ -C


=
T1 := A
D / T1 := B * C
T2 := C
T2 := A+T1
_ T3 := B * T2
+ T3 := A*T1
T4 := T1+T3
+ * T4 := T2+T3
T5 := T1*T3
T5 := – C
T6 := T4 + T5
T6 := T4 / T5
* T7 := – T2
D := T6
A T8 := T6 / T7
B C D := T8

Question: Which IR code sequence is better?


12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 28
Three-address Implementation
Technique
• The description of three-address instructions
specifies the components of each type of
instruction, but it does not specify the
representation of these instructions in a data
structure.
• In a compiler, these instructions can be
implemented as objects or as records with
fields for the operator and the operands.
❑ quadruples
❑ triples
❑ indirect triples
Quadruples
• A quadruple (or “quad”) has four fields:
op, arg1, arg2 & result.
• The op field contains an internal code for
the operator.

• x = y + z is represented by placing + in op,


y in arg1, z in arg2 , and x in result.
Few exceptions
a=b*-c+b*-c
Three-address code Quadruples
Triples
• A triple has only three fields: op, arg1 , & arg2
• Using triples, we refer to the result of an
operation x op y by its position, rather than
by an explicit temporary name.
• Thus, instead of the temporary t1, a triple
representation would refer to position (0).
• Parenthesized numbers represent pointers
into the triple structure itself (value numbers)
a=b*-c+b*-c
Three-address code
Implementation of Three Address
Code
• Quadruples
• Four fields: op, arg1, arg2, result
¤ Array of struct {op, *arg1, *arg2, *result}
• x:=y op z is represented as op y, z, x
• arg1, arg2 and result are usually pointers to
symbol table entries.
• May need to use many temporary names.
• Many assembly instructions are like
quadruple, but arg1, arg2, and result are real
registers.
12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 35
Implementation of Three Address Code
(Con’t)
• Triples
• Three fields: op, arg1, and arg2. Result become implicit.
• arg1 and arg2 are either pointers to the symbol table or
index/pointers to the triple structure.
Example: d = a + (b*c)
1 * b, c
Problem in
2 + a, (1) reorder the
3 assign d, (2) codes?
• No explicit temporary names used.
• Need more than one entries for ternary operations such
as x:=y[i], a=b+c, x[i]=y, … etc.
12/8/2024 \course\cpeg621-10F\Topic-1a.ppt 36

You might also like