0% found this document useful (0 votes)
10 views29 pages

2024 CD Ch06 Intermidiate & Ch07 Runtime & Ch08 Code Optimization

Chapter Six of Compiler Design discusses Intermediate Code Generation, focusing on its role as an interface between high-level source code and machine code. It explains the importance of intermediate code in simplifying compiler design and optimization, detailing representations like three-address code, quadruples, and triples. The chapter also covers the implementation of three-address statements and introduces runtime environments, symbol tables, and hash tables in subsequent chapters.

Uploaded by

munyemola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views29 pages

2024 CD Ch06 Intermidiate & Ch07 Runtime & Ch08 Code Optimization

Chapter Six of Compiler Design discusses Intermediate Code Generation, focusing on its role as an interface between high-level source code and machine code. It explains the importance of intermediate code in simplifying compiler design and optimization, detailing representations like three-address code, quadruples, and triples. The chapter also covers the implementation of three-address statements and introduces runtime environments, symbol tables, and hash tables in subsequent chapters.

Uploaded by

munyemola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Compiler Design

Chapter Six: Intermidiate Code Generation


6.1. Introduction Intermediate Code/languages
6.2. Three address codes/statements
6.3. Implementation of three address code
Outline  Quadruples, Triples, and Indirect Triples
6.4. Declarations in Procedures
6.5. Flow Control Statements

By: Tseganesh M.(MSc.)


6.1. Introduction Intermediate Code/languages
 Intermediate Code/languages
 The terms; intermediate code, intermediate language, and, intermediate representation are

used interchangeably
 Intermediate code is the interface between front end and back end.

 It is abstraction lies between the high-level source code and the machine code.

 It is platform-independent and simplifies optimization and translation to machine code.

 Front end translates a source program into an intermediate representation then, the back

end generates target code.

 This fourth phase of compiler bridges the


analysis and synthesis phases of translation.
 In this phase, an intermediate representation
of final machine language code is produced.

1/11/2025 2
Intermediate Code/languages cont’d….…
 Why we need to translate source code into intermediate code, which translated to its
target code
 What happen if a source code directly translated into its target machine code
 Let us see the reasons why we need an intermediate code.
 If a compiler directly translates the source code to its target machine code without
having the option for generating intermediate code,
 Then a full native compiler is required for each new machine.

 So, Intermediate code eliminates the need of a new full compiler for every unique
machine by keeping the analysis portion same for all the compilers.
 Intermediate code generator takes input from its predecessor phase, in the form of an
annotated syntax tree, then converted it into a linear representation, e.g.,postfix notation
 Using intermediate code, it is easy to apply the source code modifications to improve
code performance, which done by applying code optimization techniques on the
intermediate code
1/11/2025 3
Intermediate Code/languages cont’d….…
Intermediate code Representation
 Intermediate codes can be represented in two ways each of them with their own benefits:
1. High Level intermediate representations:
 High level intermediate code can be represented as source code, which is very
close to the source language itself.
 To enhance performance of source code, we can easily apply code modification.
 But for target machine optimization, it is less preferred.
2. Low Level intermediate representations
 Low level intermediate code is close to the target machine,
 It is suitable for register and memory allocation, instruction set selection, etc.
 It is good to used for machine-dependent optimizations.
 Intermediate code tends to be machine independent code.
1/11/2025 4
Intermediate Code/languages cont’d….…
 Since Intermediate code tends to be machine independent code, a code generator
assumes to have unlimited number of memory storage (register) to generate code.
For example: a : = b * - c + b * - c
 Intermediate code generator will try to divide this expression into sub-expressions, then
generate the corresponding code as follows;
t1 : = - c
t2 : = b * t1
t3 : = - c
t4 : = b * t3
t5 : = t2 + t4 where t being used as registers in the target program
a : = t5
 Intermediate code can be either language specific (e.g., Byte Code for Java) or language
independent (three-address code).

1/11/2025 5
Intermediate Code/languages cont’d….…
 There are three ways of intermediate representation:
a. Syntax tree: is graphically represent hierarchical structure of expressions
b. Postfix notation: Uses a stack-based approach for a linearized representation of
a syntax tree/expressions
c. Three address code: Instructions with at most three operands
a. Syntax tree: which is graphical representations
 A syntax tree depicts the natural hierarchical structure of a source program.
 A dag (Directed Acyclic Graph) gives the same information but in a more compact way
because common subexpressions are identified.
 A syntax tree and dag for the assignment statement a : = b * - c + b * - c are as follows:

1/11/2025 6
Intermediate Code/languages cont’d….…
b. Postfix notation: which is a linearized representation of a syntax tree
 it is a list of the nodes of the tree in which a node appears immediately after its
children.
 The postfix notation for the above syntax tree is:
a b c uminus * b c uminus * + assign
C. Three address code
 In a three address code there is at most one operator at the right side of an
instruction
+
 Example: a:= a+a*b-c+b-c*d t1 = b – c
+ * t2 = a * t1
t3 = a + t2
* t4 = t1 * d
d
t5 = t3 + t4
a -

b c
1/11/2025 7
6.2. Three address statements/ Code
 Three address code: is an abstract form of intermediate code which is easy to generate
and can be easily converted to machine code.
 It makes use of at most three addresses and one operator to represent an expression, and
 The value computed at each instruction is stored in temporary variable generated by
compiler.
 The compiler decides the order of operation given by three address code.
 For instances: the general form of Three-address code for a sequence of statements is:
x : = y op z where x, y and z are names, constants, or compiler-generated temporaries;
op stands for any operator, such as arithmetic operator, or logical operator, etc.
 Thus a source language expression like x+ y*z might be translated into a sequence
t1 : = y * z
t2 : = x + t1 where t1 and t2 are compiler-generated temporary names.
1/11/2025 WCU-CS Compiled by TM. 8
Three address statements/ Code cont’d….…
 Three-address code corresponding to the syntax tree and dag given above for the expression
a:=b*-c+b*-c
t1 : = - c t1 : = -c
t2 : = b * t1 t2 : = b * t1
t3 : = - c t5 : = t2 + t2
t4 : = b * t3 a : = t5
t5 : = t2 + t4 (b) Code for the dag
a : = t5
(a) Code for the syntax tree

 Read in detail about: Types of three address code


 Syntax-Directed Translation into Three-Address Code
1/11/2025 WCU-CS Compiled by TM. 9
6.3. Implementation of three address statements cont’d…..
 In compiler, Three-address statements can be implemented as records with fields for the
operator and the operands.
 Such representation are in the forms of: Quadruples, Triples, and Indirect triples Quadruples
i. Quadruples: Represent instructions as a 4-tuple/fields, which are, op, arg1, arg2 and result
 op field contains an internal code for the operator.
 The three-address statement x : = y op z is represented by placing y in arg1, z in arg2
and x in result.
 The contents of fields arg1, arg2, and result are normally pointers to the symbol-table
entries for the names represented by these fields.
 Example: t1 = a + b this in Quadruples represents as: (ADD, a, b, t1)
ii. Triples, to avoid entering temporary names into the symbol table, it might refer to a
temporary value.
 Temporaries are not used and instead references to instructions are made
 So Represent instructions as a 3-tuple (without a result field) : op, arg1, and arg2
1/11/2025 10
Implementation of three address statements cont’d…..
 The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table
or pointers into the triple structure ( for temporary values ).
 Since three fields are used, this intermediate code format is known as triples.

iii. Indirect triples Quadruples:


 In addition to triples it use a list of pointers to triples rather than listing the triples
themselves.
 For example, let us use an array statement to list pointers to triples in the desired order.
1/11/2025 11
Read in detail about
 Translation scheme to produce three-address code for various
programming concepts including:
6.4. Declarations in a Procedure
6.5. Flow Control Statements
 Assignments Statements
 BOOLEAN EXPRESSIONS
 CASE STATEMENTS
 Etc….

1/11/2025 WCU-CS Compiled by TM. 12


Next
Chapter Seven: Run time- Environments
 Overview of runtime environment
 Symbol table
 Hash Table
Chapter Eight: Code Generation and Optimization
 A simple code generation algorithm
 Register allocation
 DAG Representation
 peephole Optimization
13
1/11/2025 WCU-CS Compiled by TM.
Compiler Design

Chapter Seven
Run-time Environments
Outline 7.1. Overview of runtime environment
7.2. Symbol table
7.2. Hash Table
By: Tseganesh M.(MSc.)
7.1. Overview of runtime environment
 Runtime environment in compiler design refers to the set of tools and resources necessary
to execute a program at runtime.
 It is responsible for managing the execution of a program including memory management,
control flow, stack allocation, library functions, and access to variables
 During execution of a program, various memory locations are allocated/deallocated as required
 The runtime environment is responsible for managing these memory locations to avoid
conflicts or memory leaks.
 Another crucial aspect of a runtime environment is the stack allocation.
 Stack is a data structure that keeps track of function calls and local variables within a program.
 The runtime environment making sure that the correct values are pushed and popped in the
stack as required.
 Proper stack allocation ensures the correct execution of function calls and prevents
information loss between function invocations
1/11/2025 15
Overview of runtime environment cont’d…..
 Components of a Runtime Environment:
1. Activation Records: Stores function call information, including parameters, return
address, local variables, and temporary data.
2. Call Stack: Maintains activation records for function calls in a stack-like manner.
3. Heap: Dynamically allocated memory for objects and variables.
4. Static Area: Stores global variables and constants.
 E.g.: For a function call: int add(int x, int y) {
return x + y;
}
int main() {
int result = add(5, 10);}
 The call to add creates an activation record on the call stack with parameters x=5
and y=10

1/11/2025 16
7.2. Symbol Table
 Symbol table is a crucial data structure used by compilation process of compiler designing.
 It serves as a "repository" that stores information about variables, functions, objects, and

other symbols in the source code.


 The main objective of the symbol table is to assist the compiler in performing tasks in each
phases effectively and resolving references to symbols throughout the program.
 For instances,
 In LA phase, the compiler identifies basic information about each symbols and stores their

relevant attributes (such as name, data type, scope, etc) and properties in the symbol table.
 In SA phases, symbol table provides a mechanism to verify semantic properties (such as

variables declared before use, type compatibility, and functions properly defined & called)
 As the compilation process progresses, the symbol table is continually updated with
additional details about each symbol, like its scope, memory location, and value.
 This allows the compiler to perform tasks like type checking, ensuring consistency and
compatibility of data types used within the program.

1/11/2025 17
Symbol Table cont’d…..
 The uses of a symbol table:
1. Tracking Identifiers: Keeps track of variable names, their types, and scopes.

2. Memory Management: Maps identifiers to memory locations.


3. Error Checking: Ensures variables are declared before use.
 Example of Symbol Table Entry:
Identifier Type Scope Memory Address
x int Local 0x0010
result int Global 0x0020

 Implementation: it can be implemented as a hash table, tree, or linked list for efficient lookups

1/11/2025 18
7.3. Hash Table
 Hash Table: acts as a data structure that stores key-value pairs and enables efficient and rapid
retrieval of information using a hash function, particularly during symbol table management.
 By utilizing a hash table, a compiler can quickly search for symbol attributes, ensuring
effective compilation.
 It minimizes the time complexity of searching for symbol attributes, as it typically ensures
constant-time retrieval.
 During the compilation process, the hash function computes the hash value for the
identifier, then it is used an index for storage and retrieval purposes.
 Hash Function: Converts a key (e.g., variable name) into an index for efficient storage and retrieval.
 E.g.: For variable x, the hash function might compute:
Index = Hash("x") % Table_Size
 Collision Handling:
1. Chaining: Uses linked lists to store multiple keys at the same index.
2. Open Addressing: Find the next available slot in the table 19
1/11/2025
Chapter Eight
Code Generation and Optimization

Outline:
8.1. A simple code generation algorithm
8.2. Register allocation
8.3. peephole Optimization
8.4. DAG Representation (reading assignment)
20
1/11/2025 WCU-CS Compiled by TM.
8.1. Overview of a simple code generation algorithm
 Code generation is final step in the compilation process that transforms source code into machine code,
executable by a computer.
 In this phase, the compiler analyzes the intermediate representation of the source code and translates it into
a target language, usually assembly or machine code.
 Code generator is used to produce the target code for three-address statements, and it uses registers to store
the operands of the three address statement.
 Steps in Code Generation:
1. Intermediate Code: Convert the intermediate representation (e.g., TAC) to machine code.

2. Instruction Selection: Select appropriate assembly/machine instructions.

3. Register Allocation: Assign registers for operands.

4. Instruction Ordering: Arrange instructions for optimal execution.

 Example: For the three address statement (TAC): T1 = a + b


T2 = T1 * c, it have following sequence of codes
 Generated Assembly Code: MOV AX, a
ADD AX, b
MOV T1, AX
MOV AX, T1
MUL c
MOV T2, AX
1/11/2025 21
Overview of code generation cont’d…..
 Code generation Algorithms
 The algorithm takes a sequence of three-address statements as input.
 For each three address statement of the form a:= b op c perform the various actions as follows:
1. Invoke a function getreg to find out the location L where the result of computation b op c should
be stored.
2. Consult the address description for y to determine y'.
 If the value of y currently in memory and register both then prefer the register y' .
 If the value of y is not already in L then generate the instruction MOV y', L to place a copy of y in L.
3. Generate the instruction OP z' , L where z' is used to show the current location of z.
 if z is in both then prefer a register to a memory location.
 Update the address descriptor of x to indicate that x is in location L.
 If x is in L then update its descriptor and remove x from all other descriptor.
4. If the current value of y or z have no next uses or not live on exit from the block or in register
then alter the register descriptor to indicate that after execution of x : = y op z those register
1/11/2025 will no longer contain y or z 22
Overview of code generation cont’d…..
 Example: Generating Code for Assignment Statements
 The assignment statement d:= (a-b) + (a-c) + (a-c) can be translated into the following
sequence of three address code:
t:= a-b
u:= a-c
v:= t +u
d:= v+u
 Code sequence for the example is as follows

Statement Code Generated Register descriptor Address descriptor


Register empty
t:= a - b MOV a, R0 R0 contains t t in R0
SUB b, R0
u:= a - c MOV a, R1 R0 contains t t in R0
SUB c, R1 R1 contains u u in R1
v:= t + u ADD R1, R0 R0 contains v u in R1
R1 contains u v in R1
d:= v + u
1/11/2025
ADD R1, R0 R0 contains d d in R0 23
8.2. Register allocation
 Efficient code generation plays a vital role in optimizing the performance and functionality of
the generated code.
 To achieve this, the compilation process employs various techniques such as instruction
selection, register allocation, optimization , and etc.
 Register allocation
 Since the number of physical registers is limited, the compiler must allocate them wisely to
minimize the need for spilling values to slower memory storage.
 Register allocation ensures that frequently accessed variables are stored in registers, enabling
faster access and reducing memory traffic.
 To efficiently allocate registers, the compiler employs algorithms such as graph coloring and
linear scan.
 These techniques analyze the variables' liveness and usage patterns to determine the optimal
allocation strategy.
1/11/2025 24
8.2. Register allocation cont’d…..
 Efficient code generation plays a vital role in optimizing the performance and functionality of
the generated code.
 To achieve this, the compilation process employs various techniques such as instruction
selection, register allocation, optimization , and etc.
 Register allocation
 Since the number of physical registers is limited, the compiler must allocate them wisely to
minimize the need for spilling values to slower memory storage.
 Register allocation ensures that frequently accessed variables are stored in registers, enabling
faster access and reducing memory traffic.
 To efficiently allocate registers, the compiler employs algorithms such as graph coloring and
linear scan.
 These techniques analyze the variables' liveness and usage patterns to determine the optimal
allocation strategy.
1/11/2025 25
8.3. peephole Optimization cont’d…..
peepholes Optimizations
 Optimization is an integral part of code generation that improves the performance and efficiency
of the generated code.
 There are different optimization techniques including
 Constant folding,

 Common subexpression elimination,

 Copy propagation,

 Dead-code elimination,

 Loop unrolling,

 Data locality improvement,

 Algorithmic optimizations, etc

 These different optimization techniques are applied during code generation to reduce execution
time, memory usage, code size, and improve power consumption.
 By applying these techniques, the compiler producing code that better utilizes the available
resources and achieves faster execution.
1/11/2025 26
peephole Optimization cont’d…..
 Constant folding: here, we can eliminate both the test and printing from the object code.
 This reducing a compile time that the value of an expression is a constant and using the constant instead.

 For example: a=3.14157/2 can be replaced by

a=1.570 there by eliminating a division operation.


 Common subexpression elimination: here, we can avoid re-computing the expression if we can use the
previously computed value. t1: = 4*i
 For example t2: = a [t1]  This code can be optimized using t1: = 4*I
t3: = 4*j the common sub-expression t2: = a [t1]
t4: = 4*I elimination as t3: = 4*j
t5: = n t5: = n
t6: = b [t4] +t5 t6: = b [t1] +t5
 The common sub expression t4: =4*i is eliminated as its computation is alre ady in t1. And value of i is
not been changed from definition to use
 Copy propagation: means use of one variable instead of another.
 This may not appear to be an improvement, but it gives us an opportunity to eliminate x.

 For example: x=Pi;


A=x*r*r; The optimization using copy propagation can be done as follows:
A=Pi*r*r; Here the variable x is eliminated
 Dead-code elimination: a variable is live at a point in a program if its value can be used subsequently;
 otherwise, it is dead or useless at that point, which a statements that compute values that never get used.

 An optimization can be done by eliminating dead code. 27


peephole Optimization cont’d…..
Types of Optimizations/optimizers
 Optimizations are classified into two categories, which are:
a. Machine independent optimizations: improve the target code without taking into consideration
any properties of the target machine
b. Machine dependent optimizations: improve the target code based on register allocation and
utilization of special machine-instruction sequences
 The criteria for code improvement transformations:
 Simply stated, the best optimizations are those produce the most benefit for the least effort.

 Optimizations must preserve the meaning of programs, it must not change the output.

 It must be on the average, speed up programs by a measurable amount.

 Not every optimization succeeds in improving every program, rarely “optimization”

may slow down a program slightly.


 It must be worth the effort, it does not make a compiler to expend the intellectual effort to

implement a code improving.


 “Peephole” optimization of this kind are simple enough and beneficial enough to be included in any
1/11/2025 compiler. 28
The End!

THANK YOU!!! 29

You might also like