CH-6 Intermediate Code Generator
CH-6 Intermediate Code Generator
6.2. Quadruples
6.3. Declarations
▪ They serve as an intermediary step between the high-level source code and the
machine code produced by the compiler
Cont…
▪ Examples of intermediate languages include:
▪ Java bytecode: Generated from Java source code and executed by the Java Virtual
Machine (JVM).
▪ Intermediate Representation (IR) in compilers: LLVM IR and GCC IR are
examples of intermediate representations used in compiler toolchains.
▪ Common Intermediate Language (CIL): Used in the .NET framework for
languages like C# and VB.NET.
▪ Python bytecode: Generated from Python source code and executed by the Python
interpreter.
Intermediate Code Generator
▪ Intermediate Code Generator is a phase in the compilation process where the source
code of a program is translated into an intermediate representation (IR) or code. This
intermediate code serves as a bridge between the high-level source code and the target
machine code or lower-level representation.
▪ The intermediate code generator receives input from its predecessor phase, semantic
analyzer, in the form of an annotated syntax tree.
▪ The syntax tree can then be converted into a linear representation, such as postfix
notation.
Cont…
▪ The benefits of using a machine-independent intermediate form in the context of
compiler design include:
1. Retargeting Facilitation: Machine-independent intermediate code allows for easier
retargeting, enabling the creation of compilers for different machines by attaching a back
end for the new machine to an existing front end
2. Code Optimization: A machine-independent code optimizer can be applied to the
intermediate representation, enhancing the efficiency and performance of the generated
target code
3. Portability Enhancement: By utilizing machine-independent intermediate code,
portability is improved.
Cont…
• In compiler design, intermediate code serves as a bridge between the
source code and the target machine code, enabling easier optimization and
translation.
3. Syntax Tree.
Cont…
❖ Postfix Notation:
➢Postfix notation is a linear representation of a syntax tree where the operator follows
the operands.
➢The ordinary (infix) way of writing the sum of a and b is with operator in the middle:
a+b
➢The postfix (or postfix polish)notation for the same expression places the operator at
the right end, as ab+.
Cont…
❖ Postfix Notation:
➢In postfix notation, the operator appears after the operands, simplifying the
expression evaluation process.
➢For example, the expression "a * d - (b + c)" can be translated into postfix form as
"ad * bc + -".
➢Postfix notation is beneficial for evaluating expressions without the need for
parentheses
➢It is commonly used in compiler design for its simplicity and efficiency
Cont…
Cont…
• Syntax tree:
➢The syntax tree represents the hierarchical structure of a source program, with
each node corresponding to an operator or operand.
➢A dag (Directed Acyclic Graph) gives the same information but in a more compact
way because common subexpressions are identified.
Cont…
• A syntax tree and dag for the assignment statement a : =b * - c + b * - c are as follows:
6.1. Three Address Code Rules
• Three address code:
➢Three-address code is a type of intermediate code used by optimizing
compilers, where a given expression is broken down into several separate
instructions that can be easily translated into assembly language
➢A statement involving no more than three references (two for operands and
one for result) is known as three address statement
Cont…
• Three address code:
➢Three Address Code is a common representation of intermediate code where each
instruction contains three operands.
➢The typical form of a three address statement is expressed as "x = y op z," where
x, y, and z represent memory addresses.
▪ Here, T1, T2, and T3 are temporary variables used to store intermediate
results
Cont…
▪ Advantages of Three Address Code:
➢Simplicity: Three Address Code is a simple and easy-to-understand representation
of code that can be easily parsed and manipulated2.
➢Limited Benefit: Depending on the specific application, loop detection may not
provide a significant benefit in terms of code optimization or performance
Cont…
▪ Three-address code is a linearized representation of a syntax tree or a
dag in which clear names correspond to the interior nodes of the graph.
▪ The syntax tree and dag are represented by the three-address code
sequences.
a : =b * - c + b * - c
Cont…
• A syntax tree and dag for the assignment statement a : =b * - c + b * - c are as follows:
Cont…
Types of Three-Address Statements
▪ Three-address statements are a form of intermediate representation (IR) used in
compilers to aid in the implementation of code-improving transformations
▪ There are several types of three-address statements, including:
1. Assignment statements
➢ x : = y op z, where op is a binary arithmetic or logical operation.
2. Assignment instructions
➢ x : = op y, where op is a unary operation. Essential unary operations include unary minus, logical
negation, shift operators, and conversion operators that, for example, convert a fixed-point number to a
floating-point number.
3. Copy statements
➢ x : = y where the value of y is assigned to x.
Types of Three-Address Statements
4. Unconditional jump
goto L
Creates label L and generates three-address code ‘goto L’
5. Indexed assignments
x : = y[i] and x[i] : = y.
3. Indirect Triples
6.2. Quadruples
▪ Quadruples
➢Quadruples are a form of 3-address code representation that consists of four fields
namely: operator, argument 1, argument 2, and result.
➢A quadruple is a record structure with four fields, which are, op, arg1, arg2 and result.
➢The op field contains an internal code for the operator.
➢The three-address statement x : = y op z is represented by placing y in arg1, z in arg2 and
x in result.
➢The contents of fields arg1, arg2 and result are normally pointers to the symbol-table entries
for the names represented by these fields. If so, temporary names must be entered into the
symbol table as they are created.
Cont…
▪ Examples: ▪ The Quadruples representation be
➢A: = -B * (C+D)
T2:= C+D
T3:= T1 *T2
A: = T3
Cont…
▪ Examples: ▪ The Quadruples representation be
➢A: = -B * (C+D)
T2:= C+D
T3:= T1 *T2
A: = T3
Class work
• Example-2,
1. Quadruple representation for the statement a : =b * - c + b * - c
2. Quadruple representation for the statement "a := b + c * d"
➢In this representation, temporary variables are not used, and instead, a number in
parentheses is used to represent a pointer to a particular record of the symbol table
Cont…
• Example of Triple: ▪ The Quadruples representation be
A: = -B * (C+D)
3-address code:
✓ Temporal variable not used
0 * C d
-1 + B 0
-2 = A -1
Cont…
▪ Indirect Triples
➢Indirect Triples are a variation of triples that make use of a pointer to the listing of
all references to computations.
➢This representation uses an extra array to list the pointers to the triples in the
desired order than listing the triples themselves. This implementation is known as
indirect triple representation.
▪ Declaration involves allocation of space in memory and entry of type and name in the
symbol table.
▪ A program may be coded and designed keeping the target machine structure in mind, but it
may not always be possible to accurately convert a source code to its target language.
▪ Memory allocation is done in a consecutive manner and names are allocated to memory in
the sequence they are declared in the program.
Cont…
▪ We use offset variable and set it to zero {offset = 0} that denote the base
address.
▪ While the first name is allocated memory starting from the memory
location 0 {offset=0}, the next name declared later, should be allocated
memory next to the first one.
Cont…
▪ Example: We take the example of C programming language where an integer variable is
assigned 2 bytes of memory and a float variable is assigned 4 bytes of memory.
int a;
float b;
Allocation process:
{offset = 0}
int a;
id.type = int
id.width = 2
offset = offset + id.width
{offset = 2}
float b;
id.type = float
id.width = 4
offset = offset + id.width
{offset = 6}
Cont…
▪ To enter the details in a symbol table, a procedure enter can be used with the following
structure: enter(name, type, offset)
▪ By using this enter procedure, the symbol table will be populated with the necessary
information about each variable, including its name, data type, and relative address in the
data area. This information is crucial for the compiler to generate correct code and perform
various optimizations during the compilation process.
Here's an example of how the enter procedure can be implemented:
Here's an example of how the enter procedure can be implemented:
Cont…
▪ Here's an example of how the enter procedure can be implemented:
▪ The enter procedure creates a new dictionary with the given name, type, and offset and
appends it to the symbol_table list.
6.4. Declarations in Procedures
▪ The syntax of languages such as C, Pascal, and Fortran allows all the declarations in a
single procedure to be processed as a group. This means that declarations for variables
can be grouped together and processed at once.
▪ During this process, a global variable, say offset, can keep track of the next available
relative address.
• By processing all the declarations in a single procedure as a group, the compiler can efficiently
allocate memory and keep track of the relative addresses of variables in the procedure. This
can help to avoid errors and improve the performance of the resulting code.
Cont…
6.5. Flow Control Statements
• We now consider the translation of boolean expressions into three-address code in
the context of if-then, if-then-else, and while-do statements such as those generated
by the following grammar:
S → if E then S1
| if E then S1 else S2
| while E do S1
Cont…
• In each of these productions, E is the Boolean expression to be translated. In the translation,
we assume that a three-address statement can be symbolically labeled, and that the function
newlabel returns a new symbolic label each time it is called.
• E.true is the label to which control flows if E is true, and E.false is the label to which
control flows if E is false.
• The semantic rules for translating a flow-of-control statement S allow control to flow from
the translation S.code to the three-address instruction immediately following S.code.
• S.next is a label that is attached to the first three-address instruction to be executed after the
code for S.
Cont…
Cont…
6.6. Back Patching
• The easiest way to implement the syntax-directed definitions for boolean expressions is to
use two passes.
• First, construct a syntax tree for the input, and then walk the tree in depth-first order,
computing the translations. The main problem with generating code for boolean expressions
and flow-of-control statements in a single pass is that during one single pass we may not
know the labels that control must go to at the time the jump statements are generated. Hence,
a series of branching statements with the targets of the jumps left unspecified is generated.
Each statement will be put on a list of goto statements whose labels will be filled in when the
proper label can be determined. We call this subsequent filling in of labels backpatching.
Cont…
• The syntax-directed definitions for boolean expressions can be implemented using two passes
to ensure proper labeling and generation of code for flow-of-control statements.
• During the first pass, a syntax tree is constructed for the input. This tree represents the
structure of the boolean expression and allows for easy traversal and manipulation during the
second pass.
• In the second pass, the tree is traversed in depth-first order, and the translations for the
boolean expressions are computed. During this traversal, the main problem with generating
code for boolean expressions and flow-of-control statements in a single pass is that the
targets of the jumps may not be known at the time the jump statements are generated.
Cont…
• By using two passes and backpatching, the syntax-directed definitions for boolean
expressions can be implemented efficiently and accurately, ensuring that the proper
labels are assigned to each jump statement and that the flow-of-control is correct.
2. merge(p1,p2) concatenates the lists pointed to by p1 and p2, and returns a pointer to the
concatenated list.
3. backpatch(p,i) inserts i as the target label for each of the statements on the list pointed to by p.
6.7. Procedure Calls
• The procedure is such an important and frequently use programming construct that it is
imperative for a compiler to generate good code for procedure calls and returns.
• The run-time routines that handle procedure argument passing, calls and returns are part
of the run-time support package.
1. S →call id ( Elist )
2. Elist → Elist , E
3. Elist → E
Cont…
Calling Sequences:
➢The translation for a call includes a calling sequence, a sequence of actions taken on entry to and
exit from each procedure. The falling are the actions that take place in a calling sequence :
➢ When a procedure call occurs, space must be allocated for the activation record of the called procedure.
➢ The arguments of the called procedure must be evaluated and made available to the called procedure in a
known place.
➢ Environment pointers must be established to enable the called procedure to access data in enclosing blocks.
➢ The state of the calling procedure must be saved so it can resume execution after the call.
➢ Also saved in a known place is the return address, the location to which the called routine must transfer after it
is finished.
➢ Finally a jump to the beginning of the code for the called procedure must be generated.
Cont…
• For example, consider the following syntax-directed translation
1. S → call id ( Elist )
{ for each item p on queue do emit (‘ param’ p );
emit (‘call’ id.place) }
2. Elist→Elist , E
{ append E.place to the end of queue }
3. Elist →E
{ initialize queue to contain only E.place }
• Here, the code for S is the code for Elist, which evaluates the arguments, followed by a
param p statement for each argument, followed by a call statement.
• queue is emptied and then gets a single pointer to the symbol table location for the
name that denotes the value of E.