0% found this document useful (0 votes)
2 views54 pages

CD - CH5 - Intermediate Code Generation

The document discusses intermediate code generation in compiler design, highlighting its importance for machine independence and optimization. It covers various types of intermediate code, including three-address code, and explains the process of syntax-directed translation to generate this code. Additionally, it addresses the handling of declarations and assignment statements within the context of symbol tables and memory allocation.

Uploaded by

andualem.second
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views54 pages

CD - CH5 - Intermediate Code Generation

The document discusses intermediate code generation in compiler design, highlighting its importance for machine independence and optimization. It covers various types of intermediate code, including three-address code, and explains the process of syntax-directed translation to generate this code. Additionally, it addresses the handling of declarations and assignment statements within the context of symbol tables and memory allocation.

Uploaded by

andualem.second
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

CSE 4310 – Compiler Design

CH5 – Intermediate Code Generation


Outline
• Introduction
• Why Intermediate Code?
• Different Types of Intermediate Code/Languages
• Three Address Codes
• Syntax Directed Translations into Three Address Code
• Symbol Table
Introduction
• In a compiler,
• the front end translates a source program into an intermediate representation, and
• the back end generates the target code from this intermediate representation.
• The use of a machine independent intermediate code (IC) is:
• retargeting to another machine is facilitated
• the optimization can be done on the machine independent code

• In this chapter,
• We consider that the error checking is done in another pass,
• Even though in practice the IC generation and the type checking can be done at the
same time.
Why Intermediate Code?
• While generating machine code directly from source code is possible,
it entails two problems:
• With � languages and � target machines, we need to write � front ends,
� × � optimizers, and � × � code generators.

• The code optimizer which is one of the largest and very-difficult-to-write


components of a compiler, cannot be reused.
• By converting source code to an intermediate code, a machine-independent
code optimizer may be written.
• This means just � front ends, � code generators and 1 optimizer.
Why Intermediate Code?
Different Types of Intermediate Code
• Intermediate code must be easy to produce and easy to translate to
machine code.
• The type of intermediate code deployed is based on the application.
• Quadruples, triples, indirect triples, abstract syntax trees are the classical
forms used for machine-independent optimizations and machine code
generation.
• Static Single Assignment form (SSA) is a recent form and enables more
effective optimizations.
• Conditional constant propagation and global value numbering are more effective on SSA.
• Program Dependence Graph (PDG) is useful in automatic parallelization,
instruction scheduling, and software pipelining.
Intermediate Languages
Syntax tree
• While parsing the input, a syntax tree can be constructed for the following tables.
• A syntax tree (abstract tree) is a condensed form of parse tree useful for
representing language constructs.
• For example, for the string � + �, the parse tree in (a) below will be represented by the
syntax tree shown in (b);
• the keywords (syntactic sugar) that existed in the parse tree will not exist in the syntax tree.
Intermediate Languages
Postfix notation
• The postfix notation is practical for an intermediate representation as
the operands are found just before the operator.
• In fact, the postfix notation is a linearized representation of a syntax
tree.
• Example: 1 + 2 ∗ 3 will be represented in the postfix notation as 1 2 3 ∗+
Intermediate Languages
Three address code
• The three address code is a sequence of statements of the form: � ≔
� �� � where:
• �, � and � are names, constants or compiler-generated temporaries
• �� is an operator such as integer or floating point arithmetic operator or logical
operator on Boolean data.
• Notes:
• No built-up arithmetic operator is permitted
• Only one operator at the right side of the assignment is possible, i.e. � + � + � is
not possible.
• Similarly to postfix notation, the three address code is a linearized representation of
a syntax tree.
• It has been given the name three-address code because such an instruction usually contains
three addresses (the two operands and the result) .
Intermediate Languages
• Three-address code is a generic form and can be implemented as
quadruples, triples, indirect triples, tree or DAG.
Example:
• The three-address code for � + � ∗ � − �/(� ∗ �) is:

1. �� = �∗�
2. �� = � + ��
3. �� = �∗�
4. �� = �/��
5. �� = �� − ��
Three Address Code
• Quadruples:
• Each instruction in quadruples presentation is divided into four fields: ��������,
����, ����, and ������.
• Triples:
• Each instruction in triples presentation has three fields : ��, ����, and ����.
• Triples represent similarity with DAG and syntax tree.
• Triples face the problem of code immovability while optimization, as the results are
positional and changing the order or position of an expression may cause problems.
• Indirect Triples:
• This representation is an enhancement over triples representation.
• It uses pointers instead of position to store results.
• This enables the optimizers to freely re-position the sub-expression to produce an
optimized code.
Three Address Code
• Implementations of Three Address Code in the above example
� + � ∗ � − �/(� ∗ �) can be:
Instructions in Three Address Code
• As with an assembler statement, the three-address code statement can
have:
• Symbolic labels, as well as Statements for control flow.
• The following are Common three-address code statements:

Statement Format Comments


1. Assignment (binary operation) � : = � �� � Arithmetic and logical operators used
2. Assignment (unary operation) � : = �� � Unary -, not, conversion operators used
3. Copy statement �:= �
4. Unconditional jump ���� �
5. Conditional jumps �� � ����� � ���� �
Instructions in Three Address Code

Statement Format Comments


6. Function call:
• Parameter specification ����� �� The parameters are specified using param
• Calling the function ���� �, � The procedure P is called by indicating the number of
parameters
7. Indexed arguments � : = �[�] � will be assigned the value at the address � + �
�[�] : = � The value at the address � + � will be assigned �
8. Address & pointer assignments � : = &� � is assigned the address of �
�:= ∗� � is assigned the element at the address �
∗� = � The value at the address � is assigned �
Instructions in Three Address Code
• The choice of allowable operators is an important issue in the design
of an intermediate form.
• It should be rich enough
• to implement the operations of the source language and
• yet it should not be too complicated to be translated in the target language.
Instructions in Three Address Code
• Example Intermediate Code
Syntax Directed Translation into Three Address Code

• Syntax directed translation can be used to generate the three-address code.


• Generally,
• either the three-address code is generated as an attribute of the attributed parse
tree, or
• the semantic actions have side effects that write the three-address code statements
in a file.
• When the three-address code is generated, it is often necessary to use
temporary variables and temporary names.
• To this end the following functions are given:
• �������() - each time this function is called, it gives distinct names that can be
used for temporary variables.
• ��������() - each time this function is called, it gives distinct names that can be
used for label names.
Syntax Directed Translation into Three Address Code

• In addition, for convenience, we use the notation ��� to create a


three-address code from a number of strings.
• ��� will produce a three-address code after concatenating all the
parameters.
• For example, if id1.lexeme = x, id2.lexeme =y and id3.lexeme = z:
• ���(���. ������, ‘: = ’, ���. ������, ‘ + ’, ���. ������) will produce the
three-address code : � : = � + �

• Note: variables and attribute values are evaluated by ��� before


being concatenated with the other parameters.
Example 1: Generation of the three address code for an assignment statement and
an expression
Syntax Rule Semantic Action
S ® id := E �. ���� : = �. ���� || ��� (��. ������, : = , �. �����)

E ® E1 + E2 �. ����� : = �������();
�. ���� : = ��. ���� || ��. ���� || ��� (�. �����, ‘: = ’, ��. �����, ‘ + ’, ��. �����)
E ® E1 * E2 �. ����� : = �������();
�. ���� : = ��. ���� || ��. ���� || ��� (�. �����, ‘: = ’, ��. �����, ‘ ∗ ’, ��. �����)
E ® - E1 �. ����� : = �������()
�. ���� : = ��. ���� || ��� (�. �����, ‘: = ������ ’, ��. �����)
E ® (E1) �. ����� : = �������()
�. ���� : = ��. ����
E ® id �. ����� : = ��. ������
�. ���� : = ‘’ /∗ ����� ���� ∗/

• �. ����� is the name that will hold the value of E, and


• �. ���� is the sequence of three address statements evaluating E.
Example 1: Generation of the three address code for an assignment
statement and an expression
• The three-address code produced for the input � : = � + � ∗ � will be:
�1 : = � ∗ �
�2 : = � + �1
� : = �2
• This is found by concatenating, at the semantic action of each rule, the code
that has been previously calculated to the code of this particular rule.
• For example, E ® E1 + E2 should add
• the variable where the result of E1 is and
• the variable where the result of E2 is and
• put the result in a temporary variable.
• We can determine the names of the variables that contain the results of E1 and
E2 by using synthesized attributes named place.
Example 2: Generation of the three address code for a while statement.

Syntax Rule Semantic Action


� ⟶ ����� � �� �1 �. ����� : = ��������();
�. ����� : = ��������();
�. ���� : = ��� (�. �����, ‘: ’) || �. ���� || ��� (‘��’, �. �����, ‘
= � ���� ’, �. �����) || ��. ���� || ��� (�. �����, ‘: ’);
• The above semantic action will create the three-address code of the following
form:
L1 :
E.code
If E.place = 0 goto L2
S1.code
Goto L1
L2:
Syntax Directed Translation into Three Address Code
• In a similar way a three address code can be generated from the
following statements of syntax directed translation:
• Declarations
• Assignment Statements
• Arrays references
Declarations

● The declaration is used by the compiler as a source of type-


information that it will store in the symbol table.
● While processing the declaration,
– the compiler reserves memory area for the variables and
– stores the relative address of each variable in the symbol table.
● The relative address consists of an address from the static data
area.
● To help the processing of the declarations, we use
– a number of variables, attributes and procedure that help the
processing of declarations.
Declarations
● The compiler maintains a global offset variable that
indicates the first address not yet allocated.
– Initially, offset is assigned 0.
– Each time an address is allocated to a variable, the offset is
incremented by the width of the data object denoted by the
name.
– The procedure enter(name, type, address) creates a symbol table
entry for name, give it the type type and the relative address
address.
– The synthesized attributes name and width for non-terminal T are
also used to indicate the type and number of memory units taken
by objects of that type.
Declarations
● Example: We take the example of C programming language where an integer variable is assigned
2 bytes of memory and a float variable is assigned 4 bytes of memory.
int a;
float b;
Allocation process: {offset = 0}
int a;
id.type = int
id.width = 2
offset = offset + id.width {offset = 2}
float b;
id.type = float
id.width = 4
offset = offset + id.width {offset = 6}
Declarations
Example: Semantic actions for the declaration part. We consider that an Integer and a pointer occupy four bytes
and a real number occupies 8 bytes of memory.
Syntax Rule Semantic Action
P ® { Offset := 0} D S
D ® D; D

D ® id : T { ����� (��. ����, �. ����, ������); ������ : = ������ + �. ����ℎ; }

T ® integer { �. ���� : = �������; �. ����ℎ : = 4;}

T ® real { �. ���� : = ����; �. ����ℎ : = 8;}

T ® array [num] of T1 {T.type:=array(num.val,T1.type); T.width := num.val ∗ T1.width;}


T ® ^T1 { �. ���� : = ������� (�1. ����); �. ����ℎ : = 4;}
Declarations

Note:
● In languages where nested procedures are possible, we must
have several symbol tables, one for each procedure.
Assignment Statements
● Using the symbol table, we will see how it is possible to generate
the three-address code statements corresponding to
assignments.
– Variables are represented by their symbol table entries.
● The function lookup(lexeme) checks
– if there is an entry for this occurrence of the name in the symbol table,
and
– a pointer to the entry is returned; otherwise nil is returned.
● The newtemp() function will generate
– temporary variables and reserve a memory area for the variables by
modifying the offset and
– putting in the symbol table the reserved memories’ addresses.
Assignment Statements
● Example: generation of the three-address code for the
assignment statement and simple expressions
Syntax Rule Semantic Action
S ® id := E p : = lookup (id. name);
S. code : = E. code || If p <> nil then gen (p. lexeme, ‘: = ’, E. place) else Error;
E ® E1 + E2 �. ����� : = �������();
�. ���� : = ��. ���� || ��. ���� || ���(�. �����, ‘: = ’, ��. �����, ‘ + ’, ��. �����)

E ® E1 * E2 �. ����� : = �������();
�. ���� : = ��. ���� || ��. ���� ||��� (�. �����, ‘: = ’, ��. �����, ‘ ∗ ’, ��. �����)
E ® - E1 �. ����� : = �������();
�. ���� : = ��. ���� || ��� (�. �����, ‘: = ������ ’, ��. �����)
E ® (E1) �. ����� : = �������();
�. ���� : = ��. ����;
E ® id � : = ������ (��. ������)
�� � <> ��� ���� �. ����� = �. ������ ���� ����� �. ���� = ’’/∗ ����� ���� ∗/
Addressing Array Elements
● The elements of an array are stored in a block of consecutive
locations.
– If the width of each array element is w, the relative address of the
array is base and the lower bound of the index is low, then the i'th
element of the array is found at the address:
base + (i – low) * w
● For example for an array declared as

A : array [5..10] of Integer;


if it is stored at the address 100,
A[7] = 100 + (7 – 5) * 4 = 108
Addressing Array Elements
● When the index is constant as above, it is possible to evaluate the
address of A[I] at compile time.
● However, most of the time the value of the index is not
determined at compile time and
– the compiler can only produce the three-address code statements that
will calculate the address at execution time.
● However, even in this case, some of the calculation can be done
at compile time.
– Indeed,
���� + (� – ���) ∗ � = � ∗ � + (���� – ��� ∗ �)
– base – low * w can be calculated at compile time and save time for the
execution.
Addressing Array Elements

Note:
● The function width(arrayname) returns the width of the
array called arrayname by looking in the symbol table.
● The function base(arrayname) returns the base of the array
called arrayname by looking in the symbol table.
Addressing Array Elements
Syntax Rule Semantic Rule
S ® L := E if L.offset = nil then /* L is a simple id */
S.code := L.code || E.code || Gen (L.place, ‘:=’, E.place);
else
S.code := L.code || E.code || Gen (L.place, ‘[’, L.offset, ‘] :=’, E.place);
E ® E1 + E2 E.place := newtemp();
E.code := E1.code || E2.code ||
gen (E.place, ‘:=’, E1.place, ‘+’, E2.place)
E ® E1 * E2 E.place := newtemp();
E.code := E1.code || E2.code ||
gen (E.place, ‘:=’, E1.place, ‘*’, E2.place)
E ® - E1 E.place := newtemp();
E.code := E1.code || gen (E.place, ‘:= uminus ’, E1.place)
E ® (E1) E.place := newtemp();
E.code := E1.code
Addressing Array Elements
Syntax Rule Semantic Rule
If L.offset = nil then /* L is simple */
E ®L Begin
E.place := L.place
E.code := L.code;
End
Else
Begin
E.place := newtemp();
E.code := L.code || gen (E.place, ‘ :=’, L.place, ‘[’ , L.offset,
‘]’)
L ® id [E] L.place :=End
newtemp();
L.offset := newtemp();
L.code := E.code
|| gen (L.place, ‘:=’, base (id.lexeme) – width (id.lexeme) * low(id.lexeme))
|| gen (L.offset, ‘:=’, E.place, ‘*’, width (id.lexeme));

p := lookup (id.lexeme)
L ® id
If p <> nil then L.place = p.lexeme else Error
L.offset := nil; /* for simple identifier */
E.code := ‘’ /* empty code */
Addressing Array Elements

Example1: Three-address code generation for the input X := A[y]


where A is stored at the address 400 and its values are integers (width =
4) and low = 1.
● The semantic actions will generate the following three-address code.
r1 := 396
r2 := y * 4
r3 := r1 [r2]
x := r3
Exercise: Produce the attributed parse tree (decorated parse tree)
Addressing Array Elements
Example2: Three-address code generation for the input
tab1 [i + k] := x + tab2 [j]
tab1 is stored at the address 100 and its values are integers. tab2 is stored at the address
200 and its values are integers. The semantic actions will generate the following three-
address code.
r4 := i + k
r5 := 96
r6 := r1 * 4
r1 := 196
r2 := j * 4
r3 := r1 + r2
r5 [r6] := r3
Exercise: Produce the attributed parse tree (decorated parse tree).
Backpatching
● The main problem for generating code for control statements
in a single pass is that, during one single pass,
– we may not know the labels where the control must go at the time
the jump statements are generated.
● We can solve this problem by generating jump statements
where the targets are temporarily left unspecified.
● Each such statement will be put on a list of goto statements
whose labels will be filled when determined.
● We call this backpatching and it is widely used in three-
address code generation.
Symbol Table
Introduction
• Symbol table is an important data structure created and maintained by
compilers in order to store information about the occurrence of various
entities such as
• variable names, function names, objects, classes, interfaces, etc.
• Symbol table is used by both the analysis and the synthesis parts of a
compiler.
• A symbol table may serve the following purposes depending upon the
language in hand:
• To store the names of all entities in a structured form at one place.
• To verify if a variable has been declared.
• To implement type checking, by verifying assignments and expressions in the source
code are semantically correct.
• To determine the scope of a name (scope resolution).
Introduction
• A symbol table is simply a table which can be either linear or a hash
table.
• It maintains an entry for each name in the following format:
< ������ ����, ����, ��������� >
• For example, if a symbol table has to store information about the following
variable declaration:
������ ��� ��������;
• then it should store the entry such as:
< ��������, ���, ������ >
• The attribute clause contains the entries related to the name.
Implementation Options
• If a compiler is to handle a small amount of data, then the symbol
table can be implemented as an unordered list, which is easy to code,
but it is only suitable for small tables only.
• A symbol table can be implemented in one of the following ways:
• Linear (sorted or unsorted) list
• Binary Search Tree
• Hash table
• Among all, symbol tables are mostly implemented as hash tables,
• where the source code symbol itself is treated as a key for the hash function
and the return value is the information about the symbol.
Operations
• A symbol table, either linear or hash, should provide the following
operations.
1. Insert()
• This operation is more frequently used by analysis phase, i.e., the first half of the
compiler where tokens are identified and names are stored in the table.
• This operation is used to add information in the symbol table about unique names
occurring in the source code.
• The format or structure in which the names are stored depends upon the compiler in
hand.
• An attribute for a symbol in the source code is the information associated with that
symbol.
• This information contains the value, state, scope, and type about the symbol.
• The insert() function takes the symbol and its attributes as arguments and stores the
information in the symbol table.
• For example: ��� �; should be processed by the compiler as ������(�, ���);
Operations
2. lookup()
• lookup() operation is used to search a name in the symbol table to determine:
• if the symbol exists in the table.
• if it is declared before it is being used.
• if the name is used in the scope.
• if the symbol is initialized.
• if the symbol declared multiple times.
• The format of lookup() function varies according to the programming
language.
• The basic format should match the following:
lookup(symbol)
• This method returns 0 (zero) if the symbol does not exist in the symbol table.
• If the symbol exists in the symbol table, it returns its attributes stored in the table.
Scope Management
• A symbol table stores names of all kinds that occur in a program along
with information about them.
• Type of the name (int, float, function, etc.), level at which it has been
declared, whether it is a declared parameter of a function or an ordinary
variable, etc.
• In the case of a function, additional information about the list of parameters
and their types, local variables and their types, result type, etc., are also
stored.
• A compiler maintains two types of symbol tables:
• A global symbol table which can be accessed by all the procedures, and
• Scope symbol tables that are created for each scope in the program.
Scope Management
• To determine the scope of a name,
symbol tables are arranged in
hierarchical structure as shown in the
next example.
Scope Management
• The above program can be represented in a hierarchical structure of symbol tables:
• The global symbol table contains names for one global variable (int value) and two procedure
names, which should be available to all the child nodes shown above.
• The names mentioned in the pro_one symbol table (and all its child tables) are not available for
pro_two symbols and its child tables.
Scope Management
• This symbol table data structure hierarchy is stored in the semantic
analyzer and whenever a name needs to be searched in a symbol
table, it is searched using the following algorithm:
• First a symbol will be searched in the current scope, i.e. current symbol table.
• If a name is found, then search is completed, else it will be searched in the
parent symbol table until,
• either the name is found or global symbol table has been searched for the name.
Scope Management
A Simple Symbol Table
• A very simple symbol table (quite restricted and not really fast) is
presented for use in the semantic analysis of functions.
• An array, func_name_table stores the function name records, assuming no
nested function definitions.
• Each function name record has fields: name, result type, parameter list
pointer, and variable list pointer.
• Parameter and variable names are stored as lists.
• Each parameter and variable name record has fields: name, type, parameter-
or-variable tag, and level of declaration (1 for parameters, and 2 or more for
variables)
Scope Management
A Simple Symbol Table
Scope Management
A Simple Symbol Table
• Two variables in the same function, with the same name but different
declaration levels, are treated as different variables (in their respective
scopes).
• If a variable (at level > 2) and a parameter have the same name, then the variable
name overrides the parameter name (only within the corresponding scope).
• However, a declaration of a variable at level 2, with the same name as a parameter, is
flagged as an error.
• The above two cases must be checked carefully.
• A search in the symbol table for a given name must always consider the names with
the declaration levels �, � − �, …, �, in that order, where � is the current level.
Scope Management
A Simple Symbol Table
Scope Management
A Simple Symbol Table
• The global variable, active_func_ptr,
• Stores a pointer to the function name entry in func_name_table of the function that is
currently being compiled.
• The global variable, level,
• Stores the current nesting level of a statement block.
• The global variable, call_name_ptr,
• Stores a pointer to the function name entry in func_name_table of the function whose
call is being currently processed.
• The function search_func(n, found, fnptr )
• Searches the function name table for the name n and returns found as T or F;
• If found, it returns a pointer to that entry in fnptr.
Scope Management
A Simple Symbol Table
• The function search_param(p, fnptr , found, pnptr )
• Searches the parameter list of the function at fnptr for the name p, and returns found as
T or F;
• If found, it returns a pointer to that entry in the parameter list, in pnptr.
• The function search_var (v , fnptr , l, found, vnptr )
• Searches the variable list of the function at fnptr for the name v at level l or lower, and
returns found as T or F;
• If found, it returns a pointer to that entry in the variable list, in vnptr.
End of CH5!
• Thank You!

You might also like