Unit 4 CD

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

CS6660 – Compiler Design – Unit 4

Unit – IV - Syntax Directed Translation

Syntax Directed Definition


Outline
Syntax Directed Definitions
Form of Syntax Directed Definitions
Synthesized Attributes
Inherited Attributes
Dependency Graphs
Evaluation Orders

SEMANTIC ANALYSIS
Semantic Analysis computes additional information related to the meaning of the
program once the syntactic structure is known.
In typed languages as C, semantic analysis involves adding information to the symbol
table and performing type checking.
The information to be computed is beyond the capabilities of standard
parsing techniques, therefore it is not regarded as syntax.
As for Lexical and Syntax analysis, also for Semantic Analysis we need both a
Representation Formalism and an Implementation Mechanism.
As representation formalism this lecture illustrates what are called Syntax Directed
Translations.
SYNTAX DIRECTED TRANSLATION
The Principle of Syntax Directed Translation states that the meaning of an input
sentence is related to its syntactic structure, i.e., to its Parse-Tree.
By Syntax Directed Translations we indicate those formalisms for specifying
translations for programming language constructs guided by context-free grammars.
oWe associate Attributes to the grammar symbols representing the language
constructs.
oValues for attributes are computed by Semantic Rules associated with
grammar productions.
Evaluation of Semantic Rules may:
o Generate Code;
o Insert information into the Symbol Table;
o Perform Semantic Check;
1
CS6660 – Compiler Design – Unit 4

o Issue error messages;


o etc.

We can associate information with a language construct by attaching attributes to the


grammar symbols.
Two notations to associate semantic rules with production:
Syntax Directed Definition (SDD)
Translation Schemes
SDD is a high level specifications for translations. It hides many implementation
details and free the user from having to specify explicitly the order in which the
translations takes place.
Translation schemes indicate the order in which semantic rules are to be evaluated, so
they allow some implementation details are to be shown.
Both notations are helpful for specifying the semantic checking, determination
of types and for generating intermediate code.
With the help of both notation, we parse the input token stream, build the parse tree, and
traverse tree as needed to evaluate semantic rules at the parse tree nodes.
Semantic rules evaluation leads to generate the code, save information in the symbol
table, issue error messages or perform some other activities.
Translation of token stream is the result obtained by evaluating the semantic rules.
Input string Parse Tree Dependency Graph Evaluation order for semantic rules

Syntax Directed Definitions


A SDD is a generalization of context free grammar in which each grammar symbol has
an associated set of attributes, partitioned into two subsets called the synthesized and
inherited attributes of that grammar symbol.
Attributes can represent anything we choose: a string, a number, a type, a
memory location, or whatever.
The value of an attribute in parse tree node – semantic rule associated with that production
at that node.
Synthesized Attribute: The value of synthesized attribute at a node is computed
from the values of attributes at the children of that node in the parse tree.
Inherited Attribute: The value of inherited attribute is computed from the values of
attributes at the siblings and parent of that node.
Semantic rules set up the dependencies between attributes that will be represented by a
graph.
From the dependency graph, we can derive an evaluation order for semantic rules.
Evaluation of the semantic rules define the values of the attributes at the nodes in the
parse tree for the input string.
A parse tree showing the values of attributes at each node is called an annotated parse
tree.
2
CS6660 – Compiler Design – Unit 4

The process of computing the attribute values at the nodes is called annotating or
decorating the parse tree.

Form of a SDD

In a SDD, each grammar production Aα has associated with it a set of semantic


rules of the form b:=f(c1,c2,…ck) where f is a function and either
b is a synthesized attribute of A and c1,c2,…ck are attributes belonging to the
grammar symbols of the production.
b is an inherited attributes of one of the grammar symbols on the right side of
production, and c1,c2,…ck are attributes belonging to the grammar symbols of the
production.
We say that attribute b depends on attributes c1,c2,…ck . An attribute grammar is a
SDD in which the functions in semantic rules cannot have side effects.
SDD of a simple desktop calculator

Synthesized Attributes

A SDD that uses synthesized attributes extensively is said to be S-attributed definition.


A parse tree for an S-attributed definition can always be annotated by evaluating the
semantic rules for the attributes at each node bottom up, from leaves to the root.
Annotated parse tree 3*5+4n

3
CS6660 – Compiler Design – Unit 4

Inherited Attributes

An inherited attribute is one whose values id defined in terms of attributes at the parent
and/or siblings of that node.

It is convenient to express the programming language constructs.


For example, we can use the inherited attributes to keep track of whether an identifier
appears on the left side or right side of the assignment in order to decide whether the address
or value of an identifier is needed.
This attribute distributes the type information to the various identifiers in the declaration.
SDD with inherited attribute L.in

Parse tree with inherited Attribute

Dependency Graph

If an attribute b at anode in a parse tree depends on an attribute c, then the semantic


rule for b must be evaluated after semantic rule that defined c.
The interdependencies among the synthesized and inherited attributes can be shown by a
directed graph called dependency graph.
For example, suppose A.a:=f(X.x,Y.y) is a semantic rule for the production AXY. This rule
defines a
synthesized attribute A.a that depends on the
attributes X.x , Y.y.
4
CS6660 – Compiler Design – Unit 4

If this production is used in the parse tree then there will be three nodes A.a,X.x,Y.y in
the dependency graph with an edge to A.a from X.x since A.a depends on X.x and an edge
to A.a from Y.y since A.a depends on Y.y.

Example:

Consider the following production with


semantic rule:

E E1+E2 E.val:=E1.val+E2.val

E.val is synthesized from


E1.val and E2.val

Evaluation orders

If m1 m2 is an edge from m1 to m2, then m1 appears before m2 in the ordering.


Any topological sort of dependency graph gives a valid order in which the semantic
rules associated with the nodes in the parse tree can be evaluated.
There are three methods are useful to evaluate semantic rules.

Parse tree
methods
Rule based methods

5
CS6660 – Compiler Design – Unit 4

Oblivious
methods
Parse Tree Methods: At compile time, these methods will obtain an evaluation order
from the constructed dependency graph. These methods will fail to obtain the evaluation
order it dependency graph has a cycle.
Rule Based Methods: At compiler-construction time, the semantic rules associated
with productions are analyzed either by a hand or some specialized tool.
Oblivious Methods: An evaluation order is chosen without considering the semantic rules.

Rule based methods and oblivious methods need not explicitly construct the
dependency graph at compile time, so they can be more efficient in their use of compile
time and space.
A SDD is said to be circular if the dependency graph for some parse tree generated by its
grammar has a cycle.

Construction of Syntax Trees

SDDs are useful for is construction of syntax trees. A syntax tree is a condensed form of
parse tree.
Syntax trees are useful for representing programming language constructs like
expressions and statements.
They help compiler design by decoupling parsing from translation.
Each node of a syntax tree represents a construct; the children of the node represent
the meaningful components of the construct.
e.g. a syntax-tree node representing an expression E1 + E2 has label + and two children
representing the sub expressions E1 and E2

Each node is implemented by objects with suitable number of fields; each object will
have an op field that is the label of the node with additional fields as follows:

If the node is a leaf, an additional field holds the lexical value for the leaf.
This is created by function Leaf(op, val)
If the node is an interior node, there are as many field as the node has children in the syntax
tree.
This is created by function Node(op, c1,c2,c3…,ck)
6
CS6660 – Compiler Design – Unit 4

Steps in the construction of the syntax tree for a-4+c

If the rules are evaluated during a post order traversal of the parse tree, or with
reductions during a bottom-up parse, then the sequence of steps shown below ends
with p5 pointing to the root of the constructed syntax tree.

Constructing Syntax Trees during Top-Down Parsing

With a grammar designed for top-down parsing, the same syntax trees are constructed,
using the same sequence of steps, even though the structure of the parse trees differs
significantly from that of syntax trees.
The L-attributed definition below performs the same translation as the S-attributed
definition shown before.

7
CS6660 – Compiler Design – Unit 4

TYPE CHECKING
A compiler must check that the source program follows both syntactic and
semantic conventions of the source language.
This checking, called static checking, detects and reports programming errors.

Some examples of static checks:

1. Type checks – A compiler should report an error if an operator is applied to an


incompatible operand. Example: If an array variable and function variable are added
together.
2. Flow-of-control checks – Statements that cause flow of control to leave a construct
must have some place to which to transfer the flow of control. Example: An
error occurs when an enclosing statement, such as break, does not exist in switch
statement.
Position of type checker
Token parser syntax type checker syntax intermediate intermediate
stream code generator

A type checker verifies that the type of a construct matches that expected by
its context. For example : arithmetic operator mod in Pascal requires integer operands,
so a type checker verifies that the operands of mod have type integer. 
Type information gathered by a type checker may be needed when code is generated. 
TYPE SYSTEMS
The design of a type checker for a language is based on information about the syntactic
constructs in the language, the notion of types, and the rules for assigning types to
language constructs.
For example : “ if both operands of the arithmetic operators of +,- and * are of type integer,
then the result is of type integer ”
Type Expressions
The type of a language construct will be denoted by a “type expression.” 
A type expression is either a basic type or is formed by applying an operator called a
type constructor to other type expressions. 

8
CS6660 – Compiler Design – Unit 4

The sets of basic types and constructors depend on the language to be checked. 

The following are the definitions of type expressions:

1. Basic types such as boolean, char, integer, real are type expressions.

A special basic type, type_error , will signal an error during type checking; void
denoting “the absence of a value” allows statements to be checked.
2. Since type expressions may be named, a type name is a type expression.

3. A type constructor applied to type expressions is a type


expression. Constructors include:
Arrays : If T is a type expression then array (I,T) is a type expression denoting the
type of an array with elements of type T and index set I.

Products : If T1 and T2 are type expressions, then their Cartesian product T1 X T2


is a type expression.

Records : The difference between a record and a product is that the fields of a record have
names. The record type constructor will be applied to a tuple formed from field names
and field types.
For example:
type row = record
address: integer;
lexeme: array[1..15] of
char end;
var table: array[1...101] of row;
declares the type name row representing the type expression record((address X integer) X
(lexeme X array(1..15,char))) and the variable table to be an array of records of this type.

Pointers : If T is a type expression, then pointer(T) is a type expression denoting the


type “pointer to an object of type T”.
For example, var p: ↑ row declares variable p to have type pointer(row).
Functions : A function in programming languages maps a domain type D to a range type
R. The type of such function is denoted by the type expression D → R

4. Type expressions may contain variables whose values are type expressions.

9
CS6660 – Compiler Design – Unit 4

Tree representation for char x char → pointer (integer)

x pointer

char char integer

Type systems

A type system is a collection of rules for assigning type expressions to the various parts of a
program.

A type checker implements a type system. It is specified in a syntax-directed manner. 

Different type systems may be used by different compilers or processors of the
same language. 
Static and Dynamic Checking of Types

Checkingdone by a compiler is said to be static, while checking done when the target
program runs is termed dynamic. 

Any check can be done dynamically, if the target code carries the type of an
element along with the value of that element. 

Sound type system


A sound type system eliminates the need for dynamic checking for type errors because
it allows us to determine statically that these errors cannot occur when the target program runs.
That is, if a sound type system assigns a type other than type_error to a program part, then type
errors cannot occur when the target code for the program part is run.

Strongly typed language


A language is strongly typed if its compiler can guarantee that the programs it
accepts will execute without type errors.

Error Recovery

Since type checking has the potential for catching errors in program, it is desirable
for type checker to recover from errors, so it can check the rest of the input. 

Error handling has to be designed into the type system right from the start; the
type checking rules must be prepared to cope with errors. 

10
CS6660 – Compiler Design – Unit 4

SPECIFICATION OF A SIMPLE TYPE CHECKER


Here, we specify a type checker for a simple language in which the type of each
identifier must be declared before the identifier is used. The type checker is a translation scheme
that synthesizes the type of each expression from the types of its subexpressions. The type
checker can handle arrays, pointers, statements and functions.
A Simple Language
Consider the following grammar:
P→D;E
D → D ; D | id : T
T → char | integer | array [ num ] of T | ↑ T
E → literal | num | id | E mod E | E [ E ] | E ↑

Translation scheme:

P→D;E
D→D;D { addtype (id.entry , T.type)}
D → id : T
T → char { T.type : = char }
T → integer { T.type : = integer }
T → ↑ T1 { T.type : = pointer(T1.type) }
T → array [ num ] of T1 { T.type : = array ( 1… num.val , T1.type) }
In the above language,
→ There are two basic types : char and integer ;
→ type_error is used to signal errors;
→ the prefix operator ↑ builds a pointer type. Example , ↑ integer leads to the type
expression pointer ( integer ).
Type checking of expressions
In the following rules, the attribute type forE gives the type expression assigned to the
expression generated by E.
1. E → literal { E.type : = char }
E → num { E.type : = integer }
Here, constants represented by the tokens literal and num have type char and integer.
2. E → id { E.type : = lookup ( id.entry ) }
lookup ( e ) is used to fetch the type saved in the symbol table entry pointed to by e.
3. E → E1 mod E2 { E.type : = if E1. type = integer and
E2. type = integer then
integer else type_error }
The expression formed by applying the mod operator to two subexpressions of type integer
has type integer; otherwise, its type is type_error.
4. E → E1 [ E2 ] { E.type : = if E2.type = integer and
E1.type = array(s,t) then t
else type_error }
11
CS6660 – Compiler Design – Unit 4

In an array reference E1 [ E2 ] , the index expression E 2 must have type integer. The result
is the element type t obtained from the type array(s,t) of E1.
5. E → E1 ↑ { E.type : = if E1.type = pointer (t) then t
else type_error }
The postfix operator ↑ yields the object pointed to by its operand. The type of E ↑ is the type
t of the object pointed to by the pointer E.
Type checking of statements
Statements do not have values; hence the basic type void can be assigned to them. If an error
is detected within a statement, then type_error is assigned.
Translation scheme for checking the type of statements:
1. Assignment statement:
S → id : = E { S.type : = if id.type = E.type then void else
type_error }

2. Conditional statement:
S → if E then S1 { S.type : = if E.type = boolean then S1.type
else type_error }
3. While statement:
S → while E do S1 { S.type : = if E.type = boolean then S1.type
else type_error }
4. Sequence of statements:
S → S1 ; S2 { S.type : = if S1.type = void and S1.type = void
then void
else type_error }
Type checking of functions
The rule for checking the type of a function application is :
E → E1 ( E2) { E.type : = if E2.type = s and
E1.type = s → t then
t else type_error }
RUN-TIME ENVIRONMENT
SOURCE LANGUAGE ISSUES
Procedures:
A procedure definition is a declaration that associates an identifier with a statement. The
identifier is the procedure name, and the statement is the procedure body.

For example, the following is the definition of procedure named readarray :

procedure readarray;
var i : integer;
begin
for i : = 1 to 9 do read(a[i])
end;

12
CS6660 – Compiler Design – Unit 4

When a procedure name appears within an executable statement, the procedure is said to
be called at that point.

Activation trees:
An activation tree is used to depict the way control enters and leaves activations. In an
activation tree,
1. Each node represents an activation of a procedure.
2. The root represents the activation of the main program.
3. The node for a is the parent of the node for b if and only if control flows
from activation a to b.
4. The node for a is to the left of the node for b if and only if the lifetime of a occurs
before the lifetime of b.

Control stack:
A control stack is used to keep track of live procedure activations. The idea is
to push the node for activation onto the control stack as the activation begins and to
pop the node when the activation ends.

The contents of the control stack are related to paths to the root of the activation tree.
When node n is at the top of control stack, the stack contains the nodes along the path
from n to the root.

The Scope of a Declaration:


A declaration is a syntactic construct that associates information with a
name.
Declarations may be explicit, such as:
var i : integer ;
Or they may be implicit. Example, any variable name starting with I is assumed
to denote an integer.
The portion of the program to which a declaration applies is called the scope of
that declaration.

Binding of names:
Even if each name is declared once in a program, the same name may denote different
data objects at run time. “Data object” corresponds to a storage location that holds values.
The term environment refers to a function that maps a name to a storage location. The
term state refers to a function that maps a storage location to the value held there.

environment state

name storage value

13
CS6660 – Compiler Design – Unit 4

When an environment associates storage location s with a name x, we say that x is bound
to s. This association is referred to as a binding of x.

STORAGE ORGANIZATION
The executing target program runs in its own logical address space in which each
program value has a location.
The management and organization of this logical address space is shared between
the complier, operating system and target machine. The operating system maps the logical
address into physical addresses, which are usually spread throughout memory.

Typical subdivision of run-time memory:

CODE
STATIC DATA
STACK

FREE MEMORY

HEAP

Run-time storage comes in blocks, where a byte is the smallest unit of


addressable memory. Four bytes form a machine word. Multibyte objects are stored in
consecutive bytes and given the address of first byte.
The storage layout for data objects is strongly influenced by the addressing constraints of
the target machine.
A character array of length 10 needs only enough bytes to hold 10 characters, a compiler
may allocate 12 bytes to get alignment, leaving 2 bytes unused.
This unused space due to alignment considerations is referred to as padding.
The size of some program objects may be known at run time and may be placed in an area
called static.
The dynamic areas used to maximize the utilization of space at run time are stack and
heap.
Activation records:
Procedure calls and returns are usually managed by a run time stack called the control
stack.
Each live activation has an activation record on the control stack, with the root of the
activation tree at the bottom; the latter activation has its record at the top of the stack.
The contents of the activation record vary with the language being implemented.
The diagram below shows the contents of activation record.

14
CS6660 – Compiler Design – Unit 4

Temporary values such as those arising from the evaluation of expressions.


Local data belonging to the procedure whose activation record this is.
A saved machine status, with information about the state of the machine just before the
call to procedures.
An access link may be needed to locate data needed by the called procedure but found
elsewhere.
A control link pointing to the activation record of the caller.
Space for the return value of the called functions, if any. Again, not all called procedures
return a value, and if one does, we may prefer to place that value in a register
for efficiency.
The actual parameters used by the calling procedure. These are not placed in activation
record but rather in registers, when possible, for greater efficiency.

15
CS6660 – Compiler Design – Unit 4

STORAGE ALLOCATION STRATEGIES


The different storage allocation strategies are :
1. Static allocation – lays out storage for all data objects at compile time
2. Stack allocation – manages the run-time storage as a stack.
3. Heap allocation – allocates and deallocates storage as needed at run time from a
data area known as heap.

STATIC ALLOCATION
In static allocation, names are bound to storage as the program is compiled, so there is
no need for a run-time support package. 
Since the bindings do not change at run-time, everytime a procedure is activated,
its names are bound to the same storage locations. 
Therefore values of local names are retained across activations of a procedure. That is,
when control returns to a procedure the values of the locals are the same as they were
when control left the last time. 
From the type of a name, the compiler decides the amount of storage for the name and
decides where the activation records go. At compile time, we can fill in the addresses
at which the target code can find the data it operates on. 

STACK ALLOCATION

All compilers for languages that use procedures, functions or methods as units of user-
defined actions manage at least part of their run-time memory as a stack. 
Each time a procedure is called , space for its local variables is pushed onto a stack,
and when the procedure terminates, that space is popped off the stack. 

Calling sequences:
Procedures called are implemented in what is called as calling sequence, which
consists of code that allocates an activation record on the stack and enters information
into its fields. 
A return sequence is similar to code to restore the state of machine so the calling
procedure can continue its execution after the call. 
The code in calling sequence is often divided between the calling procedure (caller)
and the procedure it calls (callee). 
When designing calling sequences and the layout of activation records, the
following principles are helpful:
Values communicated between caller and callee are generally placed at the
beginning of the callee‟s activation record, so they are as close as possible to the caller‟s
activation record.
Fixed length items are generally placed in the middle. Such items typically
include the control link, the access link, and the machine status fields. 
Items whose size may not be known early enough are placed at the end of the
16
CS6660 – Compiler Design – Unit 4

activation record. The most common example is dynamically sized array, where
the value of one of the callee‟s parameters determines the length of the array.
We must locate the top-of-stack pointer judiciously. A common approach is to have it
point to the end of fixed-length fields in the activation record. Fixed- length data
can then be accessed by fixed offsets, known to the intermediate- code generator,
relative to the top-of-stack pointer.

The calling sequence and its division between caller and callee are as follows. 

The caller evaluates the actual parameters. 
The caller stores a return address and the old value of top_sp into the callee‟s
activation record. The caller then increments the top_sp to the respective
positions. 
The callee saves the register values and other status information. 
The callee initializes its local data and begins execution. 
A suitable, corresponding return sequence is: 

The callee places the return value next to the parameters. 
Using the information in the machine-status field, the callee restores top_sp and
other registers, and then branches to the return address that the caller placed in the
status field. 
Although top_sp has been decremented, the caller knows where the return value is,
relative to the current value of top_sp; the caller therefore may use that value. 
17
CS6660 – Compiler Design – Unit 4

Variable length data on stack:


The run-time memory management system must deal frequently with the allocation
of space for objects, the sizes of which are not known at the compile time, but which are
local to a procedure and thus may be allocated on the stack.
The reason to prefer placing objects on the stack is that we avoid the expense of garbage
collecting their space.
The same scheme works for objects of any type if they are local to the procedure called
and have a size that depends on the parameters of the call.

activation
control link
record for p
pointer to A
pointer to B
pointer to C

array A
arrays of p
array B

array C

activation record for control link top_sp


procedure q called by p
arrays of q top

Access to dynamically allocated arrays

Procedure p has three local arrays, whose sizes cannot be determined at compile time.
The storage for these arrays is not part of the activation record for p. 
Access to the data is through two pointers, top and top-sp. Here the top marks the actual
top of stack; it points the position at which the next activation record will begin. 
The second top-sp is used to find local, fixed-length fields of the top activation record. 
The code to reposition top and top-sp can be generated at compile time, in terms of sizes
that will become known at run time. 
HEAP ALLOCATION
Stack allocation strategy cannot be used if either of the following is possible:
1. The values of local names must be retained when activation ends.
2. A called activation outlives the caller.

18
CS6660 – Compiler Design – Unit 4

Heap allocation parcels out pieces of contiguous storage, as needed for activation
records or other objects. 
Pieces may be deallocated in any order, so over the time the heap will consist of
alternate areas that are free and in use. 

Position in the Activation records in the heap Remarks


activation tree

s Retained activation
s record for r

r q ( 1 , 9) control link

r
control link

q(1,9)

control link

The record for an activation of procedure r is retained when the activation ends. 

Therefore, the record for the new activation q(1 , 9) cannot follow that for s physically. 

If the retained activation record for r is deallocated, there will be free space in the
heap between the activation records for s and q.

19
CS6660 – Compiler Design – Unit 4

PARAMETER PASSING
The communication medium among procedures is known as parameter passing.
The values of the variables from a calling procedure are transferred to the called procedure by
some mechanism. Before moving ahead, first go through some basic terminologies pertaining to
the values in a program.

r-value: The value of an expression is called its r-value. The value contained in a single
variable also becomes an r-value if it appears on the right-hand side of the assignment operator.
r-values can always be assigned to some other variable.

l-value: The location of memory (address) where an expression is stored is known as the l-
value of that expression. It always appears at the left hand side of an assignment operator.
For example:
day = 1;
week = day * 7;
month = 1;
year = month * 12;

From this example, we understand that constant values like 1, 7, 12, and variables like
day, week, month and year, all have r-values. Only variables have l-values as they also
represent the memory location assigned to them.

For example:

7 = x + y; is an l-value error, as the constant 7 does not represent any memory location.

Formal Parameters
Variables that take the information passed by the caller procedure are called
formal parameters. These variables are declared in the definition of the called function.

Actual Parameters
Variables whose values or addresses are being passed to the called procedure are called
actual parameters. These variables are specified in the function call as arguments.

Example:

fun_one()
{
int actual_parameter = 10;
call fun_two(int actual_parameter);

20
CS6660 – Compiler Design – Unit 4

}
fun_two(int formal_parameter)
{
print formal_parameter;
}
Formal parameters hold the information of the actual parameter, depending upon
the parameter passing technique used. It may be a value or an address.

Pass by Value
In pass by value mechanism, the calling procedure passes the r-value of
actual parameters and the compiler puts that into the called procedure‟s activation record.
Formal parameters then hold the values passed by the calling procedure. If the values held by
the formal parameters are changed, it should have no impact on the actual parameters.

Pass by Reference
In pass by reference mechanism, the l-value of the actual parameter is copied to
the activation record of the called procedure. This way, the called procedure now has the address
(memory location) of the actual parameter and the formal parameter refers to the same memory
location. Therefore, if the value pointed by the formal parameter is changed, the impact should
be seen on the actual parameter as they should also point to the same value.

Pass by Copy-restore
This parameter passing mechanism works similar to „pass-by-reference‟ except that the
changes to actual parameters are made when the called procedure ends. Upon function call, the
values of actual parameters are copied in the activation record of the called procedure. Formal
parameters if manipulated have no real-time effect on actual parameters (as l-values are passed),
but when the called procedure ends, the l-values of formal parameters are copied to the l-values
of actual parameters.

Example:

int y;
calling_procedure()
{
y = 10;
copy_restore(y); //l-value of y is passed
printf y; //prints 99
}
copy_restore(int x)

21
CS6660 – Compiler Design – Unit 4

{
x = 99; // y still has value 10 (unaffected)
y = 0; // y is now 0
}
When this function ends, the l-value of formal parameter x is copied to the actual
parameter y. Even if the value of y is changed before the procedure ends, the l-value of x is
copied to the l-value of y making it behave like call by reference.

Pass by Name
Languages like Algol provide a new kind of parameter passing mechanism that works
like preprocessor in C language. In pass by name mechanism, the name of the procedure being
called is replaced by its actual body. Pass-by-name textually substitutes the
argument expressions in a procedure call for the corresponding parameters in the body of the
procedure so that it can now work on actual parameters, much like pass-by-reference.

SYMBOL TABLES

Symbol table is an important data structure created and maintained by compilers in order
to store information about the occurrence of various entities such as variable names, function
names, objects, classes, interfaces, etc. Symbol table is used by both the analysis and
the synthesis parts of a compiler.

A symbol table may serve the following purposes depending upon the language in hand:

To store the names of all entities in a structured form at one place.

To verify if a variable has been declared.

To implement type checking, by verifying assignments and expressions in the


source code are semantically correct.

To determine the scope of a name (scope resolution).

A symbol table is simply a table which can be either linear or a hash table. It maintains
an entry for each name in the following format:

<symbol name, type, attribute>

For example, if a symbol table has to store information about the following variable declaration:

22
CS6660 – Compiler Design – Unit 4

static int interest;

then it should store the entry such as:

<interest, int, static>

The attribute clause contains the entries related to the name.

Implementation
If a compiler is to handle a small amount of data, then the symbol table can be
implemented as an unordered list, which is easy to code, but it is only suitable for small tables
only. A symbol table can be implemented in one of the following ways:

Linear (sorted or unsorted) list


Binary Search Tree
Hash table
Among all, symbol tables are mostly implemented as hash tables, where the source code
symbol itself is treated as a key for the hash function and the return value is the information
about the symbol.

Operations
A symbol table, either linear or hash, should provide the following operations.

insert()
This operation is more frequently used by analysis phase, i.e., the first half of the
compiler where tokens are identified and names are stored in the table. This operation is used to
add information in the symbol table about unique names occurring in the source code.
The format or structure in which the names are stored depends upon the compiler in hand.

An attribute for a symbol in the source code is the information associated with
that symbol. This information contains the value, state, scope, and type about the
symbol. The insert() function takes the symbol and its attributes as arguments and stores the
information in the symbol table.

For example:

int a;

23
CS6660 – Compiler Design – Unit 4

should be processed by the compiler as:

insert(a, int);

lookup()
lookup() operation is used to search a name in the symbol table to determine:

if the symbol exists in the table.


if it is declared before it is being used.
if the name is used in the scope.
if the symbol is initialized.
if the symbol declared multiple times.
The format of lookup() function varies according to the programming language.
The basic format should match the following:

lookup(symbol)

This method returns 0 (zero) if the symbol does not exist in the symbol table. If
the symbol exists in the symbol table, it returns its attributes stored in the table.

Scope Management
A compiler maintains two types of symbol tables: a global symbol table which can be
accessed by all the procedures andscope symbol tables that are created for each scope in the
program.

To determine the scope of a name, symbol tables are arranged in hierarchical structure as
shown in the example below:

...
int value=10;

void pro_one()
{
int one_1;
int one_2;

{ \
int one_3; |_ inner scope 1
int one_4; |
} /

24
CS6660 – Compiler Design – Unit 4

int one_5;

{ \
int one_6; |_ inner scope 2
int one_7; |
} /
}

void pro_two()
{
int two_1;
int two_2;

{ \
int two_3; |_ inner scope 3
int two_4; |
} /

int two_5;
}
...

The above program can be represented in a hierarchical structure of symbol tables:

25
CS6660 – Compiler Design – Unit 4

The global symbol table contains names for one global variable (int value) and two procedure
names, which should be available to all the child nodes shown above. The names mentioned in the
pro_one symbol table (and all its child tables) are not available for pro_two symbols and its child tables.

This symbol table data structure hierarchy is stored in the semantic analyzer and whenever a
name needs to be searched in a symbol table, it is searched using the following algorithm:

first a symbol will be searched in the current scope, i.e. current symbol table.

if a name is found, then search is completed, else it will be searched in the parent symbol table until,

either the name is found or global symbol table has been searched for the name.

DYNAMIC STORAGE ALLOCATION


It is necessary to preserve the previous values of any variables used by subroutine, including
parameters, temporaries, return addresses, register save areas, etc. It can be accomplished
with a dynamic storage allocation technique.
Storage is organized as a stack.
Activation records are pushed and popped.
Locals and parameters are contained in the activation records for the call.
This means locals are bound to fresh storage on every call.
If we have a stack growing downwards, we just need a stack_top pointer.
To allocate a new activation record, we just increase stack_top.
To deallocate an existing activation record, we just decrease stack_top.

Static allocation vs. dynamic allocation


Temporary variables, including the one used to save the return address, were also assigned fixed
addresses within the program. This type of storage assignment is called static allocation.
It is necessary to preserve the previous values of any variables used by subroutine, including
parameters, temporaries, return addresses, register save areas, etc.
It can be accomplished with a dynamic storage allocation technique.

Dynamic storage allocation technique.


o Each procedure call creates an activation record that contains storage for all the variables used
by the procedure.
o Activation records are typically allocated on a stack.

26

You might also like