0% found this document useful (0 votes)
6 views29 pages

Unit 5

Uploaded by

proanimewatcherr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views29 pages

Unit 5

Uploaded by

proanimewatcherr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Unit 4 – Error Recovery

1. Basic types of Error.


Error can be classified into mainly two categories,
categories
1. Compile time error
2. Runtime error
Errors

Compile time Errors Run time Errors

Lexical phase Syntactic phase Semantic phase


Errors Errors Errors

Fig.4.1. Types of Error


Lexical Error
This type of errors can be detected during lexical analysis phase. Typical lexical phase errors are
are,
1. Spelling errors. Hence get incorrect tokens.
2. Exceeding length of identifier or numeric constants.
3. Appearance of illegal characters.
characters
Example:
fi ( )
{
}
 In above code 'fi' cannot be recognized as a misspelling of keyword if rather lexical
analyzer will understand that it is an identifier and will return it as valid identifier.
Thus misspelling causes errors in token formation.
Syntax error
These types of error appear during syntax analysis phase of compiler.
compiler
Typical errors are:
1. Errors in structure.
2. Missing operators.
3. Unbalanced parenthesis.
parenthesis
 The parser demands for tokens from lexical analyzer and if the tokens do not satisfy
the grammatical rules of programming language then the syntactical errors get
raised.
Semantic error
This type of error detected during semantic analysis phase.
phas
Typical errors are:
1. Incompatible types of operands.
operands
Dixita Kagathara, CE Department | 170701 – Compiler Design 51
Unit 4 – Error Recovery
2. Undeclared variable.
3. Not matching of actual argument with formal argument.
argument

2. Error recovery strategies.


strategies OR
Ad-hoc
hoc and systematic methods.
methods
1. Panic mode
 This strategy is used by most parsing methods. This is simple to implement.
 In this method on discovering error, the parser discards input symbol one at time. This
process is continued until one of a designated set of synchronizing tokens is found.
Synchronizing
onizing tokens are delimiters such as semicolon or end. These tokens indicate an
end of input statement.
 Thus in panic mode recovery a considerable amount of input is skipped without
checking it for additional errors.
 This method guarantees not to go in infinite loop.
 If there is less number of errors in the same statement then this strategy is best choice.
2. Phrase level recovery
 In this method, on discovering an error parser performs local correction on remaining
input.
 It can replace a prefix of remaining input by some string. This actually helps parser to
continue its job.
 The local correction can be replacing comma by semicolon, deletion of semicolons or
inserting missing semicolon. This type of local correction is de decided by compiler
designer.
 While doing the replacement a care should be taken for not going in an infinite loop.
 This method is used in many error-repairing
error compilers.
3. Error production
 If we have good knowledge of common errors that might be encountered, then we can
augment the grammar for the corresponding language with error productions that
generate the erroneous constructs.
 If error production is used during parsing, we can generate appropriate error message
to indicate the erroneous construct that has been recognized in the input.
 This method is extremely difficult to maintain,
maintain because
ecause if we change grammar then it
becomes necessary to change the corresponding productions.
4. Global correction
 We often want such a compiler that makes very few changes in processing an incorrect
input string.
 Given an incorrect input string x and grammar G, the algorithm will find a parse tree
for a related string y, such that number of insertions, deletions
deletions and changes of token
require to transform x into y is as small as possible.
 Such methods increase time and space requirements at parsing time.
 Global production is thus simply a theoretical concept.

Dixita Kagathara, CE Department | 170701 – Compiler Design 52


Unit 6 – Run Time Memory Management

1. Source language issue.


1. Procedure call
 A procedure definition is a declaration that associates an identifier with a statement.
 The identifier is the procedure name and the statement is the procedure body.
 For example, the following is the definition of procedure named readarray:
Procedure readarray
Var i: integer;
Begin
For i=1 to 9 do real(a[i])
End;
 When a procedure name appears within an executable statement, the procedure is said to
be called at that point.
2. Activation tree
 An activation tree is used to depict the way control enters and leaves activations. In an
activation tree,
a) Each node represents an activation of a procedure.
b) The root represents the activation of the main program.
c) The node for a is the parent of the node b if and only if control flows from activation
a to b.
d) The node for a is to the left of the node for b if and only if the lifetime of a occurs
before the lifetime of b.
3. Control stack
 A control stack is used to keep track of live procedure activations.
 The idea is to push the node for activation onto the control stack as the activation begins
and to pop the node when the activation ends.
 The contents of the control stack are related to paths to the root of th
the activation tree.
 When node n is at the top of the stack, the stack contains the nodes along the path from n
to the root.
4. The scope of declaration
 A declaration is a syntactic construct that associates information with a name.
 Declaration may be explicit, such as:
var i: integer;
 Or they may be implicit. Example, any variable name starting with i is assumed to denote an
integer.
 The portion of the program to which a declaration applies is called the scope of that
declaration.
5. Bindings of names
 Even if each h time name is declared once in a program, the same name may denote different
data objects at run time.
 “Data
ata object” corresponds to a storage location that holds values.
 The term environment refers to a function that maps a name to a storage location.
 The term state refers to a function that maps a storage location to the value held there.
Dixita Kagathara, CE Department | 170701 – Compiler Design 65
Unit 6 – Run Time Memory Management

Environment Environment

Name Storage Value


Fig 6.1 Two stage mapping from name to value
 When an environment associates storage location s with a name x, we say that x is bound
to s.
 This association is referred as a binding of x.
2. Storage organization.
1. Subdivision of Run-Time
Time memory
 The compiler demands for a block of memory to operating system. The compiler utilizes
this block of memory executing the compiled program. This block of memory is called run
time storage.
 The run time storage is subdivided to hold code and data such as, the generated target
code and Data objects..
 The size of generated code is fixed. Hence the target code occupies the determined area of
the memory. Compiler places the target code at end of the memory.
 The amount of memory required by the data objects is known at the compiled time and
hence data objects also can be placed at the statically determined area o
of the memory.
Code area

Static data area

Stack

Heap

Fig 6.2 Typical subdivision of run time memory into code and data areas
 Stack is used to manage the active procedure. Managing of active procedures means when a
call occurs then execution of activation is interrupted and information about status of the
stack is saved on the stack. When the control returns from the call this suspended activation
resumed after storing the values of relevant registers.
 Heap area is the area of run time
time storage in which the other information is stored. For
example memory for some data items is allocated under the program control. Memory
required for these data items is obtained from this heap area. Memory for some activation is
also allocated from heap area.
2. Activation Records (Most IMP)
Various field of activation record are as follows:
1. Temporary values: The temporary variables are needed during the evaluation of
expressions. Such variables are stored
stored in the temporary field of activation record.
2. Local variables: The local data is a data that is local to the execution procedure is stored in
this field of activation record.
Dixita Kagathara, CE Department | 170701 – Compiler Design 66
Unit 6 – Run Time Memory Management

Return value

Actual parameter

Control link

Access link

Saved M/c status

Local variables

Temporaries

Fig 6.3 Activation Record


3. Saved machine registers: This field holds the information regarding the status of machine
just before the procedure is called. This field contains the registers and program counter.
4. Control link: This field is optional. It points to the activation record of the calling procedure.
This link is also called dynamic link.
5. Access link: This field is also optional. It refers to the non local data in other activation
record. This field is also called static link field.
6. Actual parameters: This field holds the information about the actual parameters. These
actual parameters are passed to the called procedure.
7. Return values: This field is used to store the result of a function call.
3. Compile time layout of local data
 Suppose run-time time storage comes in block of contiguous bytes, where byte is the smallest
unit of addressable memory.
 The amount of storage needed for a name is determined from its type.
 Storage
torage for an aggregate, such as an array or record, must be large enough to hold all its
components.
 The field of local data is laid out as the declarations in a procedure are examined at compile
time.
 Variable length data has been kept outside this field.
 We keep a count of the memory locations that have been allocated for previous
declarations.
 From the count we determine a relative address of the storage for a local with respect to
some position such as the beginning of the activation record.
 The storage layout for data objects is strongly
strongly influenced by the addressing constraints of the target
machine.
3. Difference between Static v/s Dynamic memory allocation
No. Static Memory Allocation Dynamic Memory Allocation
1 Memory is allocated before the Memory is allocated during the
execution of the program begins. execution of the program.
2 No memory allocation or de-allocation
de Memory Bindings are establ
established and
actions are performed during destroyed during the execution.
xecution.

Dixita Kagathara, CE Department | 170701 – Compiler Design 67


Unit 6 – Run Time Memory Management

execution.
3 Variables remain permanently Allocated only when program unit is
allocated. active.
4 Implemented using stacks and heaps. Implemented using data segments.
5 Pointer is needed to accessing No need of dynamically
ynamically allocated
variables. pointers.
6 Faster execution than dynamic.
d Slower execution than static.
7 More memory space
pace required. Less memory
emory space required.
Table 6.1 Difference between Static and Dynamic memory allocation

4. Storage allocation strategies.


The different storage allocation strategies are;
1. Static allocation: lays out storage for all data objects at compile time.
2. Stack allocation: manages the run-time
run storage as a stack.
3. Heap allocation: allocates and de-allocates
de storage as needed
d at run time from a data area
known as heap.
Static allocation
 In static allocation, names are bound to storage as the program is compiled, so there is no
need for a run-time
time support package.
 Since the bindings do not change at run-time,
run time, every time a procedure is activated, its
names are bounded to the same storage location.
 Therefore values of local names are retained across activations of a procedure. That is,
when control returns to a procedure the value of the local are the same as they were when
control left the last time.
Stack allocation
 All compilers for languages that use procedures, functions
functions or methods as units of user
define actions manage at least part of their run-time
run time memory as a stack
stack.
 Each time a procedure is called, space for its local variables is pushed onto a stack, and
when the procedure terminates, the space is popped off thet stack.
Calling Sequences: (HowHow is task divided between calling & called program for stack
updating?)
 Procedures called are implemented in what is called as calling sequence, which consist of
code that allocates an activation record on the stack and enters information into its fields.
 A return sequence is similar to code to restore the state of machine so the calling
procedure can continue its execution after the call.
call
 The code is calling sequence of often divided between the calling procedure (caller) and
procedure is calls (callee).
(callee)
 When designing calling sequences and the layout of activation record, the following
principles are helpful:
1. Value communicated between caller caller and callee are generally placed at the
Dixita Kagathara, CE Department | 170701 – Compiler Design 68
Unit 6 – Run Time Memory Management
beginning of the callee’s activation record, so they are as close as possible to the
caller’s activation record.
2. Fixed length items are generally placed in the middle. Such items typically include
the control link,, the access link, and the machine status field.
3. Items whose size may not be known early enough are placed at the end of the
activation record.
4. We must locate the top of the stack pointer judiciously. A common approach is to
have it point to the end of fixed
fixed length fields in the activation is to have it point to
the end of fixed length fields in the activation record. Fixed length data can then be
accessed by fixed offsets, known to the intermediate code generator, relative to the
top of the stack pointer.
pointer
 The calling sequence and its division between caller and callee are as follows:
1. The caller evaluates the actual parameters.
2. The caller stores a return address and the old value of top_sp into the callee’s
activation record. The caller then increments the top_sp to the respective
positions.
3. The callee saves the register values and other status information.
4. The callee initializes its local data and begins execution.

Parameter and returned


Caller’s value
activation
record
Control link
Caller’s
Temporaries and local data
responsibility

Callee’s Parameter and returned


activation value top_sp
record Callee’s Control link
responsibility
Temporaries and local data
Fig. 6.4
6. Division of task between caller and callee
 A suitable, corresponding return sequence is:
1. The callee places the return value next to the parameters.
parameters
2. Using the information in the machine status field, the callee restores top_sp and
other registers, and then branches to the return address that the caller placed in the
status field.
3. Although top_sp has been decremented, the caller knows where the return value is,
relative to the current value of top_sp ; the caller therefore may use that value.

Variable length data on stack


 The run time memory management
management system must deal frequently with the allocation of
objects, the sizes of which are not known at the compile time, but which are local to a
procedure and thus may be allocated on the stack.
stack
Dixita Kagathara, CE Department | 170701 – Compiler Design 69
Unit 6 – Run Time Memory Management
 The same scheme works for objects of any type if they are local
local to the procedure called
have a size that depends on the parameter of the call.

Pointer to A

Pointer to B
Pointer to C

Array A

Array B
Array C

Control ink

Fig 6.5 Access to dynamically allocated arrays


Dangling Reference
 Whenever storage can be allocated, the problem of dangling reference arises. The dangling
reference occurs when there is a reference of storage that has been allocated.
 It is a logical error to use
u dangling reference, since, the value of dede-allocated storage is
undefined according to the semantics of most languages.
 Whenever storage can be allocated, the problem of dangling reference arises. The dangling
reference occurs when there is a reference of storage that has been allocated.
Heap allocation
 Stack allocation strategy cannot be used if either of the following is possible:
1. The value of local names must be retained when activation ends.
2. A called activation outlives the caller.
 Heap allocation parcels out pieces of contiguous storage, as needed for activation record or
other objects.
 Pieces may be de-allocated
allocated in any order, so over the time the heap will consist of alternate
areas that are free and in use.
use
 The record for an activation of procedure
procedure r is retained when the activation ends
ends.
 Therefore, the record for new activation q(1, 9) cannot follow that for s physically
physically.
 If the retained activation record for r is de-allocated,
de allocated, there will be free space in the heap
between the activation records forfo s and q.

Dixita Kagathara, CE Department | 170701 – Compiler Design 70


Unit 6 – Run Time Memory Management
Position in the Activation records in the Remarks
activation tree heap
Retained
S
activation record
S
for r
Control link
r q(1,9)
r
Control link

q(1,9)
Control link

Fig 6.6 Records for live activations


activations need not be adjacent in a heap

5. Parameter passing methods.


 There are two types of parameters,
parameters Formal parameters & Actual parameters
parameters.
 And based on these parameters there are various parameter passing methods, the
common methods are,
1. Call by value:
 This is the simplest method of parameter passing.
 The actual parameters are evaluated and their r-values
r values are passed to caller procedure.
 The operations on formal parameters do not change the values of a parameter.
 Example:: Languages like C, C++ use actual parameter passing method.
2. Call by reference :
 This method is also called as call by address or call by location.
location
 The L-value,
value, the address of actual parameter is passed to the called routines activation
record.
Call by value Call by reference
void main() void swap(int *, int *);
{ void main()
int x, y; {
printf("Enter the value of X & int x,y;
Y:"); printf("Enter the value of X & Y:”)
Y:”);
scanf("%d%d", &x, &y); scanf("%d%d", &x, &y);
swap(x, y); swap(&x, &y);
printf(“\nn Values inside the main printf(“\n
n Value inside the main
function”); function”);
printf(“\nn x=%d, y=%d”, x, y); printf("\n
n x=%d y=%d", x, y);
getch(); }
} void swap(int *x, int *y)
void swap(int x,int y) {
{ int temp;
Dixita Kagathara, CE Department | 170701 – Compiler Design 71
Unit 6 – Run Time Memory Management
int temp; temp=*x;
temp=x; *x=*y;
x=y; *y=temp;
y=temp; printf(“\n
n Value inside the swap
printf(“\n
n Values inside the swap function”);
function”); printf("\n
n x=%d y=%d", x, y);
printf("\n
n x=%d y=%d", x, y); }
}
Table 6.2
6. Code for call by value and call by reference
3. Copy restore:
 This method is a hybrid between call by value and call by reference. This method is also
known as copy-in-copy
copy-out or values result.
 The calling procedure calculates the value of actual parameter and it then copied to
activation record for the called procedure.
pr
 During execution of called procedure, the actual parameters value is not affected.
 If the actual parameter has L-value
L value then at return the value of formal parameter is copied
to actual parameter.
4. Call by name:
 This is less popular method of parameter passing.
 Procedure is treated like macro. The procedure body is substituted for call in caller with
actual parameters substituted for formals.
 The actual parameters can be surrounded by parenthesis to preserve their integrity.
 The local names of called procedure and names of calling procedure are distinct.

6. Block Structure and Non Block Structure Storage Allocation


 Storage allocation can be done for two types of data variables.
1. Local data
2. Non local data
 The local data can be handled using activation record whereas non local data can be
handled using scope information.
 The block structured storage allocation can done using static scope or lexical scope and
the non block structured storage allocation done using dynamic scope.
1. Local Data
 The local data can be accessed with the help of activation record.
 The offset relative to base pointer of an activation record points to local data variables
within a record, Hence
 Reference to any variable x in procedure = Base pointer pointing to start of procedure +
Offset of variable x from base pointer.
pointer
2. Access to non local names
 A procedure may sometimes refer to variables which are not local to it. Such variable are
called as non local variables. For the non local names there are two types of rules
that can be defined: static and dynamic.
dynamic
Dixita Kagathara, CE Department | 170701 – Compiler Design 72
Unit 6 – Run Time Memory Management
Static scope rule
 The static scope rule is also called as lexical scope. In this type the scope is determined by
examining the program text. PASCAL, C and ADA are the languages use the static scope
rule. These languages are also called as block structured language.
language
Dynamic scope rule
 For non block structured languages this dynamic scope allocation rules are used.
 The dynamic scope rule determines the scope of declaration of the names at run time by
considering the current activation.
 LISP and SNOBOL are the languages which use dynamic scope rule.

Used by Handling non Used by non


blocked label data blocked
structured
structured
language
language

Static scope or Dynamic scope


lexical scope
Deep access
Access link

Shallow access
Display

Fig 6.7 Access to non local data


7. What is symbol table? How characters of a name (identifiers) are
stored in symbol table?
 Definition: Symbol
ymbol table is a data structure used by compiler to keep track of semantics
of a variable. That means symbol table stores the information about scope and binding
information about names.
 Symbol table is built in lexical and syntax analysis phases.
Symbol table entries
 The items to bee stored into
in symbol table are:
1) Variable names
2) Constants
3) Procedure names
4) Function names
5) Literal constants and strings
6) Compiler generated temporaries
7) Labels in source language
 Compiler use following types of information from symbol table:
1) Data type
2) Name
3) Declaring procedure
4) Offset in storage

Dixita Kagathara, CE Department | 170701 – Compiler Design 73


Unit 6 – Run Time Memory Management
5) If structure or record then pointer to structure table
6) For parameters, whether parameter passing is by value or reference?
7) Number and type of arguments passed to the function
8) Base address
How to store names in symbol table? (IMP)
There are two types of representation:
1. Fixed length name
 A fixed space for each name is allocated in symbol table. In this type of storage if name is
too small then there is wastage of space.
 The name can be referred
referr by pointer to symbol table entry.
Name Attribute
c a I c u I a t e
s u m
a
b
Fig. 6.8 Fixed length name
2. Variable length name
 The amount of space required by string is used to store the names. The name can be stored
with the help of starting index and length of each name.
Name Attribute
Starting Length
index
0 10
10 4
14 2
16 2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
C a I c u I a t e $ s u m $ a $ b $
Fig. 6.9 Variable length name

8. Explain data structures for a symbol table.


1. List Data structure
 The amount of space required by string is used to store the names. The name can be stored
with the help of starting index and length of each name.
 Linear list is a simplest kind of mechanism to implement the symbol table.
 In this method an array is used to store names and associated information.
 New names can be added in the order as they arrive.
 The pointer 'available' is maintained at the end of all
all stored records. The list data structure

Dixita Kagathara, CE Department | 170701 – Compiler Design 74


Unit 6 – Run Time Memory Management
using array is given below:
Name 1 Info 1
Name 2 Info 2
Name 3 Info 3

Name n Info n
Fig. 6.10 List data structure
 To retrieve the information about some name we start from beginning of array and go on
searching up to available pointer. If we reach at pointer available without finding a name
we get an error "use of undeclared name".
 While inserting a new name we should ensure that it should not be already there. If it is
there another error occurs i.e. "Multiple
" defined Name".
 The advantage of list organization is that it takes minimum amount of space
space.
2. Self organizing list
 To retrieve the information about some name we start from beginning of array and go on
searching up to available pointer. If we reach at pointer available without finding a name
we get an error "use of undeclared name".
 This symbol table implementation is using linked list. A link field is added to each record.
 We search the records in the order pointed by the link of link field.
Name 1 Info 1
Name 2 Info 2
First Name 3 Info 3
Name 4 Info 4

Fig. 6.11 Self organizing list


 A pointer "First" is maintained to point to first record of the symbol table
table.
 The reference to these names can be Name 3, Name 1, Name 4, and Name 2.
 When the name is referenced or created it is moved to the front of the list.
 The most frequently referred names will tend to be front of the list. Hence time to most
referred names will be the least.
3. Binary tree
 The most frequently referred names will tend to be front of the list. Hence time to most
referred names will be the least.
 When the organization symbol table is by means of binary tree, the node structure will as
follows:
 The left child field stores the address of previous symbol.
 Right child field stores the address of next symbol. The symbol field is used to store the
Dixita Kagathara, CE Department | 170701 – Compiler Design 75
Unit 6 – Run Time Memory Management
name of the symbols.
 Information field is used to give information about the symbol.
 Binary tree structure is basically a binary search tree in which the value of left is always less
than the value of parent node. Similarly the value of right node is always more or greater
than the parent node.
4. Hash table
 Hashing is an important technique used to search the records of symbol table. This method
is superior to list organization.
 In hashing
ng scheme two tables are maintained-a
maintained a hash table and symbol table.
 The hash table consists of k entries from 0,1 to k-1.
k 1. These entries are basically pointers to
symbol table pointing to the names of symbol table.
 To determine whether the 'Name' is in symbol table, we use a hash function 'h' such that
h(name) will result any integer between 0 to k-1. k 1. We can search any name by
position=h(name).
 Using this position we can obtain the exact locations of name in symbol ttable.
 Hash function should result in uniform distribution of names in symbol.
 Hash function should be such that there will be minimum number of collision. Collision is
such situation where hash function results in same location for storing the names.
 Collision
sion resolution techniques are open addressing, chaining, rehashing.
 Advantage of hashing is quick search is possible and the disadvantage is that hashing is
complicated to implement. Some extra space is required. Obtaining scope of variables is
very difficult
cult to implement.

9. Dynamic Storage Allocation Techniques


There are two techniques used in dynamic memory allocation and those are -
 Explicit allocation
 Implicit allocation
1. Explicit Allocation
 The explicit allocation can be done for fixed size and variable sized blocks.
Explicit Allocation for Fixed Size Blocks
 This is the simplest technique of explicit allocation in which the size of the block for which
memory is allocated is fixed.
 In this technique a free list is used. Free list is a set of free blocks. This observed when we
want to allocate memory. If some memory is de-allocated de allocated then the free list gets
appended.
 The blocks are linked to each other in a list structure. The memory allocation can be done
by pointing previous node
node to the newly allocated block. Memory de de-allocation can be
done by de-referencing
referencing the previous link.
 The pointer which points to first block of memory is called Available.
 This memory allocation and de-allocation
de allocation is done using heap memory.

Dixita Kagathara, CE Department | 170701 – Compiler Design 76


Unit 6 – Run Time Memory Management
Available

10 20 30 40 50 60

Available

10 20 30 40 50 60

Fig. 6.12 De-allocate block ‘30’

 The explicit allocation consists of taking a block off the list and de-allocation
allocation consist of
putting the block back on the list.
 The advantage of this technique is that there is no space overhead.
Explicit Allocation of Variable Sized Blocks
 Due to frequent memory allocation and de-allocation
de allocation the heap memory becomes
fragmented. That means heap may consist of some blocks that are free and some that
are allocated.
Free Free Free

Allocated Memory
Fig. 6.13 Heap Memory
 In Fig. a fragmented heap memory is shown. Suppose a list of 7 blocks gets allocated and
second, fourth and sixth block is de-allocated
de then fragmentation occurs.
 Thus we get variable sized blocks that are available free. For allocating variable sized
blocks some strategies such as first fit, worst fit and best fit are used.
 Sometimes all thee free blocks are collected together to form a large free block. This
ultimately avoids the problem of fragmentation.
2. Implicit Allocation
 The implicit allocation is performed using user program and runtime packages.
 The run time package is required to know when the storage block is not in useuse.
 The format of storage block is as shown in Fig 6.14.
Block size

Reference count

Mark

Pointers to block

User Data

Fig. 6.14 Block format


 There are two approaches used for implicit allocation.

Reference count:
 Reference count is a special counter used during implicit memory allocation. If any block is
Dixita Kagathara, CE Department | 170701 – Compiler Design 77
Unit 6 – Run Time Memory Management
referred by some another block then its reference count incremented by one. That also
means if the reference count of particular block drops down to 0 then, that means that
block is not referenced one and hence it can be de-allocated.
allocated. Reference counts are best
used when pointers between blocks never appear in cycle.
Marking techniques:
 This is an alternative approach to determine whether the block is in use or not. In this
method, the user program is suspended temporarily and frozen pointers are used to mark
the blocks that are in use. Sometime bitmaps are used. These pointers are then placed in
the heap memory. Again we go through hear memory and mark mark those blocks which are
unused.
 There is one more technique called compaction in which all the used blocks are moved at
the one end of heap memory, so that all the free blocks are available in one large free
block.

Dixita Kagathara, CE Department | 170701 – Compiler Design 78


Unit 8 – Code Generation

1. Role of code generator.


 The final phase of compilation process is code generation.
 It takes an intermediate representation of the source program as input and produces an
equivalent target program as output.
Intermediate Intermediate
code code
Source Front end Code Code Target
program optimizer generator program

Fig 8.1 Position of code generator in compilation


 Target code should have following property,
1. Correctness
2. High quality
3. Efficient use of resources of target code
4. Quick code generation

2. Issues in the design of code generation.


Issues in design of code generator are:
1. Input to the Code Generator
 Input to the code generator consists of the intermediate representation of the source
program.
 There are several types for the intermediate language, such as postfix notation,
quadruples, and syntax trees or DAGs.
 The detection of semantic error should be done before submitting
submitting the input to the code
generator.
 The code generation phase requires
require complete error free intermediate code as an input.
2. Target program
 The output of the code generator is the target program. The output may take on a
variety of forms; absolute machine language,
language, relocatable machine language, or assembly
language.
 Producing an absolute machine language program as output has the advantage that it
can be placed in a location in memory and immediately executed.
 Producing a relocatable machine language program as output is that the subroutine can
be compiled separately. A set of relocatable object modules can be linked together and
loaded for execution by a linking loader.
 Producing an assembly language program as output makes the process of code
generation somewhat
mewhat easier .We can generate symbolic instructions and use the macro
facilities of the assembler to help generate code.
code
3. Memory management
 Mapping names in the source program to addresses of data objects in run time memory
is done cooperatively by the front end and the code generator.
 We assume that a name in a three-address
three address statement refers to a symbol table entry for
the name.

Dixita Kagathara, CE Department | 170701 – Compiler Design 88


Unit 8 – Code Generation
 From the symbol bol table information, a relative address can be determined for the name
in a data area.
4. Instruction selection
 If we do not care about the efficiency of the target program, instruction selection is
straightforward. It requires special handling. For example,
example, the sequence of statements
a := b + c
d := a + e
would be translated into
MOV b, R0
ADD c, R0
MOV R0, a
MOV a, R0
ADD e, R0
MOV R0, d
 Here the fourth statement is redundant, so we can eliminate that statement.
5. Register allocation
 If the instruction contains register operands then such a use becomes shorter and faster
than that of using in memory.
 The use of registers is often subdivided into two sub problems:
 During register allocation, we select the set of variables that will reside in registers at a
point in the program.
 During a subsequent register assignment phase, we pick the specific register that a
variable will reside in.
 Finding an optimal assignment of registers to variables is difficult, even with single
register value.
 Mathematically the problem is NP-complete.
NP
6. Choice of evaluation
 The order in which computations are performed can affect the efficiency of the target
code. Some computation orders require fewer registers to hold intermediate results
than others. Picking a best order is another difficult, NP-complete
NP complete problem.
7. Approaches to code generation
 The most important criterion for a code generator is that it produces correct code.
 Correctness takes on special significance because of the number of special cases that
code generator must face.
 Given the premium on correctness, designing a code generator so it can be easily
implemented, tested, and maintained is an important design goal.

3. The target machine and instruction cost.


 Familiarity with the target machine and its instruction set is a prerequisite for designing
a good code generator.
 We will assume our target computer models a three-address
three address machine with load and
store operations, computation operations,
operations, jump operations, and conditional jumps. The

Dixita Kagathara, CE Department | 170701 – Compiler Design 89


Unit 8 – Code Generation
underlying computer is a byte-addressable
byte machine with n general-purpose
purpose registers, R0,
R1, . . . , Rn
 The two address instruction of the form op source, destination
 It has following opcodes,
MOV (move source to destination)
ADD (add source to destination)
SUB (subtract source to destination)
 The
he address modes together with the assembly language forms and associated cost as
follows:
Mode Form Address Extra
cost
Absolute M M 1
Register R R 0
Indexed k(R) k +contents(R) 1
Indirect register *R contents(R) 0
Indirect *k(R) contents(k + contents(R)) 1
indexed
Table 8.1 Addressing modes
Instruction cost:
 The instruction cost can be computed as one plus cost associated with the source and
destination addressing modes given by “extra cost”.
 Calculate cost for following:
MOV B,R0
ADD C,R0
MOV R0,A
Instruction cost,
MOV B,R0costcost = 1+1+0=2
ADD C,R0cost
cost = 1+1+0=2
MOV R0,Acostcost = 1+0+1=2

Total cost=6

4. Basic Blocks.
 A basic block is a sequence of consecutive statements in which flow of control enters at
the beginning and leaves at the end without halt or possibility of branching except at the
end.
 The following sequence of three-address
three address statements forms a basic block:
t1 := a*a
t2 := a*b
t3 := 2*t2
t4 := t1+t3
t5 := b*b

Dixita Kagathara, CE Department | 170701 – Compiler Design 90


Unit 8 – Code Generation
t6 := t4+t5
 Some terminology used in basic blocks are given below:
 A three-address
address statement x:=y+z
x is said to define x and to use y or z. A name in a basic
block is said to be live at a given point if its value is used after that point in the program,
perhaps in another basic block.
 The following algorithm can be used to partition a sequence of three three-address
statements into basic blocks.
Algorithm: Partition into basic blocks.
Input: A sequence of three-address
thre statements.
Output: A list of basic blocks with each three-address
three address statement in exactly one block.
Method:
1. We first determine the set of leaders,, for that we use the following rules:
i) The first statement is a leader.
leader
ii) Any statement that is the target of a conditional or unconditional goto is a leader.
iii) Any statement that immediately follows a goto or conditional goto statement is a
leader.
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.
Example: Program to compute dot product
begin
prod := 0;
i := 1;
do
prod := prod + a[t1] * b[t2];
i := i+1;
while i<= 20
end
Three address code for the above program,
(1) prod := 0
(2) i := 1
(3) t1 := 4*i
(4) t2 := a [t1]
(5) t3 := 4*i
(6) t4 :=b [t3]
(7) t5 := t2*t4
(8) t6 := prod +t5
(9) prod := t6
(10) t7 := i+1
(11) i := t7
(12) if i<=20 goto (3)
 Let us apply an algorithm
lgorithm to the three-address
three address code to determine its basic blocks.
 Statement (1) is a leader by rule (I) and statement (3) is a leader by rule (II), since the last
statement can jump to it.
Dixita Kagathara, CE Department | 170701 – Compiler Design 91
Unit 8 – Code Generation
 Therefore, statements (1) and (2) form a basic block.
 The remainder of the program beginning with statement (3) forms a second basic block.

5. Transformations on basic block


 A number of transformations can be applied to a basic block without changing the set of
expressions computed by the block.
 Many of these transformations are useful for improving the quality of the code.
 There are two important classes of local transformations that can be applied to basic
block. These are,
1. Structure preserving transformation.
2. Algebraic transformation.
1. Structure
tructure Preserving Transformations
The primary structure-preserving
structure transformations
sformations on basic blocks are,
A. Common sub-expression
expression elimination.
 Consider the basic block,
block
a:= b+c
b:= a-d
c:= b+c
d:= a-d
 The second and fourth statements compute the same expression, hence this
basic block may be transformed into the equivalent block
a:= b+c
b:= a-d
c:= b+c
d:= b
 Although the 1st and 3rd statements in both cases appear to have the same
expression on the right, the second statement redefines b. Therefore, the value
of b in the 3rd statement is different from the value of b in the 1st, and the 1st and
3rd statements do not compute the same expression.
B. Dead-code
code elimination.
 Suppose x is dead, that is, never subsequently used, at the point where the
statement x:=y+z appears in a basic block. Then this statement may be safely
removed without changing the value of the basic block.
C. Renaming of temporary
emporary variables.
 Suppose we have a statement t:=b+c, where t is a temporary. If we change this
statement to u:= b+c, where u is a new temporary variable, and change all uses
of this instance of t to u, then the value of the basic block is not changed.
 In fact, we can always transform a basic block into an equivalent block in which
each statement that defines a temporary defines a new temporary. We call such
a basic block a normal-form block.
D. Interchange of two independent adjacent statements.
 Suppose we have a block with the two adjacent statements,
statements

Dixita Kagathara, CE Department | 170701 – Compiler Design 92


Unit 8 – Code Generation
t1:= b+c
t2:= x+y
 Then we can interchange the two statements without affecting the value of the
block if and only if neither x nor y is t1 and neither b nor c is t2. A normal
normal-form
basic block permits all statement
statement interchanges that are possible.
2. Algebraic transformation
 Countless algebraic
lgebraic transformation can be used to change the set of expressions
computed by the basic block into an algebraically equivalent set.
 The useful ones are those that simplify expressions or replace expensive operation
operations
by cheaper one.
 Example: x=x+0 or x=x+1 can be eliminated.
3. Flow graph
 A graph representation of three-address
three statements, called a flow graph
graph, is useful
for understanding code-generation
code algorithms.
 Nodes in the flow graph represent computations, and the edges represent the flow
of control.
 Example of flow graph for following three address code,
(1) prod=0
(2) i=1 Prod=0 B1
(3) t1 := 4*i
i=1
(4) t2 := a [ t1 ]
(5) t3 := 4*i
(6) t4 :=b [ t3 ] t1 := 4*i B2
(7) t5 := t2*t4 t2 := a [ t1 ]
(8) t6 := prod +t5 t3 := 4*i
(9) prod := t6 t4 :=b [ t3 ]
(10) t7 := i+1 t5 := t2*t4
(11) i := t7 t6 := prod +t5
(12) if i<=20 goto (3) prod := t6
t7 := i+1
i := t7
if i<=20 goto B2
Fig 8.2 flow graph

6. Next-Use
Use information.
 The next-use
use information is a collection of all the names that are useful for next
subsequent statement in a block. The use of a name is defined as follows,
 Consider a statement,
x := i
j := x op y
 That means the statement j uses value of x.
 The next-use
use information can be collected by making the backward scan of the
programming code in that specific block.
Dixita Kagathara, CE Department | 170701 – Compiler Design 93
Unit 8 – Code Generation
Storage for Temporary Names
 For the distinct names each time a temporary is needed. And each time a space gets
allocated for each temporary. To have optimization in the process of code generation
we pack two temporaries into the same location if they are not live simultaneously
simultaneously.
 Consider three address code as,
t1 := a * a t1 := a * a
t2 := a * b t2 := a * b
t3 := 4 * t2 t2 := 4 * t2
t4 := t1+t3 t1 := t1+t2
t5 := b * b t2 := b * b
:
t6 = t4+t5 t1 := t1+t2

7. Register and address descriptors.


descriptors
 The code generator algorithm uses descriptors to keep track of register con contents and
addresses for names.
 Address descriptor stores the location where the current value of the name can be
found at run time. The information about locations can be stored in the symbol table
and is used to access the variables.
 Register descriptor is used to keep track of what is currently in each register. The
register
gister descriptor shows that initially all the registers are empty. As tthe generation for
the block progresses the registers will hold the values of computation.

8. Register allocation and assignment.


 Efficient utilization of registers is important in generating good code.
 There are four strategies for deciding what values in a program should reside in a
registers and which register each value should reside. Strategies are,
1. Global register allocation
 Following are the strategies adopted while doing the global register allocation.
 The global register allocation has a strategy of storing the most frequently used
variables in fixed registers throughout the loop.
 Another strategy is to assign some some fixed number of global registers to hold the most
active values in each inner loop.
 The registers are not already allocated may be used to hold values local to one
block.
 In certain languages like C or Bliss programmer can do the register allocation by
using register declaration to keep certain values in register for the duration of the
procedure.
2. Usage count
 The usage count is the count for the use of some variable x in some register used in
any basic block.

Dixita Kagathara, CE Department | 170701 – Compiler Design 94


Unit 8 – Code Generation
 The usage count gives the idea about how many units of cost can be saved by
selecting a specific variable for global register allocation.
 The approximate formula for usage count for the Loop L in some basic block B can
be given as,
∑ (use(x,B) + 2* live(x,B))
liv
block B in L
 Where
here use(x,B) is number of times x used in block B prior to any definition of x
 live(x,B) =1 if x is live on exit from B; otherwise live(x)=0.
3. Register assignment for outer loop
 Consider that there are two loops L1 is outer loop and L2 is an inner loop, and
allocation of variable a is to be done to some register. The approximate scenario is as
given below, Loop L1

Loop L2 L1-L2

Fig 8.3 Loop represenation


Following criteria should be adopted for register assignment for outer loop,
 If a is allocated in loop L2 then it should not be allocated in L1 - L2.
 If a is allocated in L1 and it is not allocated in L2 then store a on entrance to L2 and
load a while leaving L2.
 If a is allocated in L2 and not
not in L1 then load a on entrance of L2 and store a on exit
from L2.
4. Register allocation for graph coloring
The graph coloring works in two passes. The working is as given below,
 In the first pass the specific machine instruction is selected for register aallocation. For
each variable a symbolic register is allocated.
 In the second pass the register inference graph is prepared. In register inference
graph each node is a symbolic registers and an edge connects two nodes where one
is live at a point where other
othe is defined.
 Then a graph coloring technique is applied for this register inference graph using kk-
color. The k-colors
colors can be assumed to be number of assignable registers. In graph
coloring technique no two adjacent nodes can have same color. Hence in regi register
inference graph using such graph coloring principle each node (actually a variable) is
assigned the symbolic registers so that no two symbolic registers can interfere with
each other with assigned physical registers.

9. DAG representation of basic blocks.


 The directed acyclic graph is used to apply transformations on the basic block.
Dixita Kagathara, CE Department | 170701 – Compiler Design 95
Unit 8 – Code Generation
 A DAG gives a picture of how the value computed by each statement in a basic block
used in a subsequent statements of the block.
 To apply the transformations on basic block a DAG is constructed from three address
statement.
 A DAG can be constructed for the following type of labels on nodes,
1. Leaf nodes are labeled by identifiers or variable names or constants. Generally
leaves represent r-values.
r
2. Interior nodes store operator values.
3. Nodes are also optionally given a sequence of identifiers for label.
 The DAG and flow graphs are two different pictorial representations. Each node of the
flow graph can be represented by DAG because each node of the flow graph is a basi
basic
block.
Example:
sum = 0;
for (i=0;i< = 10;i++)
sum = sum+a[t1];
Solution :
The three address code for above code is
1. sum :=0 B1
2. i:=0
3. t1 := 4*i
4. t2:= a[t1]
5. t3 := sum+t2
B2
6. sum := t3
7. t4 := i+1;
8. i:= t4
9. if i<=10 goto (3)

+
<=
[]
sum

* + 10
a

4 i 1
Fig 8.4 DAG for block B2
Algorithm for Construction of DAG
 We assume the three address statement could of following types,
types
Case (i) x:=y op z
Case (ii)x:=op y
Case (iii) x:=y

Dixita Kagathara, CE Department | 170701 – Compiler Design 96


Unit 8 – Code Generation
 With the help of following steps
step the DAG can be constructed.
Step 1:: If y is undefined then create node(y). Similarly if z is undefined create a
node(z)
Step 2: For the case(i)) create a node(op) whose left child is node(y) and node(z) will be
the right child. Also check for any common sub expressions. For the case(ii)
determine whether is a node labeled op,such node will have a child node(y).
In case(iii) node n win be node(y).
node(
Step 3: Delete x from list of identifiers for node(x). Append x to the list of attached
identifiers for node n found in 2.
Applications of DAG
The DAGs are used in,
1. Determining the common sub-expressions.
sub
2. Determining which names are used inside the block and computed outside the block.
3. Determining which statements of the block could have their computed value outside
the block.
4. Simplifying the list of quadruples by eliminating
eliminatin the common sub-expressions
expressions and not
performing the assignment of the form x:=y unless and until it is a must.
10. Generating code from DAGs.
 Methods generating code from DAG as shown in Figure
Figu 8.5.
Code generation from
Code generation from
DAG
DAG

Rearranging Order Heuristic Ordering Labeling Algorithm

Fig. 8.5.
8. Methods to generate code from DAG
1. Rearranging Order
 The order of three address code affects the cost of the object code being generated. In
the senses that by changing the order in which computations are done we can obtain
the object code with minimum cost.
 Consider the following code,
t1:=a+b
t2:=c+d
t3:=e-t2
t4:=t1-t3
 The code can be generated by translating the three address code line by line.
MOV a, R0
ADD b, R0
MOV c, R1
ADD d, R1
MOV R0, t1
MOV e, R0
Dixita Kagathara, CE Department | 170701 – Compiler Design 97
Unit 8 – Code Generation
SUB R1, R0
MOV t1, R1
SUB R0, R1
MOV R1, t4
 Now if we change the sequence of the above three address code.
t2:=c+d
t3:=e-t2
t1:=a+b
t4:=t1-t3
 Then we can get improved code as
MOV c, R0
ADD d, R0
MOV e, R1
SUB R0, R1
MOV a, R0
ADD b, R0
SUB R1, R0
MOV R0, t4
2. Heuristic ordering
 The heuristic ordering algorithm is as follows:
follows
1. Obtain all the interior nodes. Consider these interior nodes as unlisted nodes
nodes.
2. while( unlisted interior nodes remain)
3. {
4. pick up an unlisted node n, whose parents have been listed
5. list n;
6. while(the leftmost child m of n has no unlisted parent AND is not leaf
7. { 1

8. List m; *
2
9. n=m; 3

+
10. } 4
-

11. } *
5
8
+
/

6 7 11 12

- c d e

9 10

a
b
Fig 8.6 A DAG

Dixita Kagathara, CE Department | 170701 – Compiler Design 98


Unit 8 – Code Generation
 The DAG is first numbered from top to bottom and from left to right. Then consider the
unlisted interior nodes 1 2 3 4 5 6 8.
 Initially the only node with unlisted parent is 1.( Set n=1 by line 4 of algorithm)
 Now left argument of 1 is 2 and parent of 2 is 1 which is listed. Hence list 2. Set n=2 by
line 7) of algorithm
1234568
 Now we will find the leftmost node of 2 and that is 6. But 6 has unlisted parent 5. Hence
we cannot select 6.
 We therefore can switch to 3. The parent of 3 is 1 which is listed one. Hence list 3 set
n=3
1234568
 The left of 3 is 4. As parent of 4 is 3 and that is listed hence list 4. Left of 4 is 5 which has
listed parent (i.e. 4) hence list 5. Similarly
Sim list 6
1234568
 As now only 8 is remaining from the unlisted interior nodes we will list it.
 Hence the resulting list is 1 2 3 4 5 6 8.
 Then the order of computation is decided by reversing this list.
 We get the order of evaluation as a 8 6 5 4 3 2 1.
 That also means that we have to perform the computations at these nodes in the given
order,
T8=d/e
T6=a-b
T5=t6+c
T4=t5*t8
T3=t4-e
T2=t6+t4
T1=t2*t3
3. Labeling algorithm
 The labeling algorithm generates the optimal code for given expression in which
minimum registers are required.
 Using labeling algorithm the labeling can be done to tree by visiting nodes in bottom up
order.
 By this all the child nodes will be labeled its parent nodes.
 For computing the label at node n with the label L1 to left child and label L2 to right child
as,
Label (n) = max(L1,L2)
max( if L1 not equal to L2
Label(n) = L1+1 if L1=L2
 We start in bottom-up up fashion and label left leaf as 1 and right leaf
leaf as 0.

Dixita Kagathara, CE Department | 170701 – Compiler Design 99


Unit 8 – Code Generation
2
t4

1 2
t1 t3

a b e t2 1

1 0 1
c d

1 0
Fig 8.6 Labeled tree

Dixita Kagathara, CE Department | 170701 – Compiler Design 100

You might also like