Run Time Storage Management:: Unit-Iv

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

UNIT-IV

RUN TIME STORAGE MANAGEMENT:


To study the run-time storage management system it is sufficient to focus on the statements:
action, call, return and halt, because they by themselves give us sufficient insight into the
behavior shown by functions in calling each other and returning.

And the run-time allocation and de-allocation of activations occur on the call of functions and
when they return.

There are mainly two kinds of run-time allocation systems: Static allocation and Stack
Allocation. While static allocation is used by the FORTRAN class of languages, stack allocation
is used by the Ada class of languages.
STATIC ALLOCATION: In this, A call statement is implemented by a sequence of two
instructions.

 A move instruction saves the return address


 A goto transfers control to the target code.

The instruction sequence is

MOV #here+20, callee.static-area

GOTO callee.code-area

callee.static-area and callee.code-area are constants referring to address of the activation record
and the first address of called procedure respectively.

. #here+20 in the move instruction is the return address; the address of the instruction following
the goto instruction

. A return from procedure callee is implemented by

GOTO *callee.static-area

For the call statement, we need to save the return address somewhere and then jump to
the location of the callee function. And to return from a function, we have to access the return
address as stored by its caller, and then jump to it. So for call, we first say: MOV #here+20,
callee.static-area. Here, #here refers to the location of the current MOV instruction, and
callee.static- area is a fixed location in memory. 20 is added to #here here, as the code
corresponding to the call instruction takes 20 bytes (at 4 bytes for each parameter: 4*3 for this
instruction, and 8 for the next). Then we say GOTO callee. code-area, to take us to the code of
the callee, as callee.codearea is merely the address where the code of the callee starts. Then a
return from the callee is implemented by: GOTO *callee.static area. Note that this works only
because callee.static-area is a constant.

Example:

. Assume each 100: ACTION-l


action 120: MOV 140, 364
block takes 20 132: GOTO 200
bytes of space 140: ACTION-2
.Start address 160: HALT
of code for c :
and p is 200: ACTION-3
100 and 200 220: GOTO *364
. The activation :
Records 300:
arestatically 304:
allocated starting :
at addresses 364:
300 and 364. 368:

This example corresponds to the code shown in slide 57. Statically we say that the code
for c starts at 100 and that for p starts at 200. At some point, c calls p. Using the strategy
discussed earlier, and assuming that callee.staticarea is at the memory location 364, we get the
code as given. Here we assume that a call to 'action' corresponds to a single machine instruction
which takes 20 bytes.

STACK ALLOCATION : . Position of the activation record is not known until run time

 . Position is stored in a register at run time, and words in the record are accessed with an
offset from the register
 . The code for the first procedure initializes the stack by setting up SP to the start of the
stack area

MOV #Stackstart, SP

code for the first procedure

HALT

In stack allocation we do not need to know the position of the activation record until run-
time. This gives us an advantage over static allocation, as we can have recursion. So this is used
in many modern programming languages like C, Ada, etc. The positions of the activations are
stored in the stack area, and the position for the most recent activation is pointed to by the stack
pointer. Words in a record are accessed with an offset from the register. The code for the first
procedure initializes the stack by setting up SP to the stack area by the following command:
MOV #Stackstart, SP. Here, #Stackstart is the location in memory where the stack starts.

A procedure call sequence increments SP, saves the return address and transfers control to the
called procedure

ADD #caller.recordsize, SP

MOVE #here+ 16, *SP

GOTO callee.code_area
Consider the situation when a function (caller) calls the another function(callee), then
procedure call sequence increments SP by the caller record size, saves the return address and
transfers control to the callee by jumping to its code area. In the MOV instruction here, we only
need to add 16, as SP is a register, and so no space is needed to store *SP. The activations keep
getting pushed on the stack, so #caller.recordsize needs to be added to SP, to update the value of
SP to its new value. This works as #caller.recordsize is a constant for a function, regardless of
the particular activation being referred to.

DATA STRUCTURES: Following data structures are used to implement symbol tables

LIST DATA STRUCTURE : Could be an array based or pointer based list. But this
implementation is

- Simplest to implement
- Use a single array to store names and information
- Search for a name is linear
- Entry and lookup are independent operations
- Cost of entry and search operations are very high and lot of time goes into book keeping

Hash table: Hash table is a data structure which gives O(1) performance in accessing any
element of it. It uses the features of both array and pointer based lists.

- The advantages are obvious

REPRESENTING SCOPE INFORMATION

The entries in the symbol table are for declaration of names. When an occurrence of a name in
the source text is looked up in the symbol table, the entry for the appropriate declaration,
according to the scoping rules of the language, must be returned. A simple approach is to
maintain a separate symbol table for each scope.

Most closely nested scope rules can be implemented by adapting the data structures
discussed in the previous section. Each procedure is assigned a unique number. If the language is
block-structured, the blocks must also be assigned unique numbers. The name is represented as a
pair of a number and a name. This new name is added to the symbol table. Most scope rules can
be implemented in terms of following operations:

a) Lookup - find the most recently created entry.


b) Insert - make a new entry.
c) Delete - remove the most recently created entry.
d) Symbol table structure
e) . Assign variables to storage classes that prescribe scope, visibility, and lifetime
f) - scope rules prescribe the symbol table structure
g) - scope: unit of static program structure with one or more variable declarations
h) - scope may be nested
i) . Pascal: procedures are scoping units
j) . C: blocks, functions, files are scoping units
k) . Visibility, lifetimes, global variables
l) . Common (in Fortran)
m) . Automatic or stack storage
n) . Static variables
o) storage class : A storage class is an extra keyword at the beginning of a declaration
which modifies the declaration in some way. Generally, the storage class (if any) is the
first word in the declaration, preceding the type name. Ex. static, extern etc.
p) Scope: The scope of a variable is simply the part of the program where it may be
accessed or written. It is the part of the program where the variable's name may be used.
If a variable is declared within a function, it is local to that function. Variables of the
same name may be declared and used within other functions without any conflicts. For
instance,
q) int fun1()
{
int a;
int b;
....
}

int fun2()
{
int a;
int c;
....
}
Visibility: The visibility of a variable determines how much of the rest of the program
can access that variable. You can arrange that a variable is visible only within one part of
one function, or in one function, or in one source file, or anywhere in the program.
r) Local and Global variables: A variable declared within the braces {} of a function is
visible only within that function; variables declared within functions are called local
variables. On the other hand, a variable declared outside of any function is a global
variable , and it is potentially visible anywhere within the program.

s) Automatic Vs Static duration: How long do variables last? By default, local variables
(those declared within a function) have automatic duration: they spring into existence
when the function is called, and they (and their values) disappear when the function
returns. Global variables, on the other hand, have static duration: they last, and the values
stored in them persist, for as long as the program does. (Of course, the values can in
general still be overwritten, so they don't necessarily persist forever.) By default, local
variables have automatic duration. To give them static duration (so that, instead of
coming and going as the function is called, they persist for as long as the function does),
you precede their declaration with the static keyword: static int i; By default, a
declaration of a global variable (especially if it specifies an initial value) is the defining
instance. To make it an external declaration, of a variable which is defined somewhere
else, you precede it with the keyword extern: extern int j; Finally, to arrange that a global
variable is visible only within its containing source file, you precede it with the static
keyword: static int k; Notice that the static keyword can do two different things: it adjusts
the duration of a local variable from automatic to static, or it adjusts the visibility of a
global variable from truly global to private-to-the-file.
t) Symbol attributes and symbol table entries
u) Symbols have associated attributes
v) Typical attributes are name, type, scope, size, addressing mode etc.
w) A symbol table entry collects together attributes such that they can be easily set and
retrieved
x) Example of typical names in symbol table

Name Type
name character string
class enumeration
size integer
type enumeration

LOCAL SYMBOL TABLE MANAGEMENT :

Following are prototypes of typical function declarations used for managing local symbol table.
The right hand side of the arrows is the output of the procedure and the left side has the input.

NewSymTab : SymTab SymTab


DestSymTab : SymTab SymTab
InsertSym : SymTab X Symbol boolean
LocateSym : SymTab X Symbol boolean
GetSymAttr : SymTab X Symbol X Attr boolean
SetSymAttr : SymTab X Symbol X Attr X value boolean
NextSym : SymTab X Symbol Symbol
MoreSyms : SymTab X Symbol boolean
A major consideration in designing a symbol table is that insertion and retrieval should be as fast
as possible
. One dimensional table: search is very slow

. Balanced binary tree: quick insertion, searching and retrieval; extra work required to keep the
tree balanced

. Hash tables: quick insertion, searching and retrieval; extra work to compute hash keys

. Hashing with a chain of entries is generally a good approach

A major consideration in designing a symbol table is that insertion and retrieval should be
as fast as possible. We talked about the one dimensional and hash tables a few slides back. Apart
from these balanced binary trees can be used too. Hashing is the most common approach.

HASHED LOCAL SYMBOL TABLE

Hash tables can clearly implement 'lookup' and 'insert' operations. For implementing the
'delete', we do not want to scan the entire hash table looking for lists containing entries to be
deleted. Each entry should have two links:

a) A hash link that chains the entry to other entries whose names hash to the same value - the
usual link in the hash table.
b) A scope link that chains all entries in the same scope - an extra link. If the scope link is left
undisturbed when an entry is deleted from the hash table, then the chain formed by the scope
links will constitute an inactive symbol table for the scope in question.

Nesting structure of an example Pascal program

Look at the nesting structure of this program. Variables a, b and c appear in global as
well as local scopes. Local scope of a variable overrides the global scope of the other variable
with the same name within its own scope. The next slide will show the global as well as the local
symbol tables for this structure. Here procedure I and h lie within the scope of g ( are nested
within g).

GLOBAL SYMBOL TABLE STRUCTURE The global symbol table will be a collection of
symbol tables connected with pointers.

. Scope and visibility rules


determine the structure of
global symbol table

. For ALGOL class of


languages scoping rules
structure the symbol table as
tree of local tables

- Global scope as root

- Tables for nested scope as


children of the table for the
scope they are nested in
The exact structure will be determined by the scope and visibility rules of the
language.The global symbol table will be a collection of symbol tables connected with pointers.
The exact structure will be determined by the scope and visibility rules of the language.
Whenever a new scope is encountered a new symbol table is created. This new table contains a
pointer back to the enclosing scope's symbol table and the enclosing one also contains a pointer
to this new symbol table. Any variable used inside the new scope should either be present in its
own symbol table or inside the enclosing scope's symbol table and all the way up to the root
symbol table. A sample global symbol table is shown in the below figure.

BLOCK STRUCTURES AND NON BLOCK STRUCTURE STORAGE ALLOCATION

Storage binding and symbolic registers : Translates variable names into addresses and the
process must occur before or during code generation

- . Each variable is assigned an address or addressing method


- . Each variable is assigned an offset with respect to base which changes with every
invocation
- . Variables fall in four classes: global, global static, stack, local (non-stack) static
- The variable names have to be translated into addresses before or during code generation.
There is a base address and every name is given an offset with respect to this base which changes
with every invocation. The variables can be divided into four categories:

a) Global Variables : fixed relocatable address or offset with respect to base as global pointer

b) Global Static Variables : .Global variables, on the other hand, have static duration (hence
also called static variables): they last, and the values stored in them persist, for as long as the
program does. (Of course, the values can in general still be overwritten, so they don't necessarily
persist forever.) Therefore they have fixed relocatable address or offset with respect to base as
global pointer.

c) Stack Variables : allocate stack/global in registers and registers are not indexable, therefore,
arrays cannot be in registers

. Assign symbolic registers to scalar variables

. Used for graph coloring for global register allocation

d) Stack Static Variables : By default, local variables (stack variables) (those declared within a
function) have automatic duration: they spring into existence when the function is called, and
they (and their values) disappear when the function returns. This is why they are stored in stacks
and have offset from stack/frame pointer.

Register allocation is usually done for global variables. Since registers are not indexable,
therefore, arrays cannot be in registers as they are indexed data structures. Graph coloring is a
simple technique for allocating register and minimizing register spills that works well in practice.
Register spills occur when a register is needed for a computation but all available registers are in
use. The contents of one of the registers must be stored in memory to free it up for immediate
use. We assign symbolic registers to scalar variables which are used in the graph coloring.
Local Variables in Frame

 Assign to consecutive locations; allow enough space for each


 May put word size object in half word boundaries
 Requires two half word loads
 Requires shift, or, and
 Align on double word boundaries
 Wastes space
 And Machine may allow small offsets

word boundaries - the most significant byte of the object must be located at an address whose
two least significant bits are zero relative to the frame pointer

half-word boundaries - the most significant byte of the object being located at an address
whose least significant bit is zero relative to the frame pointer .

Sort variables by the alignment they need

- Store largest variables first


- Utomatically aligns all the variables
- Does not require padding
- Store smallest variables first
- Requires more space (padding)
- For large stack frame makes more variables accessible with small offsets

While allocating memory to the variables, sort variables by the alignment they need. You may:

Store largest variables first: It automatically aligns all the variables and does not require padding
since the next variable's memory allocation starts at the end of that of the earlier variable
. Store smallest variables first: It requires more space (padding) since you have to accommodate
for the biggest possible length of any variable data structure. The advantage is that for large stack
frame, more variables become accessible within small offsets

How to store large local data structures? Because they Requires large space in local frames
and therefore large offsets

- If large object is put near the boundary other objects require large offset either from fp (if
put near beginning) or sp (if put near end)
- Allocate another base register to access large objects
- Allocate space in the middle or elsewhere; store pointer to these locations from at a small
offset from fp
- Requires extra loads

Large local data structures require large space in local frames and therefore large offsets.
As told in the previous slide's notes, if large objects are put near the boundary then the other
objects require large offset. You can either allocate another base register to access large objects
or you can allocate space in the middle or elsewhere and then store pointers to these locations
starting from at a small offset from the frame pointer, fp.

In the unsorted allocation you can see the waste of space in green. In sorted frame there is no
waste of space.

STORAGE ALLOCATION FOR ARRAYS


Elements of an array are stored in a block of consecutive locations. For a single dimensional
array, if low is the lower bound of the index and base is the relative address of the storage
allocated to the array i.e., the relative address of A[low], then the i th Elements of an array are
stored in a block of consecutive locations

For a single dimensional array, if low is the lower bound of the index and base is the
relative address of the storage allocated to the array i.e., the relative address of A[low], then the i
th elements begins at the location: base + (I - low)* w . This expression can be reorganized as
i*w + (base -low*w) . The sub-expression base-low*w is calculated and stored in the symbol
table at compile time when the array declaration is processed, so that the relative address of A[i]
can be obtained by just adding i*w to it.

- Addressing Array Elements


- Arrays are stored in a block of consecutive locations
- Assume width of each element is w
- ith element of array A begins in location base + (i - low) x w where base is relative
address of A[low]
- The expression is equivalent to
- i x w + (base-low x w)

i x w + const

2-DIMENSIONAL ARRAY: For a row major two dimensional array the address of A[i][j]
can be calculated by the formula :

base + ((i-lowi )*n2 +j - lowj )*w where low i and lowj are lower values of I and j and n2 is
number of values j can take i.e. n2 = high2 - low2 + 1.

This can again be written as :

((i * n2) + j) *w + (base - ((lowi *n2) + lowj ) * w) and the second term can be calculated at
compile time.

In the same manner, the expression for the location of an element in column major two-
dimensional array can be obtained. This addressing can be generalized to multidimensional
arrays. Storage can be either row major or column major approach.

Example: Let A be a 10x20 array therefore, n 1 = 10 and n 2 = 20 and assume w = 4


The Three address code to access A[y,z] is
t 1 = y * 20
t1=t1+z
t2 = 4 * t 1
t 3 =A-84 {((low1 Xn2 )+low2 )Xw)=(1*20+1)*4=84}
t4 = t2 + t 3
x=t4
Let A be a 10x20 array
n1 = 10 and n2 = 20

Assume width of the type stored in the array is 4. The three address code to access A[y,z] is
t1 = y * 20
t1 = t1 + z
t2 = 4 *t1
t3 =base A -84 {((low 1 *n2)+low 2 )*w)=(1*20+1)*4=84}
t4 =t2 +t3
x = t4

The following operations are designed :1. mktable(previous): creates a new symbol table and
returns a pointer to this table. Previous is pointer to the symbol table of parent procedure.

2. entire(table,name,type,offset): creates a new entry for name in the symbol table pointed to by
table .

3. addwidth(table,width): records cumulative width of entries of a table in its header.

4. enterproc(table,name ,newtable): creates an entry for procedure name in the symbol table
pointed to by table . newtable is a pointer to symbol table for name .

P {t=mktable(nil);
push(t,tblptr);
push(0,offset)}
D
{addwidth(top(tblptr),top(offset));
pop(tblptr);
pop(offset)}
D D; D

The symbol tables are created using two stacks: tblptr to hold pointers to symbol tables of
the enclosing procedures and offset whose top element is the next available relative address for a
local of the current procedure. Declarations in nested procedures can be processed by the syntax
directed definitions given below. Note that they are basically same as those given above but we
have separately dealt with the epsilon productions. Go to the next page for the explanation.
D proc id;
{ t = mktable(top(tblptr));
push(t, tblptr); push(0, offset)}
D1;S
{ t = top(tblptr);
addwidth(t, top(offset));
pop(tblptr); pop(offset);;
enterproc(top(tblptr), id.name, t)}
D id: T
{ enter(top(tblptr), id.name, T.type, top(offset));
top(offset) = top (offset) + T.width }

The action for M creates a symbol table for the outermost scope and hence a nil pointer is passed
in place of previous. When the declaration, D proc id ; ND1 ; S is processed, the action
corresponding to N causes the creation of a symbol table for the procedure; the pointer to symbol
table of enclosing procedure is given by top(tblptr). The pointer to the new table is pushed on to
the stack tblptr and 0 is pushed as the initial offset on the offset stack. When the actions
corresponding to the subtrees of N, D1 and S have been executed, the offset corresponding to the
current procedure i.e., top(offset) contains the total width of entries in it. Hence top(offset) is
added to the header of symbol table of the current procedure. The top entries of tblptr and offset
are popped so that the pointer and offset of the enclosing procedure are now on top of these
stacks. The entry for id is added to the symbol table of the enclosing procedure. When the
declaration D -> id :T is processed entry for id is created in the symbol table of current
procedure. Pointer to the symbol table of current procedure is again obtained from top(tblptr).
Offset corresponding to the current procedure i.e. top(offset) is incremented by the width
required by type T to point to the next available location.

STORAGE ALLOCATION FOR RECORDS

Field names in records

T record

{t = mktable(nil);

push(t, tblptr); push(0, offset)}

D end

{T.type = record(top(tblptr));

T.width = top(offset);

pop(tblptr); pop(offset)}

T -> record LD end { t = mktable(nil);


push(t, tblptr); push(0, offset)
}
L -> { T.type = record(top(tblptr));
T.width = top(offset);
pop(tblptr); pop(offset)
}
The processing done corresponding to records is similar to that done for procedures.
After the keyword record is seen the marker L creates a new symbol table. Pointer to this table
and offset 0 are pushed on the respective stacks. The action for the declaration D -> id :T push
the information about the field names on the table created. At the end the top of the offset stack
contains the total width of the data objects within the record. This is stored in the attribute
T.width. The constructor record is applied to the pointer to the symbol table to obtain T.type.
Names in the Symbol table :
S id := E
{p = lookup(id.place);
if p <> nil then emit(p := E.place)
else error}
E id
{p = lookup(id.name);
if p <> nil then E.place = p
else error}
The operation lookup in the translation scheme above checks if there is an entry for this
occurrence of the name in the symbol table. If an entry is found, pointer to the entry is returned
else nil is returned. Look up first checks whether the name appears in the current symbol table. If
not then it looks for the name in the symbol table of the enclosing procedure and so on. The
pointer to the symbol table of the enclosing procedure is obtained from the header of the symbol
table.

CODE OPTIMIZATION
Considerations for optimization : The code produced by the straight forward compiling
algorithms can often be made to run faster or take less space,or both. This improvement is
achieved by program transformations that are traditionally called optimizations. Machine
independent optimizations are program transformations that improve the target code without
taking into consideration any properties of the target machine. Machine dependant optimizations
are based on register allocation and utilization of special machine-instruction sequences.

Criteria for code improvement transformations

- Simply stated, the best program transformations are those that yield the most benefit for
the least effort.

- First, the transformation must preserve the meaning of programs. That is, the
optimization must not change the output produced by a program for a given input, or
cause an error.

- Second, a transformation must, on the average, speed up programs by a measurable


amount.

- Third, the transformation must be worth the effort.

Some transformations can only be applied after detailed, often time-consuming analysis of the
source program, so there is little point in applying them to programs that will be run only a few
times.
OBJECTIVES OF OPTIMIZATION: The main objectives of the optimization techniques are
as follows

1. Exploit the fast path in case of multiple paths fro a given situation.

2. Reduce redundant instructions.

3. Produce minimum code for maximum work.

4. Trade off between the size of the code and the speed with which it gets executed.

5. Place code and data together whenever it is required to avoid unnecessary searching of
data/code

During code transformation in the process of optimization, the basic requirements are as follows:

1. Retain the semantics of the source code.

2. Reduce time and/ or space.

3. Reduce the overhead involved in the optimization process.

Scope of Optimization: Control-Flow Analysis

Consider all that has happened up to this point in the compiling process—lexical
analysis, syntactic analysis, semantic analysis and finally intermediate-code generation. The
compiler has done an enormous amount of analysis, but it still doesn‘t really know how the
program does what it does. In control-flow analysis, the compiler figures out even more
information about how the program does its work, only now it can assume that there are no
syntactic or semantic errors in the code.

Control-flow analysis begins by constructing a control-flow graph, which is a graph of


the different possible paths program flow could take through a function. To build the graph, we
first divide the code into basic blocks. A basic block is a segment of the code that a program
must enter at the beginning and exit only at the end. This means that only the first statement can
be reached from outside the block (there are no branches into the middle of the block) and all
statements are executed consecutively after the first one is (no branches or halts until the exit).
Thus a basic block has exactly one entry point and one exit point. If a program executes the first
instruction in a basic block, it must execute every instruction in the block sequentially after it.

A basic block begins in one of several ways:


• The entry point into the function
• The target of a branch (in our example, any label)
• The instruction immediately following a branch or a return

A basic block ends in any of the following ways:


• A jump statement
• A conditional or unconditional branch
• A return statement

Now we can construct the control-flow graph between the blocks. Each basic block is a
node in the graph, and the possible different routes a program might take are the connections, i.e.
if a block ends with a branch, there will be a path leading from that block to the branch target.
The blocks that can follow a block are called its successors. There may be multiple successors or
just one. Similarly the block may have many, one, or no predecessors. Connect up the flow graph
for Fibonacci basic blocks given above. What does an if then-else look like in a flow graph?
What about a loop? You probably have all seen the gcc warning or javac error about:
"Unreachable code at line XXX." How can the compiler tell when code is unreachable?

LOCAL OPTIMIZATIONS

Optimizations performed exclusively within a basic block are called "local


optimizations". These are typically the easiest to perform since we do not consider any control
flow information; we just work with the statements within the block. Many of the local
optimizations we will discuss have corresponding global optimizations that operate on the same
principle, but require additional analysis to perform. We'll consider some of the more common
local optimizations as examples.

FUNCTION PRESERVING TRANSFORMATIONS

 Common sub expression elimination


 Constant folding
 Variable propagation
 Dead Code Elimination
 Code motion
 Strength Reduction

1. Common Sub Expression Elimination:


Two operations are common if they produce the same result. In such a case, it is likely more
efficient to compute the result once and reference it the second time rather than re-evaluate it. An
expression is alive if the operands used to compute the expression have not been changed. An
expression that is no longer alive is dead.
Example :
a=b*c;
d=b*c+x-y;
We can eliminate the second evaluation of b*c from this code if none of the intervening
statements has changed its value. We can thus rewrite the code as

t1=b*c;
a=t1;
d=t1+x-y;

Let us consider the following code


a=b*c;
b=x;
d=b*c+ x-y;
in this code, we can not eliminate the second evaluation of b*c because the value of b is changed
due to the assignment b=x before it is used in calculating d.
We can say the two expressions are common if
 They lexically equivalent i.e., they consist of identical operands connected to each other
by identical operator.
 They evaluate the identical values i.e., no assignment statements for any of their operands
exist between the evaluations of these expressions.
 The value of any of the operands use in the expression should not be changed even due to
the procedure call.
Example :
c=a*b;
x=a;
d=x*b;
We may note that even though expressions a*b and x*b are common in the above code,
they can not be treated as common sub expressions.

2. Variable Propagation:

Let us consider the above code once again

c=a*b;
x=a;
d=x*b+4;
if we replace x by a in the last statement, we can identify a*b and x*b as common sub
expressions. This technique is called variable propagation where the use of one variable is
replaced by another variable if it has been assigned the value of same
Compile Time evaluation
The execution efficiency of the program can be improved by shifting execution time
actions to compile time so that they are not performed repeatedly during the program execution.
We can evaluate an expression with constants operands at compile time and replace that
expression by a single value. This is called folding. Consider the following statement:

a= 2*(22.0/7.0)*r;
Here, we can perform the computation 2*(22.0/7.0) at compile time itself.

3. Dead Code Elimination:


If the value contained in the variable at a point is not used anywhere in the program
subsequently, the variable is said to be dead at that place. If an assignment is made to a dead
variable, then that assignment is a dead assignment and it can be safely removed from the
program.
Similarly, a piece of code is said to be dead, which computes value that are never used anywhere
in the program.
c=a*b;
x=a;
d=x*b+4;
Using variable propagation, the code can be written as follows:
c=a*b;
x=a;
d=a*b+4;
Using Common Sub expression elimination, the code can be written as follows:
t1= a*b;
c=t1;
x=a;
d=t1+4;
Here, x=a will considered as dead code. Hence it is eliminated.
t1= a*b;
c=t1;
d=t1+4;

4. Code Movement:
The motivation for performing code movement in a program is to improve the execution time of
the program by reducing the evaluation frequency of expressions. This can be done by moving
the evaluation of an expression to other parts of the program. Let us consider the bellow code:
If(a<10)
{
b=x^2-y^2;
}
else
{
b=5;
a=( x^2-y^2)*10;
}

At the time of execution of the condition a<10, x^2-y^2 is evaluated twice. So, we can optimize
the code by moving the out side to the block as follows:
t= x^2-y^2;
If(a<10)
{
b=t;
}
else
{
b=5;
a=t*10;
}
5. Strength Reduction:
In the frequency reduction transformation we tried to reduce the execution frequency of
the expressions by moving the code. There is other class of transformations which perform
equivalent actions indicated in the source program by reducing the strength of operators. By
strength reduction, we mean replacing the high strength operator with low strength operator with
out affecting the program meaning. Let us consider the bellow example:
i=1;
while (i<10)
{
y=i*4;
}

The above can written as follows:


i=1;
t=4;
while (i<10)
{
y=t;
t=t+4;
}
Here the high strength operator * is replaced with +.

GLOBAL OPTIMIZATIONS, DATA-FLOW ANALYSIS:


So far we were only considering making changes within one basic block. With some
Additional analysis, we can apply similar optimizations across basic blocks, making them global
optimizations. It‘s worth pointing out that global in this case does not mean across the entire
program. We usually optimize only one function at a time. Inter procedural analysis is an even
larger task, one not even attempted by some compilers.
The additional analysis the optimizer does to perform optimizations across basic blocks is
called data-flow analysis. Data-flow analysis is much more complicated than control-flow
analysis, and we can only scratch the surface here.

Let‘s consider a global common sub expression elimination optimization as our example.
Careful analysis across blocks can determine whether an expression is alive on entry to a block.
Such an expression is said to be available at that point. Once the set of available expressions is
known, common sub-expressions can be eliminated on a global basis. Each block is a node in
the flow graph of a program. The successor set (succ(x)) for a node x is the set of all nodes that x
directly flows into. The predecessor set (pred(x)) for a node x is the set of all nodes that flow
directly into x. An expression is defined at the point where it is assigned a value and killed when
one of its operands is subsequently assigned a new value. An expression is available at some
point p in a flow graph if every path leading to p contains a prior definition of that expression
which is not subsequently killed. Lets define such useful functions in DF analysis in following
lines.
avail[B] = set of expressions available on entry to block B
exit[B] = set of expressions available on exit from B
avail[B] = ∩ exit[x]: x ∈ pred[B] (i.e. B has available the intersection of the exit of its
predecessors)
killed[B] = set of the expressions killed in B
defined[B] = set of expressions defined in B
exit[B] = avail[B] - killed[B] + defined[B]
avail[B] = ∩ (avail[x] - killed[x] + defined[x]) : x ∈ pred[B]

Here is an Algorithm for Global Common Sub-expression Elimination:


1) First, compute defined and killed sets for each basic block (this does not involve any of its
predecessors or successors).
2) Iteratively compute the avail and exit sets for each block by running the following algorithm
until you hit a stable fixed point:
a) Identify each statement s of the form a = b op c in some block B such that b op c is
available at the entry to B and neither b nor c is redefined in B prior to s.
b) Follow flow of control backward in the graph passing back to but not through each
block that defines b op c. The last computation of b op c in such a block reaches s.
c) After each computation d = b op c identified in step 2a, add statement t = d to that
block where t is a new temp.
d) Replace s by a = t.
Try an example to make things clearer:
main:
BeginFunc 28;
b=a+2;
c=4*b;
tmp1 = b < c;
ifNZ tmp1 goto L1 ;
b=1;
L1:
d=a+2;
EndFunc ;

First, divide the code above into basic blocks. Now calculate the available expressions for each
block. Then find an expression available in a block and perform step 2c above. What common
sub-expression can you share between the two blocks? What if the above code were:
main:
BeginFunc 28;
b=a+2;
c=4*b;
tmp1 = b < c ;
IfNZ tmp1 Goto L1 ;
b=1;
z = a + 2 ; <========= an additional line here
L1:
d=a+2;
EndFunc ;
MACHINE OPTIMIZATIONS
In final code generation, there is a lot of opportunity for cleverness in generating
efficient target code. In this pass, specific machines features (specialized instructions, hardware
pipeline abilities, register details) are taken into account to produce code optimized for this
particular architecture.
REGISTER ALLOCATION:
One machine optimization of particular importance is register allocation, which is
perhaps the single most effective optimization for all architectures. Registers are the fastest kind
of memory available, but as a resource, they can be scarce.
The problem is how to minimize traffic between the registers and what lies beyond them
in the memory hierarchy to eliminate time wasted sending data back and forth across the bus and
the different levels of caches. Your Decaf back-end uses a very naïve and inefficient means of
assigning registers, it just fills them before performing an operation and spills them right
afterwards.
A much more effective strategy would be to consider which variables are more heavily
in demand and keep those in registers and spill those that are no longer needed or won't be
needed until much later.
One common register allocation technique is called "register coloring", after the central
idea to view register allocation as a graph coloring problem. If we have 8 registers, then we try to
color a graph with eight different colors. The graph‘s nodes are made of "webs" and the arcs are
determined by calculating interference between the webs. A web represents a variable‘s
definitions, places where it is assigned a value (as in x = …), and the possible different uses of
those definitions (as in y = x + 2). This problem, in fact, can be approached as another graph.
The definition and uses of a variable are nodes, and if a definition reaches a use, there is an arc
between the two nodes. If two portions of a variable‘s definition-use graph are unconnected, then
we have two separate webs for a variable. In the interference graph for the routine, each node is a
web. We seek to determine which webs don't interfere with one another, so we know we can use
the same register for those two variables. For example, consider the following code:
i = 10;
j = 20;
x = i + j;
y = j + k;
We say that i interferes with j because at least one pair of i‘s definitions and uses is
separated by a definition or use of j, thus, i and j are "alive" at the same time. A variable is alive
between the time it has been defined and that definition‘s last use, after which the variable is
dead. If two variables interfere, then we cannot use the same register for each. But two variables
that don't interfere can since there is no overlap in the liveness and can occupy the same register.
Once we have the interference graph constructed, we r-color it so that no two adjacent nodes
share the same color (r is the number of registers we have, each color represents a different
register).
We may recall that graph-coloring is NP-complete, so we employ a heuristic rather than
an optimal algorithm. Here is a simplified version of something that might be used:
1. Find the node with the least neighbors. (Break ties arbitrarily.)
2. Remove it from the interference graph and push it onto a stack
3. Repeat steps 1 and 2 until the graph is empty.
4. Now, rebuild the graph as follows:
a. Take the top node off the stack and reinsert it into the graph
b. Choose a color for it based on the color of any of its neighbors presently in the graph,
rotating colors in case there is more than one choice.
c. Repeat a , and b until the graph is either completely rebuilt, or there is no color
available to color the node.
If we get stuck, then the graph may not be r-colorable, we could try again with a different
heuristic, say reusing colors as often as possible. If no other choice, we have to spill a variable to
memory.

INSTRUCTION SCHEDULING:

Another extremely important optimization of the final code generator is instruction


scheduling. Because many machines, including most RISC architectures, have some sort of
pipelining capability, effectively harnessing that capability requires judicious ordering of
instructions.
In MIPS, each instruction is issued in one cycle, but some take multiple cycles to
complete. It takes an additional cycle before the value of a load is available and two cycles for a
branch to reach its destination, but an instruction can be placed in the "delay slot" after a branch
and executed in that slack time. On the left is one arrangement of a set of instructions that
requires 7 cycles. It assumes no hardware interlock and thus explicitly stalls between the second
and third slots while the load completes and has a Dead cycle after the branch because the delay
slot holds a noop. On the right, a more favorable rearrangement of the same instructions will
execute in 5 cycles with no dead Cycles.
lw $t2, 4($fp)
lw $t3, 8($fp)
noop
add $t4, $t2, $t3
subi $t5, $t5, 1
goto L1
noop
lw $t2, 4($fp)
lw $t3, 8($fp)
subi $t5, $t5, 1
goto L1
add $t4, $t2, $t3

PEEPHOLE OPTIMIZATIONS:
Peephole optimization is a pass that operates on the target assembly and only considers a
few instructions at a time (through a "peephole") and attempts to do simple, machine dependent
code improvements. For example, peephole optimizations might include elimination of
multiplication by 1, elimination of load of a value into a register when the previous instruction
stored that value from the register to a memory location, or replacing a sequence of instructions
by a single instruction with the same effect. Because of its myopic view, a peephole optimizer
does not have the potential payoff of a full-scale optimizer, but it can significantly improve code
at a very local level and can be useful for cleaning up the final code that resulted from more
complex optimizations. Much of the work done in peephole optimization can be though of as
find-replace activity, looking for certain idiomatic patterns in a single or sequence of two to three
Instructions than can be replaced by more efficient alternatives.
For example, MIPS has instructions that can add a small integer constant to the value in a
register without loading the constant into a register first, so the sequence on the left can be
replaced with that on the right:
li $t0, 10
lw $t1, -8($fp)
add $t2, $t1, $t0
sw $t1, -8($fp)
lw $t1, -8($fp)
addi $t2, $t1, 10
sw $t1, -8($fp)
What would you replace the following sequence with?
lw $t0, -8($fp)
sw $t0, -8($fp)
What about this one?
mul $t1, $t0, 2

Abstract Syntax Tree/DAG : Is nothing but the condensed form of a parse tree and is

 . Useful for representing language constructs


 . Depicts the natural hierarchical structure of the source program

- Each internal node represents an operator


- Children of the nodes represent operands
- Leaf nodes represent operands

.DAG is more compact than abstract syntax tree because common sub expressions are eliminated
A syntax tree depicts the natural hierarchical structure of a source program. Its structure has
already been discussed in earlier lectures. DAGs are generated as a combination of trees:
operands that are being reused are linked together, and nodes may be annotated with variable
names (to denote assignments). This way, DAGs are highly compact, since they eliminate local
common sub-expressions. On the other hand, they are not so easy to optimize, since they are
more specific tree forms. However, it can be seen that proper building of DAG for a given
sequence of instructions can compactly represent the outcome of the calculation.

An example of a syntax tree and DAG has been given in the next slide .

a := b * -c + b * -c

You can see that the node " * " comes only once in the DAG as well as the leaf " b ", but the
meaning conveyed by both the representations (AST as well as the DAG) remains the same.

IMPORTANT QUESTIONS:
1. What is Code optimization? Explain the objectives of it. Also discuss Function preserving
transformations with your own examples?
2. Explain the following optimization techniques
(a) Copy Propagation
(b) Dead-Code Elimination
(c) Code Motion
(d) Reduction in Strength.
4. Explain the principle sources of code-improving transformations.
5. What do you mean by machine dependent and machine independent code optimization?
Explain about machine dependent code optimization with examples.

ASSIGNMENT QUESTIONS:

1. Explain Local Optimization techniques with your own Examples?


2. Explain in detail the procedure that eliminating global common sub expression?
3. What is the need of code optimization? Justify your answer?

You might also like