Run Time Storage Management:: Unit-Iv
Run Time Storage Management:: Unit-Iv
Run Time Storage Management:: Unit-Iv
And the run-time allocation and de-allocation of activations occur on the call of functions and
when they return.
There are mainly two kinds of run-time allocation systems: Static allocation and Stack
Allocation. While static allocation is used by the FORTRAN class of languages, stack allocation
is used by the Ada class of languages.
STATIC ALLOCATION: In this, A call statement is implemented by a sequence of two
instructions.
GOTO callee.code-area
callee.static-area and callee.code-area are constants referring to address of the activation record
and the first address of called procedure respectively.
. #here+20 in the move instruction is the return address; the address of the instruction following
the goto instruction
GOTO *callee.static-area
For the call statement, we need to save the return address somewhere and then jump to
the location of the callee function. And to return from a function, we have to access the return
address as stored by its caller, and then jump to it. So for call, we first say: MOV #here+20,
callee.static-area. Here, #here refers to the location of the current MOV instruction, and
callee.static- area is a fixed location in memory. 20 is added to #here here, as the code
corresponding to the call instruction takes 20 bytes (at 4 bytes for each parameter: 4*3 for this
instruction, and 8 for the next). Then we say GOTO callee. code-area, to take us to the code of
the callee, as callee.codearea is merely the address where the code of the callee starts. Then a
return from the callee is implemented by: GOTO *callee.static area. Note that this works only
because callee.static-area is a constant.
Example:
This example corresponds to the code shown in slide 57. Statically we say that the code
for c starts at 100 and that for p starts at 200. At some point, c calls p. Using the strategy
discussed earlier, and assuming that callee.staticarea is at the memory location 364, we get the
code as given. Here we assume that a call to 'action' corresponds to a single machine instruction
which takes 20 bytes.
STACK ALLOCATION : . Position of the activation record is not known until run time
. Position is stored in a register at run time, and words in the record are accessed with an
offset from the register
. The code for the first procedure initializes the stack by setting up SP to the start of the
stack area
MOV #Stackstart, SP
HALT
In stack allocation we do not need to know the position of the activation record until run-
time. This gives us an advantage over static allocation, as we can have recursion. So this is used
in many modern programming languages like C, Ada, etc. The positions of the activations are
stored in the stack area, and the position for the most recent activation is pointed to by the stack
pointer. Words in a record are accessed with an offset from the register. The code for the first
procedure initializes the stack by setting up SP to the stack area by the following command:
MOV #Stackstart, SP. Here, #Stackstart is the location in memory where the stack starts.
A procedure call sequence increments SP, saves the return address and transfers control to the
called procedure
ADD #caller.recordsize, SP
GOTO callee.code_area
Consider the situation when a function (caller) calls the another function(callee), then
procedure call sequence increments SP by the caller record size, saves the return address and
transfers control to the callee by jumping to its code area. In the MOV instruction here, we only
need to add 16, as SP is a register, and so no space is needed to store *SP. The activations keep
getting pushed on the stack, so #caller.recordsize needs to be added to SP, to update the value of
SP to its new value. This works as #caller.recordsize is a constant for a function, regardless of
the particular activation being referred to.
DATA STRUCTURES: Following data structures are used to implement symbol tables
LIST DATA STRUCTURE : Could be an array based or pointer based list. But this
implementation is
- Simplest to implement
- Use a single array to store names and information
- Search for a name is linear
- Entry and lookup are independent operations
- Cost of entry and search operations are very high and lot of time goes into book keeping
Hash table: Hash table is a data structure which gives O(1) performance in accessing any
element of it. It uses the features of both array and pointer based lists.
The entries in the symbol table are for declaration of names. When an occurrence of a name in
the source text is looked up in the symbol table, the entry for the appropriate declaration,
according to the scoping rules of the language, must be returned. A simple approach is to
maintain a separate symbol table for each scope.
Most closely nested scope rules can be implemented by adapting the data structures
discussed in the previous section. Each procedure is assigned a unique number. If the language is
block-structured, the blocks must also be assigned unique numbers. The name is represented as a
pair of a number and a name. This new name is added to the symbol table. Most scope rules can
be implemented in terms of following operations:
int fun2()
{
int a;
int c;
....
}
Visibility: The visibility of a variable determines how much of the rest of the program
can access that variable. You can arrange that a variable is visible only within one part of
one function, or in one function, or in one source file, or anywhere in the program.
r) Local and Global variables: A variable declared within the braces {} of a function is
visible only within that function; variables declared within functions are called local
variables. On the other hand, a variable declared outside of any function is a global
variable , and it is potentially visible anywhere within the program.
s) Automatic Vs Static duration: How long do variables last? By default, local variables
(those declared within a function) have automatic duration: they spring into existence
when the function is called, and they (and their values) disappear when the function
returns. Global variables, on the other hand, have static duration: they last, and the values
stored in them persist, for as long as the program does. (Of course, the values can in
general still be overwritten, so they don't necessarily persist forever.) By default, local
variables have automatic duration. To give them static duration (so that, instead of
coming and going as the function is called, they persist for as long as the function does),
you precede their declaration with the static keyword: static int i; By default, a
declaration of a global variable (especially if it specifies an initial value) is the defining
instance. To make it an external declaration, of a variable which is defined somewhere
else, you precede it with the keyword extern: extern int j; Finally, to arrange that a global
variable is visible only within its containing source file, you precede it with the static
keyword: static int k; Notice that the static keyword can do two different things: it adjusts
the duration of a local variable from automatic to static, or it adjusts the visibility of a
global variable from truly global to private-to-the-file.
t) Symbol attributes and symbol table entries
u) Symbols have associated attributes
v) Typical attributes are name, type, scope, size, addressing mode etc.
w) A symbol table entry collects together attributes such that they can be easily set and
retrieved
x) Example of typical names in symbol table
Name Type
name character string
class enumeration
size integer
type enumeration
Following are prototypes of typical function declarations used for managing local symbol table.
The right hand side of the arrows is the output of the procedure and the left side has the input.
. Balanced binary tree: quick insertion, searching and retrieval; extra work required to keep the
tree balanced
. Hash tables: quick insertion, searching and retrieval; extra work to compute hash keys
A major consideration in designing a symbol table is that insertion and retrieval should be
as fast as possible. We talked about the one dimensional and hash tables a few slides back. Apart
from these balanced binary trees can be used too. Hashing is the most common approach.
Hash tables can clearly implement 'lookup' and 'insert' operations. For implementing the
'delete', we do not want to scan the entire hash table looking for lists containing entries to be
deleted. Each entry should have two links:
a) A hash link that chains the entry to other entries whose names hash to the same value - the
usual link in the hash table.
b) A scope link that chains all entries in the same scope - an extra link. If the scope link is left
undisturbed when an entry is deleted from the hash table, then the chain formed by the scope
links will constitute an inactive symbol table for the scope in question.
Look at the nesting structure of this program. Variables a, b and c appear in global as
well as local scopes. Local scope of a variable overrides the global scope of the other variable
with the same name within its own scope. The next slide will show the global as well as the local
symbol tables for this structure. Here procedure I and h lie within the scope of g ( are nested
within g).
GLOBAL SYMBOL TABLE STRUCTURE The global symbol table will be a collection of
symbol tables connected with pointers.
Storage binding and symbolic registers : Translates variable names into addresses and the
process must occur before or during code generation
a) Global Variables : fixed relocatable address or offset with respect to base as global pointer
b) Global Static Variables : .Global variables, on the other hand, have static duration (hence
also called static variables): they last, and the values stored in them persist, for as long as the
program does. (Of course, the values can in general still be overwritten, so they don't necessarily
persist forever.) Therefore they have fixed relocatable address or offset with respect to base as
global pointer.
c) Stack Variables : allocate stack/global in registers and registers are not indexable, therefore,
arrays cannot be in registers
d) Stack Static Variables : By default, local variables (stack variables) (those declared within a
function) have automatic duration: they spring into existence when the function is called, and
they (and their values) disappear when the function returns. This is why they are stored in stacks
and have offset from stack/frame pointer.
Register allocation is usually done for global variables. Since registers are not indexable,
therefore, arrays cannot be in registers as they are indexed data structures. Graph coloring is a
simple technique for allocating register and minimizing register spills that works well in practice.
Register spills occur when a register is needed for a computation but all available registers are in
use. The contents of one of the registers must be stored in memory to free it up for immediate
use. We assign symbolic registers to scalar variables which are used in the graph coloring.
Local Variables in Frame
word boundaries - the most significant byte of the object must be located at an address whose
two least significant bits are zero relative to the frame pointer
half-word boundaries - the most significant byte of the object being located at an address
whose least significant bit is zero relative to the frame pointer .
While allocating memory to the variables, sort variables by the alignment they need. You may:
Store largest variables first: It automatically aligns all the variables and does not require padding
since the next variable's memory allocation starts at the end of that of the earlier variable
. Store smallest variables first: It requires more space (padding) since you have to accommodate
for the biggest possible length of any variable data structure. The advantage is that for large stack
frame, more variables become accessible within small offsets
How to store large local data structures? Because they Requires large space in local frames
and therefore large offsets
- If large object is put near the boundary other objects require large offset either from fp (if
put near beginning) or sp (if put near end)
- Allocate another base register to access large objects
- Allocate space in the middle or elsewhere; store pointer to these locations from at a small
offset from fp
- Requires extra loads
Large local data structures require large space in local frames and therefore large offsets.
As told in the previous slide's notes, if large objects are put near the boundary then the other
objects require large offset. You can either allocate another base register to access large objects
or you can allocate space in the middle or elsewhere and then store pointers to these locations
starting from at a small offset from the frame pointer, fp.
In the unsorted allocation you can see the waste of space in green. In sorted frame there is no
waste of space.
For a single dimensional array, if low is the lower bound of the index and base is the
relative address of the storage allocated to the array i.e., the relative address of A[low], then the i
th elements begins at the location: base + (I - low)* w . This expression can be reorganized as
i*w + (base -low*w) . The sub-expression base-low*w is calculated and stored in the symbol
table at compile time when the array declaration is processed, so that the relative address of A[i]
can be obtained by just adding i*w to it.
i x w + const
2-DIMENSIONAL ARRAY: For a row major two dimensional array the address of A[i][j]
can be calculated by the formula :
base + ((i-lowi )*n2 +j - lowj )*w where low i and lowj are lower values of I and j and n2 is
number of values j can take i.e. n2 = high2 - low2 + 1.
((i * n2) + j) *w + (base - ((lowi *n2) + lowj ) * w) and the second term can be calculated at
compile time.
In the same manner, the expression for the location of an element in column major two-
dimensional array can be obtained. This addressing can be generalized to multidimensional
arrays. Storage can be either row major or column major approach.
Assume width of the type stored in the array is 4. The three address code to access A[y,z] is
t1 = y * 20
t1 = t1 + z
t2 = 4 *t1
t3 =base A -84 {((low 1 *n2)+low 2 )*w)=(1*20+1)*4=84}
t4 =t2 +t3
x = t4
The following operations are designed :1. mktable(previous): creates a new symbol table and
returns a pointer to this table. Previous is pointer to the symbol table of parent procedure.
2. entire(table,name,type,offset): creates a new entry for name in the symbol table pointed to by
table .
4. enterproc(table,name ,newtable): creates an entry for procedure name in the symbol table
pointed to by table . newtable is a pointer to symbol table for name .
P {t=mktable(nil);
push(t,tblptr);
push(0,offset)}
D
{addwidth(top(tblptr),top(offset));
pop(tblptr);
pop(offset)}
D D; D
The symbol tables are created using two stacks: tblptr to hold pointers to symbol tables of
the enclosing procedures and offset whose top element is the next available relative address for a
local of the current procedure. Declarations in nested procedures can be processed by the syntax
directed definitions given below. Note that they are basically same as those given above but we
have separately dealt with the epsilon productions. Go to the next page for the explanation.
D proc id;
{ t = mktable(top(tblptr));
push(t, tblptr); push(0, offset)}
D1;S
{ t = top(tblptr);
addwidth(t, top(offset));
pop(tblptr); pop(offset);;
enterproc(top(tblptr), id.name, t)}
D id: T
{ enter(top(tblptr), id.name, T.type, top(offset));
top(offset) = top (offset) + T.width }
The action for M creates a symbol table for the outermost scope and hence a nil pointer is passed
in place of previous. When the declaration, D proc id ; ND1 ; S is processed, the action
corresponding to N causes the creation of a symbol table for the procedure; the pointer to symbol
table of enclosing procedure is given by top(tblptr). The pointer to the new table is pushed on to
the stack tblptr and 0 is pushed as the initial offset on the offset stack. When the actions
corresponding to the subtrees of N, D1 and S have been executed, the offset corresponding to the
current procedure i.e., top(offset) contains the total width of entries in it. Hence top(offset) is
added to the header of symbol table of the current procedure. The top entries of tblptr and offset
are popped so that the pointer and offset of the enclosing procedure are now on top of these
stacks. The entry for id is added to the symbol table of the enclosing procedure. When the
declaration D -> id :T is processed entry for id is created in the symbol table of current
procedure. Pointer to the symbol table of current procedure is again obtained from top(tblptr).
Offset corresponding to the current procedure i.e. top(offset) is incremented by the width
required by type T to point to the next available location.
T record
{t = mktable(nil);
D end
{T.type = record(top(tblptr));
T.width = top(offset);
pop(tblptr); pop(offset)}
CODE OPTIMIZATION
Considerations for optimization : The code produced by the straight forward compiling
algorithms can often be made to run faster or take less space,or both. This improvement is
achieved by program transformations that are traditionally called optimizations. Machine
independent optimizations are program transformations that improve the target code without
taking into consideration any properties of the target machine. Machine dependant optimizations
are based on register allocation and utilization of special machine-instruction sequences.
- Simply stated, the best program transformations are those that yield the most benefit for
the least effort.
- First, the transformation must preserve the meaning of programs. That is, the
optimization must not change the output produced by a program for a given input, or
cause an error.
Some transformations can only be applied after detailed, often time-consuming analysis of the
source program, so there is little point in applying them to programs that will be run only a few
times.
OBJECTIVES OF OPTIMIZATION: The main objectives of the optimization techniques are
as follows
1. Exploit the fast path in case of multiple paths fro a given situation.
4. Trade off between the size of the code and the speed with which it gets executed.
5. Place code and data together whenever it is required to avoid unnecessary searching of
data/code
During code transformation in the process of optimization, the basic requirements are as follows:
Consider all that has happened up to this point in the compiling process—lexical
analysis, syntactic analysis, semantic analysis and finally intermediate-code generation. The
compiler has done an enormous amount of analysis, but it still doesn‘t really know how the
program does what it does. In control-flow analysis, the compiler figures out even more
information about how the program does its work, only now it can assume that there are no
syntactic or semantic errors in the code.
Now we can construct the control-flow graph between the blocks. Each basic block is a
node in the graph, and the possible different routes a program might take are the connections, i.e.
if a block ends with a branch, there will be a path leading from that block to the branch target.
The blocks that can follow a block are called its successors. There may be multiple successors or
just one. Similarly the block may have many, one, or no predecessors. Connect up the flow graph
for Fibonacci basic blocks given above. What does an if then-else look like in a flow graph?
What about a loop? You probably have all seen the gcc warning or javac error about:
"Unreachable code at line XXX." How can the compiler tell when code is unreachable?
LOCAL OPTIMIZATIONS
t1=b*c;
a=t1;
d=t1+x-y;
2. Variable Propagation:
c=a*b;
x=a;
d=x*b+4;
if we replace x by a in the last statement, we can identify a*b and x*b as common sub
expressions. This technique is called variable propagation where the use of one variable is
replaced by another variable if it has been assigned the value of same
Compile Time evaluation
The execution efficiency of the program can be improved by shifting execution time
actions to compile time so that they are not performed repeatedly during the program execution.
We can evaluate an expression with constants operands at compile time and replace that
expression by a single value. This is called folding. Consider the following statement:
a= 2*(22.0/7.0)*r;
Here, we can perform the computation 2*(22.0/7.0) at compile time itself.
4. Code Movement:
The motivation for performing code movement in a program is to improve the execution time of
the program by reducing the evaluation frequency of expressions. This can be done by moving
the evaluation of an expression to other parts of the program. Let us consider the bellow code:
If(a<10)
{
b=x^2-y^2;
}
else
{
b=5;
a=( x^2-y^2)*10;
}
At the time of execution of the condition a<10, x^2-y^2 is evaluated twice. So, we can optimize
the code by moving the out side to the block as follows:
t= x^2-y^2;
If(a<10)
{
b=t;
}
else
{
b=5;
a=t*10;
}
5. Strength Reduction:
In the frequency reduction transformation we tried to reduce the execution frequency of
the expressions by moving the code. There is other class of transformations which perform
equivalent actions indicated in the source program by reducing the strength of operators. By
strength reduction, we mean replacing the high strength operator with low strength operator with
out affecting the program meaning. Let us consider the bellow example:
i=1;
while (i<10)
{
y=i*4;
}
Let‘s consider a global common sub expression elimination optimization as our example.
Careful analysis across blocks can determine whether an expression is alive on entry to a block.
Such an expression is said to be available at that point. Once the set of available expressions is
known, common sub-expressions can be eliminated on a global basis. Each block is a node in
the flow graph of a program. The successor set (succ(x)) for a node x is the set of all nodes that x
directly flows into. The predecessor set (pred(x)) for a node x is the set of all nodes that flow
directly into x. An expression is defined at the point where it is assigned a value and killed when
one of its operands is subsequently assigned a new value. An expression is available at some
point p in a flow graph if every path leading to p contains a prior definition of that expression
which is not subsequently killed. Lets define such useful functions in DF analysis in following
lines.
avail[B] = set of expressions available on entry to block B
exit[B] = set of expressions available on exit from B
avail[B] = ∩ exit[x]: x ∈ pred[B] (i.e. B has available the intersection of the exit of its
predecessors)
killed[B] = set of the expressions killed in B
defined[B] = set of expressions defined in B
exit[B] = avail[B] - killed[B] + defined[B]
avail[B] = ∩ (avail[x] - killed[x] + defined[x]) : x ∈ pred[B]
First, divide the code above into basic blocks. Now calculate the available expressions for each
block. Then find an expression available in a block and perform step 2c above. What common
sub-expression can you share between the two blocks? What if the above code were:
main:
BeginFunc 28;
b=a+2;
c=4*b;
tmp1 = b < c ;
IfNZ tmp1 Goto L1 ;
b=1;
z = a + 2 ; <========= an additional line here
L1:
d=a+2;
EndFunc ;
MACHINE OPTIMIZATIONS
In final code generation, there is a lot of opportunity for cleverness in generating
efficient target code. In this pass, specific machines features (specialized instructions, hardware
pipeline abilities, register details) are taken into account to produce code optimized for this
particular architecture.
REGISTER ALLOCATION:
One machine optimization of particular importance is register allocation, which is
perhaps the single most effective optimization for all architectures. Registers are the fastest kind
of memory available, but as a resource, they can be scarce.
The problem is how to minimize traffic between the registers and what lies beyond them
in the memory hierarchy to eliminate time wasted sending data back and forth across the bus and
the different levels of caches. Your Decaf back-end uses a very naïve and inefficient means of
assigning registers, it just fills them before performing an operation and spills them right
afterwards.
A much more effective strategy would be to consider which variables are more heavily
in demand and keep those in registers and spill those that are no longer needed or won't be
needed until much later.
One common register allocation technique is called "register coloring", after the central
idea to view register allocation as a graph coloring problem. If we have 8 registers, then we try to
color a graph with eight different colors. The graph‘s nodes are made of "webs" and the arcs are
determined by calculating interference between the webs. A web represents a variable‘s
definitions, places where it is assigned a value (as in x = …), and the possible different uses of
those definitions (as in y = x + 2). This problem, in fact, can be approached as another graph.
The definition and uses of a variable are nodes, and if a definition reaches a use, there is an arc
between the two nodes. If two portions of a variable‘s definition-use graph are unconnected, then
we have two separate webs for a variable. In the interference graph for the routine, each node is a
web. We seek to determine which webs don't interfere with one another, so we know we can use
the same register for those two variables. For example, consider the following code:
i = 10;
j = 20;
x = i + j;
y = j + k;
We say that i interferes with j because at least one pair of i‘s definitions and uses is
separated by a definition or use of j, thus, i and j are "alive" at the same time. A variable is alive
between the time it has been defined and that definition‘s last use, after which the variable is
dead. If two variables interfere, then we cannot use the same register for each. But two variables
that don't interfere can since there is no overlap in the liveness and can occupy the same register.
Once we have the interference graph constructed, we r-color it so that no two adjacent nodes
share the same color (r is the number of registers we have, each color represents a different
register).
We may recall that graph-coloring is NP-complete, so we employ a heuristic rather than
an optimal algorithm. Here is a simplified version of something that might be used:
1. Find the node with the least neighbors. (Break ties arbitrarily.)
2. Remove it from the interference graph and push it onto a stack
3. Repeat steps 1 and 2 until the graph is empty.
4. Now, rebuild the graph as follows:
a. Take the top node off the stack and reinsert it into the graph
b. Choose a color for it based on the color of any of its neighbors presently in the graph,
rotating colors in case there is more than one choice.
c. Repeat a , and b until the graph is either completely rebuilt, or there is no color
available to color the node.
If we get stuck, then the graph may not be r-colorable, we could try again with a different
heuristic, say reusing colors as often as possible. If no other choice, we have to spill a variable to
memory.
INSTRUCTION SCHEDULING:
PEEPHOLE OPTIMIZATIONS:
Peephole optimization is a pass that operates on the target assembly and only considers a
few instructions at a time (through a "peephole") and attempts to do simple, machine dependent
code improvements. For example, peephole optimizations might include elimination of
multiplication by 1, elimination of load of a value into a register when the previous instruction
stored that value from the register to a memory location, or replacing a sequence of instructions
by a single instruction with the same effect. Because of its myopic view, a peephole optimizer
does not have the potential payoff of a full-scale optimizer, but it can significantly improve code
at a very local level and can be useful for cleaning up the final code that resulted from more
complex optimizations. Much of the work done in peephole optimization can be though of as
find-replace activity, looking for certain idiomatic patterns in a single or sequence of two to three
Instructions than can be replaced by more efficient alternatives.
For example, MIPS has instructions that can add a small integer constant to the value in a
register without loading the constant into a register first, so the sequence on the left can be
replaced with that on the right:
li $t0, 10
lw $t1, -8($fp)
add $t2, $t1, $t0
sw $t1, -8($fp)
lw $t1, -8($fp)
addi $t2, $t1, 10
sw $t1, -8($fp)
What would you replace the following sequence with?
lw $t0, -8($fp)
sw $t0, -8($fp)
What about this one?
mul $t1, $t0, 2
Abstract Syntax Tree/DAG : Is nothing but the condensed form of a parse tree and is
.DAG is more compact than abstract syntax tree because common sub expressions are eliminated
A syntax tree depicts the natural hierarchical structure of a source program. Its structure has
already been discussed in earlier lectures. DAGs are generated as a combination of trees:
operands that are being reused are linked together, and nodes may be annotated with variable
names (to denote assignments). This way, DAGs are highly compact, since they eliminate local
common sub-expressions. On the other hand, they are not so easy to optimize, since they are
more specific tree forms. However, it can be seen that proper building of DAG for a given
sequence of instructions can compactly represent the outcome of the calculation.
An example of a syntax tree and DAG has been given in the next slide .
a := b * -c + b * -c
You can see that the node " * " comes only once in the DAG as well as the leaf " b ", but the
meaning conveyed by both the representations (AST as well as the DAG) remains the same.
IMPORTANT QUESTIONS:
1. What is Code optimization? Explain the objectives of it. Also discuss Function preserving
transformations with your own examples?
2. Explain the following optimization techniques
(a) Copy Propagation
(b) Dead-Code Elimination
(c) Code Motion
(d) Reduction in Strength.
4. Explain the principle sources of code-improving transformations.
5. What do you mean by machine dependent and machine independent code optimization?
Explain about machine dependent code optimization with examples.
ASSIGNMENT QUESTIONS: