Three Address Code (TAC) : Addresses and Instructions
Three Address Code (TAC) : Addresses and Instructions
Three Address Code (TAC) : Addresses and Instructions
TAC can range from high- to low-level, depending on the choice of operators. In general,
it is a statement containing at most 3 addresses or operands. The general form is x := y op
z, where “op” is an operator, x is the result, and y and z are operands. x, y, z are variables,
constants, or “temporaries”. A three-address instruction consists of at most 3 addresses
for each statement. It is a linearized representation of a binary syntax tree. Explicit names
correspond to interior nodes of the graph. E.g. for a looping statement , syntax tree
represents components of the statement, whereas three-address code contains labels
and jump instructions to represent the flow-of-control as in machine language.
A TAC instruction has at most one operator on the RHS of an instruction; no built-up
arithmetic expressions are permitted. e.g. x + y * z can be translated as
t1 = y * z t 2
= x + t1
where t1 & t2 are compiler–generated temporary names.
Since it unravels multi-operator arithmetic expressions and nested control-
flow statements, it is useful for target code generation and optimization.
Addresses and Instructions
• TAC consists of a sequence of instructions, each instruction may have up to three
addresses, prototypically t1 = t2 op t3
• Addresses may be one of:
– A name. Each name is a symbol table index. For convenience, we write the
names as the identifier.
– A constant.
– A compiler-generated temporary. Each time a temporary address is
needed, the compiler generates another name from the stream t1, t2, t3,
etc.
• Temporary names allow for code optimization to easily move
instructions
• At target-code generation time, these names will be allocated to
registers or to memory.
• TAC Instructions
– Symbolic labels will be used by instructions that alter the flow of control.
The instruction addresses of labels will be filled in later.
L: t1 = t2 op t3
– Assignment instructions: x = y op z
• Includes binary arithmetic and logical operations
– Unary assignments: x = op y
• Includes unary arithmetic op (-) and logical op (!) and type
conversion
– Copy instructions: x=y
– Unconditional jump: goto L
• L is a symbolic label of an instruction –
Conditional jumps: if x goto L If x is true, execute
instruction L next
ifFalse x goto L If x is false, execute instruction L next
– Conditional jumps: if x relop y goto L
– Procedure calls. For a procedure call p(x1, …, xn) param x1
…
param xn call
p, n
– Function calls : y= p(x1, …, xn) y = call p,n , return y
– Indexed copy instructions: x = y[i] and x[i] = y
• Left: sets x to the value in the location i memory units beyond y
• Right: sets the contents of the location i memory units beyond x to
y
– Address and pointer instructions:
• x = &y sets the value of x to be the location (address) of y.
• x = *y, presumably y is a pointer or temporary whose value is a
location. The value of x is set to the contents of that location.
• *x = y sets the value of the object pointed to by x to the value of y.
Example: Given the statement do i = i+1; while (a[i] < v ); , the TAC can be written as
below in two ways, using either symbolic labels or position number of instructions for
labels.
Three Address Code Representations
Data structures for representation of TAC can be objects or records with fields for
operator and operands. Representations include quadruples, triples and indirect triples.
Quadruples
• In the quadruple representation, there are four fields for each instruction: op,
arg1, arg2, result
– Binary ops have the obvious representation
– Unary ops don‟t use arg2
– Operators like param don‟t use either arg2 or result
– Jumps put the target label into result
• The quadruples in Fig (b) implement the three-address code in (a) for the
expression a = b * - c + b * - c
Triples
• A triple has only three fields for each instruction: op, arg1, arg2
• The result of an operation x op y is referred to by its position.
• Triples are equivalent to signatures of nodes in DAG or syntax trees.
• Triples and DAGs are equivalent representations only for expressions; they are
not equivalent for control flow.
• Ternary operations like x[i] = y requires two entries in the triple structure,
similarly for x = y[i].
• Moving around an instruction during optimization is a problem
Example: Representations of a =b *– c + b * – c
Indirect Triples
These consist of a listing of pointers to triples, rather than a listing of the triples
themselves. An optimizing compiler can move an instruction by reordering the
instruction list, without affecting the triples themselves.
Type Equivalence
The two types of type equivalence are structural equivalence and name equivalence.
Structural equivalence: When type expressions are represented by graphs, two
types are structurally equivalent if and only if one of the following conditions is
true:
1. They are the same basic type
2. They are formed by applying the same constructor to structurally
equivalent types.
3. One is a type name that denotes the other
Name equivalence: If type names are treated as standing for themselves, the
first two conditions above lead to name equivalence of type expressions.
Example 1: in the Pascal fragment
type p = node;
q = node;
var x : p; y
: q;
x and y are structurally equivalent, but not name-equivalent.
Example 2: Given the declarations
Type t1 = Array [1..10] of integer;
Type t2 = Array [1..10] of integer;
– are they name equivalent? No because they have different type names.
Example 3: Given the declarations type vector =
array [1..10] of real type weight
= array [1..10] of real var x, y:
vector; z: weight
Name Equivalence: When they have the same name.
– x, y have the same type; z has a different type.
Structural Equivalence: When they have the same structure.
– x, y, z have the same type.
Declarations
We learn about types and declarations using simplified grammar that
declares a single name at a time.
We have the following grammar; (2)
•
First main function as root then main calls readarray and quicksort.
Quicksort in turn calls partition and quicksort again.
The flow of control in a program corresponds to the depth first traversal of
activation tree which starts at the root.
control stack and activation records:
Whenever a procedure is executed,its activation record stored on the stack,called control stack.
Control stack or runtime stack is used to keep track of the live procedure
activations i.e the procedures whose execution have not been completed.
A procedure name is pushed on to the stack when it is called (activation
begins) and it is popped when it returns (activation ends).
Information needed by a single execution of a procedure is managed using
an activation record or frame.
When a procedure is called, an activation record is pushed into the stack
and as soon as the control returns to the caller function the activation
record is popped.
The contents of activation records vary with the language being imple mented.
Here is a list of the kinds of data that might appear in an activation record.
Calling Sequences:
Procedure calls are implemented by what are known as calling sequences, which consists
of code that allocates an activation record on the stack and enters information into its
fields.
Calling sequences and the layout of activation records may differ greatly, even among
implementations of the same language. The code in a calling sequence is often divided
between the calling procedure (the "caller") and the procedure it calls (the "callee").
In general, if a procedure is called from n different points, then the portion of the calling
sequence assigned to the caller is generated n times.
When designing calling sequences and the layout of activation records, the following
principles are helpful:
1.Values communicated between caller and callee are generally placed at the beginning
of the callee’s activation record, so they are as close as possible to the caller's activation
record.
2. Fixed-length items are generally placed in the middle, such items typically include the
control link, the access link, and the machine status fields.
3. Items whose size may not be known early enough are placed at the end of the
activation record.
4. We must locate the top-of-stack pointer judiciously.
The calling sequence and its division between caller and callee is as follows:
1. The caller evaluates the actual parameters.
2. The caller stores a return address and the old value of top-sp into the callee's
activation record. The caller then increments top-sp to the position.
3. The callee saves the register values and other status information.
4. The callee initializes its local data and begins execution.
Variable-Length Data on the Stack:
In modern languages, objects whose size cannot be determined at compile time are
allocated space in the heap.
However, it is also possible to allocate objects, arrays, or other structures of unknown
size on the stack.
The reason to prefer placing objects on the stack if possible is that we avoid the expense
of garbage collecting their space.
A common strategy for allocating variable-length arrays is shown below:
Translation of Expressions
The goal is to generate 3-address code for expressions. Assume there is a function gen()
that given the pieces needed which does the proper formatting so gen(x = y + z) will
output the corresponding 3-address code. gen() is often called with addresses rather
than lexemes like x. The constructor Temp() produces a new address in whatever
format gen needs.
Operations within Expressions
The syntax-directed definition below builds up the three-address code for an
assignment statement S using attribute code for S and attributes addr and code for an
expression E. Attributes S.code and E.code denote the three-address code for S and E,
respectively. Attribute E.addr denotes the address that will hold the value of E.
Incremental Translation
The method in the previous section generates long strings and we walk the tree. By
using SDT instead of using SDD, you can output parts of the string as each node is
processed.
TYPE CHECKING
A compiler must check that the source program follows both syntactic
and semantic conventions of the source language. This checking, called static
checking, detects and reports programming errors.
Type Systems
Type Expressions
The type of a language construct will be denoted by a “type expression.”
A type expression is either a basic type or is formed by applying an operator
called a type constructor to other type expressions. The sets of basic types and
constructors depend on the language to be checked. The following are the
definitions of type expressions:
1. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type_error , will signal an error during type
checking; void denoting “the absence of a value” allows statements to
be checked.
Constructors include:
Arrays : If T is a type expression then array (I,T) is a type expression
denoting the type of an array with elements of type T and index set I.
For example:
Type systems
A sound type system eliminates the need for dynamic checking fo allows
us to determine statically that these errors cannot occur when the target
program runs. That is, if a sound type system assigns a type other than
type_error to a program part, then type errors cannot occur when the target code
for the program part is run.
Error Recovery
Since type checking has the potential for catching errors in program, it
is desirable for type checker to recover from errors, so it can check the rest of
the input. Error handling has to be designed into the type system right from the
start; the type checking rules must be prepared to cope with errors.
Heap management:
The heap is used for dynamically allocated memory.
It is useful for deallocation and allocation operations.
It is the portion of the store that stores data that will live indefinitely or until the
program terminates.
Programming languages such as C++ or java allow the creation of objects whose
existence is not tied to the procedure that creates them because objects/pointers
to objects may be passed from procedure to procedure.
These objects are stored on the heap.
The memory manager interfaces the application programs and the operating
system and is responsible for the allocation and deallocation of space within the
heap.
Memory Manager:
The memory manager is the subsystem that allocates and deallocates space
within a heap.
Allocation when a program needs memory for a variable or object, the memory
manager will produce a chunk of contiguous heap memory for the requested size.
If the size is available the allocation request is satisfied by the free heap space
otherwise if the requested space is not available, it increases the heap storage by
acquiring consecutive bytes of virtual memory from the operating system.
these are frequent operations in many programs and as such must be as efficient
as possible.
During memory access, the machine will first look for the closest
data(lowest level) and if it is absent here, it looks in the next higher level
and so on.
Program locality:
Most programs spend a lot of time executing a relatively small fraction of
the code while touching only a small fraction of the data - locality.
Temporal locality of a program is whereby the memory locations a
program accesses are likely to be accessed again within a short period of
time.
Spatial locality is whereby a program's memory locations close to the
location accessed are likely to be accessed within a short period of time.
We can improve temporal and spatial locality by changing the data layout
or order of computation
Reducing fragmentation:
When a program begins execution, the heap is one contiguous unit of
free space, as the program allocates and deallocates memory, this space
is broken up into used and free chunks.
We define holes as free memory chunks.
Best-fit improves space utilization however it might not be best for spatial
locality
Placing them close to each other will improve the program's spatial
locality.
We need to assume that objects have a type that can be determined by the
garbage collector at run time.
From the type information, we can tell how large the object is and which
components of the object contain references (pointers) to other objects.
A user program, which we shall refer to as the mutator, modifies the col-
lection of objects in the heap.
The mutator creates objects by acquiring space from the memory manager.
Objects become garbage when the mutator program cannot "reach" them.
The garbage collector finds these unreachable objects and reclaims their
space by handing them to the memory manager, which keeps track of the free
space.
Not all languages are good candidates for automatic garbage collection. For a
garbage collector to work, it must be able to tell whether any given data
element or component of a data element is, or could be used as, a pointer to a
chunk of allocated memory space.
Performance Metrics:
Many different approaches have been pro-posed over the years, and there is
not one clearly best garbage-collection algo-rithm.
pause Time. Simple garbage collectors are famous for causing pro-grams —
the mutators — to pause suddenly for an extremely long time.
Program Locality. We cannot evaluate the speed of a garbage collector solely
by its running time. The garbage collector controls the placement of data and
thus influences the data locality of the mutator program.
Some of these design goals conflict with one another, and tradeoffs must be
made carefully by considering how programs typically behave.
Reachability:
Recursively, any object with a reference that is stored in the field members or
array elements of any reachable object is itself reachable.
Reachability becomes a bit more complex when the program has been op-
timized by the compiler.
Here are some things an optimizing compiler can do to enable the garbage
collector to find the correct root set:
2. The compiler can write out information that the garbage collector can use
to recover all the references, such as specifying which registers contain
references
1. Reference counting
data structures often point back to their parent nodes, or point to each other as
cross references.
Figure shows three objects with references among them, but no references from
anywhere else.
If none of these objects is part of the root set, then they are all garbage, but their
reference counts are each greater than 0.
Such a situation is tantamount to a memory leaks
The advantage of reference counting, on the other hand, is that garbage
collection is performed in an incremental fashion. Even though the total
overhead can be large, the operations are spread throughout the mutator's
computation.