Three Address Code (TAC) : Addresses and Instructions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Three Address Code(TAC)

TAC can range from high- to low-level, depending on the choice of operators. In general,
it is a statement containing at most 3 addresses or operands. The general form is x := y op
z, where “op” is an operator, x is the result, and y and z are operands. x, y, z are variables,
constants, or “temporaries”. A three-address instruction consists of at most 3 addresses
for each statement. It is a linearized representation of a binary syntax tree. Explicit names
correspond to interior nodes of the graph. E.g. for a looping statement , syntax tree
represents components of the statement, whereas three-address code contains labels
and jump instructions to represent the flow-of-control as in machine language.

A TAC instruction has at most one operator on the RHS of an instruction; no built-up
arithmetic expressions are permitted. e.g. x + y * z can be translated as
t1 = y * z t 2
= x + t1
where t1 & t2 are compiler–generated temporary names.
Since it unravels multi-operator arithmetic expressions and nested control-
flow statements, it is useful for target code generation and optimization.
Addresses and Instructions
• TAC consists of a sequence of instructions, each instruction may have up to three
addresses, prototypically t1 = t2 op t3
• Addresses may be one of:
– A name. Each name is a symbol table index. For convenience, we write the
names as the identifier.
– A constant.
– A compiler-generated temporary. Each time a temporary address is
needed, the compiler generates another name from the stream t1, t2, t3,
etc.
• Temporary names allow for code optimization to easily move
instructions
• At target-code generation time, these names will be allocated to
registers or to memory.
• TAC Instructions
– Symbolic labels will be used by instructions that alter the flow of control.
The instruction addresses of labels will be filled in later.
L: t1 = t2 op t3
– Assignment instructions: x = y op z
• Includes binary arithmetic and logical operations
– Unary assignments: x = op y
• Includes unary arithmetic op (-) and logical op (!) and type
conversion
– Copy instructions: x=y
– Unconditional jump: goto L
• L is a symbolic label of an instruction –
Conditional jumps: if x goto L If x is true, execute
instruction L next
ifFalse x goto L If x is false, execute instruction L next
– Conditional jumps: if x relop y goto L
– Procedure calls. For a procedure call p(x1, …, xn) param x1

param xn call
p, n
– Function calls : y= p(x1, …, xn) y = call p,n , return y
– Indexed copy instructions: x = y[i] and x[i] = y
• Left: sets x to the value in the location i memory units beyond y
• Right: sets the contents of the location i memory units beyond x to
y
– Address and pointer instructions:
• x = &y sets the value of x to be the location (address) of y.
• x = *y, presumably y is a pointer or temporary whose value is a
location. The value of x is set to the contents of that location.
• *x = y sets the value of the object pointed to by x to the value of y.
Example: Given the statement do i = i+1; while (a[i] < v ); , the TAC can be written as
below in two ways, using either symbolic labels or position number of instructions for
labels.
Three Address Code Representations
Data structures for representation of TAC can be objects or records with fields for
operator and operands. Representations include quadruples, triples and indirect triples.
Quadruples
• In the quadruple representation, there are four fields for each instruction: op,
arg1, arg2, result
– Binary ops have the obvious representation
– Unary ops don‟t use arg2
– Operators like param don‟t use either arg2 or result
– Jumps put the target label into result
• The quadruples in Fig (b) implement the three-address code in (a) for the
expression a = b * - c + b * - c

Triples
• A triple has only three fields for each instruction: op, arg1, arg2
• The result of an operation x op y is referred to by its position.
• Triples are equivalent to signatures of nodes in DAG or syntax trees.
• Triples and DAGs are equivalent representations only for expressions; they are
not equivalent for control flow.
• Ternary operations like x[i] = y requires two entries in the triple structure,
similarly for x = y[i].
• Moving around an instruction during optimization is a problem

Example: Representations of a =b *– c + b * – c
Indirect Triples
These consist of a listing of pointers to triples, rather than a listing of the triples
themselves. An optimizing compiler can move an instruction by reordering the
instruction list, without affecting the triples themselves.

Static Single-Assignment Form


Static single-assignment form (SSA) is an intermediate representation that facilitates
certain code optimizations. Two distinctive aspects distinguish SSA from three-address
code.
• All assignments in SSA are to variables with distinct names; hence static single-
assignment.
• Φ-FUNCTION
Same variable may be defined in two different control-flow paths.
For example, if ( flag ) x = -1; else x = 1; y = x * a;
using Φ-function it can be written as if ( flag
) x1 = -1; else x2 = 1; x3 =
Φ(x1,x2); y = x3 * a;
The Φ -function returns the value of its argument that corresponds to the control-
flow path that was taken to get to the assignment statement containing the Φ –
function.
Types and Declarations
Types
A type typically denotes a set of values and a set of operations allowed on those values.
Applications of types include type checking and translation.
Certain operations are legal for each type. For example, it doesn‟t make sense to add a
function pointer and an integer in C. But it does make sense to add two integers. But
both have the same assembly language implementation!
A language‟s Type System specifies which operations are valid for which types.
Type Checking is the process of verifying fully typed programs. Given an operation and
an operand of some type, it determines whether the operation is allowed. The goal of
type checking is to ensure that operations are used with the correct types. It uses logical
rules to reason about the behavior of a program and enforces intended interpretation
of values.
Type Inference is the process of filling in missing type information. Given the type of
operands, determine the meaning of the operation and the type of the operation; or,
without variable declarations, infer type from the way the variable is used.
Components of a Type System
• Built-in types
• Rules for constructing new types
• Rules for determining if two types are equivalent
• Rules for inferring the types of expressions
Type Expressions
Types have structure, represented using type expressions. Type expressions can be
either basic type or formed by applying type constructors.
Example: an array type int[2][3] has the type expression is array(2, array(3, integer))
where array is the operator and takes 2 parameters, a number and a type. Definition of
Type Expressions
• A basic type is a type expression. Typical basic types for a language include
boolean, char, integer, float, and void.
• A type name is a type expression.
• A type expression can be formed by applying the array type constructor to a
number and a type expression.
• A record is a data structure with named fields. A type expression can be formed
by applying the record type constructor to the field names and their types.
• A type expression can be formed by using the type constructor → for function
types. We write s→ t for "function from type s to type t.
• If s and t are type expressions, then their Cartesian product s × t is a type
expression. Products can be used to represent a list or tuple of types (e.g., for
function parameters).
• Type expressions may contain variables whose values are type expressions.
Representing Type Expressions
▪ Construct a DAG for type expression adapting the value-number method.
▪ Interior nodes represent type constructors.
▪ Leaves represent basic types, type names, and type variables.
Type graphs: are a graph-structured representation of type expressions:
– Basic types are given predefined “internal values”;
– Named types can be represented via pointers into a hash table.
– A composite type expression f (T1,…,Tn) is represented as a node
identifying the constructor f and with pointers to the nodes for T1, …, Tn.
E.g .: type graph for the type expression of int x[10][20] is shown below.

Type Equivalence
The two types of type equivalence are structural equivalence and name equivalence.
Structural equivalence: When type expressions are represented by graphs, two
types are structurally equivalent if and only if one of the following conditions is
true:
1. They are the same basic type
2. They are formed by applying the same constructor to structurally
equivalent types.
3. One is a type name that denotes the other
Name equivalence: If type names are treated as standing for themselves, the
first two conditions above lead to name equivalence of type expressions.
Example 1: in the Pascal fragment
type p = node;
q = node;
var x : p; y
: q;
x and y are structurally equivalent, but not name-equivalent.
Example 2: Given the declarations
Type t1 = Array [1..10] of integer;
Type t2 = Array [1..10] of integer;
– are they name equivalent? No because they have different type names.
Example 3: Given the declarations type vector =
array [1..10] of real type weight
= array [1..10] of real var x, y:
vector; z: weight
Name Equivalence: When they have the same name.
– x, y have the same type; z has a different type.
Structural Equivalence: When they have the same structure.
– x, y, z have the same type.

Declarations
We learn about types and declarations using simplified grammar that
declares a single name at a time.
We have the following grammar; (2)

The non-terminal D generates a sequence of declarations. Non-


terminal T generates basic types, array, or record types.
The non-terminal B generates a basic type either an int or float.
The non-terminal C generates strings of zero or more integers, each
surrounded by brackets.
An array type consists of the basic type specified by B followed by array
components specified by a non-terminal C. A record type is a sequence
of declarations for fields of the record surrounded by curly braces.

Stack Allocation of Space


Almost all compilers for languages that use procedures, functions, or methods as units of
user-defined actions.
Each time a procedure1 is called, space for its local variables is pushed onto a stack, and
when the procedure terminates, that space is popped off the stack.
Activation Trees:
Stack allocation would not be feasible if procedure calls, or activations of pro cedures,
did not nest in time.
We therefore can represent the activations of procedures during the running of an entire
program by a tree, called an activation tree.
Activation Tree is representation of series of activations in the form of a tree.

Each execution of procedure is referred to as an activation of the


procedure.
Lifetime of an activation is the sequence of steps present in the execution
of the procedure.
A procedure is recursive if a new activation begins before an earlier
activation of the same procedure has ended.
An activation tree shows the way control enters and leaves activations.
Properties of activation trees are :-
• Each node represents an activation of a procedure.
• The root shows the activation of the main function.
• The node for procedure ‘x’ is the parent of node for procedure ‘y’
if and only if the control flows from procedure x to procedure y.
• Example – Consider the following program of Quicksort
• main() {

• Int n;
• readarray();
• quicksort(1,n);
• }

• quicksort(int m, int n) {

• Int i= partition(m,n);
• quicksort(m,i-1);
• quicksort(i+1,n);
• }
• The activation tree for this program will be:


First main function as root then main calls readarray and quicksort.
Quicksort in turn calls partition and quicksort again.
The flow of control in a program corresponds to the depth first traversal of
activation tree which starts at the root.
control stack and activation records:
Whenever a procedure is executed,its activation record stored on the stack,called control stack.

Control stack or runtime stack is used to keep track of the live procedure
activations i.e the procedures whose execution have not been completed.
A procedure name is pushed on to the stack when it is called (activation
begins) and it is popped when it returns (activation ends).
Information needed by a single execution of a procedure is managed using
an activation record or frame.
When a procedure is called, an activation record is pushed into the stack
and as soon as the control returns to the caller function the activation
record is popped.
The contents of activation records vary with the language being imple mented.
Here is a list of the kinds of data that might appear in an activation record.

A general activation record consist of the following things:


• Local variables: hold the data that is local to the execution of the
procedure.
• Temporary values: stores the values that arise in the evaluation
of an expression.
• Machine status: holds the information about status of machine
just before the function call.
• Access link (optional): refers to non-local data held in other
activation records.
• Control link (optional): points to activation record of caller.
• Return value: used by the called procedure to return a value to
calling procedure
• Actual parameters
• Control stack for the above quicksort example:

Calling Sequences:
Procedure calls are implemented by what are known as calling sequences, which consists
of code that allocates an activation record on the stack and enters information into its
fields.
Calling sequences and the layout of activation records may differ greatly, even among
implementations of the same language. The code in a calling sequence is often divided
between the calling procedure (the "caller") and the procedure it calls (the "callee").
In general, if a procedure is called from n different points, then the portion of the calling
sequence assigned to the caller is generated n times.
When designing calling sequences and the layout of activation records, the following
principles are helpful:
1.Values communicated between caller and callee are generally placed at the beginning
of the callee’s activation record, so they are as close as possible to the caller's activation
record.
2. Fixed-length items are generally placed in the middle, such items typically include the
control link, the access link, and the machine status fields.
3. Items whose size may not be known early enough are placed at the end of the
activation record.
4. We must locate the top-of-stack pointer judiciously.

The calling sequence and its division between caller and callee is as follows:
1. The caller evaluates the actual parameters.
2. The caller stores a return address and the old value of top-sp into the callee's
activation record. The caller then increments top-sp to the position.
3. The callee saves the register values and other status information.
4. The callee initializes its local data and begins execution.
Variable-Length Data on the Stack:
In modern languages, objects whose size cannot be determined at compile time are
allocated space in the heap.
However, it is also possible to allocate objects, arrays, or other structures of unknown
size on the stack.
The reason to prefer placing objects on the stack if possible is that we avoid the expense
of garbage collecting their space.
A common strategy for allocating variable-length arrays is shown below:

Translation of Expressions
The goal is to generate 3-address code for expressions. Assume there is a function gen()
that given the pieces needed which does the proper formatting so gen(x = y + z) will
output the corresponding 3-address code. gen() is often called with addresses rather
than lexemes like x. The constructor Temp() produces a new address in whatever
format gen needs.
Operations within Expressions
The syntax-directed definition below builds up the three-address code for an
assignment statement S using attribute code for S and attributes addr and code for an
expression E. Attributes S.code and E.code denote the three-address code for S and E,
respectively. Attribute E.addr denotes the address that will hold the value of E.
Incremental Translation
The method in the previous section generates long strings and we walk the tree. By
using SDT instead of using SDD, you can output parts of the string as each node is
processed.

Addressing Array Elements


The idea is that you associate the base address with the array name. That is, the offset
stored in the identifier table is the address of the first element of the array.
For one dimensional arrays, this is especially easy: The address increment is the width
of each element times the index (assuming indexes start at 0). So the address of A[i] is
the base address of A plus i times the width of each element of A.
Two Dimensional Arrays
Let us assume row major ordering. That is, the first element stored is A[0,0], then
A[0,1], ... A[0,k-1], then A[1,0], ... . Modern languages use row major ordering.
With the alternative column major ordering, after A[0,0] comes A[1,0], A[2,0], ... .
For two dimensional arrays the address of A[i,j] is the sum of three terms
1.The base address of A.
2.The distance from A to the start of row i.
3. The distance from the start of row i to element A[i,j].
Grammar for Expressions With Array References
S → id = E; | L=E;
E → E1 + E2 | id |L
L → id [ E ] | L1 [ E ]
Generates expressions of the form a = b a = b + c , a = b[i] , a[j]= c
, a[j] = b[k] , a = b[c[i]], a[i]= b[i][j][k] Nonterminal L has 3
synthesized attributes
• L.addr – temporary that is used while computing the offset for the array reference
by summing the terms ij x wj
• L.array is a pointer to the symbol table entry for the array name.
– L.array.base is used to determine the actual l-value of an array
reference after all the index expressions are analyzed.
• L.type - type of the subarray generated by L .

TYPE CHECKING
A compiler must check that the source program follows both syntactic
and semantic conventions of the source language. This checking, called static
checking, detects and reports programming errors.

Some examples of static checks:

1. Type checks - A compiler should report an error if an operator is applied to


an incompatible operand. Example: If an array variable and function variable
are added together.

2. Flow-of-control checks - Statements that cause flow of control to leave a


construct must have some place to which to transfer the flow of control.
Example: An enclosing statement, such as break, does not exist in switch
statement.
Fig. 2.6 Position of type checker
A typechecker verifies that the type of a construct matches that expected
by its context. For example : arithmetic operator mod in Pascal requires integer
operands, so a type checker verifies that the operands of mod have type integer.
Type information gathered by a type checker may be needed when code is
generated.

Type Systems

The design of a type checker for a language is based on information


about the syntactic constructs in the language, the notion of types, and the rules
for assigning types to language
constructs.
For example : “ if both operands of the arithmetic operators of +,- and * are of
type integer, then the result is of type integer ”

Type Expressions
The type of a language construct will be denoted by a “type expression.”
A type expression is either a basic type or is formed by applying an operator
called a type constructor to other type expressions. The sets of basic types and
constructors depend on the language to be checked. The following are the
definitions of type expressions:
1. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type_error , will signal an error during type
checking; void denoting “the absence of a value” allows statements to
be checked.

2. Since type expressions may be named, a type name is a type expression.


3. A type constructor applied to type expressions is a type expression.

Constructors include:
Arrays : If T is a type expression then array (I,T) is a type expression
denoting the type of an array with elements of type T and index set I.

Products : If T1 and T2 are type expressions, then their Cartesian product


T1 X T2 is a type expression.

Records : The difference between a record and a product is that the


names. The record type constructor will be applied to a tuple formed
from field names and field types.

For example:

type row = record


address: integer;
lexeme: array[1..15] of char
end;
var table: array[1...101] of row;

declares the type name row representing the type expression


record((address X integer) X (lexeme X array(1..15,char))) and the
variable table to be an array of records of this type.

Pointers : If T is a type expression, then pointer(T) is a type expression


denoting the type “pointer to an object of type T”.
For example, var p: ↑ row declares variable p to have type pointer(row).
Functions : A function in programming languages maps a domain type
D to a range type R. The type of such function is denoted by the type
expression D → R
4. Type expressions may contain variables whose values are type
expressions.
Fg. 5.7 Tree representation for char x char → pointer
(integer)

Type systems

A type system is a collection of rules for assigning type expressions to


the various parts of a program. A type checker implements a type system. It is
specified in a syntax-directed manner. Different type systems may be used by
different compilers or processors of the same language.

Static and Dynamic Checking of Types

Checking done by a compiler is said to be static, while checking done


when the target program runs is termed dynamic. Any check can be done
dynamically, if the target code carries the type of an element along with the
value of that element.

Sound type system

A sound type system eliminates the need for dynamic checking fo allows
us to determine statically that these errors cannot occur when the target
program runs. That is, if a sound type system assigns a type other than
type_error to a program part, then type errors cannot occur when the target code
for the program part is run.

Strongly typed language

A language is strongly typed if its compiler can guarantee that the


programs it accepts will execute without type errors.

Error Recovery

Since type checking has the potential for catching errors in program, it
is desirable for type checker to recover from errors, so it can check the rest of
the input. Error handling has to be designed into the type system right from the
start; the type checking rules must be prepared to cope with errors.

Heap management:
The heap is used for dynamically allocated memory.
It is useful for deallocation and allocation operations.
It is the portion of the store that stores data that will live indefinitely or until the
program terminates.
Programming languages such as C++ or java allow the creation of objects whose
existence is not tied to the procedure that creates them because objects/pointers
to objects may be passed from procedure to procedure.
These objects are stored on the heap.
The memory manager interfaces the application programs and the operating
system and is responsible for the allocation and deallocation of space within the
heap.
Memory Manager:
The memory manager is the subsystem that allocates and deallocates space
within a heap.

It is responsible for tracking all free space at all times.


It has two functions, allocation and deallocation.

Allocation when a program needs memory for a variable or object, the memory
manager will produce a chunk of contiguous heap memory for the requested size.

If the size is available the allocation request is satisfied by the free heap space
otherwise if the requested space is not available, it increases the heap storage by
acquiring consecutive bytes of virtual memory from the operating system.

If there is no space at all, this information is passed back to the application


program by the memory manager.

Deallocation, during deallocation, the deallocated space is returned to the pool of


free space so as to reuse it to satisfy expected allocation requests.

This deallocated memory is never returned to the operating system.

Properties of memory manager:


Program efficiency - It should make good use of the memory subsystem so as to
assist fast execution of programs.

Programs exhibit locality - a non-random clustered way for accessing memory.


it can make better use of space which might improve the speed of execution for
the program.
Space efficiency - It should minimize the needed heap space by a program so as
to allow larger programs to run using a fixed virtual address space.
This is achieved by reducing fragmentation.

Low overhead - overhead is the fraction of execution time spent performing


allocation and deallocation

these are frequent operations in many programs and as such must be as efficient
as possible.

The Memory Hierarchy of the computer:

Modern machines are designed in such a way , programmers write correct


programs without concern for the memory sub-system.

The efficiency of a program is dependent not only on the number of executed


instructions but also the time taken to execute each instruction.

Time to execute an instruction is influenced by the time taken to access memory


locations and can vary from nanoseconds to milliseconds.

A variance in memory access times is caused by hardware limitations in


that, we can either build small and fast storage or large and slow
storage but not both large and fast i.e It is impossible to build a memory
with gigabytes of storage and fast access times.

The memory hierarchy


As can be seen from the hierarchy it is a series of storage elements
with smaller faster ones closer to the processor and larger slower ones
further from the processor.

A processor will have a small number of registers whose contents are


controlled by the software.

During memory access, the machine will first look for the closest
data(lowest level) and if it is absent here, it looks in the next higher level
and so on.

Caches are managed by the hardware so as to keep up with fast RAM


access times.
Disks are relatively slow and the virtual memory is managed by the
operating system assisted by the hardware structure

Data is transfered in contiguous blocks, to repay costs of access, larger


blocks are used with the slower levels of hierarchy.
Between main-memory and cache, data is transfered in blocks - cache
lines of 32 - 256 bytes long.
Between virtual and main memory, data is transfered in block 0 pages of
4k - 64k bytes in size.

Program locality:
Most programs spend a lot of time executing a relatively small fraction of
the code while touching only a small fraction of the data - locality.
Temporal locality of a program is whereby the memory locations a
program accesses are likely to be accessed again within a short period of
time.
Spatial locality is whereby a program's memory locations close to the
location accessed are likely to be accessed within a short period of time.

"Programs spend 90% of their time executing 10% of the code"

Locality will allow us to take advantage of the memory hierarchy by


placing the most common instructions and data in the fast-small
storage while leaving the rest in the slow-large storage and by that we
lower memory access times of a program significantly.
Optimizing using memory hierarchy:
Placing the mostly used instructions in the cache works well.

We improve spatial locality by having the compiler place basic blocks -


sequences of instructions always being executed sequentially, on the
same page or cache line.

We can improve temporal and spatial locality by changing the data layout
or order of computation

Reducing fragmentation:
When a program begins execution, the heap is one contiguous unit of
free space, as the program allocates and deallocates memory, this space
is broken up into used and free chunks.
We define holes as free memory chunks.

Best-Fit, Next-Fit object placement:


We can reduce fragmentation by controlling the placement of objects in a
heap. A good strategy is to allocate the requested memory in the
smallest large enough hole - best-fit.
This best-fit algorithm spares large holes so as to satisfy subsequent
larger requests.
Alternatively first-fit is whereby an object is placed in the lowest hole
first(lowest address) which fits.

Best-fit improves space utilization however it might not be best for spatial
locality

Placing them close to each other will improve the program's spatial
locality.

Managing and coalescing free space:


During manual object deallocation, the memory manager must make its
chunks free so as to be allocated again.
It might also be possible to combine(coalesce) the chunk with adjacent
chunks so as to form a larger heap.
We can use data structures to support coalescing of adjacent blocks.

• Boundary Tags, We keep information(free/used bit) at the low and


high ends of each free or allocated chunk which will tell us if the block
is currently allocated or not. A count of total number of bytes is kept
adjacent to each bit in the chunk.
• Doubly Linked, Embeded Free List, We link free chunks in a doubly
linked list whereby the pointers for the list are within the blocks
adjacent to the boundary tags at their end.
Introduction to Garbage Collection

Data that cannot be referenced is generally known as garbage.

Many high-level programming languages remove the burden of manual


memory management from the programmer by offering automatic garbage
collection.

Design Goals for Garbage Collectors:

Garbage collection is the reclamation of chunks of storage holding objects


that can no longer be accessed by a program.

We need to assume that objects have a type that can be determined by the
garbage collector at run time.

From the type information, we can tell how large the object is and which
components of the object contain references (pointers) to other objects.

A user program, which we shall refer to as the mutator, modifies the col-
lection of objects in the heap.

The mutator creates objects by acquiring space from the memory manager.

Objects become garbage when the mutator program cannot "reach" them.
The garbage collector finds these unreachable objects and reclaims their
space by handing them to the memory manager, which keeps track of the free
space.

Not all languages are good candidates for automatic garbage collection. For a
garbage collector to work, it must be able to tell whether any given data
element or component of a data element is, or could be used as, a pointer to a
chunk of allocated memory space.

A language in which the type of any data component can be determined is


said to be type safe.

type-safe languages, like Java, whose types cannot be determined at compile


time, but can be determined at run time.

Unsafe languages, which unfortunately include some of the most impor-tant


languages such as C and C + + , are bad candidates for automatic garbage
collection. In unsafe languages, memory addresses can be manipulated arbi-
trarily.

Performance Metrics:

Garbage collection is often so expensive that, although it was invented


decades ago and absolutely prevents memory leaks.

Many different approaches have been pro-posed over the years, and there is
not one clearly best garbage-collection algo-rithm.

Overall Execution Time. Garbage collection can be very slow.its perfor-


mance is determined greatly by how it leverages the memory subsystem.

Space Usage. It is important that garbage collection avoid fragmentation and


make the best use of the available memory.

pause Time. Simple garbage collectors are famous for causing pro-grams —
the mutators — to pause suddenly for an extremely long time.
Program Locality. We cannot evaluate the speed of a garbage collector solely
by its running time. The garbage collector controls the placement of data and
thus influences the data locality of the mutator program.

Some of these design goals conflict with one another, and tradeoffs must be
made carefully by considering how programs typically behave.

Reachability:

Recursively, any object with a reference that is stored in the field members or
array elements of any reachable object is itself reachable.

Reachability becomes a bit more complex when the program has been op-
timized by the compiler.

Here are some things an optimizing compiler can do to enable the garbage
collector to find the correct root set:

1.The compiler can restrict the invocation of garbage collection to only


certain code points in the program, when no "hidden" references exist.

2. The compiler can write out information that the garbage collector can use
to recover all the references, such as specifying which registers contain
references

The set of reachable objects changes as a program executes. It grows as new


objects get created and shrinks as objects become unreachable.

It is important to remember that once an object becomes unreachable, it


cannot become reach-able again.

There are two basic ways to find unreachable objects.

1. Reference counting

2. trace-based garbage collector

Reference Counting Garbage Collectors:


With a reference-counting garbage collector, every object must have a field
for the reference count.

Reference counts can be maintained as follows:

1. Object Allocation. The reference count of the new object is set to 1.

2. Parameter Passing. The reference count of each object passed into


a procedure is incremented.

3. Reference Assignments. For statement u = v, where u and v are refer-ences,


the reference count of the object referred to by v goes up by one, and the
count for the old object referred to by u goes down by one.

Reference counting has two main disadvantages:

it cannot collect unreach-able, cyclic data structures, and it is expensive.

Cyclic data structures are quite feasible.

data structures often point back to their parent nodes, or point to each other as
cross references.

Figure shows three objects with references among them, but no references from
anywhere else.
If none of these objects is part of the root set, then they are all garbage, but their
reference counts are each greater than 0.
Such a situation is tantamount to a memory leaks
The advantage of reference counting, on the other hand, is that garbage
collection is performed in an incremental fashion. Even though the total
overhead can be large, the operations are spread throughout the mutator's
computation.

reference counting is par-ticularly attractive algorithm when timing deadlines


must be met, as well as for interactive applications where long, sudden pauses
are unacceptable. Another advantage is that garbage is collected immediately,
keeping space usage low.

You might also like