Unit 1
Unit 1
the variables x and y are both local variables. The scope of x is limited to the main() function, while the scope of y is
limited to the block within the if statement. The values of these variables are not accessible outside of their respective
scopes.
2. Global Variable:
In compiler design, a global variable is a variable that is accessible throughout the program and has a global scope.
Global variables are typically used to store values that are needed by multiple functions or blocks within the program.
Global variables are declared outside of any function or block and can be accessed from anywhere within the program.
the variable `global_var` is a global variable. It is declared outside of any function or block and can be accessed from
anywhere within the program. The functions `function1()` and `function2()` both modify the value of `global_var`, and
the final value of `global_var` is returned by the `main()` function.
The activation tree is constructed at runtime as a program executes and function calls are made. Each node in the tree
represents an activation record for a particular function call, and the edges between nodes represent the nesting of
function calls.
The root node of the activation tree represents the initial call to the program, and each child node represents a function
call made during the execution of the program. The nodes are arranged in a tree structure because each function call
can have multiple nested function calls, which in turn can have their own nested function calls.
2. Stack Allocation: This involves allocating memory for local variables and function parameters on the program stack.
The memory is automatically released when the function returns. This strategy is simple and efficient but limited by the
size of the stack.
3. Heap Allocation: This involves allocating memory for dynamically allocated variables using functions like malloc() and
free(). The memory is not automatically released, and it can be fragmented and less efficient than other strategies.
4. Object Pooling: This involves pre-allocating a fixed number of objects of a certain type and then reusing them as
needed during program execution. This strategy can be efficient for objects with short lifetimes.
5. Garbage Collection: This involves automatically reclaiming memory that is no longer being used by the program. This
strategy can be more flexible than other strategies but can be slower and less predictable in terms of performance.
The choice of allocation strategy depends on the requirements of the program, including factors such as the size of the
program, the types of data being used, and the performance requirements.
Parameter passing and storage allocation are two important aspects of the compilation process that are essential for the
efficient execution of the compiled code. In this answer, we will discuss these two concepts in detail.
Parameter Passing:
Parameter passing is the process by which arguments are passed to a function when it is called. There are several ways
in which parameters can be passed to a function, including:
1. Pass-by-value: In this method, the value of the argument is copied and passed to the function. Any changes made to
the parameter inside the function do not affect the value of the argument outside the function.
2. Pass-by-reference: In this method, a reference to the memory location of the argument is passed to the function. Any
changes made to the parameter inside the function affect the value of the argument outside the function.
3. Pass-by-pointer: In this method, a pointer to the memory location of the argument is passed to the function. The
function can then access the value of the argument through the pointer.
4. Pass-by-name: In this method, the parameter is not evaluated until it is used inside the function. This allows for lazy
evaluation of arguments.
The choice of parameter passing method can have a significant impact on the performance of the compiled code. Pass-
by-value is generally the simplest method, but it can result in unnecessary copying of large data structures. Pass-by-
reference and pass-by-pointer can be more efficient, but they require careful memory management to avoid issues such
as memory leaks.
Storage Allocation:
Storage allocation is the process by which memory is allocated to variables and data structures in the compiled code.
There are several methods of storage allocation, including:
1. Static allocation: In this method, memory is allocated to variables at compile time. The memory remains allocated for
the entire lifetime of the program.
2. Stack allocation: In this method, memory is allocated to variables on a stack at runtime. The memory is automatically
deallocated when the function that allocated it returns.
3. Heap allocation: In this method, memory is allocated to variables on a heap at runtime. The memory remains
allocated until it is explicitly deallocated by the program.
4. Register allocation: In this method, variables are stored in registers rather than memory. This can result in faster
access times, but the number of available registers is limited.
The choice of storage allocation method can have a significant impact on the performance and memory usage of the
compiled code. Static allocation is generally the simplest method, but it can result in wasted memory for unused
variables. Stack allocation is efficient for small data structures, but it can lead to stack overflow errors for large data
structures. Heap allocation allows for dynamic memory management, but it can result in memory leaks if not used
carefully. Register allocation can be very efficient, but it is limited by the number of available registers.
Explain structural linear and Hybrid types of intermediate representations with examples.
In compiler design, intermediate representation (IR) is an abstract representation of the program code that is used by the
compiler to perform various optimization and transformation operations. There are three types of intermediate
representations: structural, linear, and hybrid. Let's discuss each of them in detail:
1. Structural IR: Structural IR represents the program code as a graph or tree-like structure, with nodes representing various
constructs of the programming language (such as statements, expressions, functions, etc.) and edges representing the
relationships between them. Examples of structural IR include abstract syntax trees (ASTs), control flow graphs (CFGs), and
data flow graphs (DFGs). For example, consider the following C code:
```
int a = 10;
int b = 20;
int c = a + b;
```
```
=
/\
c +
/\
a b
```
2. Linear IR: Linear IR represents the program code as a sequence of instructions that can be executed in order. Examples of
linear IR include three-address code (TAC), quadruples, and bytecode. For example, the TAC for the above C code would be:
```
t1 = a + b
c = t1
```
3. Hybrid IR: Hybrid IR combines the features of both structural and linear IRs, representing the program code as a
combination of graph-like structures and sequences of instructions. Examples of hybrid IR include low-level virtual machines
(LLVM) IR and Java bytecode. For example, consider the following Java code:
```
public int sum(int a, int b) {
int c = a + b;
return c;
}
```
```
define i32 @sum(i32 %a, i32 %b) {
entry:
%c = add i32 %a, %b
ret i32 %c
}
```
In this example, the LLVM IR represents the program code as a combination of basic blocks (graph-like structures) and
sequences of instructions within each basic block (linear IR).
In summary, structural IRs are useful for analyzing the structure of the program code, linear IRs are useful for performing
optimizations and transformations on the program code, and hybrid IRs provide a balance between the two. The choice of IR
depends on the requirements of the compiler and the programming language being compiled.
In this type of allocation, memory is allocated on the In this type of allocation, a heap is maintained
basis of the size of data objects. for the memory allocation at the run time.
Static allocation is a simple way but not an efficient Heap allocation is an efficient way of memory
Static Allocation Heap Allocation
In an array, static allocation is used for allocating the In the linked list, heap allocation is used for
memory. allocating the memory.
In this type of allocation, there is no chance of In this type of allocation, dynamic data
creating dynamic data structures and objects. structures and objects are created.
This type of allocation is comparatively cheaper and This type of allocation is comparatively
easy to implement. expensive and difficult to implement.
An activation tree is a data structure used to keep track of function and procedure activations in a program. It is a
hierarchical representation of the execution stack, where each node in the tree corresponds to an activation record for a
function or procedure. The activation record contains information about the function or procedure, such as local
variables, parameters, return address, and saved registers.
The root of the activation tree corresponds to the main program, and each child of a node represents a function or
procedure called by that node. When a function or procedure is called, a new activation record is created and pushed
onto the stack, and a new node is added to the activation tree. When the function or procedure returns, the activation
record is popped from the stack, and the corresponding node is removed from the activation tree.
The activation tree is used by the compiler to keep track of the variables and parameters of each function or procedure.
It is also used for error handling, debugging, and optimization.
A symbol table manager is a component of a compiler or interpreter that manages the symbol table, which is a data
structure used to store information about the identifiers (such as variables, functions, and classes) used in a program.
The symbol table manager is responsible for creating, updating, and searching the symbol table. When a new identifier is
encountered in the program, the symbol table manager creates a new entry in the symbol table and stores information
such as the name, type, scope, and address of the identifier. When the identifier is used later in the program, the symbol
table manager retrieves the information from the symbol table and uses it for type checking, code generation, and other
tasks.
The symbol table manager may also perform other tasks such as name resolution, scope management, and error
checking. It is an important component of a compiler or interpreter, as it enables the compiler to keep track of the
identifiers used in the program and ensure that they are used correctly.
1. Call by reference:
In call by reference, the address of the actual parameter is passed to the formal parameter. This means that any changes
made to the formal parameter inside the procedure or function will also affect the actual parameter. For example, consider
the following code:
```
void swap(int *x, int *y) {
int temp = *x;
*x = *y;
*y = temp;
}
int main() {
int a = 5, b = 10;
swap(&a, &b);
printf("a = %d, b = %d", a, b);
return 0;
}
```
In the above example, we are passing the addresses of variables a and b to the swap function. Inside the function, the values
of the variables are swapped using their addresses. As a result, when we print the values of a and b in the main function, we
get the output as "a = 10, b = 5".
2. Call by value:
In call by value, a copy of the actual parameter is passed to the formal parameter. This means that any changes made to the
formal parameter inside the procedure or function will not affect the actual parameter. For example, consider the following
code:
```
void increment(int x) {
x = x + 1;
printf("x inside function = %d\n", x);
}
int main() {
int a = 5;
increment(a);
printf("a = %d", a);
return 0;
}
```
In the above example, we are passing the value of variable a to the increment function. Inside the function, the value of x is
incremented by 1. However, since we are passing the value of a and not its address, the value of a remains unchanged outside
the function. As a result, when we print the value of a in the main function, we get the output as "a = 5".
3. Call by name:
In call by name, the formal parameter is replaced with the actual parameter expression each time it is used inside the
function. This means that any changes made to the formal parameter inside the function will also affect the actual parameter.
For example, consider the following code:
```
#define min(a, b) ((a) < (b) ? (a) : (b))
int main() {
int a = 5, b = 10;
int m = min(a++, b++);
printf("a = %d, b = %d, min = %d", a, b, m);
return 0;
}
```
In the above example, we are using the macro min to find the minimum of two values. Inside the macro, we are using call by
name to evaluate the expressions (a++) and (b++) each time they are used. As a result, when we print the values of a, b, and
the minimum in the main function, we get the output as "a = 6, b = 11, min = 5".
It supports recursion.
It creates a data structure for the data item dynamically.
Advantages
A symbol table is a data structure used by a compiler to store information about the symbols used in the
source code. It is used for semantic analysis, code generation, and optimization phases of a compiler. There
are various data structures used for symbol table organization. Some of them are:
Linear List: In this method, the symbol table is represented as a linear list of entries, where each entry
corresponds to a symbol in the source code. Each entry contains the symbol name, its type, and other
attributes. This method is simple and easy to implement, but it is not efficient for large symbol tables.
Hash Table: A hash table is a data structure that provides fast access to the symbol table. In this method, each
symbol is mapped to a hash value, which is used as an index in the table. The hash table provides constant-
time access to the symbol table and is efficient for large symbol tables.
Binary Search Tree: In this method, the symbol table is organized as a binary search tree. The nodes in the tree
are sorted based on the symbol name. Each node in the tree contains the symbol name, its type, and other
attributes. The binary search tree provides efficient searching and insertion of symbols in the symbol table.
AVL Tree: An AVL tree is a self-balancing binary search tree. In this method, the symbol table is organized as an
AVL tree. The nodes in the tree are sorted based on the symbol name. Each node in the tree contains the
symbol name, its type, and other attributes. The AVL tree provides efficient searching and insertion of symbols
in the symbol table, and it ensures that the tree remains balanced.
Red-Black Tree: A red-black tree is a self-balancing binary search tree. In this method, the symbol table is
organized as a red-black tree. The nodes in the tree are sorted based on the symbol name. Each node in the
tree contains the symbol name, its type, and other attributes. The red-black tree provides efficient searching
and insertion of symbols in the symbol table, and it ensures that the tree remains balanced.
These data structures are used by the compiler to organize the symbol table efficiently. The choice of data
structure depends on the size of the symbol table, the operations to be performed, and the efficiency require