Syntax Directed Translation in Compiler Design
Syntax Directed Translation in Compiler Design
Design
Parser uses a CFG(Context-free-Grammar) to validate the input string
and produce output for the next phase of the compiler. Output could be
either a parse tree or an abstract syntax tree. Now to interleave
semantic analysis with the syntax analysis phase of the compiler, we
use Syntax Directed Translation.
2
The above diagram shows how semantic analysis could happen. The
flow of information happens bottom-up and all the children’s attributes
are computed before parents, as discussed above. Right-hand side
nodes are sometimes annotated with subscript 1 to distinguish between
children and parents.
Additional Information
Synthesized Attributes are such attributes that depend only on the
attribute values of children nodes.
Thus [ E -> E+T { E.val = E.val + T.val } ] has a synthesized attribute
val corresponding to node E. If all the semantic attributes in an
augmented grammar are synthesized, one depth-first search traversal
in any order is sufficient for the semantic analysis phase.
Inherited Attributes are such attributes that depend on parent
and/or sibling’s attributes.
Thus [ Ep -> E+T { Ep.val = E.val + T.val, T.val = Ep.val } ], where E
& Ep are same production symbols annotated to differentiate between
parent and child, has an inherited attribute val corresponding to node
T.
3
compiler. It also separates the translation concerns from the parsing
concerns, allowing for more modular and extensible compiler designs.
Efficient code generation: SDT enables the generation of efficient
code by optimizing the translation process. It allows for the use of
techniques such as intermediate code generation and code
optimization.
4
Compiler - Intermediate Code
Generation
A source code can directly be translated into its target machine code,
then why at all we need to translate the source code into an
intermediate code which is then translated to its target code? Let us
see the reasons why we need an intermediate code.
5
Intermediate code can be either language specific (e.g., Byte Code
for Java) or language independent (three-address code).
Three-Address Code
For example:
a = b + c * d;
r1 = c * d;
r2 = b + r1;
a = r2
Quadruples
* c d r1
+ b r1 r2
+ r2 r1 r3
= r3 a
6
Triples
Op arg1 arg2
* c d
+ b (0)
+ (1) (0)
= (2)
Indirect Triples
Declarations
7
The source programming language and the target machine
architecture may vary in the way names are stored, so relative
addressing is used. While the first name is allocated memory starting
from the memory location 0 {offset=0}, the next name declared
later, should be allocated memory next to the first one.
Example:
int a;
float b;
Allocation process:
{offset = 0}
int a;
id.type = int
id.width = 2
float b;
id.type = float
id.width = 4
8
Optimization is a program transformation technique, which tries to
improve the code by making it consume less resources (i.e. CPU,
Memory) and deliver high speed.
The output code must not, in any way, change the meaning of
the program.
Optimization should increase the speed of the program and if
possible, the program should demand less number of resources.
Optimization should itself be fast and should not delay the
overall compiling process.
Machine-independent Optimization
do
{
item = 10;
value = value + item;
} while(value<100);
Item = 10;
do
9
{
value = value + item;
} while(value<100);
should not only save the CPU cycles, but can be used on any
processor.
Machine-dependent Optimization
Basic Blocks
10
Basic blocks are important concepts from both code generation and
optimization point of view.
11
Loop Optimization
Dead code is one or more than one code statements, which are:
There are some code statements whose computed values are used
only under certain circumstances, i.e., sometimes the values are
used and sometimes they are not. Such codes are known as partially
dead-code.
Partial Redundancy
13
expressions are computed more than once in a path, without any
change in operands. For example,
If (condition)
{
a = y OP z;
}
else
{
...
}
c = y OP z;
We assume that the values of operands (y and z) are not changed
from assignment of variable a to variable c. Here, if the condition
statement is true, then y OP z is computed twice, otherwise once.
Code motion can be used to eliminate this redundancy, as shown
below:
If (condition)
{
...
tmp = y OP z;
a = tmp;
...
}
else
14
{
...
tmp = y OP z;
}
c = tmp;
15
[t0 = a + b]
[t1 = t0 + c]
[d = t0 + t1]
Peephole Optimization
MOV x, R0
MOV R0, R1
We can delete the first instruction and re-write the sentence as:
MOV x, R1
Unreachable code
16
Unreachable code is a part of the program code that is never
accessed because of programming constructs. Programmers may
have accidently written a piece of code that can never be reached.
Example:
void add_ten(int x)
{
return x + 10;
printf(“value of x is %d”, x);
}
In this code segment, the printf statement will never be executed as
the program control returns back before it can execute,
hence printf can be removed.
There are instances in a code where the program control jumps back
and forth without performing any significant task. These jumps can
be removed. Consider the following chunk of code:
...
MOV R1, R2
GOTO L1
...
L1 : GOTO L2
L2 : INC R1
...
MOV R1, R2
GOTO L2
...
L2 : INC R1
Strength reduction
17
There are operations that consume more time and space. Their
‘strength’ can be reduced by replacing them with other operations
that consume less time and space, but produce the same result.
Code Generator
18
Ordering of instructions : At last, the code generator decides
the order in which the instruction will be executed. It creates
schedules for instructions to execute them.
Descriptors
The code generator has to track both the registers (for availability)
and addresses (location of values) while generating the code. For
both of them, the following two descriptors are used:
19
For an instruction x = y OP z, the code generator may perform the
following actions. Let us assume that L is the location (preferably
register) where the output of y OP z is to be saved:
20
2. Syntax Analysis: Adds information regarding attribute
type, scope, dimension, line of reference, use, etc in
the table.
3. Semantic Analysis: Uses available information in the
table to check for semantics i.e. to verify that
expressions and assignments are semantically
correct(type checking) and update it accordingly.
4. Intermediate Code generation: Refers symbol table
for knowing how much and what type of run-time is
allocated and table helps in adding temporary variable
information.
5. Code Optimization: Uses information present in the
symbol table for machine-dependent optimization.
6. Target Code generation: Generates code by using
address information of identifier present in the table.
Symbol Table entries – Each entry in the symbol table is associated
with attributes that support the compiler in different phases.
Use of Symbol Table-
The symbol tables are typically used in compilers. Basically compiler is
a program which scans the application program (for instance: your C
program) and produces machine code.
During this scan compiler stores the identifiers of that application
program in the symbol table. These identifiers are stored in the form of
name, value address, type.
Here the name represents the name of identifier, value represents the
value stored in an identifier, the address represents memory location of
that identifier and type represents the data type of identifier.
Thus compiler can keep track of all the identifiers with all the necessary
information.
Items stored in Symbol table:
Variable names and constants
Procedure and function names
Literal constants and strings
Compiler generated temporaries
Labels in source languages
Information used by the compiler from Symbol table:
Data type and name
Declaring procedures
Offset in storage
If structure or record then, a pointer to structure table.
For parameters, whether parameter passing by value or by
reference
Number and type of arguments passed to function
Base Address
21
Operations of Symbol table – The basic operations defined on a
symbol table include:
info …….
id1 id2 info2 id_n info_n
1 .
22
While inserting a new name we must ensure that it is not
already present otherwise an error occurs i.e. “Multiple
defined names”
Insertion is fast O(1), but lookup is slow for large tables – O(n)
on average
The advantage is that it takes a minimum amount of space.
1. Linked List –
This implementation is using a linked list. A link field is
added to each record.
Searching of names is done in order pointed by the link
of the link field.
A pointer “First” is maintained to point to the first
record of the symbol table.
Insertion is fast O(1), but lookup is slow for large
tables – O(n) on average
2. Hash Table –
In hashing scheme, two tables are maintained – a hash
table and symbol table and are the most commonly
used method to implement symbol tables.
A hash table is an array with an index range: 0 to table
size – 1. These entries are pointers pointing to the
names of the symbol table.
To search for a name we use a hash function that will
result in an integer between 0 to table size – 1.
Insertion and lookup can be made very fast – O(1).
The advantage is quick to search is possible and the
disadvantage is that hashing is complicated to
implement.
3. Binary Search Tree –
Another approach to implementing a symbol table is to
use a binary search tree i.e. we add two link fields i.e.
left and right child.
All names are created as child of the root node that
always follows the property of the binary search tree.
Insertion and lookup are O(log2 n) on average.
Advantages of Symbol Table
1. The efficiency of a program can be increased by using symbol
tables, which give quick and simple access to crucial data such
as variable and function names, data kinds, and memory
locations.
2. better coding structure Symbol tables can be used to organize
and simplify code, making it simpler to comprehend, discover,
and correct problems.
3. Faster code execution: By offering quick access to information
like memory addresses, symbol tables can be utilized to
optimize code execution by lowering the number of memory
accesses required during execution.
23
4. Symbol tables can be used to increase the portability of code
by offering a standardized method of storing and retrieving
data, which can make it simpler to migrate code between other
systems or programming languages.
5. Improved code reuse: By offering a standardized method of
storing and accessing information, symbol tables can be utilized
to increase the reuse of code across multiple projects.
6. Symbol tables can be used to facilitate easy access to and
examination of a program’s state during execution, enhancing
debugging by making it simpler to identify and correct
mistakes.
Disadvantages of Symbol Table
1. Increased memory consumption: Systems with low memory
resources may suffer from symbol tables’ high memory
requirements.
2. Increased processing time: The creation and processing of
symbol tables can take a long time, which can be problematic
in systems with constrained processing power.
3. Complexity: Developers who are not familiar with compiler
design may find symbol tables difficult to construct and
maintain.
4. Limited scalability: Symbol tables may not be appropriate for
large-scale projects or applications that require o the
management of enormous amounts of data due to their limited
scalability.
5. Upkeep: Maintaining and updating symbol tables on a regular
basis can be time- and resource-consuming.
6. Limited functionality: It’s possible that symbol tables don’t
offer all the features a developer needs, and therefore more
tools or libraries will be needed to round out their capabilities.
Applications of Symbol Table
1. Resolution of variable and function names: Symbol tables
are used to identify the data types and memory locations of
variables and functions as well as to resolve their names.
2. Resolution of scope issues: To resolve naming conflicts and
ascertain the range of variables and functions, symbol tables
are utilized.
3. Symbol tables, which offer quick access to information such as
memory locations, are used to optimize code execution.
4. Code generation: By giving details like memory locations and
data kinds, symbol tables are utilized to create machine code
from source code.
5. Error checking and code debugging: By supplying details
about the status of a program during execution, symbol tables
are used to check for faults and debug code.
24
6. Code organization and documentation: By supplying details
about a program’s structure, symbol tables can be used to
organize code and make it simpler to understand.
25