AT&CD Unit 3
AT&CD Unit 3
UNIT-III
Semantics: Syntax directed translation, S-attributed and L-attributed grammars, Intermediate
code – abstract syntax tree, translation of simple statements and control flow statements.
Context Sensitive features – Chomsky hierarchy of languages and recognizers. Type checking,
type conversions, equivalence of type expressions, overloading of functions and operations.
Intermediate code
A source code can directly be translated into its target machine code, then why at all we
need to translate the source code into an intermediate code, which is then translated to its target
code? Let us see the reasons why we need an intermediate code.
If a compiler translates the source language to its target machine language without having
the option for generating intermediate code, then for each new machine, a full native
compiler is required.
Intermediate code eliminates the need of a new full compiler for every unique machine by
keeping the analysis portion same for all the compilers.
The second part of compiler, synthesis, is changed according to the target machine.
It becomes easier to apply the source code modifications to improve code performance by
applying code optimization techniques on the intermediate code.
Intermediate Representation
Intermediate codes can be represented in a variety of ways and they have their own benefits.
High Level IR - High-level intermediate code representation is very close to the source
language itself. They can be easily generated from the source code and we can easily
apply code modifications to enhance performance. But for target machine optimization, it
is less preferred.
Low Level IR - This one is close to the target machine, which makes it suitable for
register and memory allocation, instruction set selection, etc. It is good for machine-
dependent optimizations.
Intermediate code can be either language specific (e.g., Byte Code for Java) or language
independent (three-address code).
Three-Address Code
Intermediate code generator receives input from its predecessor phase, semantic analyzer,
in the form of an annotated syntax tree. That syntax tree then can be converted into a linear
representation, e.g., postfix notation. Intermediate code tends to be machine independent code.
Therefore, code generator assumes to have unlimited number of memory storage (register) to
generate code.
For example:
a = b + c * d;
The intermediate code generator will try to divide this expression into sub-expressions
and then generate the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2
o In the parse tree, most of the leaf nodes are single child to their
parent nodes.
o In the syntax tree, we can eliminate this extra information.
o Syntax tree is a variant of parse tree. In the syntax tree, interior
nodes are operators and leaves are operands.
o Syntax tree is usually used when represent a program in a tree
structure.
A sentence id + id * id would have the following syntax tree:
E.CODE & S.CODE are a sequence of statements which generate three address code.
E.TRUE is the label to which control flow if E is true.
E.FALSE is the label to which control flow if E is false.
The code for E generates a jump to E.TRUE if E is true and a jump to S.NEXT if E is false.
∴ E.FALSE=S.NEXT in the following table.
In the following table, a new label is allocated to E.TRUE.
When S1.CODE will be executed, and the control will be jumped to statement following S, i.e.,
to S1.NEXT.
∴ S1. NEXT = S. NEXT.
Syntax Directed Translation for "If E then S1."
If E is true, control will go to E.TRUE, i.e., S1.CODE will be executed and after that S.NEXT
appears after S1.CODE.
If E.CODE will be false, then S2.CODE will be executed.
Initially, both E.TRUE & E.FALSE are taken as new labels. Hen S1.CODE at label E.TRUE is
executed, control will jump to S.NEXT.
Therefore, after S1, control will jump to the next statement of complete statement S.
S1.NEXT=S.NEXT
Similarly, after S2.CODE, the next statement of S will be executed.
∴ S2.NEXT=S.NEXT
Syntax Directed Translation for "If E then S1 else S2."
Production Semantic Rule
𝐒 → 𝐢𝐟 𝐄 𝐭𝐡𝐞𝐧 𝐒𝟏 𝐞𝐥𝐬𝐞 𝐒𝟐 E. TRUE = newlabel;
E. FALSE = newlabel;
S1. NEXT = S. NEXT;
S2. NEXT = S. NEXT;
S. CODE = E. CODE | | GEN (E. TRUE '− ') | | S1. CODE
GEN(goto S. NEXT) | |
GEN (E. FALSE −) | | S2. CODE
𝐒 → 𝐰𝐡𝐢𝐥𝐞 𝐄 𝐝𝐨 𝐒𝟏
Another important control statement is while E do S1, i.e., statement S1 will be executed till
Expression E is true. Control will arrive out of the loop as the expression E will become false.
A Label S. BEGIN is created which points to the first instruction for E. Label E. TRUE
is attached with the first instruction for S1. If E is true, control will jump to the label E. TRUE &
S1. CODE will be executed. If E is false, control will jump to E. FALSE. After S1. CODE,
again control will jump to S. BEGIN, which will again check E. CODE for true or false.
∴ S1. NEXT = S. BEGIN
Type 3: Regular Grammar: Type-3 grammars generate regular languages. These languages
are exactly all languages that can be accepted by a finite-state automaton. Type 3 is the most
restricted form of grammar.
Type 3 should be in the given form only :
V --> VT / T (left-regular grammar)
(or)
V --> TV /T (right-regular grammar)
For example:
S --> a
Type checking
Type checking is the process of verifying and enforcing constraints of types in values.
A compiler must check that the source program should follow the syntactic and semantic
conventions of the source language and it should also check the type rules of the language. It
allows the programmer to limit what types may be used in certain circumstances and assigns
types to values. The type-checker determines whether these values are used appropriately or
not.
It checks the type of objects and reports a type error in the case of a violation, and
incorrect types are corrected. Whatever the compiler we use, while it is compiling the program,
it has to follow the type rules of the language. Every language has its own set of type rules for
the language. We know that the information about data types is maintained and computed by
the compiler.
The information about data types like INTEGER, FLOAT, CHARACTER, and all the
other data types is maintained and computed by the compiler. The compiler contains modules,
where the type checker is a module of a compiler and its task is type checking.
Conversion - Conversion from one type to another type is known as implicit if it is to be
done automatically by the compiler. Implicit type conversions are also called Coercion and
coercion is limited in many languages.
Example: An integer may be converted to a real but real is not converted to an integer.
Conversion is said to be Explicit if the programmer writes something to do the Conversion.
Tasks:
1. has to allow “Indexing is only on an array”
2. has to check the range of data types used
3. INTEGER (int) has a range of -32,768 to +32767
4. FLOAT has a range of 1.2E-38 to 3.4E+38.
Types of Type Checking:
There are two kinds of type checking:
1. Static Type Checking.
2. Dynamic Type Checking.
Static Type Checking: Static type checking is defined as type checking performed at compile
time. It checks the type variables at compile time, which means the type of the variable is
known at the compile time. It generally examines the program text during the translation of the
program. Using the type rules of a system, a compiler can infer from the source text that a
function (fun) will be applied to an operand (a) of the right type each time the expression
fun(a) is evaluated.
The Benefits of Static Type Checking:
Runtime Error Protection.
Overloading:
An Overloading symbol is one that has different operations depending on its context.
Overloading is of two types:
1. Operator Overloading
2. Function Overloading
Operator Overloading: In Mathematics, the arithmetic expression “x+y” has the addition
operator „+‟ is overloaded because „+‟ in “x+y” have different operators when „x‟ and „y‟ are
integers, complex numbers, reals, and Matrices.
Example: In Ada, the parentheses „()‟ are overloaded, the i th element of the expression A(i) of
an Array A has a different meaning such as a „call to function „A‟ with argument „i‟ or an
explicit conversion of expression i to type „A‟. In most languages the arithmetic operators are
overloaded.
Function Overloading: The Type Checker resolves the Function Overloading based on types
of arguments and Numbers.
Example:
E-->E1(E2)
{
E.type:= if E2.type = s
E1.type = s -->t then t
else type_error
}
cell = record
info: type;
next: link;
end;
The corresponding type graph has a cycle. So to decide structural equivalence of two
types represented by graphs PASCAL compilers put a mark on each visited node (in order not to
visit a node twice). In C, a linked list is usually defined as follows.
struct cell {
int info;
struct cell *next;
};
To avoid cyclic graphs, C compilers
require type names to be declared before they are used, except for pointers to records.
use structural equivalence except for records for which they use name equivalence.