Unit-3 F&CD
Unit-3 F&CD
Conceptually, with both syntax-directed definition and translation schemes, we parse the input
token stream, build the parse tree, and then traverse the tree as needed to evaluate the semantic rules at
the parse tree nodes. Evaluation of the semantic rules may generate code, save information in a symbol
table, issue error messages, or perform any other activities. The translation of the token stream is the
result obtained by evaluating the semantic rules.
Definition
Syntax Directed Translation has augmented rules to the grammar that facilitate semantic
analysis. SDT involves passing information bottom-up and/or top-down to the parse tree in form of
attributes attached to the nodes.
Syntax-directed translation rules use
1) lexical values of nodes,
2) constants
3) attributes associated with the non-terminals in their definitions.
The general approach to Syntax-Directed Translation is to construct a parse tree or syntax tree
and compute the values of attributes at the nodes of the tree by visiting them in some order. In many
cases, translation can be done during parsing without building an explicit tree.
Example
E -> E+T | T
T -> T*F | F
F -> INTLIT
This is a grammar to syntactically validate an expression having additions and multiplications in it.
Now, to carry out semantic analysis we will augment SDT rules to this grammar, in order to pass some
information up the parse tree and check for semantic errors, if any. In this example, we will focus on
the evaluation of the given expression, as we don’t have any semantic assertions to check in this very
basic
example.
To evaluate translation rules, we can employ one depth-first search traversal on the parse tree.
This is possible only because SDT rules don’t impose any specific order on evaluation until children’s
attributes are computed before parents for a grammar having all synthesized attributes. Otherwise, we
would have to figure out the best-suited plan to traverse through the parse tree and evaluate all the
attributes in one or more traversals. For better understanding, we will move bottom-up in the left to
right fashion for computing the translation rules of our example.
The above diagram shows how semantic analysis could happen. The flow of information
happens bottom-up and all the children’s attributes are computed before parents, as discussed above.
Right-hand side nodes are sometimes annotated with subscript 1 to distinguish between children and
parents.
In syntax directed translation, along with the grammar we associate some informal notations and these
notations are called as semantic rules.
o In syntax directed translation, every non-terminal can get one or more than one attribute or
sometimes 0 attribute depending on the type of the attribute. The value of these attributes is
evaluated by the semantic rules associated with the production rule.
o In the semantic rule, attribute is VAL and an attribute may hold anything like a string, a number,
a memory location and a complex record
o In Syntax directed translation, whenever a construct encounters in the programming language then
it is translated according to the semantic rules define in that particular programming language.
Example
S→E$ { printE.VAL }
Syntax direct translation is implemented by constructing a parse tree and performing the actions in a left
to right depth first order.SDT is implementing by parse the input and produce a parse tree as a result.
Example
S→E$ { printE.VAL }
Each node in a syntax tree can be executed as data with multiple fields. In the node for an
operator, one field recognizes the operator and the remaining field includes a pointer to the
nodes for the operands. The operator is known as the label of the node. The following
functions are used to create the nodes of the syntax tree for the expressions with binary
operators. Each function returns a pointer to the recently generated node.
• mknode (op, left, right) − It generates an operator node with label op and two field
including pointers to left and right.
• mkleaf (id, entry) − It generates an identifier node with label id and the field including
the entry, a pointer to the symbol table entry for the identifier.
• mkleaf (num, val) − It generates a number node with label num and a field including val,
the value of the number. For example, construct a syntax tree for an expression a − 4 + c.
In this sequence, p1, p2, … . . p5are pointers to the symbol table entries for identifier 'a' and
'c' respectively.
Example1 − Draw Syntax Tree for the string a + b ∗ c − d.
The tree is generated in a bottom-up fashion. The function calls mkleaf (id, entry a) and mkleaf
(num 4) construct the leaves for a and 4. The pointers to these nodes are stored using p1and
p2. The call mknodes (′−′, p1, p2 ) then make the interior node with the leaves for a and 4 as
children. The syntax tree will be
Syntax Directed Translation of Syntax Trees
E → E(1) + E(2) {E. VAL = Node (+, E(1). VAL, E(2). VAL)}
E → E(1) ∗ E(2) {E. VAL = Node (∗, E(1). VAL, E(2). VAL)})
Node (+, 𝐄(𝟏), 𝐕𝐀𝐋, 𝐄(𝟐). 𝐕𝐀𝐋) will create a node labeled +.
E(1). VAL &E(2). VAL are left & right children of this node.
Similarly, Node (∗, E(1). VAL, E(2). VAL) will make the syntax as −
Function UNARY (−, E(1). VAL)will make a node – (unary minus) & E(1). VAL will be the only child
of it.
Function LEAF (id) will create a Leaf node with label id.
Example2 − Construct a syntax tree for the expression.
a = b ∗ −c + d
Solution
Expression 2: T1 = T0 +c
An array of records is used to hold the nodes of a syntax tree or DAG. Each row of the array corresponds
to a single record, and hence a single node. The first field in each record is an operation code, which
indicates the node’s label. In the given figure below, Interior nodes contain two more fields denoting
the left and right children, while leaves have one additional field that stores the lexical value (either a
symbol-table pointer or a constant in this instance).
Parse Trees Vs Syntax Trees-
Each interior node represents a grammar Each interior node represents an operator.
rule.
Each leaf node represents an operand.
Each leaf node represents a terminal.
Parse trees provide every characteristic Syntax trees do not provide every characteristic
information from the real syntax. information from the real syntax.
Parse trees are comparatively less dense Syntax trees are comparatively more dense than
than syntax trees. parse trees.
1. Synthesized attributes –
A Synthesized attribute is an attribute of the non-terminal on the left-hand side of a production.
Synthesized attributes represent information that is being passed up the parse tree. The attribute
can take value only from its children (Variables in the RHS of the production).
For eg. let’s say A -> BC is a production of a grammar, and A’s attribute is dependent on B’s
attributes or C’s attributes then it will be synthesized attribute.
2. Inherited attributes –
An attribute of a nonterminal on the right-hand side of a production is called an inherited
attribute. The attribute can take value either from its parent or from its siblings (variables in the
LHS or RHS of the production).
For example, let’s say A -> BC is a production of a grammar and B’s attribute is dependent
on A’s attributes or C’s attributes then it will be inherited attribute.
S-attributed and L-attributed SDT:
1. S-attributed SDT :
• If an SDT uses only synthesized attributes, it is called as S-attributed SDT.
• S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent nodes
depend upon the values of the child nodes.
• Semantic actions are placed in rightmost place of RHS.
2. L-attributed SDT:
• If an SDT uses both synthesized attributes and inherited attributes with a restriction
that inherited attribute can inherit values from left siblings only, it is called as L-
attributed SDT.
• Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing
manner.
• Semantic actions are placed anywhere in RHS.
For example,
A -> XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S}
is not an L-attributed grammar since Y.S = A.S and Y.S = X.S are allowed but Y.S = Z.S violates the
L-attributed SDT definition as attributed is inheriting the value from its right sibling.
Note – If a definition is S-attributed, then it is also L-attributed but NOT vice-versa.
• If a compiler translates the source language to its target machine language without having the option
for generating intermediate code, then for each new machine, a full native compiler is required.
• Intermediate code eliminates the need of a new full compiler for every unique machine by keeping
the analysis portion same for all the compilers.
• The second part of compiler, synthesis, is changed according to the target machine.
• It becomes easier to apply the source code modifications to improve code performance by applying
code optimization techniques on the intermediate code.
• If it can divide the compiler stages into two parts, i.e., Front end & Back end, then this
phase comes in between.
In the analysis-synthesis model of a compiler, the front end of a compiler translates a source
program into an independent intermediate code, then the back end of the compiler uses this intermediate
code to generate the target code (which can be understood by the machine).
The benefits of using machine-independent intermediate code are:
• Because of the machine-independent intermediate code, portability will be enhanced. For ex,
suppose, if a compiler translates the source language to its target machine language without
having the option for generating intermediate code, then for each new machine, a full native
compiler is required. Because, obviously, there were some modifications in the compiler
itself according to the machine specifications.
• Retargeting is facilitated.
• It is easier to apply source code modification to improve the performance of source code by
optimizing the intermediate code.
Intermediate Representation
Intermediate codes can be represented in a variety of ways and they have their own benefits.
• High Level IR - High-level intermediate code representation is very close to the source
language itself. They can be easily generated from the source code and we can easily apply
code modifications to enhance performance. But for target machine optimization, it is less
preferred.
• Low Level IR - This one is close to the target machine, which makes it suitable for register
and memory allocation, instruction set selection, etc. It is good for machine-dependent
optimizations.
Intermediate code can be either language specific (e.g., Byte Code for Java) or language independent
(three-address code).
Postfix Notation
Also known as reverse Polish notation or suffix notation. The ordinary (infix) way of writing the sum
of a and b is with an operator in the middle: a + b The postfix notation for the same expression places
the operator at the right end as ab +. In general, if e1 and e2 are any postfix expressions, and + is any
binary operator, the result of applying + to the values denoted by e1 and e2 is postfix notation by e1e2
+. No parentheses are needed in postfix notation because the position and arity (number of arguments)
of the operators permit only one way to decode a postfix expression. In postfix notation, the operator
follows the operand.
Example 1: The postfix representation of the expression (a + b) * c is : ab + c *
Example 2: The postfix representation of the expression (a – b) * (c + d) + (a – b) is : ab – cd + *ab
-+
Syntax Tree:
A syntax tree is nothing more than a condensed form of a parse tree. The operator and keyword nodes
of the parse tree are moved to their parents and a chain of single productions is replaced by the single
link in the syntax tree the internal nodes are operators and child nodes are operands. To form a syntax
tree put parentheses in the expression, this way it’s easy to recognize which operand should come
first.
Example: x = (a + b * c) / (a – b * c)
Three-Address Code
Intermediate code generator receives input from its predecessor phase, semantic analyzer, in the form of
an annotated syntax tree. That syntax tree then can be converted into a linear representation, e.g., postfix
notation. Intermediate code tends to be machine independent code. Therefore, code generator assumes to
have unlimited number of memory storage (register) to generate code.
For example:
a = b + c * d;
The intermediate code generator will try to divide this expression into sub-expressions and then generate
the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2
r being used as registers in the target program.
A three-address code has at most three address locations to calculate the expression. A three-address code
can be represented in two forms : quadruples and triples.
1. Quadruples
Each instruction in quadruples presentation is divided into four fields: operator, arg1, arg2, and result.
The above example is represented below in quadruples format:
* c d r1
+ b r1 r2
+ r2 r1 r3
= r3 A
2. Triples
Each instruction in triples presentation has three fields : op, arg1, and arg2.The results of respective sub-
expressions are denoted by the position of expression. Triples represent similarity with DAG and syntax
tree. They are equivalent to DAG while representing expressions.
Op arg1 arg2
* c D
+ b (0)
+ (1) (0)
= (2)
Triples face the problem of code immovability while optimization, as the results are positional and
changing the order or position of an expression may cause problems.
3. Indirect Triples
This representation is an enhancement over triples representation. It uses an extra array to list the pointer
to the triples in the desired sequence. This enables the optimizers to freely re-position the sub-expression
to produce an optimized code.
Advantages of Intermediate Code Generation
• It is Machine Independent. It can be executed on different platforms.
• It creates the function of code optimization easy. A machine-independent code optimizer
can be used to intermediate code to optimize code generation.
• It can perform efficient code generation.
• From the existing front end, a new compiler for a given back end can be generated.
Translation of Assignment Statements
In the syntax directed translation, assignment statement is mainly deals with expressions. The expression
can be of type real, integer, array and records.
1. S → id := E
2. E → E1 + E2
3. E → E1 * E2
4. E → (E1)
5. E → id
S → id :=E {p = look_up(id.name);
If p ≠ nil then
Emit (p = E.place)
Else
Error;
}
E → E1 + E2 {E.place = newtemp();
Emit (E.place = E1.place '+' E2.place)
}
E → E1 * E2 {E.place = newtemp();
Emit (E.place = E1.place '*' E2.place)
}
E → id {p = look_up(id.name);
If p ≠ nil then
Emit (p = E.place)
Else
Error;
}
Boolean expressions have two primary purposes. They are used for computing the logical values. They
are also used as conditional expression using if-then-else or while-do.
1. E → E OR E
2. E → E AND E
3. E → NOT E
4. E → (E)
5. E → id relop id
6. E → TRUE
7. E → FALSE
The AND and OR are left associated. NOT has the higher precedence then AND and lastly OR.
E → E1 OR E2 {E.place = newtemp();
Emit (E.place ':=' E1.place 'OR' E2.place)
}
E → E1 + E2 {E.place = newtemp();
Emit (E.place ':=' E1.place 'AND' E2.place)
}
The EMIT function is used to generate the three address code and the newtemp( ) function is used to
generate the temporary variables.
The E → id relop id2 contains the next_state and it gives the index of next three address statements in the
output sequence.
Here is the example which generates the three address code using the above translation scheme: