0% found this document useful (0 votes)
56 views13 pages

Syntax-Directed Definition

Uploaded by

harshini114131
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views13 pages

Syntax-Directed Definition

Uploaded by

harshini114131
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN

Syntax Directed Definition  Production 3, E T, has a single rule that defines the value of val for E to be the
A syntax-directed definition (SDD) is a context-free grammar together with attributes same as the value of val at the child for T.
and rules. Attributes are associated with grammar symbols and rules are associated  Production 4 is similar to the second production; its rule multiplies the values at
with productions. If X is a symbol and a is one of its attributes, then we write X.a to the children instead of adding them.
denote the value of a at a particular parse-tree node labeled X. For example, an infix-to-  The rules for productions 5 and 6 copy values at a child, like that for the third
postfix translator might have a production and rule production. Production 7 gives F.val the value of a digit, that is, the numerical
PRODUCTION SEMANTIC RULE value of the token digit that the lexical analyzer returned.
E  E1 + T E.code = E1.code || T.code || ‘+’
This production has two non terminals, E and T; the subscript in E1 distinguishes the An SDD that involves only synthesized attributes is called S-attributed; the SDD in Fig.
occurrence of E in the production body from the occurrence of E as the head. Both E and 3.3.1 has this property. In an S-attributed SDD, each rule computes an attribute for the
T have a string-valued attribute code. The semantic rule specifies that the string E.code is nonterminal at the head of a production from attributes taken from the body of the
formed by concatenating E1.code, T.code, and the character '+'. production.
An SDD without side effects is sometimes called an attribute grammar. The rules in an
1.1 Inherited and Synthesized Attributes attribute grammar define the value of an attribute purely in terms of the values of other
Two kinds of attributes for nonterminals: attributes and constants.
1. A synthesized attribute for a nonterminal A at a parse-tree node N is defined by a
semantic rule associated with the production at N. Note that the production must 1.2 Evaluating an SDD at the Nodes of a Parse Tree
have A as its head. A synthesized attribute at node N is defined only in terms of A parse tree, showing the value(s) of its attribute(s) is called an annotated parse tree. For
attribute values at the children of N and at N itself. SDD's with both inherited and synthesized attributes, there is no guarantee that there is
2. An inherited attribute for a nonterminal B at a parse-tree node N is defined by a even one order in which to evaluate attributes at nodes.
semantic rule associated with the production at the parent of N. Note that the
production must have B as a symbol in its body. An inherited attribute at node N Example 3.3.2: Figure 3.3.2 shows an annotated parse tree for the input string3 * 5 + 4 n,
is defined only in terms of attribute values at N's parent, N itself, and N's constructed using the grammar and rules of Fig. 3.3.1. The values of lexval are presumed
siblings. supplied by the lexical analyzer. Each of the nodes for the nonterminals has attribute val
computed in a bottom-up order, and we see the resulting values associated with each
Example 3.3.1: The SDD in Fig. 3.3.1 is based on our familiar grammar for arithmetic node. For instance, at the node with a child labeled *, after computing T.val = 3 and F.val=
expressions with operators + and *. It evaluates expressions terminated by an end marker 5 at its first and third children, we apply the rule that says T.val is the product of these
n. In the SDD, each of the nonterminals has a single synthesized attribute, called val. We two values,or 15.
also suppose that the terminal digit has a synthesized attribute lexval, which is an integer
value returned by the lexical analyzer.

Figure 3.3.1 : Syntax-directed definition of a simple desk calculator


 The rule for production 1, L  E n, sets L.val to E.val, which we shall see is the
numerical value of the entire expression.
Figure 3.3.2: An SDD based on a grammar suitable for top-down parsing
 Production 2, E  E1+ T, also has one rule, which computes the val attribute for
the head E as the sum of the values at E1and T. At any parse tree node N labeled E, Example 3.3.3: The SDD in Fig. 3.3.3 computes terms like 3 * 5 and 3 * 5 * 7.The top-
the value of val for E is the sum of the values of val at the children of node N down parse of input 3*5 begins with the production T  FT'. Here, F generates the digit
labeled E and T.

Page 1 Page 2
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN

3, but the operator * is generated by T’. Thus, the left operand 3 appears in a different With T'.inh = 3 and F.val = 5, we get T1’.inh = 15 . At the lower node for T1’, the production
sub tree of the parse tree from *. An inherited attribute will therefore be used to pass the is T’-> ε. The semantic rule T'.syn= T'.inhdefinesT1’.syn =15. The syn attributes at the
operand to the operator. nodes for T’ pass the value 15 up the tree to the node for T, where T.val = 15.

Evaluation Orders for SDD's


1. Dependency Graphs
A dependency graph depicts the flow of information among the attribute instances in a
particular parse tree; an edge from one attribute instance to another means that the
value of the first is needed to compute the second. Edges express constraints implied by
the semantic rules.
 For each parse-tree node, say a node labeled by grammar symbol X, the
dependency graph has a node for each attribute associated with X.
Figure 3.3.3: An SDD based on a grammar suitable for top-down parsing  Suppose that a semantic rule associated with a production p defines the value of
Each of the nonterminals T and F has a synthesized attribute val; the terminal digit has a synthesized attribute A.b in terms of the value of X.c. Then, the dependency graph
synthesized attribute lex val. The nonterminal T has two attributes: an inherited attribute has an edge from X.c to A.b. More precisely, at every node N labeled A where
inh and a synthesized attribute syn. production p is applied; create an edge to attribute b at N, from the attribute c at
The semantic rules are based on the idea that the left operand of the operator* is the child of N corresponding to this instance of the symbol X in the body of the
inherited. More precisely, the head T’ of the production T'  * F T1’ inherits the left production.
operand of * in the production body. Given a term x * y * z, the root of the subtree for * y  Suppose that a semantic rule associated with a production p defines the value of
* z inherits x. Then, the root of the subtree for * z inherits the value of x * y, and so on, if inherited attribute B.c in terms of the value of X.a. Then, the dependency graph
there are more factors in the term. Once all the factors have been accumulated, the has an edge from X.a to B.c. For each node N labeled B that corresponds to an
result is passed back up the tree using synthesized attributes. occurrence of this B in the body of production p, create an edge to attribute c at N
from the attribute a at the node M that corresponds to this occurrence of X. Note
To see how the semantic rules are used, consider the annotated parse tree for 3 * 5 in Fig. that M could be either the parent or a sibling of N.
3.3.5. The leftmost leaf in the parse tree, labeled digit, has attribute value lexval= 3,
where the 3 is supplied by the lexical analyzer. Its parent is for production 4, F -> digit. Example 3.4.1: Consider the following production and rule:
The only semantic rule associated with this production defines F.val= digit.lexval, which PRODUCTION SEMANTIC RU LE
equals 3. E ->E1+ T E.val= E1.val + T.val
At every node N labeled E, with children corresponding to the body of this production,
the synthesized attribute val at N is computed using the values of val at the two children,
labeled E and T. Thus, a portion of the dependency graph for every parse tree in which
this production is used looks like Fig. 3.4.1. As a convention, we shall show the parse tree
edges as dotted lines, while the edges of the dependency graph are solid.

Figure 3.3.5: Annotated parse tree for 3 * 5


At the second child of the root, the inherited attribute T'.inh is defined by the semantic Figure 3.4.1: E.val is synthesized from Ei.val and T.val
rule T'.inh = F.val associated with production 1. Thus, the left operand, 3, for the *
operator is passed from left to right across the children of the root. Example 3.4.2: An example of a complete dependency graph appears in Fig. 3.4.2. The
nodes of the dependency graph, represented by the numbers 1 through 9, correspond to
The production at the node for T is T’ * FT1’.(We retain the subscript1 in the annotated the attributes in the annotated parse tree in Fig. 3.3.5
parse tree to distinguish between the two nodes for T'.) The inherited attribute T1’.inh is
defined by the semantic rule T1’.inh = T'.inh * F.val associated with production 2.

Page 3 Page 4
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN

performing a postorder traversal of the parse tree and evaluating the attributes at a node
N when the traversal leaves N for the last time. That is, we apply the function postorder,
defined below, to the root of the parse tree
postorder(N) {
for ( each child C of N, from the left ) postorder(C);
evaluate the attributes associated with node N;
}
S-attributed definitions can be implemented during bottom-up parsing, since a bottom-up
parse corresponds to a postorder traversal. Specifically, postorder corresponds exactly to
. the order in which an LR parser reduces a production body to its head.
Figure 3.4.2: Dependency graph for the annotated parse tree of Fig. 3.3.5
 Nodes 1 and 2 represent the attribute lexval associated with the two leaves 3. L-Attributed Definitions
labeled digit. The second class of SDD's is called L-attributed definitions. The idea behind this class is
 Nodes 3 and 4 represent the attribute val associated with the two nodes labeled F. that, between the attributes associated with a production body, dependency-graph edges
The edges to node 3 from 1 and to node 4 from 2 result from the semantic rule can go from left to right, but not from right to left(hence "L-attributed"). More precisely,
that defines F.val in terms of digit.lexval In fact, F.val equals digit .lexval, but the each attribute must be either
edge represents dependence, not equality. 1. Synthesized, or
 Nodes 5 and 6 represent the inherited attribute T'.inh associated with each of the 2. Inherited, but with the rules limited as follows. Suppose that there is a production
occurrences of nonterminal T'. The edge to 5 from 3 is due to the rule T'.inh = A X1X2 …. Xn, and that there is an inherited attribute Xi.a computed by a rule
F.val, which defines T'.inh at the right child of the root from F.val at the left child. associated with this production. Then the rule may use only:
We see edges to 6 from node 5 for T'.inh and from node 4 for F.val, because these (a) Inherited attributes associated with the head A.
values are multiplied to evaluate the attribute inh at node 6. (b) Either inherited or synthesized attributes associated with the
 Nodes 7 and 8 represent the synthesized attribute syn associated with the occurrences of symbols X1X2 …. Xn located to the left of Xi.
occurrences of X". The edge to node 7 from 6 is due to the semantic rule T'.syn = (c) Inherited or synthesized attributes associated with this occurrence of Xi
T'.inh associated with production 3 in Fig. 3.3.4. The edge to node 8 from 7 is due itself, but only in such a way that there are no cycles in a dependency
to a semantic rule associated with production 2. graph formed by the attributes of this Xj .
 Finally, node 9 represents the attribute T.val. The edge to 9 from 8 is due to the
semantic rule, T.val = T'.syn, associated with production 1. Example 3.4.2: The SDD in Fig. 3.3.4 is L-attributed. To see why, consider the semantic
rules for inherited attributes, which are repeated here for convenience:
The dependency graph of Fig. 3.4.2 has no cycles. One topological sort is the order in PRODUCTION SEMANTIC RULE
which the nodes have already been numbered: 1,2,... ,9. Notice that every edge of the T  FT' T'.inh = F.val
graph goes from a node to a higher-numbered node, so this order is surely a topological T' ->*F T1’ T1’. Inh = T'. inhx F. val
sort. There are other topological sorts as well, such as 1,3,5,2,4,6,7,8,9.
The first of these rules defines the inherited attribute T'.inh using only F.val, and F
2. S-Attributed Definitions appears to the left of T' in the production body, as required. The second rule defines
Given an SDD, it is very hard to tell whether there exist any parse trees whose T[.inh using the inherited attribute T'.inh associated with the head, and F.val, where F
dependency graphs have cycles. In practice, translations can be implemented using appears to the left of T[ in the production body.
classes of SDD's that guarantee an evaluation order, since they do not permit dependency In each of these cases, the rules use information "from above or from the left," as
graphs with cycles. Moreover, the two classes can be implemented efficiently in required by the class. The remaining attributes are synthesized. Hence, the SDD is L-
connection with top-down or bottom-up parsing. attributed.
The first class is defined as follows:
 An SDD is S-attributed if every attribute is synthesized. Applications of Syntax-Directed Translation
The main application in the construction of syntax trees is some compilers use syntax
Example 3.4.2: The SDD of Fig 3.3.1 is an example of an S-attributed definition. Each trees as an intermediate representation, a common form of SDD turns its input string into
attribute, L.val, E.val, T.val, and F.val is synthesized. a tree. We consider two SDD's for constructing syntax trees for expressions. The first, an
S-attributed definition, is suitable for use during bottom-up parsing. The second, L-
When an SDD is S-attributed, we can evaluate its attributes in any bottom up order of the attributed, is suitable for use during top-down parsing.
nodes of the parse tree. It is often especially simple to evaluate the attributes by

Page 5 Page 6
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN

3.1 Construction of Syntax Trees


Each node in a syntax tree represents a construct; the children of the node represent the
meaningful components of the construct. A syntax-tree node representing an expression
E1 + E2 has label + and two children representing the sub expressions E1and E2.

We shall implement the nodes of a syntax tree by objects with a suitable number of
fields. Each object will have an op field that is the label of the node. The objects will have
additional fields as follows:
 If the node is a leaf, an additional field holds the lexical value for the leaf. A
constructor function Leaf (op, val) creates a leaf object. Alternatively, if nodes
are viewed as records, then Leaf returns a pointer to a new record for a leaf.
 If the node is an interior node, there are as many additional fields as the node
has children in the syntax tree. A constructor function Node takes two or more
arguments: Node(op,c1,c2,... ,ck) creates an object with first field op and k
additional fields for the k children c1, c2,... ,ck.

Example 3.5.1: The S-attributed definition in Fig. 3.5.1 constructs syntax trees for a simple
expression grammar involving only the binary operators +and -. As usual, these operators
are at the same precedence level and are jointly left associative. All nonterminals have
one synthesized attribute node, which represents a node of the syntax tree. Figure 3.5.2: Syntax tree for a — 4 + c

The nodes of the syntax tree are shown as records, with the op field first. Syntax-tree
edges are now shown as solid lines. The underlying parse tree, which need not actually be
constructed, is shown with dotted edges. The third type of line, shown dashed, represents
the values of E.node and T-node; each line points to the appropriate syntax-tree node.

At the bottom we see leaves for a, 4 and c, constructed by Leaf. We suppose that the
lexical value id. entry points into the symbol table, and the lexical value num.val is the
Figure 3.5.1 : Constructing syntax trees for simple expressions numerical value of a constant. These leaves, or pointers to them, become the value of
Every time the first production EE1 + T is used, its rule creates a node with ' + ' for op T.node at the three parse-tree nodes labeled T, according to rules 5 and 6. Note that by
and two children, E1.node and T.node, for the sub expressions. The second production has rule 3, the pointer to the leaf for a is also the value of E.node for the leftmost E in the
a similar rule. parse tree.
For production 3, E  T, no node is created, since E.node is the same as T.node. Similarly,
no node is created for production 4, T (E ) . The value of T.node is the same as E.node, Rule 2 causes us to create a node with op equal to the minus sign and pointers to the first
since parentheses are used only for grouping; they influence the structure of the parse two leaves. Then, rule 1 produces the root node of the syntax tree by combining the node
tree and the syntax tree, but once their job is done, there is no further need to retain for — with the third leaf.
them in the syntax tree.
The last two T-productions have a single terminal on the right. We use the constructor If the rules are evaluated during a postorder traversal of the parse tree, or with
Leaf to create a suitable node, which becomes the value of T.node. reductions during a bottom-up parse, then the sequence of steps shown in Fig. 3.5.3 ends
with p5 pointing to the root of the constructed syntax tree.
Figure 3.5.2 shows the construction of a syntax tree for the input a — 4 + c.

Figure 3.5.3: Steps in the construction of the syntax tree for a — 4 + c

Page 7 Page 8
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN

Example 3.5.2: The L-attributed definition in Fig. 3.5.4 performs the same translation as Since there is no more input, at node 9, E'.inh points to the root of the entire syntax tree.
the S-attributed definition in Fig. 3.5.1. The attributes for the grammar symbols E, T, id, The syn attributes pass this value back up the parse tree until it becomes the value of
and num are as discussed in Example 3.5.2. E.node. Specifically, the attribute value at node 10 is defined by the rule E'.syn = E'.inh
associated with the production E' —> e. The attribute value at node 11 is defined by the
rule E'.syn = E1’.syn associated with production 2 in Fig.3.5.4. Similar rules define the
attribute values at nodes 12 and 13.

Syntax-Directed Translation Schemes


Syntax-directed translation schemes are a complementary notation to syntax directed
definitions. A syntax-directed translation scheme (SDT) is a context free grammar with
program fragments embedded within production bodies. The program fragments are
called semantic actions and can appear at any position within a production body.
Any SDT can be implemented by first building a parse tree and then performing the
actions in a left-to-right depth-first order; that is, during a preorder traversal.
Typically, SDT's are implemented during parsing, without building a parse tree. To
implement SDT's, two important classes of SDD's:
1. The underlying grammar is LR-parsable, and the SDD is S-attributed.
2. The underlying grammar is LL-parsable, and the SDD is L-attributed.
Figure 3.5.4 : Constructing syntax trees during top-down parsing
1. Postfix Translation Schemes
The rules for building syntax trees in this example are similar to the rules for the desk The simplest SDD implementation occurs when we can parse the grammar bottom-up
calculator. In the desk-calculator example, a term x * y was evaluated by passing x as an and the SDD is S-attributed. In that case, we can construct an SDT in which each action is
inherited attribute, since x and * y appeared in different portions of the parse tree. Here, placed at the end of the production and is executed along with the reduction of the body
the idea is to build a syntax tree for x + y by passing x as an inherited attribute, since x and to the head of that production. SDT's with all actions at the right ends of the production
+ y appear in different sub trees. Nonterminal E' is the counterpart of nonterminal T’. bodies are called postfix SDT's.

Example 3.6.1: The postfix SDT in Fig. 3.6.1 implements the desk calculator SDD of Fig.
3.3.1, with one change: the action for the first production prints a value. The remaining
actions are exact counterparts of the semantic rules. Since the underlying grammar is LR,
and the SDD is S-attributed, these actions can be correctly performed along with the
reduction steps of the parser.

Figure 3.5.5 : Dependency graph for a - 4 + c, with the SDD of Fig. 3.5.2
Figure 3.6.1: Postfix SDT implementing the desk calculator
Nonterminal E' has an inherited attribute inh and a synthesized attribute syn. Attribute
E'.inh represents the partial syntax tree constructed so far. Specifically, it represents the 2. Parser-Stack Implementation of Postfix SDT's
root of the tree for the prefix of the input string that is to the left of the subtree for E'. At Postfix SDT's can be implemented during LR parsing by executing the actions when
node 5 in the dependency graph in Fig. 3.5.5, E'.inh denotes the root of the partial syntax reductions occur. The attribute(s) of each grammar symbol can be put on the stack in a
tree for the identifier a; that is, the leaf for a. At node 6, E'.inh denotes the root for the place where they can be found during the reduction. The best plan is to place the
partial syntax tree for the input a — 4. At node 9, E'.inh denotes the syntax tree for a — 4 attributes along with the grammar symbols (or the LR states that represent these
+ c. symbols) in records on the stack itself.

Page 9 Page 10
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN

In Fig. 3.6.2, the parser stack contains records with a field for a grammar symbol (or Thus, we may refer to the attribute E.val that appears at the third position on the stack as
parser state) and, below it, a field for an attribute. The three grammar symbols X YZ are stack[top -2].val. The entire SDT is shown in Fig. 3.6.3.
on top of the stack; perhaps they are about to be reduced according to a production like
A —> X YZ. Here, we show X.xas the one attribute of X, and so on. In general, we can For instance, in the second production, E  E1 + T, we go two positions below the top to
allow for more attributes, either by making the records large enough or by putting get the value of E1, and we find the value of T at the top. The resulting sum is placed
pointers to records on the stack. With small attributes, it may be simpler to make the where the head E will appear after the reduction, that is, two positions below the current
records large enough, even if some fields go unused some of the time. However, if one or top. The reason is that after the reduction, the three topmost stack symbols are replaced
more attributes are of unbounded size — say, they are character strings — then it would by one. After computing E.val, we pop two symbols off the top of the stack, so the record
be better to put a pointer to the attribute's value in the stack record and store the actual where we placed E.val will now be at the top of the stack.
value in some larger, shared storage area that is not part of the stack.
In the third production, E  T, no action is necessary, because the length of the stack
does not change, and the value of T.val at the stack top will simply become the value of
E.val. The same observation applies to the productions F and F digit. Production F (
E) is slightly different. Although the value does not change, two positions are removed
from the stack during the reduction, so the value has to move to the position after the
Figure 3.6.2: Parser stack with a field for synthesized attributes reduction.

If the attributes are all synthesized, and the actions occur at the ends of the productions, 3. SDT's With Actions inside Productions
then we can compute the attributes for the head when we reduce the body to the head. An action may be placed at any position within the body of a production. It is performed
If we reduce by a production such as A X Y Z, then we have all the attributes of X, Y, and immediately after all symbols to its left are processed. Thus, if we have a production  X
Z available, at known positions on the stack, as in Fig.3.6.2. After the action, A and its {a} Y, the action a is done after we have recognized X (if X is a terminal) or all the
attributes are at the top of the stack, in the position of the record for X. terminals derived from X (if X is a nonterminal). More precisely,
 If the parse is bottom-up, then we perform action a as soon as this occurrence of X
Example 3.6.2: Let us rewrite the actions of the desk-calculator SDT of Example 3.6.1 so appears on the top of the parsing stack.
that they manipulate the parser stack explicitly. Such stack manipulation is usually done  If the parse is top-down, we perform a just before we attempt to expand this
automatically by the parser. occurrence of Y (if Y a nonterminal) or check for Y on the input (if Y is a terminal).

SDT's that can be implemented during parsing include postfix SDT's and a class of SDT's
that implements L-attributed definitions.
Example 3.6.3: As an extreme example of a problematic SDT, suppose that we turn our
desk-calculator running example into an SDT that prints the prefix form of an expression,
rather than evaluating the expression. The productions and actions are shown in Fig.
3.6.4.

Figure 3.6.4: Problematic SDT for infix-to-prefix translation during parsing


Figure 3.6.3: Implementing the desk calculator on a bottom-up parsing stack
Unfortunately, it is impossible to implement this SDT during either topdown or bottom-up
Suppose that the stack is kept in an array of records called stack, with top a cursor to the parsing, because the parser would have to perform critical actions, like printing instances
top of the stack. Thus, stack [top] refers to the top record on the stack, stack [top - 1] to of * or +, long before it knows whether these symbols will appear in its input.
the record below that, and so on. Also, we assume that each record has a field called val,
which holds the attribute of whatever grammar symbol is represented in that record.

Page 11 Page 12
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN

Using marker nonterminals M2 and M4 for the actions in productions 2and 4, order in any left-to-right parse, top-down or bottom-up. The "trick" for eliminating left
respectively, on input 3, a shift-reduce parser has conflicts between reducing by M2ε, recursion is to take two productions
reducing by M4  ε, and shifting the digit. A  Aα| β
that generate strings consisting of a β and any number of α's, and replace them by
Any SDT can be implemented as follows: productions that generate the same strings using a new nonterminal R (for “remainder")
1. Ignoring the actions, parse the input and produce a parse tree as a result. of the first production:
2. Then, examine each interior node N, say one for production A α. Add additional A  βR
children to N for the actions in a, so the children of N from left to right have R  αR | ε
exactly the symbols and actions of α. If βdoes not begin with A, then A no longer has a left-recursive production. In regular-
3. Perform a preorder traversal of the tree, and as soon as a node labeled by an definition terms, with both sets of productions, A is defined by β(α)*.
action is visited, perform that action.
Example 3.6.4: Consider the following E-productions from an SDT for translating infix
For instance, Fig. 3.6.5 shows the parse tree for expression 3 * 5 + 4 with actions inserted. expressions into postfix notation:
If we visit the nodes in preorder, we get the prefix form of the expression: + * 3 5 4. E E1 + T { print('+'); }
ET
If we apply the standard transformation to E, the remainder of the left-recursive
production is α = + T { print('+'); }
and the body of the other production is T. If we introduce R for the remainder of E, we get
the set of productions:
E T R
R  + T { print(‘+’); } R
R ε
When the actions of an SDD compute attributes rather than merely printing output, we
must be more careful about how we eliminate left recursion from a grammar. However, if
the SDD is S-attributed, then we can always construct an SDT by placing attribute-
computing actions at appropriate positions in the new productions.

5. SDT's for L-Attributed Definitions


We converted S-attributed SDD's into postfix SDT's, with actions at the right ends of
productions.
The rules for turning an L-attributed SDD into an SDT are as follows:
Figure 3.6.5: Parse tree with actions embedded 1. Embed the action that computes the inherited attributes for a nonterminal A
immediately before that occurrence of A in the body of the production. If several
inherited attributes for A depend on one another in an acyclic fashion, order the
evaluation of attributes so that those needed first are computed first.
4. Eliminating Left Recursion from SDT's 2. Place the actions that compute a synthesized attribute for the head of a
Since no grammar with left recursion can be parsed deterministically top-down, we production at the end of the body of that production.
examined left-recursion elimination. When the grammar is part of an SDT, we also need
to worry about how the actions are handled.

First, consider the simple case, in which the only thing we care about is the order in which
the actions in an SDT are performed. For example, if each action simply prints a string, we
care only about the order in which the strings are printed. In this case, the following
principle can guide us:
• When transforming the grammar, treat the actions as if they were terminal symbols.

This principle is based on the idea that the grammar transformation preserves the order
of the terminals in the generated string. The actions are therefore executed in the same

Page 13 Page 14
UNIT III PART 2 COMPILER DESIGN UNIT III PART 2 COMPILER DESIGN
INTERMEDIATE CODE GENERATION 5. Conditional jumps such as if x relop y goto L. This instruction applies a relational
operator (<, =, >=, etc. ) to x and y, and executes the statement with label L next if x
stands in relation relop to y. If not, the three-address statement following if x relop y goto
4.1. INTRODUCTION L is executed next, as in the usual sequence.
What is Intermediate Code? 6. param x and call p, n for procedure calls and return y, where y representing a returned
Intermediate Code is a modified input source program which is stored in some data structure. valueis optional. For example,
The front end translates a source program into an intermediate representation from which the param x1
back end generates target code. param x2
...
Why Intermediate Code Generation is required? paramxn
Benefits of using a machine-independent intermediate form are: callp,n
1. Retargeting is facilitated. That is, a compiler for a different machine can be created by generated as part of a call of the procedure p(x1, x2, …. ,xn ).
attaching a back end for the new machine to an existing front end. 7. Indexed assignments of the form x:= y[i] and x[i] = y.
2. A machine-independent code optimization can be applied to the intermediate 8. Address and pointer assignments of the form x = &y, x= *y, and *x= y.
representation. Implementation of Three-Address Statements:
A three-address statement is an abstract form of intermediate code. In a compiler, these
statements can be implemented as records with fields for the operator and the operands.
Three such representations are:
 Quadruples
 Triples
Fig. 4.1: Position of Intermediate Code Generator  Indirect triples
Intermediate code can be represented by using the following: Quadruples:
i) Three Address Code Three-Address Code does not specify the internal representation of 3-Address instructions.
ii) Syntax Tree This limitation is overcome by Quadruple.
4.1.1 Three-Address Code:  A quadruple is a record structure with four fields, which are, op, arg1, arg2 and result.
 The op field contains an internal code for the operator. The three-address statement x
Three-address code is a sequence of statements of the general form
=y op zis represented by placing y in arg1, z in arg2 and x in result.
x = y op z
 The contents of field’s arg1, arg2 and result are normally pointers to the symbol-table
Where x, y and z are names, constants, or compiler-generated temporaries; op stands entries for the names represented by these fields. If so, temporary names must be
for any operator, such as a fixed- or floating-point arithmetic operator, or a logical entered into the symbol table as they are created.
operator on Boolean valued data. Thus a source language expression like x+ y*z might Example: a: =b*-c+b*-c represent the expression using Quadruple, Triples and
be translated into a sequence Indirect Triples.
t1= y * z
t2= x + t1
Where t1 and t2 are compiler-generated temporary names.
The reason for the term “three-address code” is that each statement usually
contains three addresses, two for the operands and one for the result.
An address can be one of the following:
• A name. For convenience, we allow source-program names to appear as addresses in
three-address code. In an implementation, a source name is replaced by a pointer to its
symbol-table entry, where all information about the name is kept.
• A constant. In practice, a compiler must deal with many different types of constants
and variables.
• A compiler-generated temporary. It is useful, especially in optimizing compilers, to
create a distinct name each time a temporary is needed. These temporaries can be Triples:
combined, if possible, when registers are allocated to variables.  To avoid entering temporary names into the symbol table, we might refer to a
Types of Three-Address Statements: temporary value by the position of the statement that computes it.
The common three-address statements are:  If we do so, three-address statements can be represented by records with only three
1. Assignment statements of the form x = y op z, where op is a binary arithmetic or fields:
op, arg1 and arg2.
logical operation.
 The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table
2. Assignment instructions of the form x = op y, where op is a unary operation.
or pointers into the triple structure ( for temporary values ).
Essential unary operations include unary minus, logical negation, shift operators,  Since three fields are used, this intermediate code format is known as triples.
and conversion operators that, for example, convert a fixed-point number to a
floating-point number.
3. Copy statements of the form x = y where the value of y is assigned to x.
4. The unconditional jump goto L. The three-address statement with label L is the next to
be executed.

1 2
UNIT III PART 2 COMPILER DESIGN UNIT III PART 2 COMPILER DESIGN
3. Pointer3= Makenode(‘*’, Pointer1,Pointer2);
4. Pointer4= Makeleaf(identifier, entry c);
5. Pointer5= Makeleaf(identifier, entry d);
6. Pointer6= Makenode(‘+’, Pointer4,Pointer5);
7. Pointer7= Makenode(‘-’, Pointer3,Pointer6);

**Note: The benefit of quadruples over triples can be seen in an optimizing compiler, where
instructions are often moved around. With Quadruples if we move an instruction that computes
a temporary t, then the instruction that uses t require no change. With Triples the result of an
operation is referred to by its position, so moving an instruction may require us to change all
references to that result. This problem does not occur in Indirect Triples.
Indirect Triples:
 Another implementation of three-address code is that of listing pointers to triples,
rather than listing the triples themselves. This implementation is called indirect triples. Fig.4.2: Abstract Syntax Tree
 For example, let us use an array statement to list pointers to triples in the desired order. 4.1.3. Directed Acyclic Graph:
An important derivative of abstract syntax tree is known as Directed Acyclic Graph. It is
used to reduce the amount of memory used for storing the Abstract Syntax Tree data
structure.
Consider an expression:
k=k-7;
The AST and DAG is shown in the fig below

4.1.2 ABSTRACT SYNTAX TREES: Fig. 4.3: AST Fig.4.4: DAG


Abstract Syntax tree is a condensed version of a Syntax Tree eliminating all syntactic elements Note: There are 2 nodes for the identifier ‘k’ in fig 1, one representing k on the LHS of the
of the language. Abstract Syntax Tree is also used to represent the Intermediate Code. The expression and the other representing the k on the RHS. The DAG identifies such common nodes
procedure for constructing abstract syntax tree is same as the procedure that we used to and eliminates their duplication in the AST. The DAG for the above expression is shown in fig2.
convert an expression into a postfix notation. The operators act as the parent nodes and In DAG, a node may have multiple parents. In fig2 node ‘k’ has two parents (- node and = node).
variables, constants, identifiers as leaf nodes. Abstract Syntax tree is constructed form bottom to The creation of DAG is identical to the AST except for the extra check to determine whether a
top. node with identical properties already exists. In the event of the node already created before, it
 Every node in a syntax tree is a record with many fields. For example an operator will is chained to the existing node avoiding a duplicate node.
have two operands.
 The three functions makeleaf(identifier,entry), makeleaf(number,value) and Example 2: a + a * (b - c) + (b - c) * d
makenode(operator,operand1,operand2) are used while constructing the abstract The leaf for a has two parents, because a appears twice in the expression. More interestingly,
syntax trees. the two occurrences of the common sub expression b - c are represented by one node, the node
1. Makeleaf(identifier,entry) labeled —. That node has two parents, representing its two uses in the sub expressions a* (b -
This function creates an identifier node with the name or label “identifier” and a pointer c) and (b-c)*d. Even though b and c appear twice in the complete expression, their nodes each
to symbol table entry given by “entry”. have one parent, since both uses are in the common sub expression b - c.
2. Makeleaf(number,value)
This function creates a leaf node “number” and “value”.
3. Makenode(Operator,Operand1,Operand2)
This function creates an operator node with a name “operator” and a pointer to the left
child (Operand1) and a pointer to the right child (Operand 2).The left and right child can
be again an operator node.
Example: Construct Abstract Syntax Tree for the Expression a*b-(c+d).
1. Pointer1=Makeleaf(identifier, entry a);
2. Pointer2= Makeleaf(identifier, entry b);
3 4
UNIT III PART 2 COMPILER DESIGN UNIT III PART 2 COMPILER DESIGN

We shall use the following definition of type expressions:


 A basic type is a type expression. Typical basic types for a language include boolean, char,
integer, float, and void.
 A type name is a type expression.
 A type expression can be formed by applying the array type constructor to a number and a
type expression.
 A record is a data structure with named fields. A type expression can be formed by applying
the record type constructor to the field names and their types.
Fig 4.5: DAG for a + a * (b - c) + (b - c) * d  A type expression can be formed by using the type constructor  for function types. We
write s  t for "function from type s to type t."
The Value-Number Method for Constructing DAG's  If s and t are type expressions, then their Cartesian product s Xt is a type expression.
Often, the nodes of a syntax tree or DAG are stored in an array of records, as suggested by Fig.  Type expressions may contain variables whose values are type expressions.
4.6. Each row of the array represents one record, and therefore one node. In each record, the
first field is an operation code, indicating the label of the node. In Fig. 4.6(b), leaves have one 4.2.2 Type Equivalence
additional field, which holds the lexical value (either a symbol-table pointer or a constant, in this  The basic question is “when are two type expressions equivalent?”
case), and interior nodes have two additional fields indicating the left and right children.  Two expressions are structurally equivalent if there are two expressions of same basic type
or are formed by applying same constructor.

Fig. 4.6: Steps for constructing the DAG of Fig. 4.3

In this array, we refer to nodes by giving the integer index of the record for that node within the
array. This integer historically has been called the value number for the node or for the
expression represented by the node. For instance, in Fig. 4.6, the node labeled +has value
number 3, and it’s left and right children have value numbers 1 and 2, respectively. In practice,
 Example: int a, b
we could use pointers to records or references to objects instead of integer indexes, but we shall Here a and b are structurally equivalent
still refer to the reference to a node as its "value number." If stored in an appropriate data
structure, value numbers help us construct expression DAG's efficiently. 4.2.3 Declarations
Types and declarations using a simplified grammar that declares just one name at a time;
4.2 Types and Declarations declarations with lists of names can also be handled. The grammar is
The applications of types can be grouped under checking and translation: D T id ; D | ε
 Type checking uses logical rules to reason about the behavior of a program at run time. T  B C | record ‘{‘ D ‘}’
Specifically, it ensures that the types of the operands match the type expected by an B int | float
operator. C  ε | [ num ] C
 Translation Applications. From the type of a name, a compiler can determine the  Non terminal D generates a sequence of declarations.
storage that will be needed for that name at run time.  Non terminal T generates basic, array, or record types.
 Non terminal B generates one of the basic types int and float.
4.2.1 Type Expressions
 Non terminal C, for "component," generates strings of zero or more integers, each
Types have structure, which we shall represent using type expressions: a type expression is
integer surrounded by brackets.
either a basic type or is formed by applying an operator called a type constructor to a type
An array type consists of a basic type specified by B, followed by array components specified by
expression. The sets of basic types and constructors depend on the language to be checked.
non terminal C.
Example: The array type int [2] [3] can be read as "array of 2 arrays of 3 integers each" and
A record type (the second production for T) is a sequence of declarations for the fields of the
written as a type expression array (2, array (3, integer)). This type is represented by the tree in
record, all surrounded by curly braces.
Fig. 4.12. The operator array takes two parameters, a number and a type.
4.2.4 Storage Layout for Local Names
From the type of a name, we can determine the amount of storage that will be needed for the
name at run time. At compile time, we can use these amounts to assign each name a relative
address. The type and relative address are saved in the symbol-table entry for the name. Data of
varying length, such as strings, or data whose size cannot be determined until run time, such as
dynamic arrays, is handled by reserving a known fixed amount of storage for a pointer to the
data.
Figure 4.7: Type expression for int [2][3]

5 6
UNIT III PART 2 COMPILER DESIGN UNIT III PART 2 COMPILER DESIGN
The width of a type is the number of storage units needed for objects of that type. A basic type, Conversion from one type to another is said to be implicit if it is done automatically by the
such as a character, integer, or float, requires an integral number of bytes. For easy access, compiler. Implicit type conversions, also called coercions, are limited in many languages to
storage for aggregates such as arrays and classes is allocated in one contiguous block of bytes. widening conversions. Conversion is said to be explicit if the programmer must write
something to cause the conversion. Explicit conversions are also called casts.
4.3 Type Checking The semantic action for checking E  E1 + E2uses two functions:
Type checking has the potential for catching errors in programs. In principle, any check can be 1. max(t1,t2) takes two types t1 and t2and returns the maximum (or least upper bound) of the
done dynamically, if the target code carries the type of an element along with the value of the two types in the widening hierarchy. It declares an error if either t1or t2is not in the hierarchy;
element. A sound type system eliminates the need for dynamic checking for type errors, e.g., if either type is an array or a pointer type.
because it allows us to determine statically that these errors cannot occur when the target 2. widen(a, t, w) generates type conversions if needed to widen an address a of type t into a
program runs. An implementation of a language is strongly typed if a compiler guarantees that value of type w. It returns a itself if t and w are the same type. Otherwise, it generates an
the programs it accepts will run without type errors. instruction to do the conversion and place the result in a temporary t, which is returned as the
4.3.1 Rules for Type Checking result.
Type checking can take on two forms: synthesis and inference.
1. Type synthesis builds up the type of an expression from the types of its sub expressions. 4.4 Control Flow
It requires names to be declared before they are used. The type of E1 + E2 is defined in 4.4.1 Boolean Expressions
terms of the types of E1 and E2. Boolean expressions are composed of the boolean operators (which we denote &&, II, and !
2. Type inference determines the type of a language construct from the way it is used. Let using the C convention for the operators AND, OR, and NOT, respectively) applied to elements
null be a function that tests whether a list is empty. Then, from the usage null(x), we can that are boolean variables or relational expressions.
tell that x must be a list. The type of the elements of x is not known; all we know is that x Relational expressions are of the form E1 relE2, where E1 and E2 are arithmetic expressions. We
must be a list of elements of some type that is presently unknown. consider boolean expressions generated by the following grammar:
B B ||B | B && B | ! B |( B ) | E relE | true | false
4.3.2 Type Conversions We use the attribute relop to indicate which of the six comparison operators <, <= , =, ! =, >, or
Consider expressions like x + i, where x is of type float and i is of type integer. Since the >= is represented by rel. As is customary, we assume that II and && are left-associative, and
expression has two different types of operands, the compiler may need to convert one of the that II has lowest precedence, then &&, then !.
operands of + to ensure that both operands are of the same type when the addition occurs. Given the expression B1 || B2, if we determine that B1 is true, and then we can conclude that the
Suppose that integers are converted to floats when necessary, using a unary operator (float). entire expression is true without having to evaluate B2. Similarly, given B1&&B2, if B1 is false,
For example, the integer 2 is converted to a float in the code for the expression 2*3.14: then the entire expression is false.
t1= (float) 2
t2 = t1 * 3.14 4.4.2 Short-Circuit Code
Type synthesis will be illustrated by extending the scheme for translating expressions. We In short-circuit (or jumping) code, the boolean operators &&, ||, and !translate into jumps. The
introduce another attribute E.type, whose value is either integer or float. The rule associated operators themselves do not appear in the code; instead, the value of a boolean expression is
with EE1+E2 builds on the pseudo code represented by a position in the code sequence.
if ( E1.type = integer and E2.type = integer ) E.type = integer: Example: The statement
else if ( E1.type = float and E2.type = integer ) . . . if ( x <100 || x >200 && x != y ) x = 0;
Type conversion rules vary from language to language. The rules for Java in Fig. 4.16 distinguish might be translated into the code.
between widening conversions, which are intended to preserve information, and narrowing
conversions, which can lose information.

In this translation, the boolean expression is true if control reaches label L2. If the expression is
false, control goes immediately to L1, skipping L2 and the assignment x = 0.

4.4.3. Flow-of-Control Statements


We now consider the translation of boolean expressions into three-address code in the context
of statements such as those generated by the following grammar:
S if ( B) S1
S if ( B) S1else S2
S while ( B) S1
In these productions, non terminal B represents a boolean expression and non terminal S
represents a statement.
Figure 4.8: Conversions between primitive types in Java
The widening rules are given by the hierarchy in Fig. 4.8(a): any type lower in the hierarchy can The translation of if (B) S1 consists of B.code followed by S1.code, as illustrated in Fig. 6.35(a).
be widened to a higher type. Thus, a char can be widened to an int or to a float, but a char cannot Within B.code are jumps based on the value of B. If B is true, control flows to the first instruction
be widened to a short. The narrowing rules are illustrated by the graph in Fig. 4.8(b): a type scan of S1.code, and if B is false, control flows to the instruction immediately following S1.code.
be narrowed to a type t if there is a path from s to t. Note that char, short, and byte are pair wise
convertible to each other.

7 8
UNIT III PART 2 COMPILER DESIGN UNIT III PART 2 COMPILER DESIGN

Figure 4.9: Code for if-, if-else-, and while-statements


The labels for the jumps in B.code and S.code are managed using inherited attributes. With a
boolean expression B, we associate two labels: B.true, the label to which control flows if B is
true, and B.false, the label to which control flows if B is false. With a statement S, we associate an
inherited attribute S.next denoting a label for the instruction immediately after the code for S. In
Figure 4.10: Translation scheme for boolean expressions
some cases, the instruction immediately following S.code is a jump to some label L. A jump to a
jump to L from within S.code is avoided using S.next.
Consider semantic action (1) for the production B  B1 || M B2. If B1is true, then B is also true, so
the jumps on B1.truelist become part of B.truelist. If B1is false, however, we must next test B2, so
4.5 Backpatching:
the target for the jumps B1.falselist must be the beginning of the code generated for B2. This
It is the process of filling up the unspecified labels.
target is obtained using the marker nonterminal M. That nonterminal produces, as a synthesized
The following functions are required for backpatching:
attribute M.instr, the index of the next instruction, just before B2code starts being generated.
1. makelist(i) creates a new list containing only i, an index into the array of instructions;
To obtain that instruction index, we associate with the production M  ε the semantic action
makelist returns a pointer to the newly created list.
{ M.instr = nextinstr; }
2. merge(p1,p2) concatenates the lists pointed to by p1and p2, and returns a pointer to the
The variable nextinstr holds the index of the next instruction to follow. This value will be
concatenated list.
backpatched onto the B1.falselist (i.e., each instruction on the list B1.falselist will receive M.instr
3. backpatch(p,I )inserts i as the target label for each of the instructions on the list pointed to
as its target label) when we have seen the remainder of the production B  B1 || M B2.
by p.
We now construct a translation scheme suitable for generating code for boolean expressions
Semantic action (2) for B  B1 && M B2 is similar to (1). Action (3) for B !B1 swaps the true
during bottom-up parsing. A marker non terminal M in the grammar causes a semantic action to
and false lists. Action (4) ignores parentheses. For simplicity, semantic action (5) generates two
pick up, at appropriate times, the index of the next instruction to be generated. The grammar is
instructions, a conditional goto and an unconditional one. Neither has its target filled in. These
as follows:
instructions are put on new lists, pointed to by B.truelist and B.falselist, respectively.
B  B1 || M B2 | B1 && M B2| ! B1 | ( B1 ) || E1 rel E2 | true | false
Mε Example: x<100 || x>200 && x!=y
The translation scheme is in below figure 4.10. Solution: Generate TAC for the given expression
100 if x<100 goto _______
101 goto
102 if x>200 goto _______
103 goto
104 if x!=y goto _________
105 goto

9 10
UNIT III PART 2 COMPILER DESIGN

100 if x<100 goto___


101 goto 102
102 if x>200 goto 104
103 goto ___
104 if x!=y goto ___
105 goto ____

4.6. Intermediate Code for Procedures


Suppose that a is an array of integers, and that f is a function. Then the assighmet
statement
n = f(a[i]);
can be translated into three-address code as follows:
1) t1 = i * 4
2) t2 = a [ t1 ]
3) param t2
4) t3 = call f, 1
5) n = t3
The procedure are important and frequently used programming construct so it is necessary for
a compiler to generate good code for procedure calls and returns.

11

You might also like