Unit III Notes
Unit III Notes
UNIT-III-Basics
Syntax-Directed Translation
translate exprx;
translate term;
handle +;
Postfix Notation
In this section deal with translation into postfix notation. The postfix notation for an
expression E can be defined inductively as follows:
Example 2.8: The postfix notation for (9-5)+2 is 95-2+. That is, the translations of 9, 5, and 2
are the constants themselves, by rule (1). Then, the translation of 9-5 is 95- by rule (2). The
translation of (9-5) is the same by rule (3). Having translated the parenthesized sub expression,
we may apply rule (2) to the entire expression, with (9-5) in the role of E\ and 2 in the role of E2:
to get the result 95-2+.
As another example, the postfix notation for 9- (5+2) is 952+-. That is, 5+2 is first
translated into 52+, and this expression becomes the second argument of the minus sign.
Synthesized Attributes
A syntax-directed definition associates
1. With each grammar symbol, a set of attributes, and
2. With each production, a set of semantic rules for computing the values of the attributes
associated with the symbols appearing in the production.
Attributes can be evaluated as follows. For a given input string x, construct a parse tree for x.
Then, apply the semantic rules to evaluate attributes at each node in the parse tree, as follows.
A parse tree showing the attribute values at each node is called an annotated parse tree.
For example, Fig. 2.9 shows an annotated parse tree for 9-5+2 with an attribute t associated with
the nonterminals expr and term. The value 95-2+ of the attribute at the root is the postfix
notation for 9-5+2.
Informally, inherited attributes have their value at a parse-tree node determined from attribute
values at the node itself, its parent, and its siblings in the parse tree.
The postfix form of a digit is the digit itself; e.g., the semantic rule associated with the
production term -» 9 defines term.t to be 9 itself whenever this production is used at a node in a
parse tree. The other digits are translated similarly. As another example, when the production
expr term is applied, the value of term.t becomes the value of expr.t.
A syntax-directed definition specifies the values of attributes by associating semantic rules with
the grammar productions. For example, an infix-to-postfix translator might have a production
and rule
PRODUCTION SEMANTIC RU LE
Both E and T have a string-valued attribute code. The semantic rule specifies that the string
E.code is formed by concatenating Ei.code, T.code, and the character '+'.
E -» Ei + T { print '+' }
By convention, semantic actions are enclosed within curly braces. (If curly braces occur as
grammar symbols, we enclose them within single quotes, as in ' { ' and '}'.) The position of a
semantic action in a production body determines the order in which the action is executed.
Terminals can have synthesized attributes, but not inherited attributes. Attributes for terminals
have lexical values that are supplied by the lexical analyzer; there are ho semantic rules in the
SDD itself for computing the value of an attribute for a terminal.
Example: The SDD is based on our familiar grammar for arithmetic expressions with operators
+ and *. It evaluates expressions terminated by an end marker n. In the SDD, each of the non
terminals has a single synthesized attribute, called val We also suppose that the terminal digit
has a synthesized attribute lexval, which is an integer value returned by the lexical analyzer.
The rule for production 1, L ->• E n, sets L.val to E.val, which we shall see is the numerical
value of the entire expression.
Production 2, E -> Ei + T, also has one rule, which computes the val attribute for the head E as
the sum of the values at E\ and T. At any parse tree node N labeled E, the value of val for E is the
sum of the values of val at the children of node N labeled E and T.
Production 3, E -»• T, has a single rule that defines the value of val for E to be the same as the
value of val at the child for T.
Production 4 is similar to the second production; its rule multiplies the values at the children
instead of adding them.
The rules for productions 5 and 6 copy values at a child, like that for the third production.
Production 7 gives F.val the value of a digit, that is, the numerical value of the token digit that
the lexical analyzer returned.
An SDD that involves only synthesized attributes is called S-attributed. In an S-attributed SDD,
each rule computes an attribute for the non terminal at the head of a production from attributes
Taken from the body of the production.
For SDD's with both inherited and synthesized attributes, there is no guarantee that there is even
one order in which to evaluate attributes at nodes.
For instance, consider non terminals A and B, with synthesized and inherited attributes A.s and
BA, respectively, along with the production and rules
PRODUCTION SEMANTIC RU L ES
Inherited attributes are useful when the structure of a parse tree does not "match" the
abstract syntax of the source code.
E x a m p l e: The SDD in Fig. 5.4 computes terms like 3 * 5 and 3 * 5 * 7 . The top-down parse
of input 3*5 begins with the production T ^ FT'. Here, F generates the digit 3, but the operator *
is generated by X". Thus, the left operand 3 appears in a different sub tree of the parse tree from
*. An inherited attribute will therefore be used to pass the operand to the operator.
Each of the non terminals T and F has a synthesized attribute val; the terminal digit has a
synthesized attribute lexval. The non terminal T has two attributes: an inherited attribute inh and
a synthesized attribute syn.
To see how the semantic rules are used, consider the annotated parse tree for 3 * 5 in Fig.
5.5. The leftmost leaf in the parse tree, labeled digit, has attribute value lexval = 3, where the 3 is
supplied by the lexical analyzer. Its parent is for production 4, F -> digit. The only semantic rule
associated with this production defines F.val = digit.lexval, which equals 3.
At the second child of the root, the inherited attribute T'.inh is defined by the semantic
rule T'.inh = F.val associated with production 1. Thus, the left operand, 3, for the * operator is
passed from left to right across the children of the root.
Dependency Graphs:
A dependency graph depicts the flow of information among the attribute instances in a particular
parse tree; an edge from one attribute instance to another means that the value of the first is
needed to compute the second. Edges express constraints implied by the semantic rules.
In more detail:
• For each parse-tree node, say a node labeled by grammar symbol X, the dependency graph has
a node for each attribute associated with X.
• Suppose that a semantic rule associated with a production p defines the value of synthesized
attribute A.b in terms of the value of X.c (the rule may define A.b in terms of other attributes in
addition to X.c). Then, the dependency graph has an edge from X.c to A.b.
• Suppose that a semantic rule associated with a production p defines the value of inherited
attribute B.c in terms of the value of X.a. Then, the dependency graph has an edge from X.a to
B.c.
E x a m p l e : Consider the following production and rule:
Example:2:
An example of a complete dependency graph appears in Fig. 5.7. The nodes of the dependency
graph, represented by the numbers 1 through 9, correspond to the attributes in the annotated
parse tree in Fig. 5.5.
Figure 5.7: Dependency graph for the annotated parse tree of Fig. 5.5
Nodes 1 and 2 represent the attribute lexval associated with the two leaves labeled digit.
Nodes 3 and 4 represent the attribute val associated with the two nodes labeled F. The edges to
node 3 from 1 and to node 4 from 2 result from the semantic rule that defines F.val in terms of
digit.lexval.
Nodes 5 and 6 represent the inherited attribute T'.inh associated with each of the occurrences of
nonterminal T'. The edge to 5 from 3 is due to the rule T'.inh = F.val, which defines T'.inh at the
right child of the root from F.val at the left child. We see edges to 6 from node 5 for T'.inh and
from node 4 for F.val, because these values are multiplied to evaluate the attribute inh at node 6.
Nodes 7 and 8 represent the synthesized attribute syn associated with the occurrences of X". The
edge to node 7 from 6 is due to the semantic rule T'.syn = T'.inh associated with production 3 in
Fig. 5.4. The edge to node 8 from 7 is due to a semantic rule associated with production 2.
Finally, node 9 represents the attribute T.val. The edge to 9 from 8 is due to the semantic rule,
T.val = T'.syn, associated with production 1.
Example-2:
1. Embed the action that computes the inherited attributes for a nonterminal A immediately
before that occurrence of A in the body of the production. If several inherited attributes for A
depend on one another in an acyclic fashion, order the evaluation of attributes so that those
needed first are computed first.
2. Place the actions that compute a synthesized attribute for the head of a production at the end of
the body of that production.
1. The inherited attribute S.next labels the beginning of the code that must be executed after S is
finished.
2. The synthesized attribute S.code is the sequence of intermediate-code steps that implements a
statement S and ends with a jump to S.next.
3. The inherited attribute C.true labels the beginning of the code that must be executed if C is
true.
4. The inherited attribute C.false labels the beginning of the code that must be executed if C is
false.
5. The synthesized attribute C.code is the sequence of intermediate-code steps that implements
the condition C and jumps either to C.true or to C.false, depending on whether C is true or false.
2. Build the parse tree, add actions, and execute the actions in preorder. This approach works
for any L-attributed definition.
In this section, we discuss the following methods for translation during parsing:
3. Use a recursive-descent parser with one function for each nonterminal. The function for
nonterminal A receives the inherited attributes of A as arguments and returns the synthesized
attributes of A.
5. Implement an SDT in conjunction with an LL-parser. The attributes are kept on the parsing
stack, and the rules fetch the needed attributes from known locations on the stack.
6. Implement an SDT in conjunction with an LR-parser. In this method. The SDT for an L-
attributed SDD typically has actions in the middle of productions, and we cannot be sure during
an LR parse that we are even in that production until its entire body has been constructed.
1. There is, for one or more nonterminals, a main attribute. For convenience, we shall assume
that the main attributes are all string valued. In Example 5.20, the attributes S.code and C.code
are main attributes; the other attributes are not.
2. The main attributes are synthesized.
3. The rules that evaluate the main attribute (s) ensure that
(a) The main attribute is the concatenation of main attributes of nonterminals appearing
in the body of the production involved, perhaps with other elements that are not main
attributes, such as the string label or the values of labels LI and L2.
(b) The main attributes of nonterminals appear in the rule in the same order as the
nonterminals themselves appear in the production body.
Intermediate-Code Generation
Intermediate code is the interface between front end and back end in a compiler Ideally
the details of source language are confined to the front end and the details of target machines to
the back end (a m*n model)
In this chapter we study intermediate representations, static type checking and
intermediate code generation
High-level representations are close to the source language and low-level representations are
close to the target machine. Syntax trees are high level; they depict the natural hierarchical
structure of the source program and are well suited to tasks like static type checking.
Example: a+a*(b-c)+(b-c)*d
2. Polish Notation:
• Linearization of syntax tree is called Polish Notation.
• It is also called as prefix notation, in which operator occurs first and then operands are
arranged.
Ex: (a+b)*(c-d) ------> *+ab-cd.
Quadraple
Triples:
A triple has only three fields, which we call op,arg1, and arg2, Note that the result field in Fig is
used primarily for temporary names.
Indirect triples
Indirect triples consist of a listing of pointers to triples, rather than a listing of triples themselves.
For example, let us use an array instruction to list pointers to triples in the desired order.
Type Expressions:
A type expression is either a basic type or is formed by applying an operator called a type
constructor to a type expression.
Example: int[2][3]
array(2,array(3,integer))
The array type int [2] [3] can be read as "array of 2 arrays of 3 integers each" and written
as a type expression array(2, array(3, integer)).
• A basic type is a type expression. Typical basic types for a language include boolean, char,
integer, float, and void; the latter denotes "the absence of a value."
• A type name is a type expression.
• A type expression can be formed by applying the array type constructor to a number and a
type expression.
• A record is a data structure with named field
• A type expression can be formed by using the type constructor g for function types
• If s and t are type expressions, then their Cartesian product s*t is a type expression
• Type expressions may contain variables whose values are type expressions
Type Equivalence
"if two type expressions are equal then return a certain type else error.“
• Two conditions for Type Equivalence as follows:
– They are the same basic type.
– They are formed by applying the same constructor to structurally equivalent
types.
Declarations
Types and declarations using a simplified grammar that declares just one name at a time;
Translation of Expressions
• An expression with more than one operator, like a + b* c, will translate into instructions
with at most one operator per instruction.
• An array reference A[i][j] will expand into a sequence of three-address instructions that
calculate an address for the reference.
• The syntax-directed definition builds up the three-address code for an assignment
statement S using attribute code for S and attributes addr and code for an expression E.
• Attributes S.code and E.code denote the three-addresscode for S and E, respectively.
• Attribute E.addr denotes the address that will hold the value of E.
Incremental Translation
• we can arrange to generate only the new three-address instructions,
• In the incremental approach, gen not only constructs a three-address instruction, it
appends the instruction to the sequence of instructions generated so far.
Type Checking:
•A programming language is strongly-typed, if every program its compiler accepts will execute
without type errors.
Rules for Type Checking:
Type checking can take on two forms:
1. Synthesis
2. Inference.
Type synthesis builds up the type of an expression from the types of its sub expressions. It
requires names to be declared before they are used.
Type inference determines the type of a language construct from the way it is used.
Type Conversions:
Consider expressions like x + i, where x is of type float and i is of type integer. Since the
representation of integers and floating-point numbers is different within a computer and different
machine instructions are used for operations on integers and floats, the compiler may need to
convert one of the operands of + to ensure that both operands are of the same type when the
addition occurs.
Control Flow:
The translation of statements such as if-else-statements and while-statements is tied to the
translation of Boolean expressions. In programming languages, boolean expressions are often
used to
1. Alter the flow of control. Boolean expressions are used as conditional expressions in
statements that alter the flow of control. The value of such boolean expressions is implicit in a
position reached in a program. For example, in if (E) 5, the expression E must be true if
statement S is reached.
2. Compute logical values. A Boolean expression can represent true Or false as values. Such
Boolean expressions can be evaluated in analogy to arithmetic expressions using three-address
instructions with logical operators.
Boolean Expressions:
Boolean expressions are composed of the boolean operators (which we denote &&, II, and !,
using the C convention for the operators AND, OR, and NOT, respectively) applied to elements
that are boolean variables or relational expressions. Relational expressions are of the form E1 rel
E2, where E1 and E2 are arithmetic expressions. In this section, we consider boolean expressions
are generated by the following grammar:
We use the attribute rel. op to indicate which of the six comparison operators <, < = , =, ! =, >, or
>= is represented by rel. we assume that and &;& are left-associative, and that I I has lowest
precedence, then I I &&, then !.
Short-Circuit Code:
In short-circuit (or jumping) code, the boolean operators &&, I I, and ! translate into
jumps. The operators themselves do not appear in the code; instead, the value of a boolean
expression is represented by a position in the code sequence.
Example 6 . 2 1 : The statement
if ( x < 100 II x > 200 && x != y ) x = 0;
might be translated into the code of Fig. 6.34. In this translation, the Boolean expression is true if
control reaches label L2. If the expression is false, control goes immediately to Lu skipping L2
and the assignment x = 0.
Flow-of-Control Statements:
We now consider the translation of boolean expressions into three-address code in the context of
statements such as those generated by the following grammar:
Syntax-directed definition
Backpatching:
In which lists of jumps are passed as synthesized attributes. Specifically, when a jump is
generated, the target of the jump is temporarily left unspecified. Each such jump is put on a list
of jumps whose labels are to be filled in when the proper label can be determined. All of the
jumps on a list have the same target label.
Switch-Statements:
The "switch" or "case" statement is available in a variety of languages. Our switch- statement
syntax is as follows
Translation of Switch-Statements:
1. Evaluate the expression E.
2. Find the value {V} in the list of cases that is the same as the value of the expression.
Recall that the default value matches the expression.
3. Execute the statement Sj associated with the value found.
Example:
1) t1=i*4
2) t2=a[t1]
3) param t2
4) t3 = call f,
5) n=t3