Syntax-Directed Definition
Syntax-Directed Definition
Syntax Directed Definition Production 3, E T, has a single rule that defines the value of val for E to be the
A syntax-directed definition (SDD) is a context-free grammar together with attributes same as the value of val at the child for T.
and rules. Attributes are associated with grammar symbols and rules are associated Production 4 is similar to the second production; its rule multiplies the values at
with productions. If X is a symbol and a is one of its attributes, then we write X.a to the children instead of adding them.
denote the value of a at a particular parse-tree node labeled X. For example, an infix-to- The rules for productions 5 and 6 copy values at a child, like that for the third
postfix translator might have a production and rule production. Production 7 gives F.val the value of a digit, that is, the numerical
PRODUCTION SEMANTIC RULE value of the token digit that the lexical analyzer returned.
E E1 + T E.code = E1.code || T.code || ‘+’
This production has two non terminals, E and T; the subscript in E1 distinguishes the An SDD that involves only synthesized attributes is called S-attributed; the SDD in Fig.
occurrence of E in the production body from the occurrence of E as the head. Both E and 3.3.1 has this property. In an S-attributed SDD, each rule computes an attribute for the
T have a string-valued attribute code. The semantic rule specifies that the string E.code is nonterminal at the head of a production from attributes taken from the body of the
formed by concatenating E1.code, T.code, and the character '+'. production.
An SDD without side effects is sometimes called an attribute grammar. The rules in an
1.1 Inherited and Synthesized Attributes attribute grammar define the value of an attribute purely in terms of the values of other
Two kinds of attributes for nonterminals: attributes and constants.
1. A synthesized attribute for a nonterminal A at a parse-tree node N is defined by a
semantic rule associated with the production at N. Note that the production must 1.2 Evaluating an SDD at the Nodes of a Parse Tree
have A as its head. A synthesized attribute at node N is defined only in terms of A parse tree, showing the value(s) of its attribute(s) is called an annotated parse tree. For
attribute values at the children of N and at N itself. SDD's with both inherited and synthesized attributes, there is no guarantee that there is
2. An inherited attribute for a nonterminal B at a parse-tree node N is defined by a even one order in which to evaluate attributes at nodes.
semantic rule associated with the production at the parent of N. Note that the
production must have B as a symbol in its body. An inherited attribute at node N Example 3.3.2: Figure 3.3.2 shows an annotated parse tree for the input string3 * 5 + 4 n,
is defined only in terms of attribute values at N's parent, N itself, and N's constructed using the grammar and rules of Fig. 3.3.1. The values of lexval are presumed
siblings. supplied by the lexical analyzer. Each of the nodes for the nonterminals has attribute val
computed in a bottom-up order, and we see the resulting values associated with each
Example 3.3.1: The SDD in Fig. 3.3.1 is based on our familiar grammar for arithmetic node. For instance, at the node with a child labeled *, after computing T.val = 3 and F.val=
expressions with operators + and *. It evaluates expressions terminated by an end marker 5 at its first and third children, we apply the rule that says T.val is the product of these
n. In the SDD, each of the nonterminals has a single synthesized attribute, called val. We two values,or 15.
also suppose that the terminal digit has a synthesized attribute lexval, which is an integer
value returned by the lexical analyzer.
Page 1 Page 2
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN
3, but the operator * is generated by T’. Thus, the left operand 3 appears in a different With T'.inh = 3 and F.val = 5, we get T1’.inh = 15 . At the lower node for T1’, the production
sub tree of the parse tree from *. An inherited attribute will therefore be used to pass the is T’-> ε. The semantic rule T'.syn= T'.inhdefinesT1’.syn =15. The syn attributes at the
operand to the operator. nodes for T’ pass the value 15 up the tree to the node for T, where T.val = 15.
Page 3 Page 4
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN
performing a postorder traversal of the parse tree and evaluating the attributes at a node
N when the traversal leaves N for the last time. That is, we apply the function postorder,
defined below, to the root of the parse tree
postorder(N) {
for ( each child C of N, from the left ) postorder(C);
evaluate the attributes associated with node N;
}
S-attributed definitions can be implemented during bottom-up parsing, since a bottom-up
parse corresponds to a postorder traversal. Specifically, postorder corresponds exactly to
. the order in which an LR parser reduces a production body to its head.
Figure 3.4.2: Dependency graph for the annotated parse tree of Fig. 3.3.5
Nodes 1 and 2 represent the attribute lexval associated with the two leaves 3. L-Attributed Definitions
labeled digit. The second class of SDD's is called L-attributed definitions. The idea behind this class is
Nodes 3 and 4 represent the attribute val associated with the two nodes labeled F. that, between the attributes associated with a production body, dependency-graph edges
The edges to node 3 from 1 and to node 4 from 2 result from the semantic rule can go from left to right, but not from right to left(hence "L-attributed"). More precisely,
that defines F.val in terms of digit.lexval In fact, F.val equals digit .lexval, but the each attribute must be either
edge represents dependence, not equality. 1. Synthesized, or
Nodes 5 and 6 represent the inherited attribute T'.inh associated with each of the 2. Inherited, but with the rules limited as follows. Suppose that there is a production
occurrences of nonterminal T'. The edge to 5 from 3 is due to the rule T'.inh = A X1X2 …. Xn, and that there is an inherited attribute Xi.a computed by a rule
F.val, which defines T'.inh at the right child of the root from F.val at the left child. associated with this production. Then the rule may use only:
We see edges to 6 from node 5 for T'.inh and from node 4 for F.val, because these (a) Inherited attributes associated with the head A.
values are multiplied to evaluate the attribute inh at node 6. (b) Either inherited or synthesized attributes associated with the
Nodes 7 and 8 represent the synthesized attribute syn associated with the occurrences of symbols X1X2 …. Xn located to the left of Xi.
occurrences of X". The edge to node 7 from 6 is due to the semantic rule T'.syn = (c) Inherited or synthesized attributes associated with this occurrence of Xi
T'.inh associated with production 3 in Fig. 3.3.4. The edge to node 8 from 7 is due itself, but only in such a way that there are no cycles in a dependency
to a semantic rule associated with production 2. graph formed by the attributes of this Xj .
Finally, node 9 represents the attribute T.val. The edge to 9 from 8 is due to the
semantic rule, T.val = T'.syn, associated with production 1. Example 3.4.2: The SDD in Fig. 3.3.4 is L-attributed. To see why, consider the semantic
rules for inherited attributes, which are repeated here for convenience:
The dependency graph of Fig. 3.4.2 has no cycles. One topological sort is the order in PRODUCTION SEMANTIC RULE
which the nodes have already been numbered: 1,2,... ,9. Notice that every edge of the T FT' T'.inh = F.val
graph goes from a node to a higher-numbered node, so this order is surely a topological T' ->*F T1’ T1’. Inh = T'. inhx F. val
sort. There are other topological sorts as well, such as 1,3,5,2,4,6,7,8,9.
The first of these rules defines the inherited attribute T'.inh using only F.val, and F
2. S-Attributed Definitions appears to the left of T' in the production body, as required. The second rule defines
Given an SDD, it is very hard to tell whether there exist any parse trees whose T[.inh using the inherited attribute T'.inh associated with the head, and F.val, where F
dependency graphs have cycles. In practice, translations can be implemented using appears to the left of T[ in the production body.
classes of SDD's that guarantee an evaluation order, since they do not permit dependency In each of these cases, the rules use information "from above or from the left," as
graphs with cycles. Moreover, the two classes can be implemented efficiently in required by the class. The remaining attributes are synthesized. Hence, the SDD is L-
connection with top-down or bottom-up parsing. attributed.
The first class is defined as follows:
An SDD is S-attributed if every attribute is synthesized. Applications of Syntax-Directed Translation
The main application in the construction of syntax trees is some compilers use syntax
Example 3.4.2: The SDD of Fig 3.3.1 is an example of an S-attributed definition. Each trees as an intermediate representation, a common form of SDD turns its input string into
attribute, L.val, E.val, T.val, and F.val is synthesized. a tree. We consider two SDD's for constructing syntax trees for expressions. The first, an
S-attributed definition, is suitable for use during bottom-up parsing. The second, L-
When an SDD is S-attributed, we can evaluate its attributes in any bottom up order of the attributed, is suitable for use during top-down parsing.
nodes of the parse tree. It is often especially simple to evaluate the attributes by
Page 5 Page 6
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN
We shall implement the nodes of a syntax tree by objects with a suitable number of
fields. Each object will have an op field that is the label of the node. The objects will have
additional fields as follows:
If the node is a leaf, an additional field holds the lexical value for the leaf. A
constructor function Leaf (op, val) creates a leaf object. Alternatively, if nodes
are viewed as records, then Leaf returns a pointer to a new record for a leaf.
If the node is an interior node, there are as many additional fields as the node
has children in the syntax tree. A constructor function Node takes two or more
arguments: Node(op,c1,c2,... ,ck) creates an object with first field op and k
additional fields for the k children c1, c2,... ,ck.
Example 3.5.1: The S-attributed definition in Fig. 3.5.1 constructs syntax trees for a simple
expression grammar involving only the binary operators +and -. As usual, these operators
are at the same precedence level and are jointly left associative. All nonterminals have
one synthesized attribute node, which represents a node of the syntax tree. Figure 3.5.2: Syntax tree for a — 4 + c
The nodes of the syntax tree are shown as records, with the op field first. Syntax-tree
edges are now shown as solid lines. The underlying parse tree, which need not actually be
constructed, is shown with dotted edges. The third type of line, shown dashed, represents
the values of E.node and T-node; each line points to the appropriate syntax-tree node.
At the bottom we see leaves for a, 4 and c, constructed by Leaf. We suppose that the
lexical value id. entry points into the symbol table, and the lexical value num.val is the
Figure 3.5.1 : Constructing syntax trees for simple expressions numerical value of a constant. These leaves, or pointers to them, become the value of
Every time the first production EE1 + T is used, its rule creates a node with ' + ' for op T.node at the three parse-tree nodes labeled T, according to rules 5 and 6. Note that by
and two children, E1.node and T.node, for the sub expressions. The second production has rule 3, the pointer to the leaf for a is also the value of E.node for the leftmost E in the
a similar rule. parse tree.
For production 3, E T, no node is created, since E.node is the same as T.node. Similarly,
no node is created for production 4, T (E ) . The value of T.node is the same as E.node, Rule 2 causes us to create a node with op equal to the minus sign and pointers to the first
since parentheses are used only for grouping; they influence the structure of the parse two leaves. Then, rule 1 produces the root node of the syntax tree by combining the node
tree and the syntax tree, but once their job is done, there is no further need to retain for — with the third leaf.
them in the syntax tree.
The last two T-productions have a single terminal on the right. We use the constructor If the rules are evaluated during a postorder traversal of the parse tree, or with
Leaf to create a suitable node, which becomes the value of T.node. reductions during a bottom-up parse, then the sequence of steps shown in Fig. 3.5.3 ends
with p5 pointing to the root of the constructed syntax tree.
Figure 3.5.2 shows the construction of a syntax tree for the input a — 4 + c.
Page 7 Page 8
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN
Example 3.5.2: The L-attributed definition in Fig. 3.5.4 performs the same translation as Since there is no more input, at node 9, E'.inh points to the root of the entire syntax tree.
the S-attributed definition in Fig. 3.5.1. The attributes for the grammar symbols E, T, id, The syn attributes pass this value back up the parse tree until it becomes the value of
and num are as discussed in Example 3.5.2. E.node. Specifically, the attribute value at node 10 is defined by the rule E'.syn = E'.inh
associated with the production E' —> e. The attribute value at node 11 is defined by the
rule E'.syn = E1’.syn associated with production 2 in Fig.3.5.4. Similar rules define the
attribute values at nodes 12 and 13.
Example 3.6.1: The postfix SDT in Fig. 3.6.1 implements the desk calculator SDD of Fig.
3.3.1, with one change: the action for the first production prints a value. The remaining
actions are exact counterparts of the semantic rules. Since the underlying grammar is LR,
and the SDD is S-attributed, these actions can be correctly performed along with the
reduction steps of the parser.
Figure 3.5.5 : Dependency graph for a - 4 + c, with the SDD of Fig. 3.5.2
Figure 3.6.1: Postfix SDT implementing the desk calculator
Nonterminal E' has an inherited attribute inh and a synthesized attribute syn. Attribute
E'.inh represents the partial syntax tree constructed so far. Specifically, it represents the 2. Parser-Stack Implementation of Postfix SDT's
root of the tree for the prefix of the input string that is to the left of the subtree for E'. At Postfix SDT's can be implemented during LR parsing by executing the actions when
node 5 in the dependency graph in Fig. 3.5.5, E'.inh denotes the root of the partial syntax reductions occur. The attribute(s) of each grammar symbol can be put on the stack in a
tree for the identifier a; that is, the leaf for a. At node 6, E'.inh denotes the root for the place where they can be found during the reduction. The best plan is to place the
partial syntax tree for the input a — 4. At node 9, E'.inh denotes the syntax tree for a — 4 attributes along with the grammar symbols (or the LR states that represent these
+ c. symbols) in records on the stack itself.
Page 9 Page 10
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN
In Fig. 3.6.2, the parser stack contains records with a field for a grammar symbol (or Thus, we may refer to the attribute E.val that appears at the third position on the stack as
parser state) and, below it, a field for an attribute. The three grammar symbols X YZ are stack[top -2].val. The entire SDT is shown in Fig. 3.6.3.
on top of the stack; perhaps they are about to be reduced according to a production like
A —> X YZ. Here, we show X.xas the one attribute of X, and so on. In general, we can For instance, in the second production, E E1 + T, we go two positions below the top to
allow for more attributes, either by making the records large enough or by putting get the value of E1, and we find the value of T at the top. The resulting sum is placed
pointers to records on the stack. With small attributes, it may be simpler to make the where the head E will appear after the reduction, that is, two positions below the current
records large enough, even if some fields go unused some of the time. However, if one or top. The reason is that after the reduction, the three topmost stack symbols are replaced
more attributes are of unbounded size — say, they are character strings — then it would by one. After computing E.val, we pop two symbols off the top of the stack, so the record
be better to put a pointer to the attribute's value in the stack record and store the actual where we placed E.val will now be at the top of the stack.
value in some larger, shared storage area that is not part of the stack.
In the third production, E T, no action is necessary, because the length of the stack
does not change, and the value of T.val at the stack top will simply become the value of
E.val. The same observation applies to the productions F and F digit. Production F (
E) is slightly different. Although the value does not change, two positions are removed
from the stack during the reduction, so the value has to move to the position after the
Figure 3.6.2: Parser stack with a field for synthesized attributes reduction.
If the attributes are all synthesized, and the actions occur at the ends of the productions, 3. SDT's With Actions inside Productions
then we can compute the attributes for the head when we reduce the body to the head. An action may be placed at any position within the body of a production. It is performed
If we reduce by a production such as A X Y Z, then we have all the attributes of X, Y, and immediately after all symbols to its left are processed. Thus, if we have a production X
Z available, at known positions on the stack, as in Fig.3.6.2. After the action, A and its {a} Y, the action a is done after we have recognized X (if X is a terminal) or all the
attributes are at the top of the stack, in the position of the record for X. terminals derived from X (if X is a nonterminal). More precisely,
If the parse is bottom-up, then we perform action a as soon as this occurrence of X
Example 3.6.2: Let us rewrite the actions of the desk-calculator SDT of Example 3.6.1 so appears on the top of the parsing stack.
that they manipulate the parser stack explicitly. Such stack manipulation is usually done If the parse is top-down, we perform a just before we attempt to expand this
automatically by the parser. occurrence of Y (if Y a nonterminal) or check for Y on the input (if Y is a terminal).
SDT's that can be implemented during parsing include postfix SDT's and a class of SDT's
that implements L-attributed definitions.
Example 3.6.3: As an extreme example of a problematic SDT, suppose that we turn our
desk-calculator running example into an SDT that prints the prefix form of an expression,
rather than evaluating the expression. The productions and actions are shown in Fig.
3.6.4.
Page 11 Page 12
UNIT-3- PART-2 COMPILER DESIGN UNIT-3- PART-2 COMPILER DESIGN
Using marker nonterminals M2 and M4 for the actions in productions 2and 4, order in any left-to-right parse, top-down or bottom-up. The "trick" for eliminating left
respectively, on input 3, a shift-reduce parser has conflicts between reducing by M2ε, recursion is to take two productions
reducing by M4 ε, and shifting the digit. A Aα| β
that generate strings consisting of a β and any number of α's, and replace them by
Any SDT can be implemented as follows: productions that generate the same strings using a new nonterminal R (for “remainder")
1. Ignoring the actions, parse the input and produce a parse tree as a result. of the first production:
2. Then, examine each interior node N, say one for production A α. Add additional A βR
children to N for the actions in a, so the children of N from left to right have R αR | ε
exactly the symbols and actions of α. If βdoes not begin with A, then A no longer has a left-recursive production. In regular-
3. Perform a preorder traversal of the tree, and as soon as a node labeled by an definition terms, with both sets of productions, A is defined by β(α)*.
action is visited, perform that action.
Example 3.6.4: Consider the following E-productions from an SDT for translating infix
For instance, Fig. 3.6.5 shows the parse tree for expression 3 * 5 + 4 with actions inserted. expressions into postfix notation:
If we visit the nodes in preorder, we get the prefix form of the expression: + * 3 5 4. E E1 + T { print('+'); }
ET
If we apply the standard transformation to E, the remainder of the left-recursive
production is α = + T { print('+'); }
and the body of the other production is T. If we introduce R for the remainder of E, we get
the set of productions:
E T R
R + T { print(‘+’); } R
R ε
When the actions of an SDD compute attributes rather than merely printing output, we
must be more careful about how we eliminate left recursion from a grammar. However, if
the SDD is S-attributed, then we can always construct an SDT by placing attribute-
computing actions at appropriate positions in the new productions.
First, consider the simple case, in which the only thing we care about is the order in which
the actions in an SDT are performed. For example, if each action simply prints a string, we
care only about the order in which the strings are printed. In this case, the following
principle can guide us:
• When transforming the grammar, treat the actions as if they were terminal symbols.
This principle is based on the idea that the grammar transformation preserves the order
of the terminals in the generated string. The actions are therefore executed in the same
Page 13 Page 14
UNIT III PART 2 COMPILER DESIGN UNIT III PART 2 COMPILER DESIGN
INTERMEDIATE CODE GENERATION 5. Conditional jumps such as if x relop y goto L. This instruction applies a relational
operator (<, =, >=, etc. ) to x and y, and executes the statement with label L next if x
stands in relation relop to y. If not, the three-address statement following if x relop y goto
4.1. INTRODUCTION L is executed next, as in the usual sequence.
What is Intermediate Code? 6. param x and call p, n for procedure calls and return y, where y representing a returned
Intermediate Code is a modified input source program which is stored in some data structure. valueis optional. For example,
The front end translates a source program into an intermediate representation from which the param x1
back end generates target code. param x2
...
Why Intermediate Code Generation is required? paramxn
Benefits of using a machine-independent intermediate form are: callp,n
1. Retargeting is facilitated. That is, a compiler for a different machine can be created by generated as part of a call of the procedure p(x1, x2, …. ,xn ).
attaching a back end for the new machine to an existing front end. 7. Indexed assignments of the form x:= y[i] and x[i] = y.
2. A machine-independent code optimization can be applied to the intermediate 8. Address and pointer assignments of the form x = &y, x= *y, and *x= y.
representation. Implementation of Three-Address Statements:
A three-address statement is an abstract form of intermediate code. In a compiler, these
statements can be implemented as records with fields for the operator and the operands.
Three such representations are:
Quadruples
Triples
Fig. 4.1: Position of Intermediate Code Generator Indirect triples
Intermediate code can be represented by using the following: Quadruples:
i) Three Address Code Three-Address Code does not specify the internal representation of 3-Address instructions.
ii) Syntax Tree This limitation is overcome by Quadruple.
4.1.1 Three-Address Code: A quadruple is a record structure with four fields, which are, op, arg1, arg2 and result.
The op field contains an internal code for the operator. The three-address statement x
Three-address code is a sequence of statements of the general form
=y op zis represented by placing y in arg1, z in arg2 and x in result.
x = y op z
The contents of field’s arg1, arg2 and result are normally pointers to the symbol-table
Where x, y and z are names, constants, or compiler-generated temporaries; op stands entries for the names represented by these fields. If so, temporary names must be
for any operator, such as a fixed- or floating-point arithmetic operator, or a logical entered into the symbol table as they are created.
operator on Boolean valued data. Thus a source language expression like x+ y*z might Example: a: =b*-c+b*-c represent the expression using Quadruple, Triples and
be translated into a sequence Indirect Triples.
t1= y * z
t2= x + t1
Where t1 and t2 are compiler-generated temporary names.
The reason for the term “three-address code” is that each statement usually
contains three addresses, two for the operands and one for the result.
An address can be one of the following:
• A name. For convenience, we allow source-program names to appear as addresses in
three-address code. In an implementation, a source name is replaced by a pointer to its
symbol-table entry, where all information about the name is kept.
• A constant. In practice, a compiler must deal with many different types of constants
and variables.
• A compiler-generated temporary. It is useful, especially in optimizing compilers, to
create a distinct name each time a temporary is needed. These temporaries can be Triples:
combined, if possible, when registers are allocated to variables. To avoid entering temporary names into the symbol table, we might refer to a
Types of Three-Address Statements: temporary value by the position of the statement that computes it.
The common three-address statements are: If we do so, three-address statements can be represented by records with only three
1. Assignment statements of the form x = y op z, where op is a binary arithmetic or fields:
op, arg1 and arg2.
logical operation.
The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table
2. Assignment instructions of the form x = op y, where op is a unary operation.
or pointers into the triple structure ( for temporary values ).
Essential unary operations include unary minus, logical negation, shift operators, Since three fields are used, this intermediate code format is known as triples.
and conversion operators that, for example, convert a fixed-point number to a
floating-point number.
3. Copy statements of the form x = y where the value of y is assigned to x.
4. The unconditional jump goto L. The three-address statement with label L is the next to
be executed.
1 2
UNIT III PART 2 COMPILER DESIGN UNIT III PART 2 COMPILER DESIGN
3. Pointer3= Makenode(‘*’, Pointer1,Pointer2);
4. Pointer4= Makeleaf(identifier, entry c);
5. Pointer5= Makeleaf(identifier, entry d);
6. Pointer6= Makenode(‘+’, Pointer4,Pointer5);
7. Pointer7= Makenode(‘-’, Pointer3,Pointer6);
**Note: The benefit of quadruples over triples can be seen in an optimizing compiler, where
instructions are often moved around. With Quadruples if we move an instruction that computes
a temporary t, then the instruction that uses t require no change. With Triples the result of an
operation is referred to by its position, so moving an instruction may require us to change all
references to that result. This problem does not occur in Indirect Triples.
Indirect Triples:
Another implementation of three-address code is that of listing pointers to triples,
rather than listing the triples themselves. This implementation is called indirect triples. Fig.4.2: Abstract Syntax Tree
For example, let us use an array statement to list pointers to triples in the desired order. 4.1.3. Directed Acyclic Graph:
An important derivative of abstract syntax tree is known as Directed Acyclic Graph. It is
used to reduce the amount of memory used for storing the Abstract Syntax Tree data
structure.
Consider an expression:
k=k-7;
The AST and DAG is shown in the fig below
In this array, we refer to nodes by giving the integer index of the record for that node within the
array. This integer historically has been called the value number for the node or for the
expression represented by the node. For instance, in Fig. 4.6, the node labeled +has value
number 3, and it’s left and right children have value numbers 1 and 2, respectively. In practice,
Example: int a, b
we could use pointers to records or references to objects instead of integer indexes, but we shall Here a and b are structurally equivalent
still refer to the reference to a node as its "value number." If stored in an appropriate data
structure, value numbers help us construct expression DAG's efficiently. 4.2.3 Declarations
Types and declarations using a simplified grammar that declares just one name at a time;
4.2 Types and Declarations declarations with lists of names can also be handled. The grammar is
The applications of types can be grouped under checking and translation: D T id ; D | ε
Type checking uses logical rules to reason about the behavior of a program at run time. T B C | record ‘{‘ D ‘}’
Specifically, it ensures that the types of the operands match the type expected by an B int | float
operator. C ε | [ num ] C
Translation Applications. From the type of a name, a compiler can determine the Non terminal D generates a sequence of declarations.
storage that will be needed for that name at run time. Non terminal T generates basic, array, or record types.
Non terminal B generates one of the basic types int and float.
4.2.1 Type Expressions
Non terminal C, for "component," generates strings of zero or more integers, each
Types have structure, which we shall represent using type expressions: a type expression is
integer surrounded by brackets.
either a basic type or is formed by applying an operator called a type constructor to a type
An array type consists of a basic type specified by B, followed by array components specified by
expression. The sets of basic types and constructors depend on the language to be checked.
non terminal C.
Example: The array type int [2] [3] can be read as "array of 2 arrays of 3 integers each" and
A record type (the second production for T) is a sequence of declarations for the fields of the
written as a type expression array (2, array (3, integer)). This type is represented by the tree in
record, all surrounded by curly braces.
Fig. 4.12. The operator array takes two parameters, a number and a type.
4.2.4 Storage Layout for Local Names
From the type of a name, we can determine the amount of storage that will be needed for the
name at run time. At compile time, we can use these amounts to assign each name a relative
address. The type and relative address are saved in the symbol-table entry for the name. Data of
varying length, such as strings, or data whose size cannot be determined until run time, such as
dynamic arrays, is handled by reserving a known fixed amount of storage for a pointer to the
data.
Figure 4.7: Type expression for int [2][3]
5 6
UNIT III PART 2 COMPILER DESIGN UNIT III PART 2 COMPILER DESIGN
The width of a type is the number of storage units needed for objects of that type. A basic type, Conversion from one type to another is said to be implicit if it is done automatically by the
such as a character, integer, or float, requires an integral number of bytes. For easy access, compiler. Implicit type conversions, also called coercions, are limited in many languages to
storage for aggregates such as arrays and classes is allocated in one contiguous block of bytes. widening conversions. Conversion is said to be explicit if the programmer must write
something to cause the conversion. Explicit conversions are also called casts.
4.3 Type Checking The semantic action for checking E E1 + E2uses two functions:
Type checking has the potential for catching errors in programs. In principle, any check can be 1. max(t1,t2) takes two types t1 and t2and returns the maximum (or least upper bound) of the
done dynamically, if the target code carries the type of an element along with the value of the two types in the widening hierarchy. It declares an error if either t1or t2is not in the hierarchy;
element. A sound type system eliminates the need for dynamic checking for type errors, e.g., if either type is an array or a pointer type.
because it allows us to determine statically that these errors cannot occur when the target 2. widen(a, t, w) generates type conversions if needed to widen an address a of type t into a
program runs. An implementation of a language is strongly typed if a compiler guarantees that value of type w. It returns a itself if t and w are the same type. Otherwise, it generates an
the programs it accepts will run without type errors. instruction to do the conversion and place the result in a temporary t, which is returned as the
4.3.1 Rules for Type Checking result.
Type checking can take on two forms: synthesis and inference.
1. Type synthesis builds up the type of an expression from the types of its sub expressions. 4.4 Control Flow
It requires names to be declared before they are used. The type of E1 + E2 is defined in 4.4.1 Boolean Expressions
terms of the types of E1 and E2. Boolean expressions are composed of the boolean operators (which we denote &&, II, and !
2. Type inference determines the type of a language construct from the way it is used. Let using the C convention for the operators AND, OR, and NOT, respectively) applied to elements
null be a function that tests whether a list is empty. Then, from the usage null(x), we can that are boolean variables or relational expressions.
tell that x must be a list. The type of the elements of x is not known; all we know is that x Relational expressions are of the form E1 relE2, where E1 and E2 are arithmetic expressions. We
must be a list of elements of some type that is presently unknown. consider boolean expressions generated by the following grammar:
B B ||B | B && B | ! B |( B ) | E relE | true | false
4.3.2 Type Conversions We use the attribute relop to indicate which of the six comparison operators <, <= , =, ! =, >, or
Consider expressions like x + i, where x is of type float and i is of type integer. Since the >= is represented by rel. As is customary, we assume that II and && are left-associative, and
expression has two different types of operands, the compiler may need to convert one of the that II has lowest precedence, then &&, then !.
operands of + to ensure that both operands are of the same type when the addition occurs. Given the expression B1 || B2, if we determine that B1 is true, and then we can conclude that the
Suppose that integers are converted to floats when necessary, using a unary operator (float). entire expression is true without having to evaluate B2. Similarly, given B1&&B2, if B1 is false,
For example, the integer 2 is converted to a float in the code for the expression 2*3.14: then the entire expression is false.
t1= (float) 2
t2 = t1 * 3.14 4.4.2 Short-Circuit Code
Type synthesis will be illustrated by extending the scheme for translating expressions. We In short-circuit (or jumping) code, the boolean operators &&, ||, and !translate into jumps. The
introduce another attribute E.type, whose value is either integer or float. The rule associated operators themselves do not appear in the code; instead, the value of a boolean expression is
with EE1+E2 builds on the pseudo code represented by a position in the code sequence.
if ( E1.type = integer and E2.type = integer ) E.type = integer: Example: The statement
else if ( E1.type = float and E2.type = integer ) . . . if ( x <100 || x >200 && x != y ) x = 0;
Type conversion rules vary from language to language. The rules for Java in Fig. 4.16 distinguish might be translated into the code.
between widening conversions, which are intended to preserve information, and narrowing
conversions, which can lose information.
In this translation, the boolean expression is true if control reaches label L2. If the expression is
false, control goes immediately to L1, skipping L2 and the assignment x = 0.
7 8
UNIT III PART 2 COMPILER DESIGN UNIT III PART 2 COMPILER DESIGN
9 10
UNIT III PART 2 COMPILER DESIGN
11