0% found this document useful (0 votes)
5 views15 pages

Chapter 4

Chapter Four discusses syntax-directed translation, which involves associating semantic rules with context-free grammars to facilitate language translation. It differentiates between syntax-directed definitions and translation schemes, highlighting their respective advantages in readability and efficiency. The chapter also covers synthesized and inherited attributes, their evaluation in parse trees, and the use of dependency graphs to determine the order of attribute evaluation.

Uploaded by

tadesebelachewe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views15 pages

Chapter 4

Chapter Four discusses syntax-directed translation, which involves associating semantic rules with context-free grammars to facilitate language translation. It differentiates between syntax-directed definitions and translation schemes, highlighting their respective advantages in readability and efficiency. The chapter also covers synthesized and inherited attributes, their evaluation in parse trees, and the use of dependency graphs to determine the order of attribute evaluation.

Uploaded by

tadesebelachewe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

DMU College of Technology Software Engineering Academic Program

CHAPTER FOUR
4. SYNTAX-DIRECTED TRANSLATION
4.1. Introduction
This chapter develops the translation of languages guided by context-free grammars. We
associate information with a programming language construct by attaching attributes to the
grammar symbols representing the construct. Values for attributes are computed by "semantic
rules" associated with the grammar productions.
There are two notations for associating semantic rules with productions, syntax directed
definitions and translation schemes. Syntax-directed definitions are high-level specifications for
translations. They hide many implementation details and free the user from having to specify
explicitly the order in which translation takes place. A syntax-directed definition specifies the
values of attributes by associating semantic rules with the grammar productions. For example, an
infix-to-postfix translator might have a production and rule
Production Semantic Rule
EE1+T E.code=E1.code||T.code||’+’
This production has two non-terminals, E and T; the subscript in E1 distinguishes the occurrence
of E in the production body from the occurrence of E as the head. Both E and T have a string-
valued attribute code. The semantic rule specifies that the string E.code is formed by
concatenating E1.code, T.code, and the character ‘+’. While the rule makes it explicit that the
translation of E is built up from the translations of E1, T, and ‘+’, it may be inefficient to
implement the translation directly by manipulating strings.
Translation schemes indicate the order in which semantic rules are to be evaluated, so they allow
some implementation details to be shown. A syntax-directed translation scheme embeds program
fragments called semantic actions within production bodies, as in
E  E1+T {print ‘+’}
By convention, semantic actions are enclosed within curly braces. (If curly braces occur as
grammar symbols, we enclose them within single quotes, as in ‘{‘ and ‘}’.) The position of a
semantic action in a production body determines the order in which the action is executed. In
general, semantic actions may occur at any position in a production body.
Between the two notations, syntax-directed definitions can be more readable, and hence more
useful for specifications. However, translation schemes can be more efficient, and hence more
useful for implementations.
The most general approach to syntax-directed translation is to construct a parse tree or a syntax
tree, and then to compute the values of attributes at the nodes of the tree by visiting the nodes of
the tree. In many cases, translation can be done during parsing, without building an explicit tree.
We shall therefore study a class of syntax-directed translations called "L-attributed translations"
(L for left-to-right), which encompass virtually all translations that can be performed during
parsing. We also study a smaller class, called "S-attributed translations" (S for synthesized),
which can be performed easily in connection with a bottom-up parse.

Principles of Compiler Design (SEng 4031) 1 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

4.2. Syntax Directed Definitions


A syntax directed definition is a generalization of the CFG in which each grammar symbol has
an associated set of attributes (synthesized and inherited) and rules. Attributes are associated
with grammar symbols and rules with productions. If X is a symbol and a is one of its attributes,
then we write X.a to denote the value of a at a particular parse-tree node labeled X. An attribute
can represent anything we choose (a string, a number, a type, a memory location, etc.).
4.2.1. Inherited and Synthesized Attributes
We shall deal with two kinds of attributes for non-terminals:
1. A synthesized attribute for a nonterminal A at a parse-tree node N is defined by a semantic
rule associated with the production at N. Note that the production must have A as its head.
A synthesized attribute at node N is defined only in terms of attribute values at the children
of N and at N itself. The attributes of the parent depend on the attributes of the children.
These attributes get values from the attribute values of their child nodes. To illustrate,
assume the following production: S  ABC. If S is taking values from its child nodes (A,
B, C), then it is said to be a synthesized attribute, as the values of ABC are synthesized to
S. As in our former example (E  E + T), the parent node E gets its value from its child
node. Synthesized attributes never take values from their parent nodes or any sibling nodes.
2. An inherited attribute for a nonterminal B at a parse-tree node N is defined by a semantic
rule associated with the production at the parent of N. Note that the production must have B
as a symbol in its body. An inherited attribute at node N is defined only in terms of
attribute values at N's parent, N itself, and N's siblings. The attributes of the children
depend on the attributes of the parent.
In contrast to synthesized attributes, inherited attributes can take values from parent and/or
siblings. As in the following production: S  ABC. A can get values from S, B, and C. B
can take values from S, A, and C. Likewise, C can take values from S, A, and B.
The value of a synthesized attribute at a node is computed from the values of attributes at the
children of that node in the parse tree; the value of an inherited attribute is computed from the
values of attributes at the siblings and parent of that node in the parse tree.
While we do not allow an inherited attribute at node N to be defined in terms of attribute values
at the children of node N, we do allow a synthesized attribute at node N to be defined in terms of
inherited attribute values at node N itself.

(a) Synthesized at node n (b) Inherited at node n

Figure 5.1: Synthesized and inherited attributes

Principles of Compiler Design (SEng 4031) 2 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

Terminals can have synthesized attributes, but not inherited attributes. Attributes for terminals
have lexical values that are supplied by the lexical analyzer; there are no semantic rules in the
SDD itself for computing the value of an attribute for a terminal.
Example 5.1: The SDD in Fig. 5.2 is based on our familiar grammar for arithmetic expressions
with operators + and *. It evaluates expressions terminated by an end marker n. In the SDD, each
of the non-terminals has a single synthesized attribute, called val. We also suppose that the
terminal digit has a synthesized attribute lexval, which is an integer value returned by the lexical
analyzer.
Production Semantic rules
1) LEn L.val = E.val
2) E  E1 + T E.val = E1.val + T.val
3) ET E.val = T.val
4) T  T1 * F T.val = T1.val * F.val
5) TF T.val = F.val
6) F  (E) F.val = E.val
7) F  digit F.val = digit.lexval
Figure 5.2: Syntax-directed definition of a simple desk calculator
The rule for production 1, L  E n, sets L.val to E.val, which we shall see is the numerical value
of the entire expression.
Production 2, E  E1 + T, also has one rule, which computes the val attribute for the head E as
the sum of the values at E1 and T. At any parse tree node N labeled E, the value of val for E is the
sum of the values of val at the children of node N labeled E and T.
Production 3, E  T, has a single rule that defines the value of val for E to be the same as the
value of val at the child for T. Production 4 is similar to the second production; its rule multiplies
the values at the children instead of adding them. The rules for productions 5 and 6 copy values
at a child, like that for the third production. Production 7 gives F.val the value of a digit, that is,
the numerical value of the token digit that the lexical analyzer returned.
 An SDD that involves only synthesized attributes is called S-attributed; the SDD in Fig. 5.2
has this property. In an S-attributed SDD, each rule computes an attribute for the nonterminal
at the head of a production from attributes taken from the body of the production.

4.2.2. Evaluating an SDD at the Nodes of a Parse Tree

To visualize the translation specified by an SDD, it helps to work with parse trees, even though a
translator need not actually build a parse tree. Imagine therefore that the rules of an SDD are
applied by first constructing a parse tree and then using the rules to evaluate all of the attributes
at each of the nodes of the parse tree.
 A parse tree, showing the value(s) of its attribute(s) is called an annotated parse tree or
decorated parse tree.
How do we construct an annotated parse tree? In what order do we evaluate attributes? Before
we can evaluate an attribute at a node of a parse tree, we must evaluate all the attributes upon

Principles of Compiler Design (SEng 4031) 3 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

which its value depends. For example, if all attributes are synthesized, as in Example 5.1, then
we must evaluate the val attributes at all of the children of a node before we can evaluate the val
attribute at the node itself.
With synthesized attributes, we can evaluate attributes in any bottom-up order, such as that of a
postorder traversal of the parse tree.
Example 5.2: Figure .3 shows an annotated parse tree for the input string 3 * 5 + 4 n, constructed
using the grammar and rules of Fig. 5.2.

Figure 5.3: Annotated parse tree for 3 * 5 + 4 n


The values of lexval are presumed supplied by the lexical analyzer. Each of the nodes for the
nonterminals has attribute val computed in a bottom-up order, and we see the resulting values
associated with each node. For instance, at the node with a child labeled *, after computing T.val
= 3 and F.val = 5 at its first and third children, we apply the rule that says T.val is the product of
these two values, or 15.
Example 5.3: The SDD in Fig. 5.4 computes terms like 3 * 5 and 3 * 5 * 7. The top-down parse
of input 3 * 5 begins with the production T  F T’. Here, F generates the digit 3, but the
operator * is generated by T’. Thus, the left operand 3 appears in a different subtree of the parse
tree from *. An inherited attribute will therefore be used to pass the operand to the operator. The
grammar in this example is an excerpt from a non-left-recursive version of the familiar
expression grammar.

Figure 5.4: An SDD based on a grammar suitable for top-down parsing

Principles of Compiler Design (SEng 4031) 4 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

Each of the nonterminals T and F has a synthesized attribute val; the terminal digit has a
synthesized attribute lexval. The nonterminal T’ has two attributes: an inherited attribute inh and
a synthesized attribute syn.
The semantic rules are based on the idea that the left operand of the operator * is inherited. More
precisely, the head T’ of the production T’  * F T1’ inherits the left operand of * in the
production body. Given a term x * y * z, the root of the subtree for * y * z inherits x. Then, the
root of the subtree for * z inherits the value of x * y, and so on, if there are more factors in the
term. Once all the factors have been accumulated, the result is passed back up the tree using
synthesized attributes.
To see how the semantic rules are used, consider the annotated parse tree for 3 * 5 in Fig. 5.5.
The leftmost leaf in the parse tree, labeled digit, has attribute value lexval = 3, where the 3 is
supplied by the lexical analyzer. Its parent is for production 4, F  digit. The only semantic rule
associated with this production defines F. val = digit. lexval, which equals 3.

Figure 5.5: Annotated parse tree for 3 * 5

At the second child of the root, the inherited attribute T’.inh is defined by the semantic rule
T’.inh=F.val associated with production 1. Thus, the left operand, 3, for the * operator is passed
from left to right across the children of the root.
The production at the node for T’ is T’  * FT1’. (We retain the subscript 1 in the annotated
parse tree to distinguish between the two nodes for T’). The inherited attribute T1’.inh is defined
by the semantic rule T1’.inh = T’.inh x F. val associated with production 2.
With T’.inh = 3 and F.val = 5, we get T1’.inh = 15. At the lower node for T1’, the production is
T’  ε. The semantic rule T’.syn = T’.inh defines T1’.syn = 15. The syn attributes at the nodes
for T' pass the value 15 up the tree to the node for T, where T.val = 15.

4.2.3. Dependency graph

If an attribute b at a node in a parse tree depends on an attribute c, then the semantic rule for b at
that node must be evaluated after the semantic rule that defines c. The interdependencies among
the inherited and synthesized attributes at the nodes in a parse tree can be depicted by a directed
graph called a dependency graph.

Principles of Compiler Design (SEng 4031) 5 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

Dependency graphs are a useful tool for determining an evaluation order for the attribute
instances in a given parse tree. While an annotated parse tree shows the values of attributes, a
dependency graph helps us determine how those values can be computed.

A dependency graph depicts the flow of information among the attribute instances in a particular
parse tree; an edge from one attribute instance to another means that the value of the first is
needed to compute the second. Edges express constraints implied by the semantic rules. In more
detail:

 For each parse-tree node, say a node labeled by grammar symbol X, the dependency
graph has a node for each attribute associated with X.

 Suppose that a semantic rule associated with a production p defines the value of
synthesized attribute A.b in terms of the value of X.c (the rule may define A.b in terms of
other attributes in addition to X.c). Then, the dependency graph has an edge from X.c to
A.b. More precisely, at every node N labeled A where production p is applied, create an
edge to attribute b at N, from the attribute c at the child of N corresponding to this
instance of the symbol X in the body of the production.

 Suppose that a semantic rule associated with a production p defines the value of inherited
attribute B.c in terms of the value of X.a. Then, the dependency graph has an edge from
X.a to B.c. For each node N labeled B that corresponds to an occurrence of this B in the
body of production p, create an edge to attribute c at N from the attribute a at the node M
that corresponds to this occurrence of X. Note that M could be either the parent or a
sibling of N.

For example, consider the following production and semantic rule:

Production Semantic Rule


E  E1 + E2 E.val = E1.val + T.val

At every node N labeled E, with children corresponding to the body of this production, the
synthesized attribute val at N is computed using the values of val at the two children, labeled E
and T. Thus, a portion of the dependency graph for every parse tree in which this production is
used looks like the following figure. As a convention, we shall show the parse tree edges as
dotted lines, while the edges of the dependency graph are solid.

Figure 5.6: E.val is synthesized from El.val and E2.val

An example of a complete dependency graph appears in the following figure (Fig: 5.7). The
nodes of the dependency graph, represented by the numbers 1 through 9, correspond to the
attributes in the annotated parse tree in Fig. 5.5.

Principles of Compiler Design (SEng 4031) 6 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

Figure 5.7: Dependency graph for the annotated parse tree of Fig. 5.5

Nodes 1 and 2 represent the attribute lexval associated with the two leaves labeled digit. Nodes 3
and 4 represent the attribute val associated with the two nodes labeled F. The edges to node 3
from 1 and to node 4 from 2 result from the semantic rule that defines F.val in terms of
digit.lexva1. In fact, F.val equals digit.lexval, but the edge represents dependence, not equality.

Nodes 5 and 6 represent the inherited attribute T’.inh associated with each of the occurrences of
nonterminal T’. The edge to 5 from 3 is due to the rule T’.inh = F.val, which defines T’.inh at the
right child of the root from F.va1 at the left child. We see edges to 6 from node 5 for T’.inh and
from node 4 for F.val, because these values are multiplied to evaluate the attribute inh at node 6.

Nodes 7 and 8 represent the synthesized attribute syn associated with the occurrences of T’. The
edge to node 7 from 6 is due to the semantic rule T’.syn = T’.inh associated with production 3 in
Fig. 5.4. The edge to node 8 from 7 is due to a semantic rule associated with production 2.

Finally, node 9 represents the attribute T.val. The edge to 9 from 8 is due to the semantic rule,
T.val = T’.syn, associated with production 1.

A dependency graph for the input 5+3*4 constructed using the grammar and rules of Fig. 5.2 is
shown below.

Figure 5.8: Dependency graph for 5+3*4

Principles of Compiler Design (SEng 4031) 7 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

4.3. Bottom-Up Evaluation of S-attributed definitions


Synthesized attributes can be evaluated by a bottom-up parser as the input is being parsed. The
parser can keep the values of the synthesized attributes associated with the grammar symbols on
its stack. Whenever a reduction is made, the values of the new synthesized attributes are
computed from the attributes appearing on the stack for the grammar symbols on the right side of
the reducing production.
An SDD is S-attributed if every attribute is synthesized. When an SDD is S-attributed, we can
evaluate its attributes in any bottom-up order of the nodes of the parse tree. It is often especially
simple to evaluate the attributes by performing a postorder traversal of the parse tree and
evaluating the attributes at a node N when the traversal leaves N for the last time. That is, we
apply the function postorder, defined below, to the root of the parse tree.
Postorder (N)
{
for (each child C of N, from the left)
Postorder(C);
evaluate the attributes associated with node N;
}
S-attributed definitions can be implemented during bottom-up parsing, since a bottom-up parse
corresponds to a postorder traversal. Specifically, postorder corresponds exactly to the order in
which an LR parser reduces a production body to its head.
 Preorder and postorder traversals are two important special cases of depth-first traversals in
which we visit the children of each node from left to right. Often, we traverse a tree to
perform some particular action at each node. If the action is done when we first visit a node,
then we may refer to the traversal as a preorder traversal. Similarly, if the action is done just
before we leave a node for the last time, then we say it is a postorder traversal of the tree.
4.4. L-attributed definitions
When translation takes place during parsing, the order of evaluation of attributes is linked to the
order in which nodes of a parse tree are "created" by the parsing method. A natural order that
characterizes many top-down and bottom-up translation methods is the one obtained by applying
the procedure dfvisit in Fig. 5.9 to the root of a parse tree. We call this evaluation order the
depth-first order. Even if the parse tree is not actually constructed, it is useful to study translation
during parsing by considering depth-first evaluation of attributes at the nodes of a parse tree.

procedure dfvisit (n : node);


begin
for each child m of n, from left to right do begin
evaluate inherited attributes of m;
dfvisit (m)
end;
evaluate synthesized attributes of n

Fig. 5.9. Depth-first evaluation order for attributes in a parse tree.

Principles of Compiler Design (SEng 4031) 8 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

We now introduce a class of syntax-directed definitions, called L-attributed definitions, whose


attributes can always be evaluated in depth-first order. (The L is for "left”, because attribute
information appears to flow from left to right).
L-attributed definition is the second class of SDD. The idea behind this class is that, between the
attributes associated with a production body, dependency-graph edges can go from left to right,
but not from right to left (hence "L-attributed"). More precisely, each attribute must be either:
1. Synthesized, or
2. Inherited, but with the rules limited as follows. Suppose that there is a production
AX1X2… Xn, and that there is an inherited attribute Xi.a computed by a rule associated
with this production. Then the rule may use only:
(a) Inherited attributes associated with the head A.
(b) Either inherited or synthesized attributes associated with the occurrences of symbols
X1,X2,...,Xi-l located to the left of Xi.
(c) Inherited or synthesized attributes associated with this occurrence of Xi itself, but only in such
a way that there are no cycles in a dependency graph formed by the attributes of this Xi.
For example, the SDD in Fig. 5.4 is L-attributed. To see why, consider the semantic rules for
inherited attributes, which are repeated here for convenience:
Production Semantic Rule
T  FT’ T’.inh = F.val
T’  *FT1’ T1’.inh = T’.inh X F.val
The first of these rules defines the inherited attribute T '.inh using only F.val, and F appears to
the left of T’ in the production body, as required. The second rule defines T1'.inh using the
inherited attribute T '.inh associated with the head, and F.val, where F appears to the left of T1' in
the production body.

In each of these cases, the rules use information "from above or from the left", as required by the
class. The remaining attributes are synthesized. Hence, the SDD is L-attributed.

Any SDD containing the following production and rules cannot be L-attributed (s is synthesized
and i is for inherited):
Production Semantic Rule
A  BC A.s = B.b;
B.i = f (C.c, A.s);

The first rule, A.s = B.b, is a legitimate rule in either an S-attributed or L-attributed SDD. It
defines a synthesized attribute A.s in terms of an attribute at a child (that is, a symbol within the
production body).
The second rule defines an inherited attribute B.i, so the entire SDD cannot be S-attributed.
Further, although the rule is legal, the SDD cannot be L-attributed, because the attribute C.c is
used to help define B.i, and C is to the right of B in the production body. While attributes at
siblings in a parse tree may be used in L-attributed SDD's, they must be to the left of the symbol
whose attribute is being defined.

Principles of Compiler Design (SEng 4031) 9 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

The syntax-directed definition in the following table (Table: 5.2) is not L-attributed because the
inherited attribute Q.i of the grammar symbol Q depends on the attribute R.s of the grammar
symbol to its right.
Production Semantic Rule
A  LM L.i = l (A.i)
M.i = m(L.s)
A.s = f(M.s)
A  QR R.i = r(A.i)
Q.i = q(R.s)
A.s = f(Q.s)
Table 5.2: A non- L-attributed syntax directed definition.

4.5. Introduction to Intermediate code generation


In a compiler, the front end translates a source program into an intermediate representation, and
the back end generates the target code from this intermediate representation. Although a source
program can be translated directly into the target language, some benefits of using a machine-
independent intermediate form are:
 Retargeting to another machine is facilitated; a complier for a different machine can be
created by attaching a back end for the new machine to an existing front end.
 A machine-independent code optimizer can be applied to the intermediate representation
 If a compiler translates the source language to its target machine language without having
the option for generating intermediate code, then for each new machine, a full native
compiler is required.
 Intermediate code eliminates the need of a new full compiler for every unique machine
by keeping the analysis portion same for all the compilers.

4.6. Intermediate Languages


Intermediate codes can be represented in a variety of ways. The intermediate code generator
receives syntax directed translated syntactical constructs and represent them in any one of the
following intermediate codes: Syntax trees, postfix notations, and three-address code are
intermediate representations.
 Syntax Tree
During parsing, syntax-tree nodes are created to represent significant programming constructs.
As analysis proceeds, information is added to the nodes in the form of attributes associated with
the nodes. The choice of attributes depends on the translation to be performed. It is the natural
hierarchical structure of a source program.
A syntax tree (abstract tree) is a condensed form of parse tree useful for representing language
constructs. For example, for the string a + b, the parse tree in (a) below will be represented by
the syntax tree shown in (b); in a syntax tree, operators and keywords do not appear as leaves,
but rather are associated with the interior node that would be the parent of those leaves in the
parse tree.

Principles of Compiler Design (SEng 4031) 10 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

E +

E + E
a b
a b

a. Parse tree for a + b b. Abstract tree for a + b

The syntax trees will have operators as nodes but the parse trees will have the non-terminals as
their nodes. A CFG is essential for drawing a parse tree, but for a syntax tree an expression is
essential (input string). This type of intermediate code representation is not also widely used in
the compilers.

 Postfix Notation
We are familiar with postfix notations. This is also one type of intermediate code representation.
It is called reverse polish notation. The postfix notation is practical for an intermediate
representation as the operands are found just before the operator. In fact, the postfix notation is a
linearized representation of a syntax tree.
 e.g., 1 + 2 * 3 (this is infix expression) will be represented in the postfix notation as 1 2 3 *
+.
This type of intermediate code representation is not generally used in the compilers.

 Three-Address code
The three-address code is a type of intermediate code, which are popularly used in compilers.
Mostly these are used in optimizing compilers.
The three address code is a sequence of statements of the form:
X := Y op Z
where: X, Y, and Z are names, constants or compiler-generated temporaries; and op is an
operator such as integer or floating point arithmetic operator, or a logical operator on boolean-
valued data.
Note that:
 No built-up arithmetic operator is permitted
 Only one operator at the right side of the assignment is possible, i.e., x + y + z is not possible
 Similarly to postfix notation, the three-address code is a linearized representation of a syntax
tree. It has been given the name “three-address code” because each instruction usually
contains three-addresses, two for the operands and one for the result.
A source language expression like x+ y * z might be translated into a sequence
t1 := y * z
t2 := x + t1
where t1 and t2 are compiler-generated temporary names. This unraveling of complicated
arithmetic expressions and of nested flow-of-control statements makes three-address code
desirable for target code generation and optimization. The use of names for the intermediate

Principles of Compiler Design (SEng 4031) 11 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

values computed by a program allows three-address code to be easily rearranged - unlike postfix
notation.
For the assignment statement a:= b* -c + b* -c, the corresponding three-address code is the
following.
t1 := - c
t2 := b * t1
t3 := - c
t4 := b * t3
t5 := t2 + t4
a := t5

 Types of Three-Address Statements


Three-address statements are similar to assembly code. Statements can have symbolic labels and
there are statements for flow of control. A symbolic label represents the index of a three-address
statement in the array holding intermediate code.
Here are the common three-address code statements:
1. Assignment statements of the form x := y op z, where op is a binary arithmetic or
logical operator.
2. Assignment instructions of the form x : = op y, where op is a unary operation. Essential
unary operations include unary minus, logical negation, shift operators, and conversion
operators that, for example, convert integer number to a floating-point number.
3. Copy statements of the form x : = y where the value of y is assigned to x.
4. The unconditional jump goto L. The three-address statement with label L is the next to be
executed.
5. Conditional jumps such as if x relop y goto L. This instruction applies a relational
operator ( <, =, >=, etc,) to x and y, and executes the statement with label L next if x
stands in relation relop to y. If not, the three-address statement following if x reiop y goto
L is executed next, as in the usual sequence.
6. param x and call p, n for procedure calls and return y, where y representing a returned
value is optional. Their typical use is as the sequence of three-address statements
param x1
param x2

param xn
call p, n
generated as part of a call of the procedure p( x 1 , x2, . . . , xn ). The integer n indicating
the number of actual-parameters in ''call p , n" is not redundant because calls can be
nested.

Principles of Compiler Design (SEng 4031) 12 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

7. Indexed assignments of the form x:= y[i] and x[i] := y. The first of these sets x to the
value in the location i memory units beyond location y. The statement x[i]:= y sets the
contents of the location i units beyond x to the value of y. In both these instructions, x, y,
and i refer to data objects.
8. Address and pointer assignments of the form x: = &y, x := *y, and *x := y. The first of
these sets the value of x to be the location of y. Presumably y is a name, perhaps a
temporary, that denotes an expression with an l-value such as A[i, j], and x is a pointer
name or temporary. That is, the r-value of x is the l-value (location) of some object. In the
statement x: = *y, presumably y is a pointer or a temporary whose r-value is a location.
The r-value of x is made equal to the contents of that location. Finally, *x := y sets the r-
value of the object pointed to by x to the r-value of y.
The choice of allowable operators is an important issue in the design of an intermediate form.
The operator set must clearly be rich enough to implement the operations in the source
language. A small operator set is easier to implement on a new target machine. However, a
restricted instruction set may force the front end to generate long sequences of statements for
some source language operations. The optimizer and code generator may then have to work
harder if god code is to be generated.

 Implementation of Three-Address Statements


A three address statement is an abstract form of intermediate ale. In a compiler, these statements
can be implemented as records with fields for the operator and the operands. Three such
representations are quadruples, triples, and indirect triples.

1) Quadruples
A quadruple is a record structure with four fields, which we call op, arg1, arg2, and result. The
op field contains an internal code for the operator. Note: op is operator, arg1 is argument1 and
arg2 is argument2. For instance, the three-address instruction x = y + z is represented by placing
+ in op, y in arg1, z in arg2, and x in result. The following are some exceptions to this rule:
 Instructions with unary operators like x = - y or x = y do not use arg2.
 Operators like param use neither arg2 nor result.
 Conditional and unconditional jumps put the target label in result.
Example: Three-address code for the assignment a = b * - c + b * - c; appears in the following.
The special operator minus is used to distinguish the unary minus operator, as in - c, from the
binary minus operator, as in b - c. Note that the unary-minus "three-address" statement has only
two addresses, as does the copy statement a = t5.
tl = minus c
t2 = b * tl
t3 = minus c
t4 = b * t3
t5 = t2 + t4
a = t5

Principles of Compiler Design (SEng 4031) 13 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

The quadruple representation for this three-address code is the following.


op arg1 arg2 result
(0) minus c t1
(1) * b t1 t2
(2) minus c t3
(3) * b t3 t4
(4) + t2 t4 t5
(5) = t5 a
The contents of fields arg1, arg2, and result are normally pointers to the symbol-table entries for
the names represented by these fields. If so, temporary names must be entered into the symbol
table as they are created.

2) Triples
Triple is also a record structure used to represent the three address codes, which has only three
fields: op, arg1 and arg2.
The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table (for
programmer-defined names or constants) or pointers into the triple structure (for temporary
values).
Note that the result field in quadruple representation is used primarily for temporary names.
Using triples, we refer to the result of an operation x op y by its position, rather than by an
explicit temporary name. Thus, instead of the temporary t1 in quadruple representation, a triple
representation would refer to position (0). Parenthesized numbers represent pointers into the
triple structure itself, while symbol-table pointers are represented by the names themselves. In
practice, the information needed to interpret the different kinds of entries in the arg1 and arg2
fields can be encoded into the op field or some additional fields.
The triples representation corresponding to the above quadruples is the following. Note that the
copy statement a = t5 is encoded in the triple representation by placing a in the arg1 field and
using the operator assign.
op arg1 arg2
(0) minus c
(1) * b (0)
(2) minus c
(3) * b (2)
(4) + (1) (3)
(5) assign a (4)

A ternary operation like x[i] = y requires two entries in the triple structure; for example, we can
put x and i in one triple and y in the next. Similarly, x = y[i] can implemented by treating it as if

Principles of Compiler Design (SEng 4031) 14 Prepared by L.A.


DMU College of Technology Software Engineering Academic Program

it were the two instructions t = y[i] and x = t, where t is a compiler-generated temporary. Note
that the temporary t does not actually appear in a triple, since temporary values are referred to by
their position in the triple structure.
op arg1 arg2
(0) []= x i
(1) assign (0) y

(a) x [i] = y

op arg1 arg2
(0) =[] x i
(1) assign y (0)

(b) y = x [i]
A benefit of quadruples over triples can be seen in an optimizing compiler, where instructions
are often moved around. With quadruples, if we move an instruction that computes a temporary
t, then the instructions that use t require no change. With triples, the result of an operation is
referred to by its position, so moving an instruction may require us to change all references to
that result. This problem does not occur with indirect triples, which we consider next.

3) Indirect triples
Another implementation of three-address code that has been considered is that of listing pointers
to triples, rather than listing the triples themselves. This implementation is naturally called
indirect triples. With indirect triples, an optimizing compiler can move an instruction by
reordering the instruction list, without affecting the triples themselves.

This representation is an enhancement over triples representation. It uses pointers instead of


position to store results. This enables the optimizers to freely re-position the sub expression to
produce an optimized code.

For example, let us use an array instruction to list pointers to triples in the desired order. Then,
the triples for the above example might be represented in indirect triples as follows.

Principles of Compiler Design (SEng 4031) 15 Prepared by L.A.

You might also like