0% found this document useful (0 votes)
145 views31 pages

Unit 4 and 5

The document discusses parse trees, syntax trees, and syntax directed translation. It defines parse trees and syntax trees, noting that syntax trees are more compact representations of parse trees. It provides examples of constructing parse trees and syntax trees. It also discusses key aspects of syntax directed translation like attributes, expansion and reduction, S-attributed and L-attributed SDT, and three address code.

Uploaded by

Shoaib Sidd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views31 pages

Unit 4 and 5

The document discusses parse trees, syntax trees, and syntax directed translation. It defines parse trees and syntax trees, noting that syntax trees are more compact representations of parse trees. It provides examples of constructing parse trees and syntax trees. It also discusses key aspects of syntax directed translation like attributes, expansion and reduction, S-attributed and L-attributed SDT, and three address code.

Uploaded by

Shoaib Sidd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Unit 4 and 5:

1. Parse tree:
● Parse tree is the graphical representation of symbol. The symbol can be terminal
or non-terminal.

● In parsing, the string is derived using the start symbol. The root of the parse tree
is that start symbol.

● It is the graphical representation of symbol that can be terminals or


non-terminals.

● Parse tree follows the precedence of operators. The deepest sub-tree traversed
first

The parse tree follows these points:

● All leaf nodes have to be terminals.

● All interior nodes have to be non-terminals.

● In-order traversal gives original input string.

Example:

Production rules:

1. S= S + S | S *S
2. S = a|b|c

Input: a*b+c

Step 1:
Step 2:

Step 3:

Step 4:
Step 5:
2. Syntax Tree:

● A syntax tree is a tree in which each leaf node represents an operand,


while each inside node represents an operator.
● The Parse Tree is abbreviated as the syntax tree.
● Syntax trees are abstract or compact representation of parse trees.
● They are also called as Abstract Syntax Trees.

Example-
Rules for constructing a syntax tree:

● mknode (op, left, right) − It generates an operator node with label op and two
field including pointers to left and right.
● mkleaf (id, entry) − It generates an identifier node with label id and the field
including the entry, a pointer to the symbol table entry for the identifier.
● mkleaf (num, val) − It generates a number node with label num and a field
including val, the value of the number.

For example, construct a syntax tree for an expression


a−4+c

p1− mkleaf (id, entry a);

p2− mkleaf (num, 4);

p3− mknode ( ′−′, p1, p2);

p4− mkleaf(id, entry c);

p5− mknode(′+′, p3, p4);

The tree is generated in a bottom-up fashion. The function calls mkleaf (id, entry a) and
mkleaf (num 4) construct the leaves for a and 4. The pointers to these nodes are stored
using p1and p2. The call mknodes (′−′, p1, p2 ). Do same for remaining.
3. Parse Trees Vs Syntax Trees-
Parse Tree Syntax Tree

Parse tree is a graphical Syntax tree is the compact form of a parse


representation of the replacement tree.
process in a derivation.

Each interior node represents a Each interior node represents an operator.


grammar rule.
Each leaf node represents an operand.
Each leaf node represents a terminal.

Parse trees provide every Syntax trees do not provide every


characteristic information from the real characteristic information from the real syntax.
syntax.

Parse trees are comparatively less Syntax trees are comparatively more dense
dense than syntax trees. than parse trees.

4. Syntax directed translation


In syntax directed translation, along with the grammar we associate some informal
notations and these notations are called as semantic rules.

So we can say that

1. Grammar + semantic rule = SDT (syntax directed translation)

● Every non-terminal can get one or more than one attribute or sometimes 0
attribute depending on the type of the attribute.

● The value of these attributes is evaluated by the semantic rules associated with
the production rule.

● In the semantic rule, attribute is VAL and an attribute may hold anything like a
string, a number, a memory location and a complex record

● In Syntax directed translation, whenever a construct encounters in the


programming language then it is translated according to the semantic rules
define in that particular programming language.

Example

Production Semantic Rules

E→E+T E.node = make_node(‘+’, E.node, T.node)

E→T E.node := T.node

T→T*F T.node := make_node(‘*’, T.node, F.node)

F → (T) F.node := T.node

F —-> id F.node = make_leaf(id, id.entry)

F → num F.node = make_leaf(num, num.value)

E.val is one of the attributes of E.


num.val is the attribute returned by the lexical analyzer.

5. Attribute:
● Semantic information is stored in attributes associated with terminal and

nonterminal symbols of the grammar.

● The attributes are divided into two groups: Synthesized attributes and Inherited

attribute

● A → XY

1. Synthesized attributes:
● A Synthesized attribute is an attribute of the non-terminal on the left-hand
side of a production.
● Synthesized attributes represent information that is being passed up the
parse tree.
● The attribute can take value only from its children.
● For eg. let’s say A -> BC is a production of a grammar, and A’s attribute is
dependent on B’s attributes or C’s attributes then it will be a synthesized
attribute.
● To illustrate, assume the following production:
S → ABC
● If S is taking values from its child nodes (A, B, C), then it is said to be a

synthesized attribute, as the values of ABC are synthesized to S.

2. Inherited attributes:
● An attribute of a nonterminal on the right-hand side of a production is

called an inherited attribute.

● The attribute can take value either from its parent or from its siblings.
● For example, let’s say A -> BC is a production of a grammar and B’s

attribute is dependent on A’s attributes or C’s attributes then it will be

inherited attribute.

● To illustrate, assume the following production:


S → ABC
● A can get values from S, B and C. B can take values from S, A, and C. Likewise,

C can take values from S, A, and B.

5. Expansion and Reduction:


Expansion : When a non-terminal is expanded to terminals as per a grammatical rule

Reduction : When a terminal is reduced to its corresponding non-terminal according to

grammar rules. Syntax trees are parsed top-down and left to right. Whenever reduction

occurs, we apply its corresponding semantic rules (actions).

6. S-attributed and L- attributed SDT:


S-attributed SDT:
● If an SDT uses only synthesized attributes, it is called as S-attributed SDT.
● S-attributed SDTs are evaluated in bottom-up parsing, as the values of the
parent nodes depend upon the values of the child nodes.

● Semantic actions are placed in rightmost place of RHS.

L-attributed SDT:
● If an SDT uses both synthesized attributes and inherited attributes with a
restriction that inherited attribute can inherit values from left siblings only, it
is called as L-attributed SDT.

● In L-attributed SDTs, a non-terminal can get values from its parent, child, and
sibling nodes. As in the following production

● S → ABC

● S can take values from A, B, and C (synthesized). A can take values from S only.
B can take values from S and A. C can get values from S, A, and B. No
non-terminal can get values from the sibling to its right.

● For example,
A -> XYZ {Y.S = A.S, Y.S = X.S, Y.S = Z.S}
is not an L-attributed grammar since Y.S = A.S and Y.S = X.S are allowed
but Y.S = Z.S violates the L-attributed SDT definition as attributed is
inheriting the value from its right sibling.

7. Three Address code:
● Three-address code is an intermediate code. It is used by the optimizing
compilers.

● In three-address code, the given expression is broken down into several separate
instructions. These instructions can easily translate into assembly language.

● Each Three address code instruction has at most three operands, or it can have 2
operands. It is a combination of assignment and a binary operator.

The characteristics of Three Address instructions are-

● They are generated by the compiler for implementing Code Optimization.


● They use maximum three addresses to represent any statement.
● They are implemented as a record with the address fields.

Example−

Expression a = b + c + d can be converted into the following Three Address Code.

t1 = b + c

t2 = t1 + d

a = t2

where t1 and t2 are temporary variables generated by the compiler. Most of the time a

statement includes less than three references, but it is still known as a three address

statement.

Types of Three-Address Code Statements

Following are the various types of three address statements −


1. Assignment− The types of Assignment statements are,

x = y op z and x = op y

Here,

● x, y and z are the operands.

● op represents the operator.

It assigns the result obtained after solving the right side expression of the assignment

operator to the left side operand.

x = y, value of y is assigned to x.

2. Unconditional Jump-

goto X , Here, X is the tag or label of the target statement.

On executing the statement,

● The control is sent directly to the location specified by label X.

● All the statements in between are skipped.

3. Conditional Jump-

If x relop y goto X

Here,

● x & y are the operands.

● X is the tag or label of the target statement.

● relop is a relational operator.


If the condition “x relop y” gets satisfied, then-

● The control is sent directly to the location specified by label X.

● All the statements in between are skipped.

If the condition “x relop y” fails, then-

● The control is not sent to the location specified by label X.

● The next statement appearing in the usual sequence is executed.

4. Procedure Call-

A call to the procedure P(x 1, x2 … . . xn) with the parameters x1, x2 … . . xn

Param x1

Param x2

Param xn

Here, P is a function which takes x as a parameter.

5. Array Statements −

x = y[i], value of ith location of array y is assigned to x.

x[i] = y, the value of y is assigned to ith location of array x.

Problem-01:
Write Three Address Code for the following expression-

a=b+c+d
Three Address Code for the given expression is-

(1) T1 = b + c

(2) T2 = T1 + d

(3) a = T2

Problem-02:
Write Three Address Code for the following expression-

If A < B then 1 else 0

Solution-
Three Address Code for the given expression is-

(1) If (A < B) goto (4)

(2) T1 = 0

(3) goto (5)

(4) T1 = 1

(5)

Problem-03:
Write Three Address Code for the following expression-

If A < B and C < D then t = 1 else t = 0

Solution-
Three Address Code for the given expression is-
(1) If (A < B) goto (3)

(2) goto (4)

(3) If (C < D) goto (6)

(4) t = 0

(5) goto (7)

(6) t = 1

(7)

There are three implementations used for three address code statements which are as

follows −

● Quadruples

● Triples

● Indirect Triples

Quadruples

Quadruple is a structure that contains at most four fields, i.e., operator, Argument 1,
Argument 2, and Result.

Operator Argument 1 Argument 2 Result

For a statement a = b + c, Quadruple Representation places + in the operator field, a in

the Argument 1 field, b in Argument 2, and c in Result field.

For example− Consider the Statement

a=b+c*d

First, convert this statement into Three Address code

∴ Three Address code will be

t1 = c ∗ d

t2 = b + t1

a = t2.

After construction of the Three Address code, it will be changed to Quadruple

representation as follows−

Quadruple

Location Operator arg 1 arg 2 Result

(0) * c d t1
(1) + b t1 t2

(2) = t2 a

The content of fields arg 1, arg 2 and Result are pointers to symbol table entries for

names represented by these entries.

Triples

This three address code representation contains three (3) fields, i.e., one for operator

and two for arguments (i.e., Argument 1 and Argument 2)

Operator Argument 1 Argument 2

In this representation, temporary variables are not used. Instead of temporary variables,

we use a number in parenthesis to represent a pointer to that particular record of the

symbol table.

For example, consider the statement

a=b+c*d

First of all, it will be converted to Three Address Code

∴ t1 = c ∗ d

t2 = b + t1

a = t2
Triple for this Three Address Code will be −

Triple

Location Operator arg 1 arg 2

(0) ∗ c d

(1) + b (0)

(2) = a (1)

Here (0) represents a pointer that refers to the result c * d, which can be used in further

statements, i.e., when c * d is added with b. This result will be saved at the position

pointer by (1). Pointer (1) will be used further when it is assigned to a.

Indirect Triples

The indirect triple representation uses an extra array to list the pointers to the triples in

the desired order. This is known as indirect triple representation.

Indirect Triple will be


In this, it can only need to refer to a pointer (0), (1), (2)which will further refer

pointers(11), (12), (13) respectively & then pointers (11), (12), (13) point to triples that is

why this representation is called indirect triple representation.

INTERMEDIATE CODE GENERATION


INTRODUCTION

The front end translates a source program into an intermediate representation from
which the back end generates target code.

Benefits of using a machine-independent intermediate form are:


1. Compiler for a different machine can be created by attaching a back end for
the new machine to an existing front end.
2. A machine-independent code optimizer can be applied to the intermediate
representation.

Fig. 3.1 Intermediate code generator


Intermediate code can be either language specific (e.g., Bytecode for Java) or

language independent (three-address code).

The following are commonly used intermediate code representation:


1. Postfix Notation –
The ordinary (infix) way of writing the sum of a and b is with operator in the
middle : a + b
The postfix notation for the same expression places the operator at the right end as
ab +.
No parentheses are needed in postfix notation.
In postfix notation the operator follows the operand.
Example – The postfix representation of the expression (a – b) * (c + d) + (a – b)
is : (ab-)*(cd+ )+(ab-)
ab-cd+*ab-+
ab – cd + *ab -+.

2. Three-Address Code –
A statement involving no more than three references(two for operands and one for
result) is known as three address statement.
Example – The three address code for the expression a + b * c + d :
T1=b*c
T2=a+T1
T3=T2+d
T 1 , T 2 , T 3 are temporary variables.

3. Syntax Tree –:
The operator and keyword nodes of the parse tree are moved to their parents and a
chain of single productions is replaced by single link in syntax tree the internal
nodes are operators and child nodes are operands. To form syntax tree put
parentheses in the expression, this way it's easy to recognize which operand should
come first.
Example –
x = (a + b * c) / (a – b * c)

1. Translation of Assignment Statements


In the syntax directed translation, assignment statement is mainly deals with

expressions. The expression can be of type real, integer, array and records.

Consider the grammar

1. S → id := E
2. E → E1 + E2
3. E → E1 * E2
4. E → (E1)
5. E → id
For this given grammar SDT= Production rule + Semantic action
The translation scheme of above grammar is given below:

The EMIT function is used to generate the three address code

The newtemp( ) function is used to generate the temporary variables.

Production rule Semantic actions

S → id :=E {
p.value = look_up(id.name);
if p ≠ nil then
Emit (p = E.place) //GEN()
else
error;
}

E → E1 + E2 {E.place = newtemp();
Emit (E.place = E1.place '+' E2.place)
}

E → E1 * E2 {E.place = newtemp();
Emit (E.place = E1.place '*' E2.place)
}
E → (E1) {E.place = E1.place}

E → id {p = look_up(id.name);
If p ≠ nil then
Emit (p = E.place)
Else
Error;
}

Consider Example x= a+b*c

3. Boolean expressions

Boolean expressions have two primary purposes.

They are used for computing the logical values.

They are also used as conditional expression using if-then-else or while-do.

Consider the grammar

1. E → E OR E
2. E → E AND E
3. E → NOT E
4. E → (E)
5. E → id relop id
6. E → TRUE
7. E → FALSE

The main use of the Boolean expression is the following:

● Boolean expressions are used as conditional expressions in statements that


alter the flow of control.
● A Boolean expression can compute logical values, true or false.

The comparison operators <, <=, =, !=, >, or => is represented by rel.op.

We also assume that || and && are left-associative. || has the lowest precedence and

than &&, and !.

The E → id relop id2 contains the next_state and it gives the index of next three

address statements in the output sequence.

Production rule Semantic actions

E → E1 OR E2 {E.place = newtemp();
Emit(E.place = E1.place OR E2.place)
}

E → E1 AND E2 {E.place = newtemp();


Emit (E.place = E1.place AND E2.place)
}
E → NOT E1 {E.place = newtemp();
Emit (E.place = NOT E1.place)
}

E → (E1) {E.place = E1.place}

E → id relop id2 {E.place = newtemp();


Emit (if id1.place relop id2.place goto next_state + 3);
EMIT (E.place =0)
EMIT (goto next_state + 2)
EMIT (E.place =1)
}

E → TRUE {E.place := newtemp();


Emit (E.place =1)
}

E → FALSE {E.place := newtemp();


Emit (E.place = 0)
}

Numerical Representation

Here, 1 denotes true and 0 denotes false.


For example :
1. The translation for a or b and not c is the three-address sequence

t1 : = not c
t2 : = b and t1
t3 : = a or t2

2. A relational expression a < b is equivalent to the conditional statement


if a < b then 1 else 0 which can be translated into the three-address code sequence
(see statement numbers at 100) :
Example 1:
100 : if a < b goto 103
101 : t : = 0
102 : goto 104
103 : t : = 1
104 :
Example 2:
Translation of a < b or c < d and e < f
100 : if a < b goto 103
101 : t1 : = 0
102 : goto 104
103 : t1 : = 1
104 : if c < d goto 107
105 : t2 : = 0
106 : goto 108
107 : t2 : = 1
108: if e < f goto 111
109 : t3 : = 0
110 : goto 112
111 : t3 : = 1
112 : t4 := t2 AND t3
113 : t5 := t1 OR t4

3. Procedures call

Procedure is an important and frequently used programming construct for a

compiler.

It is used to generate good code for procedure calls and returns.

Calling sequence:

The translation for a call includes a sequence of actions taken on entry and exit

from each procedure. Following actions take place in a calling sequence:


● When a procedure call occurs then space is allocated for the activation
record.

● Evaluate the argument of the called procedure.

● Establish the environment pointers to enable the called procedure to access


data in enclosing blocks.

● Save the state of the calling procedure so that it can resume execution after
the call.

● Also save the return address. It is the address of the location to which the
called routine must transfer after it is finished.

● Finally generate a jump to the beginning of the code for the called procedure

Syntax:

Procedure call/Return − A call to the procedure P(x1, x2 … . . xn) with the


parameters x1, x2 … . . xn is written as

param x1

param x2

………..

param xn

call p, n

Here param refers to the parameter, & call p, n will call procedure p with n
arguments.

Let us consider a grammar for a simple procedure call statement


1. S → call id(Elist)
2. Elist → Elist, E
3. Elist → E

A suitable transition scheme for procedure call would be:

Production Rule Semantic Action

S → call id(Elist) for each item p on QUEUE do

GEN (param p)

GEN (call id.PLACE)

Elist → Elist, E append E.PLACE to the end of QUEUE

Elist → E initialize QUEUE to contain only

E.PLACE

4. Switch Statement

● Switch statement in C tests the value of a variable and compares it with

multiple cases.

● Once the case match is found, a block of statements associated with that

particular case is executed.


● Each case in a block of a switch has a different name/number which is

referred to as an identifier.

● The value provided by the user is compared with all the cases inside the

switch block until the match is found.

● If a case match is NOT found, then the default statement is executed, and the

control goes out of the switch block.

Switch Case Three Address code for switch

switch E Code to evaluate E into T

begin goto TEST

case V1: S1 L1: code for S1

case V2: S2 goto NEXT

. L2: code for S2

case Vn-1: Sn-1 goto NEXT

default: Sn .
end .

Ln-1: code for Sn-1

goto NEXT

Ln : code for Sn

goto NEXT
TEST: if T = V1 goto L1

if T = V2 goto L2

if T = Vn-1 goto Ln-1

goto Ln

NEXT:

You might also like