0% found this document useful (0 votes)
13 views153 pages

CD Merged

Compiler merged notes

Uploaded by

sirireddy0606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views153 pages

CD Merged

Compiler merged notes

Uploaded by

sirireddy0606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 153

LR Parsing

• The most prevalent type of bottom-up parsers


• LR(k), mostly interested on parsers with k<=1
• Why LR parsers?
• Table driven
• Can be constructed to recognize all programming language constructs
• Most general non-backtracking shift-reduce parsing method
• Can detect a syntactic error as soon as it is possible to do so
• Class of grammars for which we can construct LR parsers are superset of
those which we can construct LL parsers
States of an LR parser
• States represent set of items
• An LR(0) item of G is a production of G with the dot at some position
of the body:
• For A->XYZ we have following items
• A->.XYZ
• A->X.YZ
• A->XY.Z
• A->XYZ.
• In a state having A->.XYZ we hope to see a string derivable from XYZ next on
the input.
• What about A->X.YZ?
Constructing canonical LR(0) item sets
• Augmented grammar:
• G with addition of a production: S’->S
• Closure of item sets:
• If I is a set of items, closure(I) is a set of items constructed from I by the
following rules:
• Add every item in I to closure(I)
• If A->α.Bβ is in closure(I) and B->γ is a production then add the item B-
>.γ to clsoure(I).
• Example:
I0=closure({[E’->.E]}
E’->E E’->.E
E -> E + T | T E->.E+T
T -> T * F | F E->.T
T->.T*F
F -> (E) | id T->.F
F->.(E)
F->.id
Constructing canonical LR(0) item sets (cont.)
• Goto (I,X) where I is an item set and X is a grammar
symbol is closure of set of all items [A-> αX. β] where
[A-> α.X β] is in I
• Example I1
E’->E.
E E->E.+T
I0=closure({[E’->.E]}
E’->.E I2
E->.E+T T
E’->T.
E->.T T->T.*F
T->.T*F I4
T->.F ( F->(.E)
F->.(E) E->.E+T
E->.T
F->.id T->.T*F
T->.F
F->.(E)
F->.id
Canonical LR(0) items
Void items(G’) {
C= CLOSURE({[S’->.S]});
repeat
for (each set of items I in C)
for (each grammar symbol X)
if (GOTO(I,X) is not empty and not in C)
add GOTO(I,X) to C;
until no new set of items are added to C on a round;
}
LR-Parsing model

INPUT a1 … ai … an $

Sm LR Parsing Program Output


Sm-1

$
ACTION GOTO
LR(0)
canonical
collections
LR(0) Parsing table
LR(0) Parsing
SLR Parsing
Canonical
Collections

Table in
next page
SLR Parsing
Table
SLR Parsing
CLR
Canonical
Collections
CLR Parsing
Table
LALR
Parsing Table
With LALR (lookahead LR) parsing,
we attempt to reduce the number
of states in an LR(1) parser by merg
ing similar states. This reduces the
number of states to the same as
SLR(1), but still retains some of the
power of the LR(1) lookaheads
LALR Parsing
Syntax-Directed Translation
Module 3: Syntax Directed Definition – Evaluation Order -
Applications of Syntax Directed Translation - Syntax Directed
Translation Schemes - Implementation of L-attributed Syntax Directed
Definition.

1
Syntax-Directed Translation

1. We associate information with the programming language constructs by attaching


attributes to grammar symbols.

2. Values of these attributes are evaluated by the semantic rules associated with the
production rules.

3. Evaluation of these semantic rules:


– may generate intermediate codes
– may put information into the symbol table
– may perform type checking
– may issue error messages
– may perform some other activities
– in fact, they may perform almost any activities.

4. An attribute may hold almost any thing.


– a string, a number, a memory location, a complex record.

2
Syntax-Directed Definitions and Translation Schemes
1. When we associate semantic rules with productions, we use two
notations:
– Syntax-Directed Definitions
– Translation Schemes

A. Syntax-Directed Definitions:
– give high-level specifications for translations
– hide many implementation details such as order of evaluation of semantic actions.
– We associate a production rule with a set of semantic actions, and we do not say when they
will be evaluated.

B. Translation Schemes:
– indicate the order of evaluation of semantic actions associated with a production rule.
– In other words, translation schemes give a little bit information about implementation
details.
3
Syntax-Directed Translation
• Conceptually with both the syntax directed translation and translation
scheme we
– Parse the input token stream
– Build the parse tree
– Traverse the tree to evaluate the semantic rules at the parse tree nodes.

Input string parse tree dependency graph evaluation order for


semantic rules

Conceptual view of syntax directed translation

4
Syntax-Directed Definitions
1. A syntax-directed definition is a generalization of a context-free
grammar in which:
– Each grammar symbol is associated with a set of attributes.
– This set of attributes for a grammar symbol is partitioned into two subsets called
• synthesized and
• inherited attributes of that grammar symbol.
– Each production rule is associated with a set of semantic rules.

2. The value of an attribute at a parse tree node is defined by the semantic rule
associated with a production at that node.
3. The value of a synthesized attribute at a node is computed from the values of
attributes at the children in that node of the parse tree
4. The value of an inherited attribute at a node is computed from the values of
attributes at the siblings and parent of that node of the parse tree

5
Syntax-Directed Definitions
Examples:
Synthesized attribute : E→E1+E2 { E.val =E1.val + E2.val}
Inherited attribute :A→XYZ {Y.val = 2 * A.val}

1. Semantic rules set up dependencies between attributes which can be


represented by a dependency graph.

2. This dependency graph determines the evaluation order of these


semantic rules.

3. Evaluation of a semantic rule defines the value of an attribute. But a


semantic rule may also have some side effects such as printing a value.
6
Annotated Parse Tree
1. A parse tree showing the values of attributes at each node is called
an annotated parse tree.
2. Values of Attributes in nodes of annotated parse-tree are either,
– initialized to constant values or by the lexical analyzer.
– determined by the semantic-rules.

3. The process of computing the attributes values at the nodes is called


annotating (or decorating) of the parse tree.

4. Of course, the order of these computations depends on the


dependency graph induced by the semantic rules.

7
Syntax-Directed Definition
In a syntax-directed definition, each production A→α is associated
with a set of semantic rules of the form:
b=f(c1,c2,…,cn)
where f is a function and b can be one of the followings:

 b is a synthesized attribute of A and c1,c2,…,cn are attributes of the


grammar symbols in the production ( A→α ).
OR
 b is an inherited attribute one of the grammar symbols in α (on the
right side of the production), and c1,c2,…,cn are attributes of the
grammar symbols in the production ( A→α ).

8
Attribute Grammar
• So, a semantic rule b=f(c1,c2,…,cn) indicates that the attribute b
depends on attributes c1,c2,…,cn.

• In a syntax-directed definition, a semantic rule may just evaluate


a value of an attribute or it may have some side effects such as
printing values.

• An attribute grammar is a syntax-directed definition in which the


functions in the semantic rules cannot have side effects (they can only
evaluate values of attributes).

9
Syntax-Directed Definition -- Example

Production Semantic Rules


L→En print(E.val)
E → E1 + T E.val = E1.val + T.val
E→T E.val = T.val
T → T1 * F T.val = T1.val * F.val
T→F T.val = F.val
F→(E) F.val = E.val
F → digit F.val = digit.lexval
1. Symbols E, T, and F are associated with a synthesized attribute val.
2. The token digit has a synthesized attribute lexval (it is assumed that it is evaluated by
the lexical analyzer).
3. Terminals are assumed to have synthesized attributes only. Values for attributes of
terminals are usually supplied by the lexical analyzer.
4. The start symbol does not have any inherited attribute unless otherwise stated.
10
S-attributed definition
• A syntax directed translation that uses synthesized attributes exclusively
is said to be a S-attributed definition.

• A parse tree for a S-attributed definition can be annotated by evaluating


the semantic rules for the attributes at each node, bottom up from leaves
to the root.

11
Annotated Parse Tree -- Example
Input: 5+3*4 L

E.val=17 n

E.val=5 + T.val=12

T.val=5 T.val=3 * F.val=4

F.val=5 F.val=3 digit.lexval=4

digit.lexval=5 digit.lexval=3

12
Dependency Graph
Input: 5+3*4 L

E.val=17 n

E.val=5 + T.val=12

T.val=5 T.val=3 * F.val=4

F.val=5 F.val=3 digit.lexval=4

digit.lexval=5 digit.lexval=3

13
Inherited attributes
• An inherited value at a node in a parse tree is defined in terms of
attributes at the parent and/or siblings of the node.

• Convenient way for expressing the dependency of a programming


language construct on the context in which it appears.

• We can use inherited attributes to keep track of whether an identifier


appears on the left or right side of an assignment to decide whether the
address or value of the assignment is needed.

• Example: The inherited attribute distributes type information to the


various identifiers in a declaration.
14
Syntax-Directed Definition – Inherited Attributes

Production Semantic Rules


D→TL L.in = T.type
T → int T.type = integer
T → real T.type = real
L → L1 id L1.in = L.in, addtype(id.entry,L.in)
L → id addtype(id.entry,L.in)

1. Symbol T is associated with a synthesized attribute type.

2. Symbol L is associated with an inherited attribute in.

15
Annotated parse tree
Input: real p,q,r annotated parse tree
parse tree D
D

T L T.type=real L1.in=real

real L , id3 real L1.in=real , id3

L , id2 L1.in=real , id2

id1 id1

16
Dependency Graph
• Directed Graph
• Shows interdependencies between attributes.
• If an attribute b at a node depends on an attribute c, then the semantic rule for b at that
node must be evaluated after the semantic rule that defines c.
• Construction:
– Put each semantic rule into the form b=f(c1,…,ck) by introducing dummy
synthesized attribute b for every semantic rule that consists of a procedure call.
– E.g.,
• LEn print(E.val)
• Becomes: dummy = print(E.val)

– The graph has a node for each attribute and an edge to the node for b from the
node for c if attribute b depends on attribute c.

17
Dependency Graph Construction
for each node n in the parse tree do
for each attribute a of the grammar symbol at n do
construct a node in the dependency graph for a

for each node n in the parse tree do


for each semantic rule b = f(c1,…,cn)
associated with the production used at n do
for i= 1 to n do
construct an edge from
the node for ci to the node for b

18
Dependency Graph Construction
• Example
• Production Semantic Rule
E→E1 + E2 E.val = E1.val + E2.val

E . val

E1. val + E2 . Val


• E.val is synthesized from E1.val and E2.val
• The dotted lines represent the parse tree that is not part of the
dependency graph.

19
Dependency Graph
D→TL L.in = T.type
T → int T.type = integer
T → real T.type = real
L → L1 id L1.in = L.in,
addtype(id.entry,L.in)

L → id addtype(id.entry,L.in)

20
Evaluation Order
• A topological sort of a directed acyclic graph is any ordering
m1,m2…mk of the nodes of the graph such that edges go from nodes
earlier in the ordering to later nodes.
. i.e if there is an edge from mi to mj them mi appears before mj in the ordering
• Any topological sort of dependency graph gives a valid order for
evaluation of semantic rules associated with the nodes of the parse tree.
• The dependent attributes c1,c2….ck in b=f(c1,c2….ck ) must be available before f
is evaluated.

• Translation specified by Syntax Directed Definition

• Input string parse tree dependency graph evaluation order for


semantic rules

21
Evaluation Order

• a4=real;
• a5=a4;
• addtype(id3.entry,a5);
• a7=a5;
• addtype(id2.entry,a7);
• a9=a7;
• addtype(id1.entry,a9);

22
Evaluating Semantic Rules
• Parse Tree methods
– At compile time evaluation order obtained from the topological sort of dependency
graph.
– Fails if dependency graph has a cycle
• Rule Based Methods
– Semantic rules analyzed by hand or specialized tools at compiler construction
time
– Order of evaluation of attributes associated with a production is pre-determined at
compiler construction time
• Oblivious Methods
– Evaluation order is chosen without considering the semantic rules.
– Restricts the class of syntax directed definitions that can be implemented.
– If translation takes place during parsing order of evaluation is forced by parsing
method.

23
Syntax Trees

Syntax-Tree
– an intermediate representation of the compiler’s input.
– A condensed form of the parse tree.
– Syntax tree shows the syntactic structure of the program while
omitting irrelevant details.
– Operators and keywords are associated with the interior nodes.
– Chains of simple productions are collapsed.
Syntax directed translation can be based on syntax tree as well as
parse tree.

24
Syntax Tree-Examples
Expression: if B then S1 else S2
+ if - then - else

5 * B S1 S2
Statement:
3 4 • Node’s label indicates what kind
• Leaves: identifiers or constants of a statement it is
• Internal nodes: labelled with • Children of a node correspond to
operations the components of the statement
• Children: of a node are its
operands
25
Constructing Syntax Tree for Expressions
• Each node can be implemented as a record with several fields.
• Operator node: one field identifies the operator (called label of the node) and
remaining fields contain pointers to operands.
• The nodes may also contain fields to hold the values (pointers to values) of
attributes attached to the nodes.

• Functions used to create nodes of syntax tree for expressions with binary
operator are given below.
– mknode(op,left,right)
– mkleaf(id,entry)
– mkleaf(num,val)

Each function returns a pointer to a newly created node.


26
Constructing Syntax Tree for Expressions-

Example: a-4+c
+
1. p1:=mkleaf(id,entrya);
2. p2:=mkleaf(num,4);
- id
3. p3:=mknode(-,p1,p2)
4. p4:=mkleaf(id,entryc);
to entry for c
5. p5:= mknode(+,p3,p4); num
id 4

• The tree is constructed bottom to entry for a


up.

27
A syntax Directed Definition for Constructing
Syntax Tree
1. It uses underlying productions of the grammar to schedule the calls of
the functions mkleaf and mknode to construct the syntax tree
2. Employment of the synthesized attribute nptr (pointer) for E and T to
keep track of the pointers returned by the function calls.
PRODUCTION SEMANTIC RULE
E  E1 + T E.nptr = mknode(“+”,E1.nptr ,T.nptr)
E  E1 - T E.nptr = mknode(“-”,E1.nptr ,T.nptr)
ET E.nptr = T.nptr
T  (E) T.nptr = E.nptr
T  id T.nptr = mkleaf(id, id.lexval)
T  num T.nptr = mkleaf(num, num.val)

28
Annotated parse tree depicting construction of
syntax tree for the expression a-4+c
E.nptr

E.nptr + T.nptr

E.nptr - T.nptr id
+
T.nptr num
- id
id
Entry for c
id nu
m
Entry for a 29
S-Attributed Definitions
1. Syntax-directed definitions are used to specify syntax-directed translations.

2. To create a translator for an arbitrary syntax-directed definition can be difficult.

3. We would like to evaluate the semantic rules during parsing (i.e. in a single pass, we will parse
and we will also evaluate semantic rules during the parsing).

4. We will look at two sub-classes of the syntax-directed definitions:


– S-Attributed Definitions: only synthesized attributes used in the syntax-directed
definitions.
– All actions occur on the right hand side of the production.
– L-Attributed Definitions: in addition to synthesized attributes, we may also use inherited
attributes in a restricted fashion.

5. To implement S-Attributed Definitions and L-Attributed Definitions we can evaluate semantic


rules in a single pass during the parsing.

6. Implementations of S-attributed Definitions are a little bit easier than implementations of L-


Attributed Definitions 30
Bottom-Up Evaluation of S-Attributed Definitions
• A translator for an S-attributed definition can often be implemented with the
help of an LR parser.
• From an S-attributed definition the parser generator can construct a translator
that evaluates attributes as it parses the input.
• We put the values of the synthesized attributes of the grammar symbols a stack
that has extra fields to hold the values of attributes.
– The stack is implemented by a pair of arrays val & state
– If the ith state symbol is A the val[i] will hold the value of the attribute
associated with the parse tree node corresponding to this A.

31
Bottom-Up Evaluation of S-Attributed Definitions
• We evaluate the values of the attributes during reductions.
A  XYZ A.a=f(X.x,Y.y,Z.z) where all attributes are synthesized.
state val state val
top  Z Z.z
Y Y.y
X X.x  top A A.a
. . . .

• Synthesized attributes are evaluated before each reduction.


• Before XYZ is reduced to A, the value of Z.z is in val[top], that of Y.y in val[top-1]
and that of X.x in val[top-2].
• After reduction top is decremented by 2.
• If a symbol has no attribute the corresponding entry in the array is undefined.

32
Bottom-Up Evaluation of S-Attributed Definitions
Production Semantic Rules
L→En print(val[top-1])
E → E1 + T val[ntop] = val[top-2] + val[top]
E→T
T → T1 * F val[ntop] = val[top-2] * val[top]
T→F
F→(E) val[ntop] = val[top-1]
F → digit

1. At each shift of digit, we also push digit.lexval into val-stack.


2. At all other shifts, we do not put anything into val-stack because other terminals do
not have attributes (but we increment the stack pointer for val-stack).

33
Bottom-Up Evaluation -- Example
• At each shift of digit, we also push digit.lexval into val-stack.
Input state val semantic rule
5+3*4n - -
+3*4n 5 5
+3*4n F 5 F → digit
+3*4n T 5 T→F
+3*4 n E 5 E→T
3*4n E+ 5-
*4 n E+3 5-3
*4n E+F 5-3 F → digit
*4n E+T 5-3 T→F
4n E+T* 5-3-
n E+T*4 5-3-4
n E+T*F 5-3-4 F → digit
n E+T 5-12 T → T1 * F
n E 17 E → E1 + T
En 17- L→En
L 17
34
L-Attributed Definitions
• When translation takes place during parsing, order of evaluation is linked to the order in which
the nodes of a parse tree are created by parsing method.
• A natural order can be obtained by applying the procedure dfvisit to the root of a parse tree.
• We call this evaluation order depth first order.
• L-attributed definition is a class of syntax directed definition whose attributes can always be
evaluated in depth first order( L stands for left since attribute information flows from left to
right).

dfvisit(node n)
{
for each child m of n, from left to right
{
evaluate inherited attributes of m
dfvisit(m)
}
evaluate synthesized attributes of n
}
L-Attributed Definitions
A syntax-directed definition is L-attributed if each inherited attribute of Xj,
where 1≤j≤n, on the right side of A → X1X2...Xn depends only on
1. The attributes of the symbols X1,...,Xj-1 to the left of Xj in the
production
2. The inherited attribute of A

Every S-attributed definition is L-attributed, since the restrictions apply only to


the inherited attributes (not to synthesized attributes).
A Definition which is not L-Attributed
Productions Semantic Rules
A→LM L.in=l(A.i)
M.in=m(L.s)
A.s=f(M.s)

A→QR R.in=r(A.in)
Q.in=q(R.s)
A.s=f(Q.s)

This syntax-directed definition is not L-attributed because the semantic rule Q.in=q(R.s)
violates the restrictions of L-attributed definitions.
• When Q.in must be evaluated before we enter to Q because it is an inherited attribute.
• But the value of Q.in depends on R.s which will be available after we return from R. So,
we are not be able to evaluate the value of Q.in before we enter to Q.
Translation Schemes
• In a syntax-directed definition, we do not say anything about the evaluation times of the
semantic rules (when the semantic rules associated with a production should be
evaluated).
• Translation schemes describe the order and timing of attribute computation.
• A translation scheme is a context-free grammar in which:
– attributes are associated with the grammar symbols and
– semantic actions enclosed between braces {} are inserted within the right sides of
productions.
Each semantic rule can only use the information computed by already executed semantic
rules.
• Ex: A → { ... } X { ... } Y { ... }

Semantic Actions
Translation Schemes for S-attributed Definitions
• useful notation for specifying translation during parsing.
• Can have both synthesized and inherited attributes.
• If our syntax-directed definition is S-attributed, the construction of the corresponding
translation scheme will be simple.
• Each associated semantic rule in a S-attributed syntax-directed definition will be inserted
as a semantic action into the end of the right side of the associated production.

Production Semantic Rule


E → E1 + T E.val = E1.val + T.val a production of a syntax directed
definition

E → E1 + T { E.val = E1.val + T.val } the production of the
corresponding translation scheme
A Translation Scheme Example
• A simple translation scheme that converts infix expressions to the
corresponding postfix expressions.
E→TR
R → + T { print(“+”) } R1
R→ε
T → id { print(id.name) }
a+b+c ab+c+

infix expression postfix expression


A Translation Scheme Example (cont.)

T R

id {print(“a”)} + T {print(“+”)} R

id {print(“b”)} + T {print(“+”)} R

id {print(“c”)} ε
The depth first traversal of the parse tree (executing the semantic actions in that order)
will produce the postfix representation of the infix expression.
Inherited Attributes in Translation Schemes

• If a translation scheme has to contain both synthesized and inherited attributes, we have
to observe the following rules to ensure that the attribute value is available when an
action refers to it.
1. An inherited attribute of a symbol on the right side of a production must be
computed in a semantic action before that symbol.
2. A semantic action must not refer to a synthesized attribute of a symbol to the right
of that semantic action.
3. A synthesized attribute for the non-terminal on the left can only be computed after
all attributes it references have been computed (we normally put this semantic action at
the end of the right side of the production).
• With a L-attributed syntax-directed definition, it is always possible to construct a
corresponding translation scheme which satisfies these three conditions (This may not
be possible for a general syntax-directed translation).
Inherited Attributes in Translation Schemes: Example

S →A1A2 {A1.in=1; A2.in=2}


A →a { print (A.in)}

A1 A2 {A1.in=1; A2.in=2}

a {print (A.in)} a {print (A.in)}


A Translation Scheme with Inherited Attributes

D → T {L.in = T.type } L
T → int { T.type = integer }
T → real { T.type = real }
L → {L1.in = L.in } L1, id {addtype(id.entry,L.in)}
L → id {addtype(id.entry,L.in)}
• This is a translation scheme for an L-attributed definitions
Bottom Up evaluation of Inherited Attributes
• Removing Embedding Semantic Actions
In bottom-up evaluation scheme, the semantic actions are evaluated during reductions.
• During the bottom-up evaluation of S-attributed definitions, we have a parallel stack to
hold synthesized attributes.
• Problem: where are we going to hold inherited attributes?
• A Solution:
– We will convert our grammar to an equivalent grammar to guarantee to the followings.
– All embedding semantic actions in our translation scheme will be moved into the
end of the production rules.
– All inherited attributes will be copied into the synthesized attributes (most of the
time synthesized attributes of new non-terminals).
– Thus we will be evaluate all semantic actions during reductions, and we find a
place to store an inherited attribute.
Removing Embedding Semantic Actions

• To transform our translation scheme into an equivalent translation


scheme:
1. Remove an embedding semantic action Si, put new non-terminal Mi
instead of that semantic action.
2. Put that semantic action Si into the end of a new production rule Mi→ε
for that non-terminal Mi.
3. That semantic action Si will be evaluated when this new production
rule is reduced.
Removing Embedding Semantic Actions

A→ {S1} X1 {S2} X2 ... {Sn} Xn



remove embedding semantic actions
A→ M1 X1 M2 X2 ... Mn Xn
M1→ε {S1}
M2→ε {S2}
.
.
Mn→ε {Sn}
Removing Embedding Semantic Actions
E→TR
R → + T { print(“+”) } R
R→ε
T → id { print(id.name) }

remove embedding semantic actions
E→TR
R→+TMR
R→ε
T → id { print(id.name) }
M → ε { print(“+”)
print( + ) }
Inheriting attributes on parser stack
• A bottom up parser reduces the RHS of a production A→XY by removing X and Y
from the top of the stack and replacing them by A.
• Suppose X has a synthesized attribute X.s which is already in the stack.
• If the inherited attrtibute Y.i is defined by the copy rule X.s=Y.i, then the value of X.s
can where Y.i is called for.
• Copy rule plays an important role in the evaluation of inherited attributes during
bottom up parsing.
Productions Semantic Rules
D→TL
T → int val[ntop]=integer
T → real val[ntop]=real
L → L1, id addtype(val[top],val[top-3])
L → id addtype(val[top],val[top-1])
Module 3 – Semantic Analysis
Syntax directed definitions
• Syntax directed definition is a generalization of context free grammar in
which each grammar symbol has an associated set of attributes.
• The attributes can be a number, type, memory location, return type etc….
• Types of attributes are:
1. Synthesized attribute
2. Inherited attribute

Value
Memory location
Type Type
E.Return
Syntax Directed Translation Scheme
• The Syntax directed translation scheme is a context free
grammar
• It is used to evaluate the order of semantic rules.
• In translation scheme, the semantic rules are embedded within
the right side of the production.
• The position at which the action to be executed is shown by
enclosed between braces.
Synthesized attributes
• Value of synthesized attribute at a node can be computed from the value of attributes at the children of that
node in the parse tree.
• A syntax directed definition that uses synthesized attribute exclusively is said to be S-attribute definition.
• Example: Syntax directed definition of simple desk calculator

Production Semantic rules

L  En Print (E.val)

E  E1+T E.val = E1.val + T.val

ET E.val = T.val

T  T1*F T.val = T1.val * F.val

TF T.val = F.val

F  (E) F.val = E.val

F  digit F.val = digit.lexval


Applications of SDT
• Executing Arithmetic Expression
• Conversion from Infix to Postfix
• Conversion from Infix to Prefix
• Conversion from Binary to Decimal
• Counting No. of Reductions
• Creating Syntax Tree
• Generating Intermediate Code
• Type Checking
• Storing Type information into Symbol Table.
Example: Synthesized attributes
String: 3*5+4n;
Production Semantic rules
L
L  En Print (E.val)
n
E.val=19
E  E1+T E.Val = E1.val + T.val

+ ET E.Val = T.val


E.val=15 T.val=4
T  T1*F T.Val = T1.val * F.val
The process of computing the
T.val=15 F.val=4 attribute values at the node is TF T.Val = F.val
called annotating or
decorating the parse tree F  (E) F.Val = E.val
* digit.lexval=4
T.val=3 F.val=5 F  digit F.Val = digit . lexval

F.val=3 digit.lexval=5
parse tree showing the value
digit.lexval=3 of the attributes at each node
is called Annotated parse tree
Annotated parse tree for 3*5+4n
Exercise
Draw Annotated Parse tree for following:
1. 7+3*2n
2. (3+4)*(5+6)n
Syntax directed definition to translates arithmetic expressions from infix to prefix
notation

Production Semantic rules


LE Print(E.val)
EE+T E.val=’+’ E.val T.val
EE-T E.val=’-‘ E.val T.val
ET E.val= T.val
TT*F T.val=’*’ T.val F.val
TT/F T.val=’/’ T.val F.val
TF T.val= F.val
FF^P F.val=’^’ F.val P.val
FP F.val= P.val
P(E) P.val= E.val
Pdigit P.val=digit.lexval
Inherited attribute
• An inherited value at a node in a parse tree is computed from the value
of attributes at the parent and/or siblings of the node.
Production Semantic rules
D→TL L.in = T.type
T → int T.type = integer
T → real T.type = real
L → L1 , id L1.in = L.in, addtype(id.entry,L.in)
L → id addtype(id.entry,L.in)

Syntax directed definition with inherited attribute L.in

• Symbol T is associated with a synthesized attribute type.


• Symbol L is associated with an inherited attribute in.
Example: Inherited attribute
Example: Pass data types to all identifier
real id1,id2,id3
D
Production Semantic rules
D→TL L.in = T.type T LL.in=real
T.type=real
T → int T.type = integer
real ,
T → real T.type = real id
id3
L1
L.in=real
L → L1 , id L1.in = L.in,
addtype(id.entry,L.in)
,
L → id addtype(id.entry,L.in) L.in=real
L1 id2
id

id
id1

L → Lid
DTL 1 , id
Evaluation order
• A topological sort of a directed acyclic graph is any ordering
𝑚1, 𝑚2, … … … . . , 𝑚𝑘 of the nodes of the graph such that edges go from
nodes earlier in the ordering to later nodes.
• If 𝑚𝑖𝑚𝑗 is an edge from 𝑚𝑖 to 𝑚𝑗 then 𝑚𝑖 appears before 𝑚𝑗 in the
ordering. D

1 T.type=real L.in=real 2

real 3 ,
L.in=real id3 4

,
5 L.in=real id2 6

7 id1
Construction of syntax tree
• Following functions are used to create the nodes of the syntax tree.
1. Mknode (op,left,right): creates an operator node with label op and two fields
containing pointers to left and right.
2. Mkleaf (id, entry): creates an identifier node with label id and a field
containing entry, a pointer to the symbol table entry for the identifier.
3. Mkleaf (num, val): creates a number node with label num and a field
containing val, the value of the number.
Construction of syntax tree for expressions
Example: construct syntax tree
for a-4+c P5 +
P1: mkleaf(id, entry for a);
P2: mkleaf(num, 4);
P3: mknode(‘-‘,p1,p2); P3 - P4 id

P4: mkleaf(id, entry for c);


Entry for c
P5: mknode(‘+’,p3,p4);
P1 id P2 Num 4

Entry for a
Bottom up evaluation of S-attributed definitions
• S-attributed definition is one such class of syntax directed definition with
synthesized attributes only.
• Synthesized attributes can be evaluated using bottom up parser only.
Synthesized attributes on the parser stack
• Consider the production AXYZ and associated semantic action is
A.a=f(X.x, Y.y, Z.z)
State Value State Value
top-2 𝑋 𝑋. 𝑥 top 𝐴 𝐴. 𝑎
top-1 𝑌 𝑌. 𝑦
top 𝑍 𝑍. 𝑧
Before reduction After reduction
Bottom up evaluation of S-attributed definitions
Production Semantic rules Input State Val Production Used
L  En Print (val[top]) 3*5n - -

E  E1+T val[top]=val[top-2] + val[top] *5n 3 3


*5n F 3 Fdigit
ET
*5n T 3 TF
T  T1*F val[top]=val[top-2] * val[top]
5n T* 3
TF n T*5 3,5
F  (E) val[top]=val[top-2] - val[top] n T*F 3,5 Fdigit

F  digit
n T 15 TT1*F
n E 15 ET
Implementation of a desk calculator En 15
with bottom up parser L 15 L  En
Move made by translator
L-Attributed definitions
• A syntax directed definition is L-attributed if each inherited attribute of
𝑋𝑗, 1 <= 𝑗 <= 𝑛, on the right side of 𝐴𝑋1, 𝑋2 … 𝑋𝑛 depends only on:
1. The attributes of the symbols 𝑋1, 𝑋2, … 𝑋j-1 to the left of 𝑋𝑗 in the production and
2. The inherited attribute of A.
Production Semantic Rules
• Example: A LM L.i:=l(A.i)
M.i=m(L.s)
AXYZ A.s=f(M.s)
A QR R.i=r(A.i)
Q.i=q(R.s)
Attributed
NotL-L-Attributed  A.s=f(Q.s)

• Above syntax directed definition is not L-attributed because the inherited


attribute Q.i of the grammar symbol Q depends on the attribute R.s of
the grammar symbol to its right.
Bottom up evaluation of S-attributed definitions
• Translation scheme is a context free grammar in which attributes are
associated with the grammar symbols and semantic actions enclosed
between braces { } are inserted within the right sides of productions.
• Attributes are used to evaluate the expression along the process of
parsing.
• During the process of parsing the evaluation of attribute takes place by
consulting the semantic action enclosed in { }.
• A translation scheme generates the output by executing the semantic
actions in an ordered manner.
• This process uses the depth first traversal.
Example: Translation scheme (Infix to postfix notation)
String: 9-5+2 ETR
E R addop 𝑇 𝑃𝑟𝑖𝑛𝑡 𝑎𝑑𝑑𝑜𝑝. 𝑙𝑒𝑥𝑒𝑚𝑒 R1 | 𝜖
T num 𝑃𝑟𝑖𝑛𝑡 𝑛𝑢𝑚. 𝑣𝑎𝑙
T R

- R
9 {Print(9)} T {Print(-)}

5 {Print(5)} + R
T {Print(+)}

2 {Print(2)} 𝜖

Now, Perform Depth first traversal Postfix=95-2+


Types of errors
Types of Errors

Errors

Compile time Run time

Lexical Syntactic Semantic


Phase error Phase error Phase error
Lexical error
• Lexical errors can be detected during lexical analysis phase.
• Typical lexical phase errors are:
1. Spelling errors
2. Exceeding length of identifier or numeric constants
3. Appearance of illegal characters
• Example:
fi ( )
{
}
• In above code 'fi' cannot be recognized as a misspelling of keyword if rather
lexical analyzer will understand that it is an identifier and will return it as
valid identifier.
• Thus misspelling causes errors in token formation.
Syntax error
• Syntax error appear during syntax analysis phase of compiler.
• Typical syntax phase errors are:
1. Errors in structure
2. Missing operators
3. Unbalanced parenthesis
• The parser demands for tokens from lexical analyzer and if the tokens do
not satisfy the grammatical rules of programming language then the
syntactical errors get raised.
• Example:
printf(“Hello World !!!”) Error: Semicolon missing
Semantic error
• Semantic error detected during semantic analysis phase.
• Typical semantic phase errors are:
1. Incompatible types of operands
2. Undeclared variable
3. Not matching of actual argument with formal argument
• Example:
id1=id2+id3*60 (Note: id1, id2, id3 are real)
(Directly we can not perform multiplication due to incompatible types of variables)
Module 4
Intermediate Code Generation
Outline
 Variants of Syntax Trees
 Three-address code
 Types and declarations
 Translation of expressions
 Type checking
 Control flow
 Backpatching
Introduction
 Intermediate code is the interface between front end
and back end in a compiler
 Ideally the details of source language are confined to
the front end and the details of target machines to the
back end (a m*n model)
 In this chapter we study intermediate representations,
static type checking and intermediate code generation

Static Intermediate Code


Parser
Checker Code Generator Generator
Front end Back end
Variants of syntax trees
 It is sometimes beneficial to crate a DAG instead of
tree for Expressions.
 This way we can easily show the common sub-
expressions and then use that knowledge during code
generation
 Example: a+a*(b-c)+(b-c)*d

+ *

*
d
a -

b c
SDD for creating DAG’s
Production Semantic Rules
1) E -> E1+T E.node= new Node(‘+’, E1.node,T.node)
2) E -> E1-T E.node= new Node(‘-’, E1.node,T.node)
3) E -> T E.node = T.node
4) T -> (E) T.node = E.node
5) T -> id T.node = new Leaf(id, id.entry)
6) T -> num T.node = new Leaf(num, num.val)
Example:
1) p1=Leaf(id, entry-a) 8) p8=Leaf(id,entry-b)=p3
2) P2=Leaf(id, entry-a)=p1 9) p9=Leaf(id,entry-c)=p4
3) p3=Leaf(id, entry-b) 10) p10=Node(‘-’,p3,p4)=p5
4) p4=Leaf(id, entry-c) 11) p11=Leaf(id,entry-d)
5) p5=Node(‘-’,p3,p4) 12) p12=Node(‘*’,p5,p11)
6) p6=Node(‘*’,p1,p5) 13) p13=Node(‘+’,p7,p12)
7) p7=Node(‘+’,p1,p6)
Value-number method for
constructing DAG’s
= id To entry for i
num 10
+ + 1 2
3 1 3
i 10

 Algorithm
 Search the array for a node M with label op, left child l
and right child r
 If there is such a node, return the value number M
 If not create in the array a new node N with label op, left
child l, and right child r and return its value
 We may use a hash table
Three address code
 In a three address code there is at most one operator at
the right side of an instruction
 Example:

+
t1 = b – c
+ * t2 = a * t1
t3 = a + t2
* t4 = t1 * d
d
t5 = t3 + t4
a -

b c
Forms of three address
instructions
 x = y op z
 x = op y
 x=y
 goto L
 if x goto L and ifFalse x goto L
 if x relop y goto L
 Procedure calls using:
 param x
 call p,n
 y = call p,n
 x = y[i] and x[i] = y
 x = &y and x = *y and *x =y
Example
 do i = i+1; while (a[i] < v);

L: t1 = i + 1 100: t1 = i + 1
i = t1 101: i = t1
t2 = i * 8 102: t2 = i * 8
t3 = a[t2] 103: t3 = a[t2]
if t3 < v goto L 104: if t3 < v goto 100

Symbolic labels Position numbers


Data structures for three
address codes
 Quadruples
 Has four fields: op, arg1, arg2 and result
 Triples
 Temporaries are not used and instead references to
instructions are made
 Indirect triples
 In addition to triples we use a list of pointers to triples
Three address code
Example t1 = minus c
t2 = b * t1
 b * minus c + b * minus c t3 = minus c
t4 = b * t3
t5 = t2 + t4
a = t5

Quadruples Triples Indirect Triples


op arg1 arg2 result op arg1 arg2 op op arg1 arg2
minus c t1 0 minus c 35 (0) 0 minus c
* b t1 t2 1 * b (0) 36 (1) 1 * b (0)
minus c t3 2 minus c 37 (2) 2 minus c
* b t3 t4 3 * b (2) b (2)
38 (3) 3 *
+ t2 t4 t5 4 + (1) (3) 39 (4) 4 + (1) (3)
= t5 a 5 = a (4) 40 (5) 5 = a (4)
Type Expressions
Example: int[2][3]
array(2,array(3,integer))

 A basic type is a type expression


 A type name is a type expression
 A type expression can be formed by applying the array type
constructor to a number and a type expression.
 A record is a data structure with named field
 A type expression can be formed by using the type constructor g for
function types
 If s and t are type expressions, then their Cartesian product s*t is a
type expression
 Type expressions may contain variables whose values are type
expressions
Three-Address Code IR
Where We Are
Source Lexical Analysis
Code
Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Co de Generation

Optimization Machine
Code
Three-Address Code
● Or “TAC”
● High-level assembly where each operation has
at most three operands.

Uses explicit runtime stack for function calls.
Uses vtables for dynamic dispatch.
Sample TAC Code
int x;
int y;
int x2 = x * x;
int y2 = y * y;
int r2 = x2 + y2;
Sample TAC Code
int x; int y;
int x2 = x * x;
int y2 = y * y; x2 = x * x;
int r2 = x2 + y2; y2 = y * y;
r2 = x2 + y2;
Sample TAC Code
int a;
int b;
int c;
int d;

a = b + c + d
;
b = a * a + b * b;
Sample TAC Code
int a;
int b; _t0 = b +c;
int c; _t1 =_t0 +d;
int d; _t1=a * a;
_t2 = b * b;
a = b+ c + d; b = _t1 +_t2;
b = a* a + b* b;
Temporary Variables
● The “three” in “three-address code” refers to
the number of operands in any instruction.
Evaluating an expression with more than three
● subexpressions requires the introduction of
temporary variables.
Sample TAC Code

int a; int b;

a = 5 + 2 * b;
Sample TAC Code

int a;
int b; _t0 = 5;
_t1 =2* b;
a = 5+ 2 * b; a = _t0+ _t1;
Sample TAC Code
TAC allows for
instructions with two
operands.

int a; int b;
_t0 = 5;
a = 5 + 2 * b; _t1 = 2 * b;
a = _t0 + _t1;
Simple TAC
Instructions

Variable assignment allows assignments of the form

● var = constant;
● var1 = var2;
● var1 = var2 op var3;
● var1 = constant op var2;
● var1 = var2 op constant;
● var = constant1 op constant2;

● Permitted operators are +, -, *, /, %. How


● would you compile y = -x; ?
Simple TAC
Instructions

Variable assignment allows assignments of the form

● var = constant;
● var1 = var2;
● var1 = var2 op var3;
● var1 = constant op var2;
● var1 = var2 op constant;
● var = constant1 op constant2; Permitted
● operators are +, -, *, /, %. How would you compile y

= -x; ?
y = 0 – x; y = -1 * x;
One More with
bools
int x; int y;
bool b1; bool
b2; bool b3;

b1 = x + x < y
b2 = x + x == y
b3 = x + x > y
One More with
bools
_t0 = x + x;
int x; int y; _t1 = y;
bool b1; bool b1 = _t0 < _t1;
b2; bool b3;
_t2 = x + x;
b1 = x + x < y _t3 = y;
b2 = x + x == y b2 = _t2 == _t3;
b3 = x + x > y
_t4 = x + x;
_t5 = y;
b3 = _t5 < _t4;
TAC with bools

Boolean variables are represented as integers
that have zero or nonzero values.
In addition to the arithmetic operator, TAC
● supports <, ==, ||, and &&.
How might you compile b = (x <= y) ?
TAC with bools

Boolean variables are represented as integers
that have zero or nonzero values.
In addition to the arithmetic operator, TAC
● supports <, ==, ||, and &&.
How might you compile b = (x <= y) ?

_t0 = x < y;
_t1 = x == y; b = _t0 || _t1;
Control Flow
Statements
int x; int
y; int z;

if (x < y)
z = x;
else
z = y;

z = z * z;
Control Flow
Statements
int x;
int y; _t0 = x < y;
int z; IfZ _t0 Goto _L0;
z = x;
if (x < y) Goto _L1;
z = x; _L0:
else z = y;
z = y; _L1:
z = z * z;
z = z * z;
Control Flow
Statements
int x;
int y; _t0 = x < y;
int z; IfZ _t0 Goto _L0;
z = x;
if (x < y) Goto _L1;
z = x; _L0:
else z = y;
z = y; _L1:
z = z * z;
z = z * z;
Control Flow
Statements
int x;
int y; _t0 = x < y;
int z; IfZ _t0 Goto _L0;
z = x;
if (x < y) Goto _L1;
z = x; _L0:
else z = y;
z = y; _L1:
z = z * z;
z = z * z;
Labels
● TAC allows for named labels indicating
particular points in the code that can be
jumped to.
● There are two control flow instructions:
● Goto label;
● IfZ value Goto label;
● Note that IfZ is always paired with Goto.
Control Flow
Statements
int x; int y;

while (x < y) {
x = x * 2;
}

y = x;
Control Flow
Statements
int x; int y;
_L0:
while (x < y) { _t0 = x < y;
x = x * 2; IfZ _t0 Goto _L1;
}
x = x * 2;
Goto _L0;
y = x; _L1:
y = x;
A Complete Decaf
Program
void main() { int x, y;
int m2 = x * x + y * y;

while (m2 > 5) { m2 = m2 –


x;
}
}
A Complete Decaf
Program
main:
void main() { int x, y;
BeginFunc 24;
int m2 = x * x + y * y;
_t0 = x * x;
while (m2 > 5) { m2 = m2 – _t1 = y * y;
x; m2 = _t0 + _t1;
} _L0:
} _t2 = 5 < m2;
IfZ _t2 Goto _L1;
m2 = m2 – x;
Goto _L0;
_L1:
EndFunc;
A Complete Decaf
Program
main:
void main() { int x, y;
BeginFunc 24;
int m2 = x * x + y * y;
_t0 = x * x;
while (m2 > 5) { m2 = m2 – _t1 = y * y;
x; m2 = _t0 + _t1;
} _L0:
} _t2 = 5 < m2;
IfZ _t2 Goto _L1; m2
= m2 – x;
Goto _L0;
_L1:
EndFunc;
A Complete Decaf
Program
main:
void main() { int x, y;
BeginFunc 24;
int m2 = x * x + y * y;
_t0 = x * x;
while (m2 > 5) { m2 = m2 – _t1 = y * y;
x; m2 = _t0 + _t1;
} _L0:
} _t2 = 5 < m2;
IfZ _t2 Goto _L1; m2
= m2 – x;
Goto _L0;
_L1:
EndFunc;
Control Flow
boolean expressions are often used to:
 Alter the flow of control.
 Compute logical values.
Short-Circuit Code


Flow-of-Control Statements
Syntax-directed definition
Generating three-address code for booleans
translation of a simple if-statement


Backpatching
 Previous codes for Boolean expressions insert symbolic labels for
jumps
 It therefore needs a separate pass to set them to appropriate addresses
 We can use a technique named backpatching to avoid this
 We assume we save instructions into an array and labels will be indices
in the array
 For nonterminal B we use two attributes B.truelist and B.falselist
together with following functions:
 makelist(i): create a new list containing only I, an index into the array
of instructions
 Merge(p1,p2): concatenates the lists pointed by p1 and p2 and returns a
pointer to the concatenated list
 Backpatch(p,i): inserts i as the target label for each of the instruction
on the list pointed to by p
Backpatching for Boolean Expressions


Backpatching for Boolean Expressions
 Annotated parse tree for x < 100 || x > 200 && x ! = y
Flow-of-Control Statements
Translation of a switch-statement

You might also like