Compiler Unit3
Compiler Unit3
Ms. C.B.Thaokar
Assistant Professor
Department of Information Technology
RCOEM, Nagpur
Structure of Compiler
Source
Program Tokens Syntactic
Semantic
Scanner Parser
(Character StructureRoutines
Stream)
Intermediate
Representation
Ms. C.B.Thaokar 4
Two types of attributes associated with grammar
• Synthesized Attributes
If attribute value at a parse tree node is determined by the
attribute values at the child node or from constant.
eg. E->E1 + T E.val = E1.val + T.val
• Inherited Attributes
This are those initial value at a node in the parse tree determined
in terms of values of the parent or sibling of that node or nodes
own attribute value.
eg. D ->T L L.in = T.type
Ms. C.B.Thaokar 5
S attributed definition:
A syntax directed definition that uses synthesized attributes
exclusively is said to be an S-attributed definition.
Production Semantic rules
E->E1 + T E.val = E1.val + T.val
E->T E.val = T.val
T->T1 * F T.val = T1.val * F.val
T->F T.val = F.val
F-> id F.val = num.lexval
W=2+2*3
Ms. C.B.Thaokar 6
Example S attributed definition
W=2+2*3
Ms. C.B.Thaokar 7
Inherited attributed definition:
Production Semantic rules
D ->T L L.in = T.type
T->int T.type = integer
T->real T.type = real
L->L1, id L1.inh = L.inh , addtype(id.entry, L.inh)
L->id addtype(id.entry, L.inh)
w= real id1, id2, id3
Ms. C.B.Thaokar 8
Example Inherited attributed definition:
w= real id1, id2, id3 Production Semantic rules
D ->T L L.in = T.type
Ms. C.B.Thaokar 9
L-attributed definitions:
• A syntax directed definition is L-attributed if each inherited
attribute of Xj, 1<=j<=n, on the right side of
A->X1 X2…Xn depends only on
– attributes of the symbols X1, X2, …, Xj-1.
–the inherited attributes of A.
•L stands for Left since information appears to flow from left to
right in the compilation process.
•Example:
A->LM { L.i=A.i; M.i =L.s ; A.s = M.s }
A->QR { R.i = A.i; Q.i = R.s; A.s = Q.s}
Ms. C.B.Thaokar 10
• Given a syntax directed definition, how to build a
translator?
– For general definitions, to evaluate the semantic rules
correctly, we need to follow the dependence of the
attributes (defined by the semantic rules).
• Build a dependency graph for the parsing tree.
Topologically sort the graph, then evaluate the rules
accordingly.
• Example: real id, id, id
– For some special definitions, we can perform
translation while parsing
• e.g. bottom-up evaluation of S-attributed
definitions.
• Most L-attributed definitions also works.
Ms. C.B.Thaokar 11
• Syntax directed translation scheme:
– a context-free grammar in which attributes are
associated with grammar symbols and the semantic
actions are enclosed between {} and are inserted
within the right side of productions to indicate the
order in which translation takes place.
– Example:
E->T R
R-> + T { print(‘+’) } R
R -> T { print(‘-’) } R | e
T->num { print (num.val) }
Ms. C.B.Thaokar 12
– S-attributed definitions can directly translated into a
translation scheme by placing the semantic actions at the
end of each productions.
• Perfect for bottom up parsing (LR parsing)
Ms. C.B.Thaokar 13
Example
E->T R
R-> + T { print(‘+’) } R
R -> T { print(‘-’) } R | Ɛ
T->num { print (num.val) }
String = 9 + 4
E
T R
num + T R
9 num Ɛ
4 Ms. C.B.Thaokar
14
Production Semantic Rules
Ms. C.B.Thaokar 16
Intermediate Code Generation
• If the compiler directly translates source code into the machine
code without generating intermediate code then a full native
compiler is required for each new machine.
• The intermediate code keeps the analysis portion same for all
the compilers that's why it doesn't need a full compiler for
every unique machine.
• Intermediate code generator receives input from its
predecessor phase and semantic analyzer phase. It takes input
in the form of an annotated syntax tree.
• Using the intermediate code, the second pass of the compiler
synthesis phase is changed according to the target machine.
Ms. C.B.Thaokar 17
Intermediate Code Generation
ICG can be represented in following ways
• Postfix Notation
• Syntax Tree
• Three Address Code
Ms. C.B.Thaokar 18
SDTS of Postfix Notation
• Postfix notation is the useful form of intermediate code if the
given language is expressions.
• Postfix notation is a linear representation of a syntax tree.
Eg. D + Y DY+
Ms. C.B.Thaokar 19
Syntax Tree
• In the parse tree, most of the leaf nodes are single child to their
parent nodes.
• In the syntax tree, we can eliminate this extra information.
• Syntax tree is a variant of parse tree. In the syntax tree, interior
nodes are operators and leaves are operands.
• Syntax tree is usually used to represent a program in a tree structure.
Ms. C.B.Thaokar 20
SDTS of Syntax Tree
E -> E1 + T { E.ptr = mknode( ‘+’ ,E1.ptr,T.ptr) }
E -> T { E.ptr = T.ptr }
T -> T1 * F { T.ptr = mknode(‘*’, T1.ptr, F.ptr) }
T -> F {T.ptr = F.ptr }
F -> id { F.ptr = mkleaf (id.place) }
mknode(operator, Left, right ) - Creates an node.
mkleaf (identifier, entry) - Creates identifier node which is pointer to
symbol table entry.
String: id + id * id
(0) uminus b - t1
(1) * t1 C t2
(2) + t2 d t3
(3) = t3 - a
Ms. C.B.Thaokar 23
Three Address Code
• Triples: The triples have three fields to implement the three
address code. The field of triples contains the name of the
operator, the first source operand and the second source operand.
• In triples, the results of respective sub-expressions are denoted by
the position of expression.
Eg: a = -b * c + d
TAC : t1 = -b t2 = t1 * c t3 = t2 + d a = t3
(0) uminus b -
(1) * (0) c
(2) + (1) d
(3) = (2) -
Ms. C.B.Thaokar 24
Three Address Code
• Indirect Triples: This representation makes use of pointer to the
listing of all references to computations which is made separately
and stored. Its similar in utility as compared to quadruple
representation.
Eg: a = -b * c + d
TAC : t1 = -b t2 = t1 * c t3 = t2 + d a = t3
Ms. C.B.Thaokar 26
SDTS of Three Address Code for Arithmetic
expression
E -> E1 + T { t1 = gentemp () ;
gencode( ‘+’ , E1.place,T.place);
E.place = t1; }
E -> T { E.place = T.place ; }
T -> T1 * F { t2 = gentemp () ;
gencode( ‘*’ ,T1.place, F.place);
T.place = t2 ; }
Ms. C.B.Thaokar 27
Data structure reqd. for TAC
Implementation ( quadruple)
• Makelist(quad. No.) / Makelist(i) :
This creates a list with the quad. No. as the only element in a
list . It returns the pointer to the list so created.
• Merge(P1, P2) / Merge (list1 , list2) :
This concatenates the items in P1, P2 and returns to
concatenated list .
• Backpatch ( list, label ) / Backpatch( p, i):
This fills the GOTO target labels in the list with the label.
For eg. Truelist = {100, 103} Falselist = { 101 , 104}
backpatch( Truelist, L1)
Ms. C.B.Thaokar 28
SDTS for Boolean Expression Relational
Operator
(Short Circuit Code/ Jumping code)
E -> E1 relop E2 Relop.val = ‘<‘
{ Relop.val = ‘>‘
Relop.val = ‘<=‘
E.true = mklist(nextquad)
Relop.val = ‘>=‘
E.false = mklist(nextquad + 1) Relop.val = ‘==‘
gencode( if E1.place relop.val E2.place goto ---) Relop.val = ‘!=‘
gencode(goto ---)
}
Eg : Write TAC for a < b
=> 100 : if a < b goto 102
101: goto E.True = 100
E.False = 101
Ms. C.B.Thaokar 29
SDTS for Boolean Operator AND
(Short Circuit Code/ Jumping code)
E -> E1 AND M E2
{
backpatch( E1.true, M.quad);
E.true = E2.true;
E.false = merge( E1.false , E2.false)
}
M -> ɛ { M.quad = next.quad ; }
Eg : Write TAC for a < b and c > d
=> 100 : if a < b goto 102
101: goto
102: if c >d goto 105
103: goto E1.True = 100
E2.True = 102
E.True =102
E1.False=101
E2.False= 103
E.False = 101, 103
Ms. C.B.Thaokar 30
SDTS for Boolean Operator OR
E -> E1 OR M E2
{
backpatch( E1.false, M.quad);
E.false = E2.false;
E.true = merge( E1.true , E2.true );
}
M -> ɛ { M.quad = next.quad ; }
Eg : Write TAC for a < b OR c > d
=> 100 : if a < b goto 102
101: goto E1.True = 100
E2.True = 102
102: if c >d goto 105
E.True =100, 102
103: goto E1.False=101
E2.False= 103
E.False = 103
Ms. C.B.Thaokar 31
SDTS for Boolean Operator NOT
E -> NOT E1
{
E.false = E1. true;
E.true = E1. false;
}
E.True = 101
E.False = 100
Ms. C.B.Thaokar 32
SDTS for IF THEN
S -> If E then MS1
{
backpatch( E.true ,M.quad);
S.next = merge( E.false, S1.next)
}
M -> ɛ { M.quad = next.quad ; }
Eg : Write TAC for if a < b then x = x +1
=> 100 : if a < b goto 102
E.True= 100
101: goto E.False= 101
102: t = x +1
S.Next = 101
103: x = t
Ms. C.B.Thaokar 33
SDTS for IF THEN ELSE
S -> If E then M1 S1 N else M2 S2 {
backpatch( E.true , M1.quad);
backpatch( E.false, M2.quad);
S.next = merge( S1.next, S2.next, N.next) }
M1 -> ɛ { M1.quad = next.quad ; }
M2 -> ɛ { M2.quad = next.quad ; }
N -> ɛ { N.next = mklist(nextquad);
gencode ( goto --------) ; }
Ms. C.B.Thaokar 35
SDTS for FOR stmt
S -> for ( E1; M1 E2 ; M2 E3 ) M3 S1 {
backpatch( E2.true , M3.quad);
backpatch( M3.next, M1.quad);
backpatch( S1.next, M2.quad);
gencode( goto ( M2.quad));
S.next = E2.false; }
M1 -> ɛ { M1.quad = next.quad ; }
M2 -> ɛ { M2.quad = next.quad ; }
M3 -> ɛ { M3.next = mklist(nextquad);
gencode ( goto --------) ;
M3.quad = nextquad;
}
Ms. C.B.Thaokar 36
FOR stmt Example
Eg :Write TAC for
For (i=1;i<5 ; i++)
x=y+2
=> 100 : i=1
101: if i<5 goto 106
102: goto
103: t1 = i +1
104: x = t1
105: goto 101 S.next = 102
106: t2 = y +2
107: x =t2
108: goto 103
Ms. C.B.Thaokar 37
Array Reference
• How to refer Array?
1D – a[i] =2 ;
In this 2 is assigned to a specific location pointed by a[i] where a
is base and i is index.
Ms. C.B.Thaokar 38
Array Reference contd.
• How to refer Array?
2D – a[i][j] =2 ;
In this 2 is assigned to a specific location pointed by a[i][j]
where a is base and i and j are row and column index.
Row major matrix representation
Loc of a[i][j] = base address of a + offset
= base address of a + [( i – lb1 )(ub2-lb2+1) + (j –lb2)] * w
= base + ( -ub2- 1) * w + [( i * ub2) + j ] * w
Constant Variable
where
lb1 and lb2 = lower bound of row and column =1
ub1 and ub2 = upper bound of row and column , assumption W =1
* Eg : a[2][2] = [1000 + ( -2 -1) * 1] + [ (2 * 2) + 2] * 1
= 997 + 6
= 1003 Ms. C.B.Thaokar 39
SDTS for Array Reference
Production Translation rule
1. A→L=E If L.offset = null then
Gencode(L.value = E.value)
else
Gencode( L.value [ L.offset] = E.value)
2 E → E1 + E2 E.value = newtemp()
Gencode(E.value ‘=’ E1.value ‘+’ E2.value)
3 E → E1 * E2 E.value = newtemp()
Gencode(E.value ‘=’ E1.value ‘*’ E2.value)
4 E→L If L.offset = null then
E.value = L.value
else
E.value = newtemp()
Gencode (E.value = L.value [ L.offset ])
End Ms. C.B.Thaokar 40
Array Reference
Production Translation rule
5 L → Alist] L.value = newtemp()
L.offset = newtemp()
Gencode (L.value = Alist. array - C
Gencode (L.offset = Alist. Value * width(Alist.array))
6 L → id L.value = id.value
L.offset = null
7 Alist→ Alist1, E t = newtemp()
m = Alist1.ndim + 1
Gencode (t = Alist1.value * limit(Alist1.array, m))
Gencode (t = t + E.value)
Alist. array = Alist1.array
Alist.value = t
Alist.ndim = m
8 Alist → id [ E Alist.array = id.value
Alist.value = E.value
Alist.ndim = 1
Ms. C.B.Thaokar 41
Array Reference Example
Eg: Write a TAC for given code with bpw=w= 4 , dim= 20
sum =0
for ( i =0; i<=20; i++)
{ sum = sum + a[i] + b[i] }
Ms. C.B.Thaokar 42
Array Reference Annotated Parse tree
Example
X = A[ y,z]
Ms. C.B.Thaokar 43
SDTS for Switch stmt
Ms. C.B.Thaokar 44
TAC Switch stmt
Code to evaluate E into t
goto test
L1: code for S1
goto next
L2: code for S2
goto next
…
Ln-1 : code for Sn-1
goto next
Ln: code for Sn
goto next
Ld: code for default
goto next
test: if t = V1 goto L1
…
if t = Vn-1 goto Ln-1
goto Ln
goto Ld
next:
Ms. C.B.Thaokar 45
Questions
• Write SDTS to count number of operators in the given input
expression.
• Write TAC for given code
i =0; j=0;
while ( i < =5){
sum[i][j] = a[i ] + 5;
i++; j ++;
}
• Why S attributed definitions are L attributed definitions
• Write TAC for switch case example. ( Assume suitable code)
Ms. C.B.Thaokar 46