0% found this document useful (0 votes)
26 views145 pages

Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by

The document discusses the semantic analysis phase of compilers, which goes beyond syntax analysis to check for program correctness, including type checking and variable declaration. It introduces concepts such as synthesized and inherited attributes, dependency graphs, and the use of attribute grammars to manage semantic rules. Additionally, it covers important compiler data structures like symbol tables and abstract syntax trees, which are essential for representing and processing programming languages.

Uploaded by

Pankaj Nirmal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views145 pages

Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by

The document discusses the semantic analysis phase of compilers, which goes beyond syntax analysis to check for program correctness, including type checking and variable declaration. It introduces concepts such as synthesized and inherited attributes, dependency graphs, and the use of attribute grammars to manage semantic rules. Additionally, it covers important compiler data structures like symbol tables and abstract syntax trees, which are essential for representing and processing programming languages.

Uploaded by

Pankaj Nirmal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 145

Acknowledgements

The slides for this lecture are a


modified versions of the offering by
Prof. Sanjeev K Aggarwal
Semantic Analysis

• Check semantics
• Error reporting
• Disambiguate
overloaded operators
• Type coercion
• Static checking
– Type checking
– Control flow checking
– Uniqueness checking
– Name checks

2
Beyond syntax analysis
• Parser cannot catch all the program errors

• There is a level of correctness that is


deeper than syntax analysis

• Some language features cannot be modeled


using context free grammar formalism

– Whether an identifier has been declared


before use

– This language is not context free


3
Beyond syntax …
• Example 1
string x; int y;
y=x+3
the use of x is a type error

• int a, b;
a=b+c
c is not declared

• An identifier may refer to different variables in


different parts of the program

• An identifier may be usable in one part of the


program but not another
4
Compiler needs to know?
• Whether a variable has been declared?

• Are there variables which have not been declared?

• What is the type of the variable?

• Whether a variable is a scalar, an array, or a function?

• What declaration of the variable does each reference use?

• If an expression is type consistent?

• If an array use like A[i,j,k] is consistent with the declaration?


Does it have three dimensions?

5
• How many arguments does a function take?

• Are all invocations of a function consistent with


the declaration?

• If an operator/function is overloaded, which


function is being invoked?

• Inheritance relationship

• Classes not multiply defined

• Methods in a class are not multiply defined

• The exact requirements depend upon the language

6
How to answer these questions?
• These issues are part of semantic analysis phase

• Answers to these questions depend upon values


like type information, number of parameters etc.

• Compiler will have to do some computation to


arrive at answers

• The information required by computations may be


non local in some cases

7
How to … ?
• Use formal methods
– Context sensitive grammars
– Extended attribute grammars

• Use ad-hoc techniques


– Symbol table
– Ad-hoc code

• Something in between !!!


– Use attributes
– Do analysis along with parsing
– Use code for attribute value computation
– However, code is developed in a systematic way
8
What???
• The nodes in a parse tree (non-
terminals in a CFG) are annotated
with information
• Each production has some associated
code that dictates the computation
on these attributes
Why attributes ?
• For lexical analysis and syntax analysis
formal techniques were used.

• However, we still had code in form of


actions along with regular expressions and
context free grammar

• The attribute grammar formalism is


important
– However, it is very difficult to implement
– But makes many points clear
– Makes “ad-hoc” code more organized
– Helps in doing non local computations
10
Attribute Grammar Framework
• Generalization of CFG where each grammar symbol
has an associated set of attributes

• Values of attributes are computed by semantic


rules

• Two notations for associating semantic rules with


productions

– Syntax directed definition


• high level specifications
• hides implementation details
• explicit order of evaluation is not specified

– Translation schemes
• indicate order in which semantic rules are to be evaluated
• allow some implementation details to be shown 11
• Conceptually both:
– parse input token stream
– build parse tree
– traverse the parse tree to evaluate the
semantic rules at the parse tree nodes

• Evaluation may:
– generate code
– save information in the symbol table
– issue error messages
– perform any other activity
12
Example
• Consider a grammar for signed binary numbers

Number � sign list


sign � + | -
list � list bit | bit
bit � 0 | 1

• Build attribute grammar that annotates Number with the


value it represents

• Associate attributes with grammar symbols

symbol attributes
Number value
sign negative
list position, value
bit position, value
13
production Attribute rule

number � sign list list.position � 0


if sign.negative
then number.value � - list.value
else number.value � list.value

sign � + sign.negative � false


sign � - sign.negative � true

list � bit bit.position � list.position


list.value � bit.value
list0 � list1 bit list1.position � list0.position + 1
bit.position � list0.position
list0.value � list1.value + bit.value

bit � 0 bit.value � 0
bit � 1 bit.value � 2bit.position
14
Evaluating Attributes
• In which order should the attributes
be computed?
Parse tree and the dependence graph

Number Val=-5

sign neg=true Pos=0 list Val=5

Pos=1 list Val=4 Pos=0 bit Val=1

Pos=1 bit Val=0


Pos=2 list Val=4

Pos=2 bit Val=4

- 1 0 1
16
Dependence Graph

• If an attribute b depends on an
attribute c then the semantic rule
for b must be evaluated after the
semantic rule for c

• The dependencies among the nodes


can be depicted by a directed graph
called dependency graph

17
Algorithm to construct
dependency graph
for each node n in the parse tree do
for each attribute a of the grammar symbol do
construct a node in the dependency graph
for a

for each node n in the parse tree do


for each semantic rule b = f (c 1, c2 , ..., ck) do
{ associated with production at n }
for i = 1 to k do
construct an edge from c i to b
18
Example
• Suppose A.a = f(X.x , Y.y) is a semantic rule
for A → X Y
A A.a

X Y X.x Y.y

• If production A → X Y has the semantic


rule X.x = g(A.a, Y.y)

A A.a

X Y X.x Y.y
19
Example
• Whenever following production is used in a parse
tree
E→ E1 + E2 E.val = E1.val + E2.val
we create a dependency graph

E.val

E1.val E2.val

20
Example

D→TL L.in = T.type

T → real T.type = real

T → int T.type = int

L → L1, id Ll.in = L.in; addtype(id.entry, L.in)

L → id addtype (id.entry,L.in)
Example
• dependency graph for real id1, id2, id3
• put a dummy node for a semantic rule that
consists of a procedure call
D

type=real in=real
T L
addtype(z,real)
Type_lexeme
real L in=real , id.zz

addtype(y,real)
in=real L , id.y y

addtype(x,real) x id.x
22
Evaluation Order
• Any topological sort of dependency graph
gives a valid order in which semantic rules
must be evaluated

a4 = real
a5 = a4
addtype(id3.entry, a5)
a7 = a5
addtype(id2.entry, a7 )
a9 := a7
addtype(id1.entry, a9 )
23
Attributes …
• attributes fall into two classes:
synthesized and inherited

• value of a synthesized attribute is


computed from the values of its children
nodes

• value of an inherited attribute is computed


from the sibling and parent nodes

24
Attributes …
• Each grammar production A → α has associated
with it a set of semantic rules of the form

b = f (c1, c2, ..., ck)

where f is a function, and either

– b is a synthesized attribute of A
OR
– b is an inherited attribute of one of the grammar
symbols on the right

• attribute b depends on attributes c 1, c2, ..., ck


25
Synthesized Attributes

• value of a synthesized attribute is


computed from the values of its children
nodes

26
Syntax Directed Definitions for a
desk calculator program
L→En Print (E.val)
E→E+T E.val = E.val + T.val
E→T E.val = T.val
T→T*F T.val = T.val * F.val
T→F T.val = F.val
F → (E) F.val = E.val
F → digit F.val = digit.lexval

• terminals are assumed to have only synthesized


attribute values of which are supplied by lexical
analyzer

• start symbol does not have any inherited attribute27


Parse tree for 3 * 4 + 5 n

L Print 17

Val=17E n

Val=12E + T Val=5

Val=12T F Val=5

Val=3 T * F Val=4 id

Val=3 F id
28
id
Inherited Attributes
• an inherited attribute is one whose value is defined in terms
of attributes at the parent and/or siblings

• Used for finding out the context in which it appears

• possible to use only S-attributes but more natural to use


inherited attributes

29
Example

D→TL L.in = T.type

T → real T.type = real

T → int T.type = int

L → L1, id Ll.in = L.in; addtype(id.entry, L.in)

L → id addtype (id.entry,L.in)
Parse tree for
real x, y, z
D

type=real in=real
T L
addtype(z,real)

real L in=real , z
addtype(y,real)
in=real L , y

addtype(x,real) x

31
production Attribute rule

number � sign list list.position � 0


if sign.negative
then number.value � - list.value
else number.value � list.value

sign � + sign.negative � false


sign � - sign.negative � true

list � bit bit.position � list.position


list.value � bit.value
list0 � list1 bit list1.position � list0.position + 1
bit.position � list0.position
list0.value � list1.value + bit.value

bit � 0 bit.value � 0
bit � 1 bit.value � 2bit.position
32
Spot the synthesized and inherited attributes
Important Compiler Data-
Structures
• Symbol-table
• Intermediate Representations
– Abstract Syntax Tree
– Three Address Code
Symbol Table
• Stores information for subsequent phases

• Interface to the symbol table


– Insert(s,t): save lexeme s and token t and return pointer
– Lookup(s): return index of entry for lexeme s or 0 if s is
not found

Implementation of symbol table


• Fixed amount of space to store lexemes. Not
advisable as it waste space.

• Store lexemes in a separate array. Each lexeme is


separated by eos. Symbol table has pointers to
lexemes.
Fixed space for lexemes Other attributes

Usually 32 bytes

Usually 4 bytes Other attributes

lexeme1 eos lexeme2 eos lexeme3 ……


How to handle keywords?
• Consider token DIV and MOD with lexemes
div and mod.

• Initialize symbol table with insert( “div” ,


DIV ) and insert( “mod” , MOD).

• Any subsequent lookup returns a nonzero


value, therefore, cannot be used as an
identifier.
Abstract Syntax Tree
• Condensed form of parse tree,
• useful for representing language constructs.
• The production S → if B then s1 else s2
may appear as

if-then-else

B s1 s2

37
Abstract Syntax tree …
• Chain of single productions may be collapsed, and
operators move to the parent nodes

E +

E + T * id3

T F id1 id2

T * F id3

F id2
38
id1
Constructing Abstract Syntax
tree for expression
• Each node can be represented as a record

• operators: one field for operator, remaining fields


ptrs to operands
mknode( op,left,right )

• identifier: one field with label id and another ptr


to symbol table
mkleaf(id,entry)

• number: one field with label num and another to


keep the value of the number
mkleaf(num,val)
39
C prototype
struct node {
char op;
struct node* left; struct node* right;
};

struct node *mknode(char op, struct node* left,


struct node* right)
{
struct node *ptr = (struct node *)
malloc(sizeof(struct node));
ptr->op = op; ptr-> left = left; ptr->right = right;
return ptr;
}
Example
the following
sequence of function
calls creates a parse
P5
tree for a- 4 + c
+
P3 P4
P1 = mkleaf(id, entry.a)
P2 = mkleaf(num, 4) - id
P1
P3 = mknode(-, P1, P2) P2 entry of c

P4 = mkleaf(id, entry.c) id num 4

P5 = mknode(+, P3, P4) entry of a

41
A syntax directed definition
for constructing syntax tree
E → E1 + T E.ptr = mknode(+, E1.ptr, T.ptr)
E →T E.ptr = T.ptr
T → T1 * F T.ptr := mknode(*, T1.ptr, F.ptr)
T →F T.ptr := F.ptr
F → (E) F.ptr := E.ptr
F → id F.ptr := mkleaf(id, entry.id)
F → num F.ptr := mkleaf(num,val)

42
DAG for Expressions
Expression a + a * ( b – c ) + ( b - c ) * d
make a leaf or node if not present,
otherwise return pointer to the existing node
P13
P1 = makeleaf(id,a) +
P2 = makeleaf(id,a) P7
P3 = makeleaf(id,b)
P4 = makeleaf(id,c) +
P5 = makenode(-,P3,P4)
P6 = makenode(*,P2,P5) P6 P12
P7 = makenode(+,P1,P6)
* *
P8 = makeleaf(id,b) P5 P10
P9 = makeleaf(id,c) P1 P2 P11
P10 = makenode(-,P8,P9) a - d
P11 = makeleaf(id,d)
P3 P8
P12 = makenode(*,P10,P11) P4 P9
P13 = makenode(+,P7,P12)
b c
43
Three address code
• It is a sequence of statements of the
general form X := Y op Z where

– X, Y or Z are names, constants or


compiler generated temporaries

– op stands for any operator such as a


fixed- or floating-point arithmetic
operator, or a logical operator
44
Three address code …
• Only one operator on the right hand side is allowed

• Source expression like x + y * z might be translated into


t1 := y * z
t2 := x + t1

where t1 and t2 are compiler generated temporary names

• Unraveling of complicated arithmetic expressions and of


control flow makes 3-address code desirable for code
generation and optimization

• The use of names for intermediate values allows 3-address


code to be easily rearranged

• Three address code is a linearized representation of a


syntax tree where explicit names correspond to the interior
nodes of the graph 45
Three address instructions
• Assignment • Function
– x = y op z – param x
– x = op y – call p,n
– x=y – return y

• Jump • Pointer
– goto L – x = &y
– if x relop y goto L – x = *y
– *x = y
• Indexed assignment
– x = y[i]
– x[i] = y
46
float a[20][10];
use a[i][j+2]
HIR MIR LIR
t1� j+2 r1� [fp-4]
t1�a[i,j+2] t2� i*20 r2� r1+2
t3� t1+t2 r3� [fp-8]
t4� 4*t3 r4� r3*20
t5� addr a r5� r4+r2
t6� t4+t5 r6� 4*r5
t7�*t6 r7�fp-216
f1� [r7+r6]

47
Some thoughts...
• Do we really need to build the whole
parse tree?
• Can the computation on the
attributes be done on-the-fly (along
with parsing)?
• Can we do this at least for some
restricted SDDs?
A thought...
• For constructing the parse tree, the parser
traverses the nodes in the parse tree in some
order...
• Let us add the sematic actions to the parse tree,
so that the parser executes these actions when it
visits them...
• When will the above constitute a correct
translation scheme?
• When these actions, if traversed in the same
order that the parser uses to traverse the tree,
forms a valid topological ordering on the
dependencies.
New Questions...
• What is the order in which the parser
traverses the parse tree?
– See next slide

• How should a translation scheme be


represented so that semantic actions
appear in the parse tree?
– Add the semantic actions as new symbols
to the production rules
Order in which parser
"creates" the parse tree nodes
• When translation takes place during parsing,
order of evaluation is linked to the order in
which nodes are created
• What is the order in which nodes are created in
LL? LR?
• A natural order in both top-down and bottom-up
parsing is depth first-order
– LL parsing expands the leftmost non-terminal first
• (A->BC): B is expanded before C
– LR parsing reduces the leftmost non-terminal first (thus
mimicking the rightmost derivation in reverse)
• (A->BC): the non-terminal B is generated before C
Translation schemes (SDT)
• A CFG where semantic actions occur within
the rhs of production

• A translation scheme to map infix to


postfix

E→ T R
{print(addop)}
R→ addop T R |ε
T→ num {print(num)}

parse tree for 9 – 5 + 2


52
Parse tree for 9-5+2
E

T R

num print(num) addop T Print(addop) R


(9) (-)

num print(num) addop T print(addop) R


(5) (+)

num print(num) Є
(2)

53
• Assume actions are terminal symbols

• Perform depth first order traversal to


obtain 9 5 – 2 +

• When designing translation scheme, ensure


attribute value is available when referred
to

• In case of synthesized attribute it is


trivial (why ?)
54
S-attributed Grammar: where
should the actions go?
E → E1 + T E.ptr = mknode(+, E1.ptr, T.ptr)
E →T E.ptr = T.ptr
T → T1 * F T.ptr := mknode(*, T1.ptr, F.ptr)
T →F T.ptr := F.ptr
F → (E) F.ptr := E.ptr
F → id F.ptr := mkleaf(id, entry.id)
F → num F.ptr := mkleaf(num,val)

55
S-attributed definition
• a syntax directed definition that uses only
synthesized attributes is said to be an S-
attributed definition
• A parse tree for an S-attributed definition
can be annotated by evaluating semantic rules
for attributes
• A translation scheme for an S-attributed
SDD can be obtained simply by appending the
semantic actions to the right-hand-side of
each production rule

56
• In case of both inherited and synthesized attributes

• An inherited attribute for a symbol on rhs of a production


must be computed in an action before that symbol

S → A1 A2 {A1.in = 1,A2.in = A1.in}


A→a {print(A.in)}
S

A1 A2 A1.in=1
A2.in=A1.in

a print(A1.in) a print(A2.in)

depth first order traversal gives error undefined

• A synthesized attribute for non terminal on the lhs can be


computed after all attributes it references, have been
computed. The action normally should be placed at the end
of rhs 57
Example: Translation scheme
for EQN
S→B B.pts = 10
S.ht = B.ht

B → B1 B2 B1.pts = B.pts
B2.pts = B.pts
B.ht = max(B1.ht,B2.ht)

B → B1 sub B2 B1.pts = B.pts;


B2.pts = shrink(B.pts)
B.ht = disp(B1.ht,B2.ht)

B → text B.ht = text.h * B.pts

58
after putting actions in the right place

S → {B.pts = 10} B
{S.ht = B.ht}

B → {B1.pts = B.pts} B1
{B2.pts = B.pts} B2
{B.ht = max(B1.ht,B2.ht)}

B → {B1.pts = B.pts} B1 sub


{B2.pts = shrink(B.pts)} B2
{B.ht = disp(B1.ht,B2.ht)}

B → text {B.ht = text.h * B.pts}


59
How to allocate memory for
attributes
• The lifetime of an attribute
is dictated by the time a respective
symbol is present in the parser stack
• Why not allocate them on the parser
stack?
– saves on expensive malloc/free
operations
– allows for optimizations. Ex:
• E->F {E.ptr = F.ptr}
Bottom-up evaluation of S-
attributed definitions
• Can be evaluated while parsing

• Whenever reduction is made, value of new


synthesized attribute is computed from the
attributes on the stack

• Extend stack to hold the values also


state value
ptr stack stack

• The current top of stack is indicated by ptr top


61
• Suppose semantic rule A.a = f(X.x, Y.y, Z.z) is
associated with production A → XYZ

• Before reducing XYZ to A, value of Z is in val(top),


value of Y is in val(top-1) and value of X is in
val(top-2)

• If symbol has no attribute then the entry is


undefined

• After the reduction, top is decremented by 2 and


state covering A is put in val(top)
62
Example: desk calculator
L → En print(val(top))
E→E+T val(ntop) = val(top-2) + val(top)
E→T val(ntop) = val(top)
T→T*F val(ntop) = val(top-2) * val(top)
T→F val(ntop) = val(top)
F → (E) val(ntop) = val(top-1)
F → digit val(ntop) = val(top)

Before reduction ntop = top - r +1


After code reduction top = ntop
63
Example: desk calculator
L → En print(val(top))
E→E+T val(ntop) = val(top-2) + val(top)
E→T
T→T*F val(ntop) = val(top-2) * val(top)
T→F
F → (E) val(ntop) = val(top-1)
F → digit

Before reduction ntop = top - r +1


After code reduction top = ntop
64
INPUT STATE Val PRODUCTION

3*5+4n
*5+4n digit 3
*5+4n F 3 F → digit
*5+4n T 3 T→F
5+4n T* 3–
+4n T*digit 3–5
+4n T*F 3–5 F → digit
+4n T 15 T→T*F
+4n E 15 E→ T
4n E+ 15 –
n E+digit 15 – 4
n E+F 15 – 4 F → digit
n E+T 15 – 4 T→F
n E 19 E → E +T 65
L-attributed definitions

• A natural order in both top-down and


bottom-up parsing is depth first-order

• L-attributed definition
– where attributes can be evaluated in depth-
first order
– can have both synthesized and inherited
attributes
66
L attributed definitions …
• A syntax directed definition is L-attributed if each
inherited attribute of Xj (1 ≤ j ≤ n) at the right hand side of
A→X1 X2…Xn depends only on

– Attributes of symbols X1 X2…Xj-1 and

– Inherited attribute of A

• Consider translation scheme

A → LM L.i = f1(A.i)
M.i = f2(L.s)
As = f3(M.s)

A → QR Ri = f4(A.i)
Qi = f5(R.s)
A.s = f6(Q.s)
67
Left recursion
• A top down parser with production
A → A α may loop forever

• From the grammar A → A α | β


left recursion may be eliminated by
transforming the grammar to

A→β R
R→αR|ε
68
Parse tree corresponding Parse tree corresponding
to a left recursive grammar to the modified grammar

A A

A R

A R

β α α α β α Є

Both the trees generate string βα*


69
Removal of left recursion
Suppose we have translation scheme:

A → A1 Y {A = g(A1,Y)}
A→X {A = f(X)}

After removal of left recursion it becomes

A→X {R.i = f(X)}


R {A.s =R.s}
R→Y {R1.i = g(Y,R)}
R1 {R.s = R1.s}
R→ε {R.s = R.i}
70
Top down Translation
Use predictive parsing to implement L-
attributed definitions

E → E1 + T E.val := E1.val + T.val

E → E1 – T E.val := E1.val – T.val

E→T E.val := T.val

T → (E) T.val := E.val

T → num T.val := num.lexval


71
Eliminate left recursion
E→ T {R.i = T.val}
R {E.val = R.s}

R→ +
T {R1.i = R.i + T.val}
R1 {R.s = R1.s}

R→ -
T {R1.i =R.i – T.val}
R1 {R.s = R1.s}

R→ ε {R.s = R.i}

T→ (E) {T.val = E.val}

T→ num {T.val = num.lexval} 72


E Parse tree for 9-5+2

T Ri=T.val R E.val=R.s

T.val=9

Num - T R1.i=R.i-T.val R Rs=R1.s


(9)

T.val=5

Num + T R1.i=R.i+T.val R Rs=R1.s


(5)

T.val=2

Num R.s=R.i
Є
(2) 73
E E.val=R.s=6

T Val=9 R.i=T.val=9 R Rs=R1.s=6

Num - T Val=5 R.i=R.i-Tval=4 R Rs=R1.s=6


(9)

Num + T Val=2 R.i=R.i+Tval=6 R R.s=R.i=6


(5)

Num Є
(2) 74
When is a semantic action executed?
B -> X {a} Y

• [Top-Down parsing]
– if Y is a non-terminal: perform 'a' just before we
attempt to expand this occurance of Y (i.e. just before
we pop-off Y to expand it)
– if Y is a terminal: perform 'a' just before we check for Y
in the input (i.e. just before we pop-off Y from the stack
matching it with the input)

• [ Bottom- Up parsing]
– perform action'a' as soon as this occurance of X appears
on top of the parsing stack (i .e. some handle is reduced
to X)
Bottom up evaluation of L-attributed
SDD: when to apply the semantic
actions embedded inside rules
A-> BC { B.i = f(A.i) }
A-> BD { B.i = g(A.i) }
• When the viable prefix on the stack is "B", it is a possibility
to develop into one of the handles "BC" or "BD"; the
respective semantic action needs to be applied
• When "B" is on the stack, we don't know what will the
incoming terminals... so, we cannot
S.begin apply the action now ...
= newlabel
• Defer it till we get the handle "BC"
• But, we don't know what rule does "A" get reduced to:
– X -> ...A... { A.i = h(X) }
– Y -> ...A... { A.i = k(Y) }
• Defer the decision further till we have reduced to X or Y?
• Reasoning this way, we may have to wait till the entire input is
seen --- same is building the whole parse tree before semantic
analysis
• Note that no problem with actions apearing at the end as a
unique handle is identified by then
76
Two main problems
• "conflict" on semantic actions
– Consider the state in the parser DFA
• S -> . {S.val = "+"} aA
• S -> . {S.val = "-"} bB
– The parser could have simply done a shift, but
now there is a conflict on which action to
perform
– Solution: use markers to "delegate" the problem
to the parser (Caveat: may not always work as an
LR grammar may not remain LR after introducing
markers)
• No slot on the value stack for the parent
– In the above example, question is where to store "S.val" as
"S" will be pushed on the stack only after "aA" or "aB"
reduces to "S".
– Solution: Introduce a marker symbol M in the parent rule of
"S" and alias this slot in the value table to store the inherited
attribute of S
Bottom up evaluation of
inherited attributes
• Remove embedded actions from translation
scheme

• Make transformation so that embedded


actions occur only at the ends of their
productions

• Replace each action by a distinct marker


non terminal M and attach action at end of
M→ε

78
Therefore,

E�TR
R � + T {print (+)} R
R � - T {print (-)} R
R�Є
T � num {print(num.val)}

transforms to

E→TR
R→+TMR
R→-TNR
R→Є
T → num {print(num.val)}
M→Є {print(+)}
N→Є {print(-)}

79
Markers
• Markers are terminals that
– derive only Є
– appear only once among all bodies of all
productions

Theorem: When a grammar is LL, marker


nonterminals can be added at any position
in the body, and the resulting grammar will
still be LR. (Why?: Homework -- see book)
Inheriting attribute on parser
stacks
• bottom up parser reduces rhs of A → XY by removing XY
from stack and putting A on the stack

• synthesized attributes of Xs can be inherited by Y by using


the copy rule Y.i=X.s

Example :take string real p,q,r


D�T {L.in = T.type}
L

T � int {T.type = integer}


T �real {T.type = real}

L→ {L1.in =L.in}
L1 ,
id {addtype(id.entry,L in)}
81
L → id {addtype(id.entry,L in)}
State stack INPUT PRODUCTION
real p,q,r
real p,q,r
T p,q,r T → real
Tp ,q,r
TL ,q,r L → id
TL, q,r
TL,q ,r
TL ,r L → L,id
TL, r
TL,r -
TL - L → L,id
D - D →TL

Every time a string is reduced to L, T.val is just


below it on the stack
82
Example …
• Every tine a reduction to L is made value of T type is just
below it

• Use the fact that T.val (type information) is at a known


place in the stack

• When production L → id is applied, id.entry is at the top of


the stack and T.type is just below it, therefore,

addtype(id.entry, L.in) ⇔ addtype(val[top], val[top-1])

• Similarly when production L → L1 , id is applied id.entry is at


the top of the stack and T.type is three places below it,
therefore,

addtype(id.entry, L.in) ⇔ addtype(val[top],val[top-3])

83
Example …
Therefore, the translation scheme
becomes

D→TL

T → int val[top] =integer

T → real val[top] =real

L → L,id addtype(val[top], val[top-3])

L → id addtype(val[top], val[top-1])
84
Simulating the evaluation of
inherited attributes
• The scheme works only if grammar allows position of
attribute to be predicted.

• Consider the grammar

S → aAC C i = As
S → bABC C i = As
C→c Cs = g(Ci)

• C inherits As
• there may or may not be a B between A and C on the stack
when reduction by rule C�c takes place

• When reduction by C → c is performed the value of C i is


either in [top-1] or [top-2]
85
Simulating the evaluation …
• Insert a marker M just before C in the
second rule and change rules to

S → aAC Ci = As
S → bABMC Mi = As; Ci = Ms
C→c Cs = g(Ci)
M→ε Ms = M i
• When production M → ε is applied we have
M s = M i = As

• Therefore value of Ci is always at [top-1]


86
Simulating the evaluation …
• Markers can also be used to simulate rules
that are not copy rules

S → aAC Ci = f(A.s)

• using a marker

S → aANC Ni= As; Ci = Ns


N→ε Ns = f(Ni)
87
General algorithm
• Algorithm: Bottom up parsing and translation with inherited
attributes

• Input: L attributed definitions


Inpu

• Output: A bottom up parser


Output

• Assume every non terminal has one inherited attribute and


every grammar symbol has a synthesized attribute

• For every production A → X1… Xn introduce n markers


M1….Mn and replace the production by
A � M1 X1 ….. Mn Xn
M1 … Mn � Є

• Synthesized attribute Xj,s goes into the value entry of Xj


• Inherited attribute Xj,i goes into the value entry of Mj
88
Algorithm …
• If the reduction is to a marker Mj and the
marker belongs to a production

A → M1 X1… MnXn then

Ai is in position top-2j+2
X1.i is in position top-2j+3
X1.s is in position top-2j+4

• If reduction is to a non terminal A by


production A → M1 X1… MnXn
then compute As and push on the stack

89
Space for attributes at
compile time
• Lifetime of an attribute begins when it is
first computed

• Lifetime of an attribute ends when all the


attributes depending on it, have been
computed

• Space can be conserved by assigning space


for an attribute only during its lifetime
90
Example
• Consider following definition

D →T L L.in := T.type
T → real T.type := real
T → int T.type := int
L → L1,I L1.in :=L.in; I.in=L.in
L→I I.in = L.in
I → I1[num] I1.in=array(numeral, I.in)
I → id addtype(id.entry,I.in)
91
Consider string int x[3], y[5]
its parse tree and dependence graph
D

T 1 2 L

int 6 L , 3 I

7 I I [ num ]
5
4 5

I [ num ] id
9 83 y

id
92
x
Resource requirement

1 2 3 4 5 6 7 8 9

Allocate resources using life time information

R1 R1 R2 R3 R2 R1 R1 R2 R1

Allocate resources using life time and copy information

R1 =R1 =R1 R2 R2 =R1 =R1 R2 R1


93
Space for attributes at compiler
Construction time
• Attributes can be held on a single stack. However,
lot of attributes are copies of other attributes

• For a rule like A →B C stack grows up to a height


of five (assuming each symbol has one inherited
and one synthesized attribute)

• Just before reduction by the rule A →B C the


stack contains I(A) I(B) S(B) I (C) S(C)

• After reduction the stack contains I(A) S(A)

94
Example
• Consider rule B →B1 B2 with inherited attribute
ps and synthesized attribute ht

• The parse tree for this string and a snapshot of


the stack at each node appears as

B B.ht
B.ps
B.ps

B2.ht
B2.ps
B1.ht
B1.ps
B1.ps
B1.ht B2.ps B.ps
B.ps
B1.ps B1.ht
B1 B.ps B1.ps B2
B.ps 95
Example …
• However, if different stacks are maintained for
the inherited and synthesized attributes, the
stacks will normally be smaller

B.ps B B.ps B.ht

B2.ht
B.ps B1.ht
B.ps
B1 B.ps B1.ht B.ps B1.ht B2
96
Type system
• A type is a set of values

• Certain operations are legal for values of each


type

• A language’s type system specifies which


operations are valid for a type

• The aim of type checking is to ensure that


operations are used on the variable/expressions of
the correct types

97
Type system …
• Languages can be divided into three
categories with respect to the type:

– “untyped”
• No type checking needs to be done
• Assembly languages

– Statically typed
• All type checking is done at compile time
• Algol class of languages
• Also, called strongly typed

– Dynamically typed
• Type checking is done at run time
• Mostly functional languages like Lisp, Scheme etc.
98
Type systems …
• Static typing
– Catches most common programming errors at compile
time
– Avoids runtime overhead
– May be restrictive in some situations
– Rapid prototyping may be difficult

• Most code is written using static types languages

• In fact, most people insist that code be strongly


type checked at compile time even if language is
not strongly typed (use of Lint for C code, code
compliance checkers)
99
Type System
• A type system is a collection of rules for assigning
type expressions to various parts of a program
(often shown as an logical inference rules)

• Different type systems may be used by different


compilers for the same language

• In Pascal type of an array includes the index set.


Therefore, a function with an array parameter can
only be applied to arrays with that index set

• Many Pascal compilers allow index set to be left


unspecified when an array is passed as a parameter
100
Utility of a (static) type system
• Type-checking is performed at compile
time to prevent something "bad" from
happening at run-time!

• For example,
– A pointer should not be added to another
pointer
– An uninitialized variable must not be used
– A null-pointer should not be dereferenced
– A closed file is not written into
Properties of a type-system
• Soundness
Soundness: A type-correct program
cannot violate the property

• Completeness
Completeness: All correct programs
will be declared type-correct (or, if a
program is found to be not type-
correct, it will surely violate the
property)
Typing Rules
• If both the operands of arithmetic operators +, -, x are
integers then the result is of type integer

Γ ≻ e1 : int Γ ≻ e2 : int
Γ ≻ e1 + e2 : int

• The result of unary & operator is a pointer to the object


referred to by the operand.

– If the type of operand is X the type of result is pointer to X


– We may need to construct such types from the basic-types

Γ ≻ x : ptr (τ ) Γ ≻ y : τ
Γ ≻ x = & y : stmt
103
Type system and type checking

Please Note
Note: Read ˫ for ≻
• Basic types: integer, char, float, boolean

• Sub range type


type: 1 … 100

• Enumerated type: (violet, indigo, red)

• Constructed type: array, record, pointers, functions

104
Type expression
• Type of a language construct is denoted by a type
expression

• It is either a basic type, or,


it is formed by applying operators called type
constructor to other type expressions

• A type constructor applied to a type expression is


a type expression

• A basic type is type expression. There are two


other special basic types:
– type error: error during type checking (failed typing
derivation)
– void: no type value (shown as α in the typing rules) 105
Type Constructors
• Array: if T is a type expression then array(I, T) is
a type expression denoting the type of an array
with elements of type T and index set I

var A: array [1 .. 10] of integer

A has type expression array(1 .. 10, integer)

• Product: if T1 and T2 are type expressions then


their Cartesian product T1 x T2 is a type
expression

106
Type constructors …
• Records: it applies to a tuple formed from field names and
field types. Consider the declaration
type row = record
addr : integer;
lexeme : array [1 .. 15] of char
end;

var table: array [1 .. 10] of row;

The type row has type expression

record ((addr x integer) x (lexeme x array(1 .. 15, char)))

and type expression of table is array(1 .. 10, row)

107
Type constructors …
• Pointer: if T is a type expression then pointer( T )
is a type expression denoting type pointer to an
object of type T
• Function: function maps domain set to range set.
It is denoted by type expression D → R

– For example mod has type expression int x int → int

– function f( a, b: char ) : ^ integer; is denoted by

char x char � pointer( integer )

108
Specifications of a type
checker
• Consider a language which consists
of a sequence of declarations (D)
followed by a sequence of
statements (P)

S→D;P
D → D ; D | id : T
T → char | integer | array [ num] of T | ^ T
E → literal | num | E mod E | E [E] | E ^
P → id:=E | if E then P | while E P | P; P
109
Specifications of a type
checker …
• A program generated by this grammar is

key : integer;
key=key mod 1999

• Assume following:
– basic types are char, int, type-error
– all arrays start at 1
– array[256] of char has type expression
array(1 .. 256, char)

110
Type System
• A program is a set of declarations (D) followed by a
set of statements (P)

• Check if the program type-checks under an empty


context (symbol-table)

. ≻ {D; P} : α
Type System
Process the declarations to build the
symbol table (typing assumptions)

Γ, x : τ ≻ {D; P} : α
Γ ≻ {x : τ ; D; P} : α
Type System
Γ ≻ stm : α Γ ≻ P : α
Γ ≻ {stm; P} : α
Γ ≻ x :τ Γ ≻ e :τ
[ stm ≡ {x := e}]
Γ ≻ {x := e} : α
x :τ ∈ Γ
Γ ≻ x :τ
Γ ≻ e1 : int Γ ≻ e2 : int
Γ ≻ e1 mod e2 : int
Rules for Symbol Table entry
D � id : T addtype(id.entry, T.type)
T � char T.type = char
T � integer T.type = int
T � ^T1 T.type = pointer(T1.type)
T � array [ num ] of T1 T.type = array(1..num, T1.type)

Type checking of functions


E � E1 ( E2 ) E. type = if E2.type == s
and E1.type == s → t
then t
else type-error
114
Type checking for expressions
E → literal E.type = char
E → num E.type = integer
E → id E.type = lookup(id.entry)
E → E1 mod E2 E.type = if E1.type == integer and
E2.type==integer
then integer
else type_error
E → E1[E2] E.type = if E2.type==integer and
E1.type==array(s,t)
then t
else type_error
E → E1^ E.type = if E1.type==pointer(t)
then t
else type_error
115
Type checking for statements
• Statements typically do not have values. Special basic type
void can be assigned to them.
S → id := E S.Type = if id.type == E.type
then void
else type_error

S → if E then S1 S.Type = if E.type == boolean


then S1.type
else type_error

S → while E do S1 S.Type = if E.type == boolean


then S1.type
else type_error

S → S1 ; S2 S.Type = if S1.type == void


and S2.type == void
then void
else type_error 116
Example
• {int x; x = x mod 2;}

• Interesting observation: A typing


derivation follows the shape of the
abstract syntax tree
Homework
• For the following questions, provide
both the changed typing rule and the
required modification in the
implementation (SDD)
– How to handle implicit type conversions?
– Is addition of pointers prevented?
Equivalence of Type expression
• Structural equivalence: Two type
expressions are equivalent if
• either these are same basic types
• or these are formed by applying same
constructor to equivalent types
• Name equivalence:: types can be given
names
• Two type expressions are equivalent if they
have the same name
119
Function to test structural equivalence

function sequiv(s, t) : boolean;


If s and t are same basic types
then return true
elseif s == array(s1, s2) and t == array(t1, t2)
then return sequiv(s1, t1) && sequiv(s2, t2)
elseif s == s1 x s2 and t == t1 x t2
then return sequiv(s1, t1) && sequiv(s2, t2)
elseif s == pointer(s1) and t == pointer(t1)
then return sequiv(s1, t1)
elseif s == s1�s2 and t == t1�t2
then return sequiv(s1,t1) && sequiv(s2,t2)
else return false;
120
Efficient implementation
• Bit vectors can be used to represent type
expressions. Refer to: A Tour Through the
Portable C Compiler: S. C. Johnson, 1979.

Basic type Encoding Type encoding


constructor
Boolean 0000
pointer 01
Char 0001
array 10
Integer 0010
real 0011 function 11

121
Efficient implementation …
Type expression encoding
char 000000 0001
function( char ) 000011 0001
pointer( function( char ) ) 000111 0001
array( pointer( function( char) ) ) 100111 0001

This representation saves space and keeps track


of constructors

122
Checking name equivalence
• Consider following declarations
type link = ^cell;
var next, last : link;
p, q, r : ^cell;

• Do the variables next, last, p, q and r have identical types ?

• Type expressions have names and names appear in type


expressions.

• Name equivalence views each type name as a distinct type

123
Name equivalence …
variable type expression
next link
last link
p pointer(cell)
q pointer(cell)
r pointer(cell)

• Under name equivalence next = last and p = q = r ,


however, next ≠ p

• Under structural equivalence all the variables are


of the same type

124
Name equivalence …
• Some compilers allow type expressions to have names.
• However, some compilers assign implicit type names to each
declared identifier in the list of variables.
• Consider
type link = ^ cell;
var next : link;
last : link;
p : ^ cell;
q : ^ cell;
r : ^ cell;

• In this case type expression of p, q and r are given different


names and therefore, those are not of the same type

125
Name equivalence …
The code is similar to
type link = ^ cell
np = ^ cell;
nq = ^ cell;
nr = ^ cell;
var next : link;
last : link;
p : np;
q : nq;
r : nr;

126
Cycles in representation of
types
• Data structures like linked lists are defined recursively

• Implemented through structures which contain pointers to


structures

• Consider following code

type link = ^ cell;


cell = record
info : integer;
next : link
end;

• The type name cell is defined in terms of link and link is


defined in terms of cell (recursive definitions)
127
Cycles in representation of …
• Recursively defined type names can be substituted
by definitions

• However, it introduces cycles into the type graph

record record

X X X X

info integer next pointer info integer next pointer

cell
128
Cycles in representation of …
• C uses structural equivalence for all types except
records

• It uses the acyclic structure of the type graph

• Type names must be declared before they are


used
– However, allow pointers to undeclared record types
– All potential cycles are due to pointers to records

• Name of a record is part of its type


– Testing for structural equivalence stops when a record
constructor is reached

129
Type conversion
• Consider expression like x + i where x is of type
real and i is of type integer

• Internal representations of integers and reals are


different in a computer
– different machine instructions are used for operations
on integers and reals

• The compiler has to convert both the operands to


the same type

• Language definition specifies what conversions are


necessary.
130
Type conversion …
• Usually conversion is to the type of the left hand
side

• Type checker is used to insert conversion


operations:
x + i � x real+ inttoreal(i)

• Type conversion is called implicit/coercion if done


by compiler.

• It is limited to the situations where no


information is lost

• Conversions are explicit if programmer has to


write something to cause conversion 131
Type checking for expressions
E → num E.type = int
E → num.num E.type = real
E → id E.type = lookup( id.entry )

E → E1 op E2 E.type = if E1.type == int && E2.type == int


then int
elseif E1.type == int && E2.type == real
then real
elseif E1.type == real && E2.type == int
then real
elseif E1.type == real && E2.type==real
then real
132
Overloaded functions and
operators
• Overloaded symbol has different meaning
depending upon the context

• In maths + is overloaded; used for integer, real,


complex, matrices

• In Ada () is overloaded; used for array, function


call, type conversion

• Overloading is resolved when a unique meaning for


an occurrence of a symbol is determined

133
Overloaded functions and
operators …
• In Ada standard interpretation of * is
multiplication

• However, it may be overloaded by saying

*” (i, j: integer) return complex;


function “*
*” (i, j: complex) return complex;
function “*

• Possible type expression for “ * ” are

integer x integer → integer


integer x integer → complex
complex x complex → complex

134
Overloaded function resolution
• Suppose only possible type for 2, 3 and 5
is integer and Z is a complex variable
– then 3*5 is either integer or complex depending
upon the context
– in 2*(3*5)
3*5 is integer because 2 is integer
– in Z*(3*5)
3*5 is complex because Z is complex
135
Type resolution
• Try all possible types of each overloaded function
(possible but brute force method!)

• Keep track of all possible types

• Discard invalid possibilities

• At the end, check if there is a single unique type

• Overloading can be resolved in two passes:


– Bottom up: compute set of all possible types for each
expression
– Top down: narrow set of possible types based on what
could be used in an expression

136
Determining set of possible
types
E’ � E E’.types = E.types
E � id E.types = lookup(id)
E � E1(E2) E.types = { t | there exists an s in E2.types
and s�t is in E1.types}

E {i,c}

{i} E * E {i}
{ixi�i
ixi�c
cxc�c}
{i} 3 5 {i}
137
Narrowing the set of possible
types
• Ada requires a complete expression to have
a unique type

• Given a unique type from the context we


can narrow down the type choices for each
expression

• If this process does not result in a unique


type for each sub expression then a type
error is declared for the expression

138
Narrowing the set of …
E’ � E E’.types = E.types
E.unique = if E’.types=={t} then t
else type_error

E � id E.types = lookup(id)

E � E1(E2) E.types = { t | there exists an s in E2.types


and s�t is in E1.types}
t = E.unique
S = {s | sЄE2.types and (s�t)ЄE1.types}
E2.unique = if S=={s} then s else type_error
E1.unique = if S=={s} then s�t else type_error

139
Is the grammar L-attributed?
E’ � E E’.types = E.types
E.unique = if E’.types=={t} then t
else type_error

E � id E.types = lookup(id)

E � E1(E2) E.types = { t | there exists an s in E2.types


and s�t is in E1.types}
t = E.unique
S = {s | sЄE2.types and (s�t)ЄE1.types}
E2.unique = if S=={s} then s else type_error
E1.unique = if S=={s} then s�t else type_error

140
Polymorphic functions
• A function can be invoked with arguments of
different types

• Built in operators for indexing arrays, applying


functions, and manipulating pointers are usually
polymorphic

• Extend type expressions to include expressions


with type variables

• Facilitate the implementation of algorithms that


manipulate data structures (regardless of types of
elements)
– Determine length of the list without knowing types of
the elements

141
Polymorphic functions …
• Strongly typed languages can make programming very
tedious

• Consider identity function written in a language like Pascal


function identity (x: integer): integer;

• This function is the identity on integers


identity: int � int

• In Pascal types must be explicitly declared

• If we want to write identity function on char then we must


write
function identity (x: char): char;

• This is the same code; only types have changed. However, in


Pascal a new identity function must be written for each type
142
Type variables
• Variables can be used in type expressions to represent
unknown types

• Important use: check consistent use of an identifier in


a language that does not require identifiers to be
declared

• An inconsistent use is reported as an error

• If the variable is always used as of the same type then


the use is consistent and has lead to type inference

• Type inference: determine the type of a


variable/language construct from the way it is used
– Infer type of a function from its body
143
• Consider
function deref(p);
begin
return p^
end;

• When the first line of the code is seen nothing is known


about type of p
– Represent it by a type variable

• Operator ^ takes pointer to an object and returns the


object

• Therefore, p must be pointer to an object of unknown type α


– If type of p is represented by β then β=pointer(α)
– Expression p^ has type α

• Type expression for function deref is


for any type α pointer(α) � α

• For identity function for any type α α � α


144
Assignment: Extend the scheme
which has a rule number � sign list . List
replacing number � sign list (DUE 5 days from today)
number � sign list list.position � 0
if sign.negative
then number.value � - list.value
else number.value � list.value

sign � + sign.negative � false


sign � - sign.negative � true

list � bit bit.position � list.position


list.value � bit.value
list0 � list1 bit list1.position � list0.position + 1
bit.position � list0.position
list0.value � list1.value + bit.value

bit � 0 bit.value � 0
bit � 1 bit.value � 2bit.position
145

You might also like