Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
• Check semantics
• Error reporting
• Disambiguate
overloaded operators
• Type coercion
• Static checking
– Type checking
– Control flow checking
– Uniqueness checking
– Name checks
2
Beyond syntax analysis
• Parser cannot catch all the program errors
• int a, b;
a=b+c
c is not declared
5
• How many arguments does a function take?
• Inheritance relationship
6
How to answer these questions?
• These issues are part of semantic analysis phase
7
How to … ?
• Use formal methods
– Context sensitive grammars
– Extended attribute grammars
– Translation schemes
• indicate order in which semantic rules are to be evaluated
• allow some implementation details to be shown 11
• Conceptually both:
– parse input token stream
– build parse tree
– traverse the parse tree to evaluate the
semantic rules at the parse tree nodes
• Evaluation may:
– generate code
– save information in the symbol table
– issue error messages
– perform any other activity
12
Example
• Consider a grammar for signed binary numbers
symbol attributes
Number value
sign negative
list position, value
bit position, value
13
production Attribute rule
bit � 0 bit.value � 0
bit � 1 bit.value � 2bit.position
14
Evaluating Attributes
• In which order should the attributes
be computed?
Parse tree and the dependence graph
Number Val=-5
- 1 0 1
16
Dependence Graph
• If an attribute b depends on an
attribute c then the semantic rule
for b must be evaluated after the
semantic rule for c
17
Algorithm to construct
dependency graph
for each node n in the parse tree do
for each attribute a of the grammar symbol do
construct a node in the dependency graph
for a
X Y X.x Y.y
A A.a
X Y X.x Y.y
19
Example
• Whenever following production is used in a parse
tree
E→ E1 + E2 E.val = E1.val + E2.val
we create a dependency graph
E.val
E1.val E2.val
20
Example
L → id addtype (id.entry,L.in)
Example
• dependency graph for real id1, id2, id3
• put a dummy node for a semantic rule that
consists of a procedure call
D
type=real in=real
T L
addtype(z,real)
Type_lexeme
real L in=real , id.zz
addtype(y,real)
in=real L , id.y y
addtype(x,real) x id.x
22
Evaluation Order
• Any topological sort of dependency graph
gives a valid order in which semantic rules
must be evaluated
a4 = real
a5 = a4
addtype(id3.entry, a5)
a7 = a5
addtype(id2.entry, a7 )
a9 := a7
addtype(id1.entry, a9 )
23
Attributes …
• attributes fall into two classes:
synthesized and inherited
24
Attributes …
• Each grammar production A → α has associated
with it a set of semantic rules of the form
– b is a synthesized attribute of A
OR
– b is an inherited attribute of one of the grammar
symbols on the right
26
Syntax Directed Definitions for a
desk calculator program
L→En Print (E.val)
E→E+T E.val = E.val + T.val
E→T E.val = T.val
T→T*F T.val = T.val * F.val
T→F T.val = F.val
F → (E) F.val = E.val
F → digit F.val = digit.lexval
L Print 17
Val=17E n
Val=12E + T Val=5
Val=12T F Val=5
Val=3 T * F Val=4 id
Val=3 F id
28
id
Inherited Attributes
• an inherited attribute is one whose value is defined in terms
of attributes at the parent and/or siblings
29
Example
L → id addtype (id.entry,L.in)
Parse tree for
real x, y, z
D
type=real in=real
T L
addtype(z,real)
real L in=real , z
addtype(y,real)
in=real L , y
addtype(x,real) x
31
production Attribute rule
bit � 0 bit.value � 0
bit � 1 bit.value � 2bit.position
32
Spot the synthesized and inherited attributes
Important Compiler Data-
Structures
• Symbol-table
• Intermediate Representations
– Abstract Syntax Tree
– Three Address Code
Symbol Table
• Stores information for subsequent phases
Usually 32 bytes
if-then-else
B s1 s2
37
Abstract Syntax tree …
• Chain of single productions may be collapsed, and
operators move to the parent nodes
E +
E + T * id3
T F id1 id2
T * F id3
F id2
38
id1
Constructing Abstract Syntax
tree for expression
• Each node can be represented as a record
41
A syntax directed definition
for constructing syntax tree
E → E1 + T E.ptr = mknode(+, E1.ptr, T.ptr)
E →T E.ptr = T.ptr
T → T1 * F T.ptr := mknode(*, T1.ptr, F.ptr)
T →F T.ptr := F.ptr
F → (E) F.ptr := E.ptr
F → id F.ptr := mkleaf(id, entry.id)
F → num F.ptr := mkleaf(num,val)
42
DAG for Expressions
Expression a + a * ( b – c ) + ( b - c ) * d
make a leaf or node if not present,
otherwise return pointer to the existing node
P13
P1 = makeleaf(id,a) +
P2 = makeleaf(id,a) P7
P3 = makeleaf(id,b)
P4 = makeleaf(id,c) +
P5 = makenode(-,P3,P4)
P6 = makenode(*,P2,P5) P6 P12
P7 = makenode(+,P1,P6)
* *
P8 = makeleaf(id,b) P5 P10
P9 = makeleaf(id,c) P1 P2 P11
P10 = makenode(-,P8,P9) a - d
P11 = makeleaf(id,d)
P3 P8
P12 = makenode(*,P10,P11) P4 P9
P13 = makenode(+,P7,P12)
b c
43
Three address code
• It is a sequence of statements of the
general form X := Y op Z where
• Jump • Pointer
– goto L – x = &y
– if x relop y goto L – x = *y
– *x = y
• Indexed assignment
– x = y[i]
– x[i] = y
46
float a[20][10];
use a[i][j+2]
HIR MIR LIR
t1� j+2 r1� [fp-4]
t1�a[i,j+2] t2� i*20 r2� r1+2
t3� t1+t2 r3� [fp-8]
t4� 4*t3 r4� r3*20
t5� addr a r5� r4+r2
t6� t4+t5 r6� 4*r5
t7�*t6 r7�fp-216
f1� [r7+r6]
47
Some thoughts...
• Do we really need to build the whole
parse tree?
• Can the computation on the
attributes be done on-the-fly (along
with parsing)?
• Can we do this at least for some
restricted SDDs?
A thought...
• For constructing the parse tree, the parser
traverses the nodes in the parse tree in some
order...
• Let us add the sematic actions to the parse tree,
so that the parser executes these actions when it
visits them...
• When will the above constitute a correct
translation scheme?
• When these actions, if traversed in the same
order that the parser uses to traverse the tree,
forms a valid topological ordering on the
dependencies.
New Questions...
• What is the order in which the parser
traverses the parse tree?
– See next slide
E→ T R
{print(addop)}
R→ addop T R |ε
T→ num {print(num)}
T R
num print(num) Є
(2)
53
• Assume actions are terminal symbols
55
S-attributed definition
• a syntax directed definition that uses only
synthesized attributes is said to be an S-
attributed definition
• A parse tree for an S-attributed definition
can be annotated by evaluating semantic rules
for attributes
• A translation scheme for an S-attributed
SDD can be obtained simply by appending the
semantic actions to the right-hand-side of
each production rule
56
• In case of both inherited and synthesized attributes
A1 A2 A1.in=1
A2.in=A1.in
a print(A1.in) a print(A2.in)
B → B1 B2 B1.pts = B.pts
B2.pts = B.pts
B.ht = max(B1.ht,B2.ht)
58
after putting actions in the right place
S → {B.pts = 10} B
{S.ht = B.ht}
B → {B1.pts = B.pts} B1
{B2.pts = B.pts} B2
{B.ht = max(B1.ht,B2.ht)}
3*5+4n
*5+4n digit 3
*5+4n F 3 F → digit
*5+4n T 3 T→F
5+4n T* 3–
+4n T*digit 3–5
+4n T*F 3–5 F → digit
+4n T 15 T→T*F
+4n E 15 E→ T
4n E+ 15 –
n E+digit 15 – 4
n E+F 15 – 4 F → digit
n E+T 15 – 4 T→F
n E 19 E → E +T 65
L-attributed definitions
• L-attributed definition
– where attributes can be evaluated in depth-
first order
– can have both synthesized and inherited
attributes
66
L attributed definitions …
• A syntax directed definition is L-attributed if each
inherited attribute of Xj (1 ≤ j ≤ n) at the right hand side of
A→X1 X2…Xn depends only on
– Inherited attribute of A
A → LM L.i = f1(A.i)
M.i = f2(L.s)
As = f3(M.s)
A → QR Ri = f4(A.i)
Qi = f5(R.s)
A.s = f6(Q.s)
67
Left recursion
• A top down parser with production
A → A α may loop forever
A→β R
R→αR|ε
68
Parse tree corresponding Parse tree corresponding
to a left recursive grammar to the modified grammar
A A
A R
A R
β α α α β α Є
A → A1 Y {A = g(A1,Y)}
A→X {A = f(X)}
R→ +
T {R1.i = R.i + T.val}
R1 {R.s = R1.s}
R→ -
T {R1.i =R.i – T.val}
R1 {R.s = R1.s}
R→ ε {R.s = R.i}
T Ri=T.val R E.val=R.s
T.val=9
T.val=5
T.val=2
Num R.s=R.i
Є
(2) 73
E E.val=R.s=6
Num Є
(2) 74
When is a semantic action executed?
B -> X {a} Y
• [Top-Down parsing]
– if Y is a non-terminal: perform 'a' just before we
attempt to expand this occurance of Y (i.e. just before
we pop-off Y to expand it)
– if Y is a terminal: perform 'a' just before we check for Y
in the input (i.e. just before we pop-off Y from the stack
matching it with the input)
• [ Bottom- Up parsing]
– perform action'a' as soon as this occurance of X appears
on top of the parsing stack (i .e. some handle is reduced
to X)
Bottom up evaluation of L-attributed
SDD: when to apply the semantic
actions embedded inside rules
A-> BC { B.i = f(A.i) }
A-> BD { B.i = g(A.i) }
• When the viable prefix on the stack is "B", it is a possibility
to develop into one of the handles "BC" or "BD"; the
respective semantic action needs to be applied
• When "B" is on the stack, we don't know what will the
incoming terminals... so, we cannot
S.begin apply the action now ...
= newlabel
• Defer it till we get the handle "BC"
• But, we don't know what rule does "A" get reduced to:
– X -> ...A... { A.i = h(X) }
– Y -> ...A... { A.i = k(Y) }
• Defer the decision further till we have reduced to X or Y?
• Reasoning this way, we may have to wait till the entire input is
seen --- same is building the whole parse tree before semantic
analysis
• Note that no problem with actions apearing at the end as a
unique handle is identified by then
76
Two main problems
• "conflict" on semantic actions
– Consider the state in the parser DFA
• S -> . {S.val = "+"} aA
• S -> . {S.val = "-"} bB
– The parser could have simply done a shift, but
now there is a conflict on which action to
perform
– Solution: use markers to "delegate" the problem
to the parser (Caveat: may not always work as an
LR grammar may not remain LR after introducing
markers)
• No slot on the value stack for the parent
– In the above example, question is where to store "S.val" as
"S" will be pushed on the stack only after "aA" or "aB"
reduces to "S".
– Solution: Introduce a marker symbol M in the parent rule of
"S" and alias this slot in the value table to store the inherited
attribute of S
Bottom up evaluation of
inherited attributes
• Remove embedded actions from translation
scheme
78
Therefore,
E�TR
R � + T {print (+)} R
R � - T {print (-)} R
R�Є
T � num {print(num.val)}
transforms to
E→TR
R→+TMR
R→-TNR
R→Є
T → num {print(num.val)}
M→Є {print(+)}
N→Є {print(-)}
79
Markers
• Markers are terminals that
– derive only Є
– appear only once among all bodies of all
productions
L→ {L1.in =L.in}
L1 ,
id {addtype(id.entry,L in)}
81
L → id {addtype(id.entry,L in)}
State stack INPUT PRODUCTION
real p,q,r
real p,q,r
T p,q,r T → real
Tp ,q,r
TL ,q,r L → id
TL, q,r
TL,q ,r
TL ,r L → L,id
TL, r
TL,r -
TL - L → L,id
D - D →TL
83
Example …
Therefore, the translation scheme
becomes
D→TL
L → id addtype(val[top], val[top-1])
84
Simulating the evaluation of
inherited attributes
• The scheme works only if grammar allows position of
attribute to be predicted.
S → aAC C i = As
S → bABC C i = As
C→c Cs = g(Ci)
• C inherits As
• there may or may not be a B between A and C on the stack
when reduction by rule C�c takes place
S → aAC Ci = As
S → bABMC Mi = As; Ci = Ms
C→c Cs = g(Ci)
M→ε Ms = M i
• When production M → ε is applied we have
M s = M i = As
S → aAC Ci = f(A.s)
• using a marker
Ai is in position top-2j+2
X1.i is in position top-2j+3
X1.s is in position top-2j+4
89
Space for attributes at
compile time
• Lifetime of an attribute begins when it is
first computed
D →T L L.in := T.type
T → real T.type := real
T → int T.type := int
L → L1,I L1.in :=L.in; I.in=L.in
L→I I.in = L.in
I → I1[num] I1.in=array(numeral, I.in)
I → id addtype(id.entry,I.in)
91
Consider string int x[3], y[5]
its parse tree and dependence graph
D
T 1 2 L
int 6 L , 3 I
7 I I [ num ]
5
4 5
I [ num ] id
9 83 y
id
92
x
Resource requirement
1 2 3 4 5 6 7 8 9
R1 R1 R2 R3 R2 R1 R1 R2 R1
94
Example
• Consider rule B →B1 B2 with inherited attribute
ps and synthesized attribute ht
B B.ht
B.ps
B.ps
B2.ht
B2.ps
B1.ht
B1.ps
B1.ps
B1.ht B2.ps B.ps
B.ps
B1.ps B1.ht
B1 B.ps B1.ps B2
B.ps 95
Example …
• However, if different stacks are maintained for
the inherited and synthesized attributes, the
stacks will normally be smaller
B2.ht
B.ps B1.ht
B.ps
B1 B.ps B1.ht B.ps B1.ht B2
96
Type system
• A type is a set of values
97
Type system …
• Languages can be divided into three
categories with respect to the type:
– “untyped”
• No type checking needs to be done
• Assembly languages
– Statically typed
• All type checking is done at compile time
• Algol class of languages
• Also, called strongly typed
– Dynamically typed
• Type checking is done at run time
• Mostly functional languages like Lisp, Scheme etc.
98
Type systems …
• Static typing
– Catches most common programming errors at compile
time
– Avoids runtime overhead
– May be restrictive in some situations
– Rapid prototyping may be difficult
• For example,
– A pointer should not be added to another
pointer
– An uninitialized variable must not be used
– A null-pointer should not be dereferenced
– A closed file is not written into
Properties of a type-system
• Soundness
Soundness: A type-correct program
cannot violate the property
• Completeness
Completeness: All correct programs
will be declared type-correct (or, if a
program is found to be not type-
correct, it will surely violate the
property)
Typing Rules
• If both the operands of arithmetic operators +, -, x are
integers then the result is of type integer
Γ ≻ e1 : int Γ ≻ e2 : int
Γ ≻ e1 + e2 : int
Γ ≻ x : ptr (τ ) Γ ≻ y : τ
Γ ≻ x = & y : stmt
103
Type system and type checking
Please Note
Note: Read ˫ for ≻
• Basic types: integer, char, float, boolean
104
Type expression
• Type of a language construct is denoted by a type
expression
106
Type constructors …
• Records: it applies to a tuple formed from field names and
field types. Consider the declaration
type row = record
addr : integer;
lexeme : array [1 .. 15] of char
end;
107
Type constructors …
• Pointer: if T is a type expression then pointer( T )
is a type expression denoting type pointer to an
object of type T
• Function: function maps domain set to range set.
It is denoted by type expression D → R
108
Specifications of a type
checker
• Consider a language which consists
of a sequence of declarations (D)
followed by a sequence of
statements (P)
S→D;P
D → D ; D | id : T
T → char | integer | array [ num] of T | ^ T
E → literal | num | E mod E | E [E] | E ^
P → id:=E | if E then P | while E P | P; P
109
Specifications of a type
checker …
• A program generated by this grammar is
key : integer;
key=key mod 1999
• Assume following:
– basic types are char, int, type-error
– all arrays start at 1
– array[256] of char has type expression
array(1 .. 256, char)
110
Type System
• A program is a set of declarations (D) followed by a
set of statements (P)
. ≻ {D; P} : α
Type System
Process the declarations to build the
symbol table (typing assumptions)
Γ, x : τ ≻ {D; P} : α
Γ ≻ {x : τ ; D; P} : α
Type System
Γ ≻ stm : α Γ ≻ P : α
Γ ≻ {stm; P} : α
Γ ≻ x :τ Γ ≻ e :τ
[ stm ≡ {x := e}]
Γ ≻ {x := e} : α
x :τ ∈ Γ
Γ ≻ x :τ
Γ ≻ e1 : int Γ ≻ e2 : int
Γ ≻ e1 mod e2 : int
Rules for Symbol Table entry
D � id : T addtype(id.entry, T.type)
T � char T.type = char
T � integer T.type = int
T � ^T1 T.type = pointer(T1.type)
T � array [ num ] of T1 T.type = array(1..num, T1.type)
121
Efficient implementation …
Type expression encoding
char 000000 0001
function( char ) 000011 0001
pointer( function( char ) ) 000111 0001
array( pointer( function( char) ) ) 100111 0001
122
Checking name equivalence
• Consider following declarations
type link = ^cell;
var next, last : link;
p, q, r : ^cell;
123
Name equivalence …
variable type expression
next link
last link
p pointer(cell)
q pointer(cell)
r pointer(cell)
124
Name equivalence …
• Some compilers allow type expressions to have names.
• However, some compilers assign implicit type names to each
declared identifier in the list of variables.
• Consider
type link = ^ cell;
var next : link;
last : link;
p : ^ cell;
q : ^ cell;
r : ^ cell;
125
Name equivalence …
The code is similar to
type link = ^ cell
np = ^ cell;
nq = ^ cell;
nr = ^ cell;
var next : link;
last : link;
p : np;
q : nq;
r : nr;
126
Cycles in representation of
types
• Data structures like linked lists are defined recursively
record record
X X X X
cell
128
Cycles in representation of …
• C uses structural equivalence for all types except
records
129
Type conversion
• Consider expression like x + i where x is of type
real and i is of type integer
133
Overloaded functions and
operators …
• In Ada standard interpretation of * is
multiplication
134
Overloaded function resolution
• Suppose only possible type for 2, 3 and 5
is integer and Z is a complex variable
– then 3*5 is either integer or complex depending
upon the context
– in 2*(3*5)
3*5 is integer because 2 is integer
– in Z*(3*5)
3*5 is complex because Z is complex
135
Type resolution
• Try all possible types of each overloaded function
(possible but brute force method!)
136
Determining set of possible
types
E’ � E E’.types = E.types
E � id E.types = lookup(id)
E � E1(E2) E.types = { t | there exists an s in E2.types
and s�t is in E1.types}
E {i,c}
{i} E * E {i}
{ixi�i
ixi�c
cxc�c}
{i} 3 5 {i}
137
Narrowing the set of possible
types
• Ada requires a complete expression to have
a unique type
138
Narrowing the set of …
E’ � E E’.types = E.types
E.unique = if E’.types=={t} then t
else type_error
E � id E.types = lookup(id)
139
Is the grammar L-attributed?
E’ � E E’.types = E.types
E.unique = if E’.types=={t} then t
else type_error
E � id E.types = lookup(id)
140
Polymorphic functions
• A function can be invoked with arguments of
different types
141
Polymorphic functions …
• Strongly typed languages can make programming very
tedious
bit � 0 bit.value � 0
bit � 1 bit.value � 2bit.position
145