U 2 PPT
U 2 PPT
UNIT – II PPT
Dr.S.Selvi
Professor/CSE
UNIT II
SYNTAX ANALYSIS
2
Outline
⚫ Role of parser
⚫ Context free grammars
⚫ Top down parsing
⚫ Bottom up parsing
⚫ Parser generators
⚫ Type Checking
⚫ Type Conversions
⚫ Run – Time Environments
⚫ Storage Allocation Strategies
⚫ Activation Records
The role of parser
token
Source Lexical Parse tree Rest of Intermediate
program Parser
Analyzer Front End representation
getNext
Token
Symbol
table
Uses of grammars
E -> E + T | T
T -> T * F | F
F -> (E) | id
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
Error handling
⚫ Common programming errors
⚫ Lexical errors
⚫ Syntactic errors
⚫ Semantic errors
⚫ Lexical errors
⚫ Error handler goals
⚫ Report the presence of errors clearly and accurately
⚫ Recover from each error quickly enough to detect subsequent
errors
⚫ Add minimal overhead to the processing of correct programs
Error-recover strategies
⚫ Panic mode recovery
⚫ Discard input symbol one at a time until one of designated set of
synchronization tokens is found
⚫ Phrase level recovery
⚫ Replacing a prefix of remaining input by some string that allows
the parser to continue
⚫ Error productions
⚫ Augment the grammar with productions that generate the
erroneous constructs
⚫ Global correction
⚫ Choosing minimal sequence of changes to obtain a globally
least-cost correction
Context free grammars
⚫ Terminals
⚫ Nonterminals
expression -> expression + term
⚫ Start symbol expression -> expression – term
⚫ productions expression -> term
term -> term * factor
term -> term / factor
term -> factor
factor -> (expression)
factor -> id
Derivations
⚫ Productions are treated as rewriting rules to generate a
string
⚫ Rightmost and leftmost derivations
⚫ E -> E + E | E * E | -E | (E) | id
⚫ Derivations for –(id+id)
⚫ E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
Parse trees
⚫ -(id+id)
⚫ E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
Ambiguity
⚫ For some strings there exist more than one parse tree
⚫ Or more than one leftmost derivation
⚫ Or more than one rightmost derivation
⚫ Example: id+id*id
Elimination of ambiguity
Elimination of ambiguity (cont.)
⚫ Idea:
⚫ A statement appearing between a then and an else must be
matched
Elimination of left recursion
⚫ A grammar is left recursive if it has a non-terminal A
such that there is a derivation A=>+Aα
⚫ Top down parsing methods cant handle left-recursive
grammars
⚫ A simple rule for direct left recursion elimination:
⚫ For a rule like:
⚫ A -> A α|β
⚫ We may replace it with
⚫ A -> β A’
⚫ A’ -> α A’ | ɛ
Left recursion elimination (cont.)
void A() {
choose an A-production, A->X1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
}
}
Recursive descent parsing (cont)
⚫ General recursive descent may require backtracking
⚫ The previous code needs to be modified to allow
backtracking
⚫ In general form it cant choose an A-production easily.
⚫ So we need to try all alternatives
⚫ If one failed the input pointer needs to be reset and another
alternative should be tried
⚫ Recursive descent parsers cant be used for left-recursive
grammars
Example
S->cAd
A->ab | a Input: cad
S S S
c A d c A d c A d
a b a
First and Follow
⚫ First() is set of terminals that begins strings derived from
⚫ If α=>ɛ
* then is also in First(ɛ)
⚫ In predictive parsing when we have A-> α|β, if First(α) and
First(β) are disjoint sets then we can select appropriate
A-production by looking at the next input
⚫ Follow(A), for any nonterminal A, is set of terminals a that can
appear immediately after A in some sentential form
⚫ If we have S =>
* αAaβ for some αand βthen a is in Follow(A)
Input Symbol
Non -
terminal a b e i t $
S S -> a S -> iEtSS’
S’ S’ -> Ɛ S’ -> Ɛ
S’ -> eS
E E -> b
Non-recursive predicting parsing
a + b $
Predictive
parsing output
stack X
Y program
Z
$
Parsing
Table
M
Predictive parsing algorithm
Set ip point to the first symbol of w;
Set X to the top stack symbol;
While (X<>$) { /* stack is not empty */
if (X is a) pop the stack and advance ip;
else if (X is a terminal) error();
else if (M[X,a] is an error entry) error();
else if (M[X,a] = X->Y1Y2..Yk) {
output the production X->Y1Y2..Yk;
pop the stack;
push Yk,…,Y2,Y1 on to the stack with Y1 on top;
}
set X to the top stack symbol;
}
Example
⚫ id+id*id$
id id F id T*F
id F id
id
Shift-reduce parser
⚫ The general idea is to shift some symbols of input to the
stack until a reduction can be applied
⚫ At each reduction step, a specific substring matching the
body of a production is replaced by the nonterminal at the
head of the production
⚫ The key decisions during bottom-up parsing are about
when to reduce and about what production to apply
⚫ A reduction is a reverse of a step in a derivation
⚫ The goal of a bottom-up parser is to construct a derivation
in reverse:
⚫ E=>T=>T*F=>T*id=>F*id=>id*id
Handle pruning
⚫ A Handle is a substring that matches the body of a
production and whose reduction represents one step along
the reverse of a rightmost derivation
Stack Input
… if expr then stmt else …$
Reduce/reduce conflict
stmt -> id(parameter_list)
stmt -> expr:=expr
parameter_list->parameter_list, parameter
parameter_list->parameter
parameter->id
expr->id(expr_list)
expr->id
expr_list->expr_list, expr
Stack Input
expr_list->expr
… id(id ,id) …$
LR Parsing
⚫ The most prevalent type of bottom-up parsers
⚫ LR(k), mostly interested on parsers with k<=1
⚫ Why LR parsers?
⚫ Table driven
⚫ Can be constructed to recognize all programming language
constructs
⚫ Most general non-backtracking shift-reduce parsing method
⚫ Can detect a syntactic error as soon as it is possible to do so
⚫ Class of grammars for which we can construct LR parsers are
superset of those which we can construct LL parsers
States of an LR parser
⚫ States represent set of items
⚫ An LR(0) item of G is a production of G with the dot at
some position of the body:
⚫ For A->XYZ we have following items
⚫ A->.XYZ
⚫ A->X.YZ
⚫ A->XY.Z
⚫ A->XYZ.
Example acc
$
T -> T * F | F
F -> (E) | id
I6 I9
E->E+.T
I1
+ T->.T*F T E->E+T.
E’->E. T->.F
T->T.*F
E E->E.+T
F->.(E)
F->.id
I0=closure({[E’->.E]} I2
E’->.E T I7
F I10
E->.E+T E’->T. * T->T*.F
F->.(E) T->T*F.
E->.T T->T.*F id F->.id
T->.T*F id
T->.F I5
F->.(E)
F->.id ( F->id. +
I4
F->(.E)
I8 I11
E->.E+T
E->.T
E E->E.+T )
T->.T*F F->(E.) F->(E).
T->.F
F->.(E)
F->.id
I3
T>F.
Use of LR(0) automaton
⚫ Example: id*id
Line Stack Symbols Input Action
(1) 0 $ id*id$ Shift to 5
(2) 05 $id *id$ Reduce by F->id
(3) 03 $F *id$ Reduce by T->F
(4) 02 $T *id$ Shift to 7
(5) 027 $T* id$ Shift to 5
(6) 0275 $T*id $ Reduce by F->id
(7) 02710 $T*F $ Reduce by T->T*F
(8) 02 $T $ Reduce by E->T
(9) 01 $E $ accept
LR-Parsing model
INPUT a1 … ai … an $
LR Parsing Output
Sm
Program
Sm-1
…
$
ACTION GOTO
LR parsing algorithm
let a be the first symbol of w$;
while(1) { /*repeat forever */
let s be the state on top of the stack;
if (ACTION[s,a] = shift t) {
push t onto the stack;
let a be the next input symbol;
} else if (ACTION[s,a] = reduce A->β) {
pop |β| symbols of the stack;
let state t now be on top of the stack;
push GOTO[t,A] onto the stack;
output the production A->β;
} else if (ACTION[s,a]=accept) break; /* parsing is done */
else call error-recovery routine;
}
Example (0) E’->E
(1) E -> E + T
(2) E-> T
STATE ACTON GOTO
(3) T -> T * F
id + * ( ) $ E T F (4) T-> F
0 S5 S4 1 2 3
(5) F -> (E) id*id+id?
(6) F->id
1 S6 Acc
Line Stack Symbols Input Action
2 R2 S7 R2 R2
(1) 0 id*id+id$ Shift to 5
3 R4 R7 R4 R4
(2) 05 id *id+id$ Reduce by F->id
4 S5 S4 8 2 3 (3) 03 F *id+id$ Reduce by T->F
5 R6 R6 R6 R6 (4) 02 T *id+id$ Shift to 7
(5) 027 T* id+id$ Shift to 5
6 S5 S4 9 3 (6) 0275 T*id +id$ Reduce by F->id
7 S5 S4 10 (7) 02710 T*F +id$ Reduce by
T->T*F
8 S6 S11 (8) 02 T +id$ Reduce by E->T
9 R1 S7 R1 R1 (9) 01 E +id$ Shift
(10) 016 E+ id$ Shift
10 R3 R3 R3 R3
(11) 0165 E+id $ Reduce by F->id
11 R5 R5 R5 R5 (12) 0163 E+F $ Reduce by T->F
(13) 0169 E+T` $ Reduce by
E->E+T
(14) 01 E $ accept
Constructing SLR parsing table
⚫ Method
⚫ Construct C={I0,I1, … , In}, the collection of LR(0) items for G’
⚫ State i is constructed from state Ii:
⚫ If [A->α.aβ] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to “shift j”
⚫ If [A->α.] is in Ii, then set ACTION[i,a] to “reduce A->α” for all a in
follow(A)
⚫ If {S’->.S] is in Ii, then set ACTION[I,$] to “Accept”
⚫ If any conflicts appears then we say that the grammar is not
SLR(1).
⚫ If GOTO(Ii,A) = Ij then GOTO[i,A]=j
⚫ All entries not defined by above rules are made “error”
⚫ The initial state of the parser is the one constructed from the set of
items containing [S’->.S]
Example grammar which is not
SLR(1) S -> L=R | R
L -> *R | id
R -> L
I0 I1 I3 I5 I7
S’->.S S’->S. S ->R. L -> id. L -> *R.
S -> .L=R
S->.R I2 I4 I6
I8
L -> .*R | S ->L.=R L->*.R S->L=.R
R -> L.
L->.id R ->L. R->.L R->.L
R ->. L L->.*R L->.*R I9
L->.id L->.id S -> L=R.
Action
=
Shift 6
2 Reduce R->L
More powerful LR parsers
⚫ Canonical-LR or just LR method
⚫ Use lookahead symbols for items: LR(1) items
⚫ Results in a large collection of items
⚫ LALR: lookaheads are introduced in LR(0) items
Canonical LR(1) items
⚫ In LR(1) items each item is in the form: [A->α.β,a]
⚫ An LR(1) item [A->α.β,a] is valid for a viable prefix γ if
*
there is a derivation S=>δAw=>δαβw, where
rm
⚫ Γ= δα
⚫ Either a is the first symbol of w, or w is ε and a is $
⚫ Example:
⚫ S->BB
*
S=>aaBab=>aaaBab
rm
⚫ B->aB|b Item [B->a.B,a] is valid for γ=aaa
and w=ab
Constructing LR(1) sets of items
SetOfItems Closure(I) {
repeat
for (each item [A->α.Bβ,a] in I)
for (each production B->γ in G’)
for (each terminal b in First(βa))
add [B->.γ, b] to set I;
until no more items are added to I;
return I;
}
SetOfItems Goto(I,X) {
initialize J to be the empty set;
for (each item [A->α.Xβ,a] in I)
add item [A->αX.β,a] to set J;
return closure(J);
}
void items(G’){
initialize C to Closure({[S’->.S,$]});
repeat
for (each set of items I in C)
for (each grammar symbol X)
if (Goto(I,X) is not empty and not in C)
add Goto(I,X) to C;
until no new sets of items are added to C;
}
Example
S’->S
S->CC
C->cC
C->d
Canonical LR(1) parsing table
⚫ Method
⚫ Construct C={I0,I1, … , In}, the collection of LR(1) items for G’
⚫ State i is constructed from state Ii:
⚫ If [A->α.aβ, b] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to “shift j”
⚫ If [A->α., a] is in Ii, then set ACTION[i,a] to “reduce A->α”
⚫ If {S’->.S,$] is in Ii, then set ACTION[I,$] to “Accept”
⚫ If any conflicts appears then we say that the grammar is not
LR(1).
⚫ If GOTO(Ii,A) = Ij then GOTO[i,A]=j
⚫ All entries not defined by above rules are made “error”
⚫ The initial state of the parser is the one constructed from the set of
items containing [S’->.S,$]
Example
S’->S
S->CC
C->cC
C->d
LALR Parsing Table
⚫ For the previous example we had:
I4
C->d. , c/d
I47
C->d. , c/d/$
I7
C->d. , $
E->E*E 0 S3 S2 1
1 S4 S5 Acc
E->(E)
2 S3 S2 6
E->id 3 R4 R4 R4 R4
4 S3 S2 7
5 S3 S2 8
I0: E’->.E I1: E’->E. I2: E->(.E)
E->.E+E E->E.+E E->.E+E 6 S4 S5
E->.E*E E->E.*E E->.E*E 7 R1 S5 R1 R1
E->.(E) E->.(E)
E->.id E->.id 8 R2 R2 R2 R2
69
TYPE SYSTEMS
⚫ The type of a language construct is denoted by a type expression.
⚫ A type expression can be:
⚫ A basic type
⚫a primitive data type such as integer, real, char, boolean, …
⚫ type-error to signal a type error
⚫ void : no type
⚫ A type name
⚫ a name can be used to denote a type expression.
⚫ A type constructor applies to other type expressions.
⚫ arrays: If T is a type expression, then array(I,T) is a type expression where I denotes
index range. Ex: array(0..99,int)
⚫ products: If T and T are type expressions, then their cartesian product T x T is a
1 2 1 2
type expression. Ex: int x int
⚫ pointers: If T is a type expression, then pointer(T) is a type expression. Ex:
pointer(int)
⚫ functions: We may treat functions in a programming language as mapping from a
domain type D to a range type R. So, the type of a function can be denoted by the type
expression D→R where D are R type expressions. Ex: int→int represents the type of
a function which takes an int value as parameter, and its return type is also int.
70
SPECIFICATION OF SIMPLE TYPE CHECKER
P → D;E
D → D;D
D → id:T { addtype(id.entry,T.type) }
T → char { T.type=char }
T → int { T.type=int }
T → real { T.type=real }
T → ↑T1 { T.type=pointer(T1.type) }
T → array[intnum] of T1 {T.type=array(1..intnum.val,T1.type) }
71
Type Checking of Expressions
E → id { E.type=lookup(id.entry) }
E → charliteral { E.type=char }
E → intliteral { E.type=int }
E → realliteral { E.type=real }
E → E1 + E 2 { if (E1.type=int and E2.type=int) then E.type=int
else if (E1.type=int and E2.type=real) then E.type=real
else if (E1.type=real and E2.type=int) then E.type=real
else if (E1.type=real and E2.type=real) then E.type=real
else E.type=type-error }
E → E1 [E2] { if (E2.type=int and E1.type=array(s,t)) then E.type=t
else E.type=type-error }
E → E1 ↑ { if (E1.type=pointer(t)) then E.type=t
else E.type=type-error }
72
Type Checking of Statements
S → id = E { if (id.type=E.type then S.type=void
else S.type=type-error }
73
Type Checking of Functions
E → E1 ( E2 ) { if (E2.type=s and E1.type=s→t) then E.type=t
else E.type=type-error }
74
Structural Equivalence of Type Expressions
⚫ How do we know that two type expressions are equal?
⚫ As long as type expressions are built from basic types (no
type names), we may use structural equivalence between two
type expressions
76
Cycles in Type Expressions
type link = ↑ cell;
type cell = record
x : int,
next : link
end;
⚫ We cannot use structural equivalence if there are cycles in
type expressions.
⚫ We have to treat type names as basic types.
but this means that the type expression link is different than the type expression
↑cell.
77
Type Conversions
x+y ? what is the type of this expression (int or
double)?
inttoreal y,,t1
real+ t1,x,t2
78
RUN TIME ENVIRONMENTS
79
SOURCE LANGUAGE ISSUES
⚫ A compiler must implement the abstractions presented in the source language
definition
⚫ Examples include
⚫ Names/id
⚫ Scopes
⚫ Binding
⚫ Data types
⚫ Operators
⚫ Procedures/functions,
⚫ Control flow constructs
80
STORAGE ORGANIZATION
⚫ A typical organization of run-time storage for Pascal/C include the
following sub-division of memory
⚫ Program code
82
High
memory
low
memory
STORAGE ALLOCATION STRATEGIES
85
Stack
⚫ Stack
⚫ Last-in First-out (LIFO)
⚫ Used to implement the behavior of function calls
⚫ Stack uses the following operations
⚫ Push (or call) the node for an activation onto the stack when
it begins
⚫ Pop (or return) the node when the activation end
86
Stack allocation of space
⚫ Stack allocation would not be practical if
procedures calls did not nest in time
⚫ Activation tree
⚫ used to establish the proper order in which activations
appear along the path to Root, and they will return the
result in the reverse of that order
Activation Records: Fields
⚫ the field of Activation Record (AR)
⚫ Temporary values: values generated as the result of the
expression evaluations which cannot be put in registers
⚫ Local data: local data to belong to the procedure
⚫ Saved machine: keeps the context or machine status (register,
PC, etc)
⚫ Access link: points to non-local data in other AR
⚫ Control link: points to the caller’s activation record
⚫ the return value space of the called function, if any
⚫ Actual parameters: used by the calling procedure to pass
parameters to called procedures. Registers are used to pass these
information
88
Activation Trees
❑ Activation trees
▪ Represents the activations of procedures during program
execution using a tree
❑ Activation of procedure
▪ Execution of a procedure body
▪ Lifetime of activation
▪ Sequence of steps between first and last statements
▪ Non-overlapping,
▪ nested activation,
▪ Recursive
89
parent
Occurs
first
90
Symbol Tables
91