0% found this document useful (0 votes)
10 views34 pages

Compiler Design Syntax Analysis Top Down

The document outlines the role of parsers in syntax analysis, detailing context-free grammars, error handling strategies, and types of parsers. It discusses methods for eliminating ambiguity and left recursion, as well as techniques for predictive parsing and error recovery. Key concepts such as First and Follow sets, LL(1) grammars, and the construction of predictive parsing tables are also covered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views34 pages

Compiler Design Syntax Analysis Top Down

The document outlines the role of parsers in syntax analysis, detailing context-free grammars, error handling strategies, and types of parsers. It discusses methods for eliminating ambiguity and left recursion, as well as techniques for predictive parsing and error recovery. Key concepts such as First and Follow sets, LL(1) grammars, and the construction of predictive parsing tables are also covered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

SYNTAX ANALYSIS

Outline

• Role of parser
• Context free grammars
• Top down parsing
The role of parser

token
Source Lexical Parse tree Rest of Front Intermediate
Parser
program Analyzer End representation
getNext
Token

Symbol
table
Error handling

• Common programming errors


• Lexical errors
• Syntactic errors
• Semantic errors

• Error handler goals


• Report the presence of errors clearly and accurately
• Recover from each error quickly enough to detect
subsequent errors
• Add minimal overhead to the processing of correct
programs.
Error-recover strategies

• Panic mode recovery


• Discard input symbol one at a time until one of designated set of
synchronization tokens is found
• Phrase level recovery
• Replacing a prefix of remaining input by some string that allows the parser to
continue
• Error productions
• Augment the grammar with productions that generate the erroneous
constructs
• Global correction
• Choosing minimal sequence of changes to obtain a globally least-cost
correction
Types of parsers
Context free grammars
• Context free grammars <T, N, P, S>
• T: a set of tokens (terminal symbols)
• N: a set of non terminal symbols
• P: a set of productions of the form
• nonterminal →String of terminals & non terminals
• S: a start symbol
• A grammar derives strings by beginning with a start symbol
and repeatedly replacing a non terminal by the right hand
side of a production for that non terminal.
• The strings that can be derived from the start symbol of a
grammar G form the language L(G) defined by the grammar.
Derivations

• Productions are treated as rewriting rules to generate a


string
• Rightmost and leftmost derivations
• E -> E + E | E * E | -E | (E) | id
• Derivations for –(id+id)
• E => -E
• => -(E)
• => -(E+E)
• => -(id+E)
• =>-(id+id)
Parse trees

• -(id+id)
• E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
Ambiguity

• For some strings there exist more than one parse


tree
• Or more than one leftmost derivation
• Or more than one rightmost derivation
• Example: id+id*id
Elimination of ambiguity
Elimination of ambiguity (cont.)

• Idea:
• A statement appearing between a then and an else must be matched
Elimination of left recursion

• A grammar is left recursive if it has a non-terminal A such


that there is a derivation A=> Aα +
• can't parsing methods can’t handle left-recursive
grammars
• A simple rule for direct left recursion elimination:
• For a rule like:
• A -> A α|β
• We may replace it with
• A -> β A’
• A’ -> α A’ | ɛ
Left recursion elimination (cont.)

• There are cases like following


• S -> Aa | b
• A -> Ac | Sd | ɛ
• Left recursion elimination algorithm:
• Arrange the nonterminals in some order A1,A2,…,An.
• For (each i from 1 to n) {
• For (each j from 1 to i-1) {
• Replace each production of the form Ai-> Aj γ by the
production Ai -> δ1 γ | δ2 γ | … |δk γ where Aj-> δ1 |
δ2 | … |δk are all current Aj productions
•}
• Eliminate left recursion among the Ai-productions
•}
Left factoring

• Left factoring is a grammar transformation that is useful for


producing a grammar suitable for predictive or top-down parsing.
• Consider following grammar:
• Stmt -> if expr then stmt else stmt
• | if expr then stmt
• On seeing input if it is not clear for the parser which production to
use
• We can easily perform left factoring:
• If we have A->αβ1 | αβ2 then we replace it with
• A -> αA’
• A’ -> β1 | β2
Left factoring (cont.)

• Algorithm
• For each non-terminal A, find the longest prefix α
common to two or more of its alternatives. If α<> ɛ, then
replace all of A-productions A->αβ1 |αβ2 | … | αβn |
γ by
• A -> αA’ | γ
• A’ -> β1 |β2 | … | βn
• Example:
• S -> I E t S | i E t S e S | a
• E -> b
Top Down Parsing
Introduction
• A Top-down parser tries to create a parse tree from the root towards
the leafs scanning input from left to right
• It can be also viewed as finding a leftmost derivation for an input string
• Example: id+id*id

E E E E
E -> TE’ lm lm
E
lm
E
lm lm
E’ -> +TE’ | Ɛ T E’ T E’ T E’ T E’ T E’
T -> FT’
T’ -> *FT’ | Ɛ F T’ F T’ F T’ F T’ + T E’

F -> (E) | id id id Ɛ id Ɛ
Recursive descent parsing

• Consists of a set of procedures, one for each nonterminal


• Execution begins with the procedure for start symbol
• A typical procedure for a non-terminal

void A() {
choose an A-production, A->X1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
}
}
Recursive descent parsing (cont)
• General recursive descent may require backtracking
• The previous code needs to be modified to allow backtracking
• In general form it cant choose an A-production easily.
• So we need to try all alternatives
• If one failed the input pointer needs to be reset and another
alternative should be tried
• Recursive descent parsers cant be used for left-recursive grammars
Example

S->cAd
A->ab | a Input: cad

S S S

c A d c A d c A d

a b a
First and Follow

• First() is set of terminals that begins strings derived from


• If α=>ɛ
* then is also in First(ɛ)
• In predictive parsing when we have A-> α|β, if First(α)
and First(β) are disjoint sets then we can select
appropriate A-production by looking at the next input
• Follow(A), for any nonterminal A, is set of terminals a that can
appear immediately after A in some sentential form
• If we have S => αAaβ for some αand βthen a is in Follow(A)
*
• If A can be the rightmost symbol in some sentential form, then $ is in
Follow(A)
Computing First

• To compute First(X) for all grammar symbols X, apply following rules


until no more terminals or ɛ can be added to any First set:
*
1. If X is a terminal then First(X) = {X}.
2. If X is a nonterminal and X->Y1Y2…Yk is a production for some k>=1, then
place a in First(X) if for some i a is in First(Yi) and ɛ is in all of First(Y1),
…,First(Yi-1) that is Y1…Yi-1 => ɛ. if ɛ is in First(Yj) for j=1,…,k then add
ɛ to First(X).
3. If X-> ɛ is a production then add ɛ to First(X)
• Example!

*
Computing follow

• To compute First(A) for all nonterminals A, apply following rules until


nothing can be added to any follow set:
1. Place $ in Follow(S) where S is the start symbol
2. If there is a production A-> αBβ then everything in First(β) except ɛ is
in Follow(B).
3. If there is a production A->B or a production A->αBβ where
First(β) contains ɛ, then everything in Follow(A) is in Follow(B)
• Example!
LL(1) Grammars

• Predictive parsers are those recursive descent parsers needing no


backtracking
• Grammars for which we can create predictive parsers are called LL(1)
• The first L means scanning input from left to right
• The second L means leftmost derivation
• And 1 stands for using one input symbol for lookahead
• A grammar G is LL(1) if and only if whenever A-> α|βare two distinct
productions of G, the following conditions hold:
• For no terminal a do αandβ both derive strings beginning with a
• At most one of α or βcan derive empty string
• If α=> ɛ then βdoes not derive any string beginning with a terminal in
Follow(A).
*
Construction of predictive parsing
table
• For each production A->α in grammar do the following:
• For each terminal a in First(α) add A-> in M[A,a]
• If ɛ is in First(α), then for each terminal b in Follow(A) add A-> ɛ to
M[A,b]. If ɛ is in First(α) and $ is in Follow(A), add A-> ɛ to M[A,$] as
well
• If after performing the above, there is no production in M[A,a] then
set M[A,a] to error
First Follow
Example
E -> TE’ F {(,id} {+, *, ), $}
T {(,id} {+, ), $}
E’ -> +TE’ | Ɛ {(,id} {), $}
T -> FT’ E
E’ {+,ɛ} {), $}
T’ -> *FT’ | Ɛ {+, ), $}
F -> (E) | id T’ {*,ɛ}
Input Symbol
Non -
terminal id + * ( ) $
E E -> TE’ E -> TE’

E’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ

T T -> FT’ T -> FT’

T’ T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ

F F -> id F -> (E)


Another example

S -> iEtSS’ | a
S’ -> eS | Ɛ
E -> b

Input Symbol
Non -
terminal a b e i t $
S S -> a S -> iEtSS’

S’ S’ -> Ɛ S’ -> Ɛ
S’ -> eS
E E -> b
Non-recursive predicting
parsing
a + b $

Predictive
parsing output
stack X
Y program
Z
$
Parsing
Table
M
Predictive parsing algorithm

Set ip point to the first symbol of w;


Set X to the top stack symbol;
While (X<>$) { /* stack is not empty */
if (X is a) pop the stack and advance ip;
else if (X is a terminal) error();
else if (M[X,a] is an error entry) error();
else if (M[X,a] = X->Y1Y2..Yk) {
output the production X->Y1Y2..Yk;
pop the stack;
push Yk,…,Y2,Y1 on to the stack with Y1 on top;
}
set X to the top stack symbol;
}
Example

• id+id*id$

Matched Stack Input Action


E$ id+id*id$
Error recovery in predictive
parsing
• Panic mode
• Place all symbols in Follow(A) into synchronization set for
nonterminal A: skip tokens until an element of Follow(A) is seen and
pop A from stack.
• Add to the synchronization set of lower level construct the symbols
that begin higher level constructs
• Add symbols in First(A) to the synchronization set of nonterminal A
• If a nonterminal can generate the empty string then the production
deriving can be used as a default
• If a terminal on top of the stack cannot be matched, pop the
terminal, issue a message saying that the terminal was insterted
Non - Input Symbol
Example terminal
EE -> TE’
id + * (
E -> TE’ synch
)
synch
$

E’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ


synch T -> FT’ synch synch
TT -> FT’
T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ
T’

FF -> id synch synch F ->synch


(E) synch

Stack Input Action


E$ )id*+id$ Error, Skip )
E$ id*+id$ id is in First(E)
TE’$ id*+id$
FT’E’$ id*+id$
idT’E’$ id*+id$
T’E’$ *+id$
*FT’E’$ *+id$
FT’E’$ +id$ Error, M[F,+]=synch
T’E’$ +id$ F has been poped
Thank You

You might also like