0% found this document useful (0 votes)
42 views73 pages

Unit 2-Part B

Uploaded by

Mahi Kolli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views73 pages

Unit 2-Part B

Uploaded by

Mahi Kolli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

➢ PART B : TOP DOWN PARSING

1. Recursive Descent Parsing


2. LL(1) Parsing
3. First and Follow Sets
4. Recursive Descent Parser
5. Error Recovery in top down parsers
➢Checks whether string given can be generated from given
grammar
➢Parser for grammar (G) : program that takes as i/p a string w and
produces o/p either a parse tree for w , if w is a sentence of G or an
error message indicating that w is not a sentence of G
➢Parser obtains a string of tokens from the lexical analyzer &
verifies that string of token names can be generated by grammar
for source language
 Three types of parsers for grammar
◦ 1. Universal (Cocke-Younger-Kasemi algorithm & Earley’s algorithm)
◦ 2. Top down
◦ 3. Bottom up
 Both cases i/p to the parser is scanned from left to right one
symbol at a time
 Work only for subclasses of grammar
 Parser implemented by hand use LL grammar (predictive
parsing)
 Parser for larger class of LR grammar are constructed using
automated tools
 Tasks conducted :
◦ 1. collecting information about various tokens into symbol table
◦ 2. performing type checking and other kinds of semantic analysis
◦ 3. generating intermediate code
 Representative Grammar
◦ E represents expression consisting of terms separated by + E -> E + T | T
symbol T -> T * F | F
F -> ( E ) | id
◦ T represent terms consisting of factors separated by * sign
◦ F represent factors
◦ Class of LR grammar for bottom up parsing
◦ Cannot be used for top down parsing as it is left recursive
Left Recursion

E -> E + T | T here Rule


A = E, α = +T and β = T then after removing left recursion A-> Aα | β
New production : E -> T E’ and E’ -> +TE’ | ε Then
A -> βA’
Final Grammar after removing left recursion A’ -> αA’ | ε
E → T E’
E’ → +T E’ | ε
T → F T’
T’ → * F T’ | ε
F → ( E) | id
 Syntax Error Handling
◦ Programs may contain errors at different levels
◦ Lexical :- misspelling an identifier, keyword or operator ( use of identifier
elipseSize instead of ellipseSize)
◦ Syntactic :- arithmetic expression with unbalanced parenthesis ( appearance
of case statement without enclosing switch)
◦ Semantic :- operator applied to incompatible operand (return statement in
Java method with result type)
◦ Logical error : assignment operator = instead of comparison operator ==

◦ Viable prefix property detects that an error has occurred as soon as they see
a prefix of input that cannot be completed to form a string of language.

◦ Syntactic errors will be detected when stream of token coming from lexical
analyzer does not match with grammatical rules.
 Function of error handler in the parser is
 1. Should report presence of errors clearly and accurately by specifying line
number of the error in source program
 2. Should recover from each error as fast as possible so that subsequent errors
can be detected
 3. Should not slow down processing of correct programs.

 Error Recovery Strategies


 1. Panic Mode Recovery :
 Discards i/p symbols one at a time until set of synchronizing tokens
 synchronizing tokens are delimiters like semicolon or }
 Advantage is that its easy to implement and guarantees not to go to infinite loop
 Disadvantage is that a considerable amount of input is skipped without checking it for
additional errors
 2. Phrase Level Recovery :
◦ In this method, when a parser encounters an error, it performs necessary correction on
remaining input so that the rest of input statement allow the parser to parse ahead.
◦ The correction can be deletion of extra semicolons, replacing comma by semicolon or
inserting missing semicolon.
◦ Replacement made should not lead to infinite loop
◦ Used in error repairing compilers as it corrects input string
◦ While performing correction, atmost care should be taken for not going in infinite loop.
◦ Disadvantage is that it finds difficult to handle situations where actual error occured
before point of detection.

 3. Error Production
◦ If user has knowledge of common errors that can be encountered then, these errors can
be incorporated by augmenting the grammar with error productions that generate
erroneous constructs.
◦ If this is used then, during parsing appropriate error messages can be generated and
parsing can be continued.
◦ Disadvantage is that its difficult to maintain.

 4. Global correction
◦ The parser examines the whole program and tries to find out the closest match for it
which is error free.
◦ The closest match program has less number of insertions, deletions and changes of
tokens to recover from erroneous input.
◦ Due to high time and space complexity, this method is not implemented practically.
 Construction of parse tree for the i/p string starting from root and
creating nodes of the parse tree in preorder (depth first)
 Leftmost derivation for an i/p string
 Terminals are basic symbols from which strings are formed
 Leftmost derivation of input
.
Recursive Descent Parsing
 Recursive descent is a top-down parsing technique that constructs the parse tree
from the top and the input is read from left to right.
 It uses procedures for every terminal and non-terminal entity.
 This parsing technique recursively parses the input to make a parse tree, which may
or may not require back-tracking.
 But the grammar associated with it (if not left factored) cannot avoid back-tracking.
A form of recursive-descent parsing that does not require any back-tracking is
known as predictive parsing.
 This parsing technique is regarded recursive as it uses context-free grammar which
is recursive in nature

Back-tracking
 Top- down parsers start from the root node (start symbol) and match the input
string against the production rules to replace them (if matched)
Predictive Parser
 Predictive parser is a recursive descent parser, which has the capability to predict
which production is to be used to replace the input string.
 The predictive parser does not suffer from backtracking.
 To accomplish its tasks, the predictive parser uses a look-ahead pointer, which
points to the next input symbols.
 To make the parser back-tracking free, the predictive parser puts some constraints
on the grammar and accepts only a class of grammar known as LL(k) grammar.
 Predictive parsing uses a stack and a parsing table to parse the input and generate a
parse tree.
 Both the stack and the input contains an end symbol $ to denote that the stack is
empty and the input is consumed.
 The parser refers to the parsing table to take any decision on the input and stack
element combination.
 In recursive descent parsing, the parser may have more than one production to
choose from for a single instance of input, whereas in predictive parser, each step
has at most one production to choose. There might be instances where there is no
production matching the input string, making the parsing procedure to fail.
LL Parser
 An LL Parser accepts LL grammar. LL grammar is a subset of context-free
grammar but with some restrictions to get the simplified version, in order to
achieve easy implementation. LL grammar can be implemented by means of both
algorithms namely, recursive-descent or table-driven.
 LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to
right, the second L in LL(k) stands for left-most derivation and k itself represents
the number of look aheads. Generally k = 1, so LL(k) may also be written as LL(1).
 Grammar
 E → T E’
 E’ → +T E’ | ε
 T → F T’
 T’ → * F T’ | ε
 F → ( E) | id
Top down parser for id+id*id
 Advantage
◦ Parser can be constructed easily by hand using top down methods

 Methods for performing top down parsing


◦ 1. Brute Force Method : presented informally and formally and is
accompanied by parsing algorithm
◦ 2. Recursive Descent Parsing : Does not allow backup
◦ 3. Parsing technique( Predictive and non predictive) : allows top down
parsing with limited or no backup

Ex: Consider the grammar :


S → cAd S→cAd
A → aB →caBd A →aB
B→b|ε →cad B→ε
Given the i/p string cad to derive this from Thus string cad is accepted by
Given grammar language as it is derivable from
start symbol of grammar
 Top Down Construction of Parse Tree
 Start with root labelled with starting non terminal and then perform the
following steps
 1. At node n labelled with non terminal A, select one of the productions for A
and construct children at n for symbols on right side of production
 2. Find next node at which a subtree is to be constructed. Above steps can be
implemented during single left to right scan of the input string .
Current token being scanned in the i/p is referred as lookahead symbol
type type

array [ simple ] of type array [ simple ] of type

type num dot dot num simple

array [ simple ] of type type

array
[ simple ] of type

num dot dot num


num dot dot num simple

integer
 Difficulties in Top Down Parsing
 1. Left Recursion
◦ Grammar G is left recursive if it has a non terminal A such that there is a derivation A => A α for
some α
◦ Causes top down parser to go into infinite loop
◦ Left recursion must be eliminated from grammar before parsing using top down parser

 2.Backtracking
◦ Choosing a wrong alternative
◦ If erroneous expansions are made and subsequently discovers a mismatch then we have to undo
all these erroneous expansions
◦ Involves exponential time complexity with respect to length of input
◦ To overcome consider a top down parsing that do not do backtracking
◦ eg : predictive parsers

 3. Order of Alternatives
◦ Order in which alternatives are used affects language accepted S
◦ S → cAd
◦ A → ab | a c A d
◦ String is cabd
Grammar fails to accept string cabd . ca is already matched but failure a
occurs for the match of b with d
 4. Report of failures
◦ When failure is reported, very little idea of where the error actually occurred.
◦ TD parser with backtrack returns failure no matter what the error is

 5. Cannot parse ambiguous grammar


 Top down parsing with full backup is brute force method of parsing
 Operations are :
◦ 1. Given a particular non terminal that is to be expanded , first production for this
non terminal is applied.

◦ 2. Within newly expanded string, next(leftmost) nonterminal selected for expansion and its
first production is applied

◦ 3. step 2 is repeated for all subsequent non terminal that are selected until such time as
step 2 cannot be continued

◦ 4. Termination may be a) No non terminal is present i.e string successfully parsed


b) Incorrect expansion

◦ 5. If no other production are available to replace the production that caused error, error
causing expansion is replaced by non terminal itself and process is backed up again to
undo next most recently applied production.
 ex: Consider the grammar Mismatch between second symbol c of input string
S→ aAd | aB and second symbol in sentinel abd
A→ b| c
B → ccd | ddc String :- accd 4.Backup
S

S is the start symbol a A d


1. S
c
2. Select first production Leftmost 2 characters are matched
S
5. Third symbol c is compared to with last symbol d
of current sentential form. Mismatch occurs.
a A d
S
Symbol a matched with string to be parsed
a B
3. Choose first production of A 6. Next production of S is selected
S
S
a A d
a B

b String accepted as it matches. Rarely used


to parse programming language construct c c d
 Top down method of syntax analysis in which a set of recursive procedures
is executed to process the input
 Requires no backtracking (rarely needed)
 Procedure is associated with each non terminal of a grammar( Predictive
parser)
 Elimination of left recursion and left factoring the grammar, results in a
grammar that can be parsed by a recursive descent parser that needs no
backtracking i.e predictive parser
 Lookahead symbol unambiguously determines procedure selected for each
non terminal
 Parser that uses set of recursive procedures to recognize its input with no
backtracking is called recursive descent parser
E → TE’ Procedure TPRIME( )
E’ → +TE’ | ε if input-symbol = ‘*’ then
T → FT’ begin
T’ → * FT’ | ε ADVANCE( );
F → (E ) | id F( );
TPRIME( );
To recognize arithmetic expression
end ;
Procedure E( )
Procedure F( )
begin if input-symbol =‘id’ then
T ( ); ADVANCE( );
EPRIME( ); else if input- symbol = ‘( ‘ then
end; begin
Procedure EPRIME( ) ADVANCE( );
if input-symbol = ‘+’ then E( );
begin if input- symbol = ‘) ‘ then
ADVANCE( ); ADVANCE( );
else ERROR( );
T( );
end ;
EPRIME( ); else ERROR( );
end;
Procedure T( )
begin
F( );
TPRIME( );
end;
input: ID + (ID + ID)

Build parse tree: input


start from start symbol to invoke:
int input (void) expression $
input term rest_expression

expression $ ID

select term → ID (matching input string “ID”)


Next, invoke expression()
Invoke rest_expression()
input
input
expression $
expression $
term rest_expression
term rest_expression
ID + expression
Next, invoke term()
Left Factoring

Ex : Elimination of left factoring

S → iEtS | iEtSeS | a
E→b
Here a =iEtS and α2 = eS
The equivalent non left factored grammar

S → iEtSS’ | a
S’ → eS | ε
E→b
Left Recursion Example
Rule
A-> Aα | β
Then
A -> βA’
A’ -> αA’ | ε
 Preprocessing steps required for predictive parsing
 Predictive parsing associated with two functions associated with grammar
G
 FIRST and FOLLOW :- used by Top down and Bottom up parsers
 Top down parsing FIRST and FOLLOW allows us to choose which
production to apply based on next input symbol
 To compute FIRST set
 If α is any sting of grammar symbols then FIRST(α) be set of terminals that
begins string derived from α
 If α → ε then ε is also in FIRST(α )
 To compute FIRST(X) for all grammar symbols X apply following rules until
no more terminals or ε can be added to FIRST set

◦ 1. If X is terminal then FIRST(X) is {X}


◦ 2. If X is non terminal and X → aα is a production then add a to FIRST(X)
◦ 3. If X→ ε is a production then add ε to FIRST(X)
◦ 4. If X→Y1Y2……Yk is a production then for all i such that all Y1Y2……Yi-1 are
non terminals and FIRST(Yj) contains ε for j = 1,2,……i-1 (i.e Y1Y2……Yi-1 → ε)
then add every non ε symbol in FIRST(Yi) to FIRST(X)
 To compute FOLLOW set
 For non terminal A, FOLLOW(A) to be set of terminals a that appear
immediately to right of A in some sentential form S→ αAaβ for some α and β
 To compute FOLLOW(A) for all non terminals A apply the following rules

◦ 1. $ is in FOLLOW(S) where S is the start symbol

◦ 2. If there is a production A → αBβ then everything in FIRST(β) except for ε is in


FOLLOW(B )

◦ 3. If there is a production A→ αB or a production A → αBβ where FIRST(β)


contains ε then everything in FOLLOW(A) is in FOLLOW(B)
 Example :
E → TE’ terminals = { +, *, (, ) , id , ε }
non-terminals = { E, T, F, E’, T’}
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E ) | id

To compute FIRST
1. FIRST( E)
As E is a non terminal symbol 4th rule has to be applied. Production for E is E→TE’

Rule : If X→Y1Y2……Yk is a production then for all i such that all Y1Y2……Yi-1 are
non terminals and FIRST(Yj) contains ε for j = 1,2,……i-1 (i.e Y1Y2……Yi-1 → ε) th
en add every non ε symbol in FIRST(Yi) to FIRST(X)
FIRST (E ) = FIRST (T)
Again T is non terminal production for T , T→ FT’ is considered
FIRST (T ) = FIRST(F)
Again F is a non terminal consider F→(E)
FIRST(F) = FIRST( ( )
Here ‘( ‘ is a terminal symbol. The FIRST is applied to a terminal is a terminal itself

RULE : If X is terminal then FIRST(X) is {X}


FIRST( ( ) = { ( }
Production for F has another alternative F→ id
So FIRST(F) = FIRST(id) = {id }

FIRST( E) = FIRST( T) = FIRST( F) = { ( , id}


Grammar
As it is not deriving to ε so it’s a resultant set E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E ) | id
Grammar
2. FIRST( E’) E → TE’
E’ is a non terminal symbol. Consider the production E’ → +TE’ | ε E’ → +TE’ | ε
FIRST(E’) = FIRST(+) = {+} T → FT’
T’ → *FT’ | ε
But it has another alternative as E’ → ε F → (E ) | id

RULE : If X→ ε is a production then add ε to FIRST(X)

FIRST( E’) = FIRST(ε) = {ε }


FIRST(E’) = {+ , ε}

3. FIRST(T’)
T’ is a non terminal symbol. Consider the production T’ → *FT’ | ε
FIRST (T’) = FIRST( * ) = {*}
Another alternative production T’ → ε FIRST SET
FIRST( T’) = FIRST(ε) = {ε } FIRST(E) = { ( , id}
FIRST(E’) = { + , ε}
FIRST(T’) = {* , ε}
FIRST(T) ={ ( , id}
FIRST(T’) = { * , ε}
FIRST(F) = { ( , id}
To compute FOLLOW
1. FOLLOW( E)
Check for the presence of non terminal symbol E in right side of production. If present
consider the production
F → ( E)

RULE : If there is a production A → αBβ then everything in FIRST(β) except for ε is


in FOLLOW(B )

Here A = F, α = ( , B = E, β = ) applying FIRST(β) i.e FIRST( ) ) = { ) }


** As it is the start symbol we have to add $

RULE : $ is in FOLLOW(S) where S is the start symbol

Grammar
FOLLOW ( E) = { ) , $} E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E ) | id
2. FOLLOW(E’)
Search for E’ on right side of the production
E → TE’ of the form A → α B where A= E, α = T and B = E’

RULE : If there is a production A→ αB or a production A → αBβ where FIRST(β)


contains ε then everything in FOLLOW(A) is in FOLLOW(B)

As no symbol is present after E’ nothing is followed by E’


Applying FOLLOW(A) i.e FOLLOW(E ) = { ) , $}

FOLLOW( E’) = FOLLOW( E) = { ) , $}


Grammar
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E ) | id
3. FOLLOW(T )
Search for production of T on right side of production
E → TE’ of the form A → αBβ where A = E, α = ε , B = T, β = E’ applying FIRST(β)
i.e FIRST( E’ ) = { + , ε }

RULE : If there is a production A → αBβ then everything in FIRST(β) except for ε is


in FOLLOW(B )

So FOLLOW (T) = { + , ε }
But it contains ε and it should be removed
**As E’ derives to ε i.e E’ → ε then add FOLLOW( E) with the non ε symbols
Now as ε is present add FOLLOW(E ) to the set
RULE : If there is a production A→ αB or a production A → αBβ where FIRST(β)
contains ε then everything in FOLLOW(A) is in FOLLOW(B)
Grammar
FOLLOW( T) = { + , ) , $} E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E ) | id
4. FOLLOW( T’)
Search for T’ on the right side
T → FT’ of the form A → α B where A= T, α = F and B = T’

RULE : If there is a production A→ αB or a production A → αBβ where FIRST(β)


contains ε then everything in FOLLOW(A) is in FOLLOW(B)

FOLLOW (T ) = { + , ) , $}
So
FOLLOW( T’) = FOLLOW(T) = { + , ) , $} Grammar
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E ) | id
5. FOLLOW (F)
Search for F on right side of production
T → FT’ of the form A → αBβ where A = T, α = ε , B = F, β = T’ applying FIRST(β)
i.e FIRST( T’ ) = { * , ε }

RULE : If there is a production A → αBβ then everything in FIRST(β) except for ε is


in FOLLOW(B )

So FOLLOW (T’) = { * , ε }
But it contains ε and it should be removed
**As T’ derives to ε i.e T’ → ε then add FOLLOW( T) with the non ε symbols
Now as ε is present add FOLLOW(T ) to the set
FOLLOW(F) = { + ,*, ) , $}
Grammar
FOLLOW SET E → TE’
Follow(E) = { ) , $} E’ → +TE’ | ε
Follow(E’) = { ) , $} T → FT’
Follow(T) ={ +, ) , $} T’ → *FT’ | ε
Follow(T’) = { +, ) , $} F → (E ) | id
Follow(F) = { *, +, ) , $}
Consider the Grammar
E → E+T | T
T→T*F|F
F → (E ) | id

STEP 1 :Removal of left recursion


E → TE’ terminals = { +, *, (, ) , id , ε }
E’ → +TE’ | ε non-terminals = { E, T, F, E’, T’}
T → FT’
T’ → *FT’ | ε
F → (E ) | id

Step 2 :Compute FIRST SET Step 3 : Compute FOLLOW SET


FIRST(E) = { ( , id} Follow(E) = { ) , $}
FIRST(E’) = { + , ε} Follow(E’) = { ) , $}
FIRST(T) ={ ( , id} Follow(T) ={ +, ) , $}
FIRST(T’) = { * , ε} Follow(T’) = { +, ) , $}
FIRST(F) = { ( , id} Follow(F) = { *, +, ) , $}
 Example : Compute First and Follow for the following grammar
 LL(1)
 No multiple enteries into the parsing table
◦ L :- stands for scanning the input from left to right
◦ L :- stands for producing a leftmost derivation
◦ 1 :- stands for using one symbol of lookahead at each step to make parsing action
decision
 A context-free grammar whose Predict sets are always disjoint (for the
same non-terminal) is said to be LL(1). LL(1) grammars are ideally suited
for top-down parsing because it is always possible to correctly predict the
expansion of any non- terminal. No backup is ever needed.
 Grammar that is ambiguous and has left recursion is not LL(1) grammar
 Grammar G is said to be LL(1) iff there is production like A→α | β and
holds the following condition
◦ 1 . For nonterminal a do both α and β derive strings beginning with a
◦ 2. Atmost one of α and β can derive an empty string
◦ 3. If β =>* ε then α does not derive any string beginning with terminal in
FOLLOW(A)
If α =>* ε then β does not derive any string beginning with a terminal in
FOLLOW(A) .
When a parsing table has multiple defined enteries remedy is to eliminate all left
recursion and left factoring the grammar

 An LL parser is called an LL(k) parser if it uses k tokens of lookahead when


parsing a sentence.
 A grammar is called an LL(k) grammar if an LL(k) parser can be constructed
from it.
 A formal language is called an LL(k) language if it has an LL(k) grammar.
 The set of LL(k) languages is properly contained in that of LL(k+1) languages,
for each k ≥ 0.
LL(1) Grammar Algorithm

Input : Grammar G
Output : Parsing table M
Method : For each production A → α of the grammar do the following

1. For each terminal a in FIRST (α) , add A→ α to M[ A, a]

2. If ε is in FIRST(α) then for each terminal b in FOLLOW(A) , add A→ α


to M[A, b]. If ε is in FIRST (α) and $ is in FOLLOW( A) , add A→ α to
M[A, $ ] as well

3. After performing if no production at all in M[ A, a] then set M[A, a] to


error
FIRST(S) = { a }
FIRST(B) = { b, ε }
FOLLOW (S ) = {$ }
FOLLOW( B) ={ a }
Grammar that is not LL(1)
Why Grammar is not LL(1)
S → iEtSS’ | a
S’ → eS | ε
E→b
FIRST(S) = { i, a} FIRST(S’) = { e, ε } FIRST( E) = {b}
FOLLOW(S ) = FOLLOW( S’) = { $ , …..} FOLLOW(E ) = {t }

PARSING TABLE M

Non terminal Input Symbol


a b e i t $
S S→ a S →iEtSS’
S’ S’ → eS S’ → ε
S’ → ε
E E→b

Two production rules for M[S’, e]


 Entry for M[ S’, e] contains both S’ → eS and S’ →ε .
 Grammar is ambiguous and which production to use when an e(else) is seen
 Resolved choosing S’ →eS
 Grammar can still be parsed with a predictive parser by arbitrarily making
M[ S’, e] = { S’ →eS}
 There is no universal rule by which multiple defined entries can be made single valued
without affecting the language recognized by parser.
 Predictive parser has the capability to predict which production is to be
used to replace the input string
 Predictive parser does not suffer from backtracking
 Predictive parser uses a look ahead pointer which points to the next input
symbols
 To make parser back-tracking free, the predictive parser puts some
constraints on grammar
 It accepts only a class of grammar known as LL( k) grammar
 Predictive parser is also known as LL(1) Parser
string to be parsed followed
by $

Sequence of
grammar symbols

 Predictive parser has an i/p , stack, parsing table and output


 The input contains string to be parsed followed by $, right end marker
 Stack contains sequence of grammar symbols preceded by $, bottom of stack marker
 Stack contains start symbol of the grammar preceded by $ initially
 Parsing table is a 2-D array M[ A, a] where A is a non terminal and a is a terminal or
symbol $
Behavior of parser
➢ controlled by program . Program considers X symbol on top of stack and a the
current input symbol
➢ 1. If X = a = $, parser halts and announces successful completion of parsing
➢ 2. If X = a != $, parser pops X off the stack and advances the input pointer to next
input symbol
➢ If X is a non terminal , the program consults every M[X, a] of parsing table M. This
entry will be either an X- production of the grammar or an error entry
➢ a) if M[X, a] = { X →UVW } parser replaces X on top of stack by WVU
➢ b) if M[X, a] = error , parser calls an error recovery routine

➢ Intially parser configuration is


Stack === $ S
Input === w $

where S is start symbol of grammar and w is the string to be parsed


Algorithm to construct Predictive Parsing table
Input : Grammar G
Output : Parsing table M
Method :
1. For each production A →α of the grammar do steps 2 and 3

2. If FIRST (α ) contains terminal ‘a’ then add A → α to M[ A, a]

3. If FIRST( α ) contains ε then add A → α to M[ A, b] for each terminal b in


FOLLOW (A) . If ε is in FIRST(α) and $ is in FOLLOW ( A) , add A →α to
M[ A, $ ]

4. Make each undefined entry of M be error


 E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → ( E) | id
Applying first and follow function
First( E) = First (T) = First( F) = { ( , id}
First (E’) = { +, ε}
First( T’) ={ *, ε}
Follow( E) =Follow( E’) ={ ), $ }
Follow (T) = Follow ( T’) ={ +, ) , $ }
Follow (F) = { +, * , ) , $ }

Non terminal Input Symbol


Id + * ( ) $
E
E’
T
T’
F
 1. Consider first production of Grammar
E → TE’
First (TE’) = First (T) = { (, id}
First (T) does not contain ε , First( E’) is not considered
As per rule 2 of algorithm since First(TE’ ) = { ( , id} and production E → TE’ is added to
M[ E, ( ] and M[ E, id] for each input symbol

M[ E, ( ] = E → TE’
M[ E, id ] = E → TE’

Non terminal Input Symbol


id + * ( ) $
E E → TE’ E → TE’
E’
T
T’
F
2. Consider second production
E’ →+ TE’ and E’ →ε
In E’ → +TE’ apply First (+TE’) = First(+) = { +}
M[E’ , +] = E’ →+TE’
In E’ → ε apply Follow (E’ ) = { ) , $ }
E ‘ → ε , First is not applied instead Follow is applied to E’
M[ E’ , ) ] = E’ → ε
M[ E’ , $ ] = E’ → ε

Non terminal Input Symbol


id + * ( ) $
E E → TE’ E → TE’
E’ E’ →+ TE’ E’ →ε E’ →ε
T
T’
F
3. Consider third production
T → FT’ apply First (FT’) = First( F) = { ( , id }
M[ T , ( ] = T → FT’
M[ T , id ] = T →FT’

Non terminal Input Symbol


id + * ( ) $
E E → TE’ E → TE’
E’ E’ →+ TE’ E’ →ε E’ →ε
T T →FT’ T → FT’
T’
F
4. Consider second production
T’ →* FT’ and T’ →ε
In T’ →* FT’ apply First (* FT’) = First(*) = { *}
M[T’ , *] = T’ →* FT’
In T’ → ε apply Follow (T’ ) = {+ , ) , $ }
E ‘ → ε , First is not applied instead Follow is applied to E’
M[ T’ , ) ] = T’ → ε
M[ T’ , $ ] = T’ → ε
M[ T’ , + ] = T’ → ε
Non terminal Input Symbol
id + * ( ) $
E E → TE’ E → TE’
E’ E’ →+ TE’ E’ →ε E’ →ε
T T →FT’ T → FT’
T’ T’ → ε T’ →* FT’ T’ → ε T’ → ε
F
5. Consider the last production
F → ( E) and F → id
In F → ( E) apply First ( (E )) = First ( ( ) = { ( }
M[F, ( ] = F →( E)
In F → id apply First(id) = { id }
M [ F , id ] = F → id
**Blanks in parsing table indicates errors . Non blank indicate production which expand the top
non terminal on the stack

Non terminal Input Symbol


id + * ( ) $
E E → TE’ E → TE’
E’ E’ →+ TE’ E’ →ε E’ →ε
T T →FT’ T → FT’
T’ T’ → ε T’ →* FT’ T’ → ε T’ → ε
F F → id F →( E)
 At the beginning stack contains start symbol and a $ at bottom. The i/p string is parsed for
each i/p symbol
1. Top element on stack = E
first i/p symbol = id
parsing table is searched for M[E, id] which gives production E → TE’
E is replaced by TE’ which is pushed on to stack in reverse order i.e ET’ replacing E. Top
element on stack is now the non terminal

2. Top element = T
i/p symbol = id is searched for a production
M[ T, id] = T → FT’
Thus FT’ is pushed replacing T
Top element is F

3. Top element = F
i/p symbol = id
Thus M[F , id ] = F→ id
Terminal id is pushed onto stack replacing F
4. Top element = id
i/p symbol = i/p string = id
Match and are popped out.

i/p string is +id * id $


Thus the production is searched in parsing table for each grammar symbol on the given input
symbol . When $ is top element and i/p string is also $ given i/p string is accepted by the
grammar
 An error is detected during the predictive parsing when the terminal on top of the
stack does not match the next input symbol, or when nonterminal A on top of the
stack, a is the next input symbol, and parsing table entry M[A,a] is empty.

 1. Panic-mode error recovery is based on the idea of skipping symbols on the input
until a token in a selected set of synchronizing tokens.
 Synchronizing sets are chosen in such a way that a parser recovers quickly from
errors.
 Panic mode recovery mechanism is effective

How to select synchronizing set?(Methods)


 A . Place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If
we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it
likely that parsing can continue.
 B . We might add keywords that begins statements to the synchronizing sets for the
nonterminals generating expressions.
 eg : if semicolons terminate statements then keywords that begin next statements
may not be in Follow set of non terminal generating expressions. If ; is missed after
assignment it may result in keyword beginning the next instruction is skipped

 C. If First(A ) is added to synchronizing set it may be possible to resume parsing


according to A if a symbol in First(A ) appears in the input.

 D. If a nonterminal can generate the empty string, then the production deriving 
can be used as a default. This may postpone some error detection, but cannot cause
an error to be missed. This approach reduces the number of nonterminals that have
to be considered during error recovery.

 E. If a terminal on top of stack cannot be matched, a simple idea is to pop the


terminal, issue a message saying that the terminal was inserted.
“synch” indicating synchronizing tokens obtaine
d from FOLLOW set of the nonterminal in
question.
If the parser looks up entry M[A,a]
and finds that it is blank, the input symbol a is
skipped.
If the entry is synch, the the
nonterminal on top of the stack is popped.
If a token on top of the stack does not
match the input symbol, then we pop the token fr
om the stack.
Parsing and error recovery moves in predictive parser
 2. Phrase level recovery

 Blank entries in predictive parsing tables are filled with pointers to error routines
 These error routines may change , insert or delete symbols on the input and issue
appropriate error messages
RECURSIVE PREDICTIVE DESCENT NON-RECURSIVE PREDICTIVE
PARSER DESCENT PARSER

It is a technique which may or may not require It is a technique that does not require any
backtracking process. kind of back tracking.

It uses procedures for every non terminal entity It finds out productions to use by replacing
to parse strings. input string.

It is a type of top-down parsing built from a set


It is a type of top-down approach, which is
of mutually recursive procedures where each
also a type of recursive parsing that does
procedure implements one of non-terminal s of
not uses technique of backtracking.
grammar.

The predictive parser uses a look ahead


pointer which points to next input symbols
It contains several small small functions one for
to make it parser back tracking free,
each non- terminals in grammar.
predictive parser puts some constraints on
grammar.

It accepts only a class of grammar known as


It accepts all kinds of grammars.
LL(k) grammar.
TOP DOWN PARSING BOTTOM UP PARSING

It is a parsing strategy that first looks It is a parsing strategy that first looks at the
at the highest level of the parse tree lowest level of the parse tree and works up
and works down the parse tree by the parse tree by using the rules of
using the rules of grammar. grammar.

Top-down parsing attempts to find Bottom-up parsing can be defined as an


the left most derivations for an input attempts to reduce the input string to start
string. symbol of a grammar.

In this parsing technique we start In this parsing technique we start parsing


parsing from top (start symbol of from bottom (leaf node of parse tree) to up
parse tree) to down (the leaf node of (the start symbol of parse tree) in bottom-up
parse tree) in top-down manner. manner.

This parsing technique uses Left This parsing technique uses Right Most
Most Derivation. Derivation.

It’s main decision is to select what It’s main decision is to select when to use a
production rule to use in order to production rule to reduce the string to get
construct the string. the starting symbol.

You might also like