0% found this document useful (0 votes)
44 views94 pages

Ch3 - Syntax Analysis

Uploaded by

ezirayallew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views94 pages

Ch3 - Syntax Analysis

Uploaded by

ezirayallew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Chapter 3

Syntax Analysis

Chapter – 3 : Syntax Analysis 1 Bahir Dar Institute of Technology


Contents (Session-1)
▪ Introduction
▪ Context-free grammar
▪ Derivation
▪ Parse Tree
▪ Ambiguity
• Resolving Ambiguity
▪ Immediate & Indirect Left Recursion
• Eliminating Immediate & Indirect Left Recursion

▪ Left Factoring
▪ Context-Free Grammars versus Regular Expressions
Chapter – 3 : Syntax Analysis 2 Bahir Dar Institute of Technology
Introduction
▪ Syntax analysis is the second phase of the compiler.
▪ The parser takes the token produced by lexical analysis and
builds the syntax tree (parse tree).
▪ The syntax tree can be easily constructed from Context-Free
Grammar.
▪ The parser reports syntax errors in an intelligible/understandable
fashion and recovers from commonly occurring errors to
continue processing the remainder of the program.
▪ The process of syntax analysis is performed using syntax
analyzer/parser.

The goal of the parser is to determine the


syntactic validity of a source string is valid
, a tree is built for use by the subsequent
phases of the compiler.
Chapter – 3 : Syntax Analysis 3 Bahir Dar Institute of Technology
Role of the syntax analyzer/Parser:
▪ It verifies the structure generated by the tokens based on the
grammar
▪ Parser builds the parse tree.
▪ Parser Performs context free syntax analysis.
▪ Parser helps to construct intermediate code.
▪ Parser produces appropriate error messages.
▪ Parser attempts to correct/recover few errors.
❖ Some Issues related to parser :
Parser cannot detect errors such as:
1. Variable re-declaration
2. Variable initialization before use.
3. Data type mismatch for an operation.
The above issues are handled by Semantic Analysis phase.

Chapter – 3 : Syntax Analysis 4 Bahir Dar Institute of Technology


Some Issues related to parser:
▪ Syntax error handling :
Programs can contain errors at many different levels. For
example :
1. Lexical, such as misspelling a keyword.
2. Syntactic, such as an arithmetic expression with unbalanced
parentheses.
3. Semantic, such as an operator applied to an incompatible
operand.
4. Logical, such as an infinitely recursive call.
Functions of error handler :
1. It should report the presence of errors clearly and accurately.
2. It should recover from each error quickly enough to be able
to detect subsequent errors.
3. It should not significantly slow down the processing of
correct programs.

Chapter – 3 : Syntax Analysis 5 Bahir Dar Institute of Technology


Error recovery strategies:
▪ The different strategies that a parse uses to recover from a
syntactic error are:
▪ Panic mode recovery: On discovering an error, the parser
discards input symbols one at a time until a synchronizing
token is found. The synchronizing tokens are usually
delimiters, such as semicolon or end. It has the advantage
of simplicity and does not go into an infinite loop. When
multiple errors in the same statement are rare, this method
is quite useful.

▪ Phrase level recovery: On discovering an error, the


parser performs local correction on the remaining input
that allows it to continue. Example: Insert a missing
semicolon or delete an extraneous semicolon etc.

Chapter – 3 : Syntax Analysis 6 Bahir Dar Institute of Technology


Error recovery strategies:
▪ Error productions: The parser is constructed using
augmented grammar with error productions. If an error
production is used by the parser, appropriate error
diagnostics can be generated to indicate the erroneous
constructs recognized by the input.

▪ Global correction: Given an incorrect input string x and


grammar G, certain algorithms can be used to find a parse
tree for a string y, such that the number of insertions,
deletions and changes of tokens is as small as possible.
However, these methods are in general too costly in terms
of time and space.

Chapter – 3 : Syntax Analysis 7 Bahir Dar Institute of Technology


Types of parsers for grammars:
▪ Universal parsers
Universal parsing methods such as the Cocke-Younger-
Kasami algorithm and Earley's algorithm can parse any
grammar.
▪ These general methods are too inefficient to use in
production.

▪ This method is not commonly used in compilers.

▪ Top-down parsers Top-down methods build parse trees


from the top (root) to the bottom (leaves).

▪ Bottom-up parsers Bottom-up methods start from the


leaves and work their way up to the root.

Chapter – 3 : Syntax Analysis 8 Bahir Dar Institute of Technology


Context Free Grammars(CFG)
▪ CFG is used to specify the structure of legal programs.
• The design of the grammar is an initial phase of the design
of a programming language.
▪ Formally a CFG G = (V, Ʃ,S,P), where:
• V = non-terminals, are variables that denote sets of (sub)strings
occurring in the language. These impose a structure on the
grammar.
• Ʃ is the set of terminal symbols in the grammar
(i.e., the set of tokens returned by the scanner)
• S is the start/goal symbol, a distinguished non-terminal in V
denoting the entire set of strings in L(G).
• P is a finite set of productions specifying how terminals and non-
terminals can be combined to form strings in the language.
Each production must have a single non-terminal on its left hand
side.
▪ The set V = V  Ʃ is called the vocabulary of G
Chapter – 3 : Syntax Analysis 9 Bahir Dar Institute of Technology
Context Free Grammars(CFG)…
▪ Example (G1):
E→ E+E | E–E | E*E | E/E | -E
E→ (E)
E → id
• Where
• Vt = {+, -, *, / (,), id}, Vn = {E}
• S = {E}
• Production are shown above
• Sometimes → can be replaced by ::=
▪ CFG is more expressive than RE - Every language that can be
described by regular expressions can also be described by a CFG
• L = {𝑎𝑛𝑏𝑛 | 𝑛 >= 1}, is an example language that can be expressed by CFG
but not by RE
▪ Context-free grammar is sufficient to describe most programming
languages.

Chapter – 3 : Syntax Analysis 10 Bahir Dar Institute of Technology


Derivation
▪ A sequence of replacements of non-terminal symbols to obtain
strings/sentences is called a derivation
• If we have a grammar E → E+E then we can replace E by
E+E
• In general a derivation step is A   if there is a
production rule A→ in a grammar
• where  and  are arbitrary strings of terminal and non-terminal
symbols
▪ Derivation of a string should start from a production with start
symbol in the left

S   is a sentential form (terminals & non-terminals


Mixed)

 is a sentence if it contains only terminal symbols


Chapter – 3 : Syntax Analysis 11 Bahir Dar Institute of Technology
Derivation…
▪ The derivations are classified into two types based on the order of
replacement of production. They are:
❖Leftmost derivation (LMD): If the leftmost non-terminal is
replaced by its production in derivation, then it called leftmost
derivation.
❖Rightmost derivation (RMD): If the rightmost non-terminal is
replaced by its production in derivation, then it called rightmost
derivation.
▪ Egg. given grammar G : E → E+E | E*E | ( E ) | - E | id
▪ LMD for - ( id + id )

▪ RMD for - ( id + id )

Chapter – 3 : Syntax Analysis 12 Bahir Dar Institute of Technology


Parse Tree
▪ Parsing is the process of analyzing a continuous stream
of input in order to determine its grammatical structure
with respect to a given formal grammar.
▪ A parse tree is a graphical representation of a derivation.
It is convenient to see how strings are derived from the
start symbol. The start symbol of the derivation becomes
the root of the parse tree.
• Inner nodes of a parse tree are non-terminal symbols.
• The leaves of a parse tree are terminal symbols.

Chapter – 3 : Syntax Analysis 13 Bahir Dar Institute of Technology


Parse Tree
▪ Example

E  -E E
 -(E) E
 -(E+E) E
- E - E - E

( E ) ( E )

E E E + E
- E - E
 -(id+E)  -(id+id)
( E ) ( E )

E + E E + E

id id id

Chapter – 3 : Syntax Analysis 14 Bahir Dar Institute of Technology


Ambiguity

▪ An ambiguous grammar is one that produces more than


one LMD or more than one RMD for the same sentence.
E  E+E
 id+E E  E*E
 id+E*E  E+E*E
 id+E*E
 id+id*E
 id+id*E
 id+id*id  id+id*id
E
E
E * E
E + E

id * E + E id
E E
id id
id id

Chapter – 3 : Syntax Analysis 15 Bahir Dar Institute of Technology


Ambiguity

▪ For the most parsers, the grammar must be unambiguous.


▪ If a grammar unambiguous grammar then there are
unique selection of the parse tree for a sentence
• We should eliminate the ambiguity in the grammar
during the design phase of the compiler.
▪ An unambiguous grammar should be written to eliminate
the ambiguity.
• We have to prefer one of the parse trees of a sentence
(generated by an ambiguous grammar) to
disambiguate that grammar to restrict to this choice.

Chapter – 3 : Syntax Analysis 16 Bahir Dar Institute of Technology


Left Recursion
▪ A grammar is left recursive if it has a non-terminal A such that
there is a derivation.
A  A for some string 
▪ Top-down parsing techniques cannot handle left-recursive
grammars.
▪ So, we have to convert our left-recursive grammar into an
equivalent grammar which is not left-recursive.
▪ Two types of left-recursion
• immediate left-recursion - appear in a single step of the
derivation (),
• Indirect left-recursion - appear in more than one step of the
derivation.

Chapter – 3 : Syntax Analysis 17 Bahir Dar Institute of Technology


Context-Free Grammars versus Regular Expressions
▪ Every regular language is a context-free language, but not vice-
versa.
▪ Example: The grammar for regular expression (a|b)*abb

▪ Describe the same language, the set of strings of a's and b's ending
in abb. So we can easily describe these languages either by finite
Automata or PDA.
▪ On the other hand, the language L ={anbn | n ≥1} with an equal
number of a's and b's is a prototypical example of a language that
can be described by a grammar but not by a regular expression.
▪ We can say that "finite automata cannot count" meaning that a
finite automaton cannot accept a language like {anbn | n ≥ 1} that
would require it to keep count of the number of a's before it sees
the b’s.
▪ So these kinds of languages (Context-Free Grammars) are accepted
by PDA as PDA uses stack as its memory.
Chapter – 3 : Syntax Analysis 18 Bahir Dar Institute of Technology
Context-Free Grammars versus Regular Expressions
▪ The general comparison of Regular Expressions vs. Context-
Free Grammars:

Chapter – 3 : Syntax Analysis 19 Bahir Dar Institute of Technology


Parsing
▪ Top Down Parsing
▪ Recursive-Descent Parsing
▪ Predictive Parser
• Recursive Predictive Parsing
• Non-Recursive Predictive Parsing
▪ LL(1) Parser – Parser Actions
▪ Constructing LL(1) - Parsing Tables
▪ Computing FIRST and FOLLOW functions
▪ LL(1) Grammars
▪ Properties of LL(1) Grammars
Chapter – 3 : Syntax Analysis 20 Bahir Dar Institute of Technology
Parsing methods
Parsing

Top down parsing Bottom up parsing (Shift reduce)

Recursive descent
Involves Back tracking Operator precedence

predictive parsing
Parsing without LR parsing
backtracking
SLR
Recursive
predictive
CLR
Non-Recursive
predictive
Or LL(1) LALR

Chapter – 3 : Syntax Analysis 21 Bahir Dar Institute of Technology


Top Down Parsing
▪ Top-down parsing involves constructing a parse tree for the input
string, starting from the root
• Basically, top-down parsing can be viewed as finding a leftmost
derivation (LMD) for an input string.
▪ How it works? Start with the tree of one node labeled with the start
symbol and repeat the following steps until the fringe of the parse tree
matches the input string
• 1. At a node labeled A, select a production with A on its LHS and
for each symbol on its RHS, construct the appropriate child
• 2. When a terminal is added to the fringe that doesn't match the
input string, backtrack
• 3. Find the next node to be expanded
▪ ! Minimize the number of backtracks as much as possible

Chapter – 3 : Syntax Analysis 22 Bahir Dar Institute of Technology


Top Down Parsing…
▪ Two types of top-down parsing
• Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does
not work, we backtrack to try other alternatives.)
• It is a general parsing technique, but not widely used
because it is not efficient
• Predictive Parsing
• no backtracking and hence efficient
• needs a special form of grammars (LL(1) grammars).
• Two types
– Recursive Predictive Parsing is a special form of
Recursive Descent Parsing without backtracking.
– Non-Recursive (Table Driven) Predictive Parser is also
known as LL(1) parser.

Chapter – 3 : Syntax Analysis 23 Bahir Dar Institute of Technology


Recursive-Descent Parsing
▪ As the name indicates, recursive descent uses recursive functions
to implement predictive parsing.
▪ It tries to find the left-most derivation.
▪ Backtracking is needed
▪ Example
S → aBc a B c
B → bc | b
▪ input: abc

Chapter – 3 : Syntax Analysis 24 Bahir Dar Institute of Technology


Recursive-Descent Parsing
▪ A left-recursive grammar can cause a recursive-descent parser
to go into an infinite loop.
▪ Hence, elimination of left-recursion must be done before
parsing.
▪ Consider the grammar for arithmetic expressions
Example

After eliminating the left-recursion the grammar becomes,


E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id

Chapter – 3 : Syntax Analysis 25 Bahir Dar Institute of Technology


Recursive-Descent Parsing

▪ Then write the recursive procedure for grammar as follows:


Procedure E() begin
Remember a
begin T( ); If input_symbol=’*’ then
grammar
EPRIME( ); ADVANCE( );
end Procedure F( ); TPRIME( );
Procedure EPRIME( ) end
begin Procedure F( )
If input_symbol=’+’ then begin
ADVANCE( ); If input-symbol=’id’ then
T( ); ADVANCE( );
EPRIME( ); else if input-symbol=’(‘
end then
Procedure T( ) ADVANCE( );
begin E( );
F( ); else if input-symbol=’)’
TPRIME( ); then ADVANCE( );
end end
Procedure TPRIME( ) else ERROR( );
Chapter – 3 : Syntax Analysis 26 Bahir Dar Institute of Technology
Predictive Parser
• Predictive parsers always build the syntax tree from the root down
to the leaves and are hence also called (deterministic) top-down
parsers
Predictive Parsing can be recursive or non-recursive
• In recursive predictive parsing, each non-terminal corresponds to
a procedure/function.

Chapter – 3 : Syntax Analysis 27 Bahir Dar Institute of Technology


Recursive Predictive Parsing…
▪ When to apply -productions?
A → aA | bB | 
• If all other productions fail, we should apply an  -
production.
• For example, if the current token is not a or b, we may
apply the -production.

• Most correct choice: We should apply an  -production for


a non-terminal A when the current token is in the follow
set of A (which terminals can follow A in the sentential
forms).

Chapter – 3 : Syntax Analysis 28 Bahir Dar Institute of Technology


Recursive Predictive Parsing…

Chapter – 3 : Syntax Analysis 29 Bahir Dar Institute of Technology


Non-Recursive Predictive Parsing
▪ A non-recursive predictive parser can be built by
maintaining a stack explicitly, rather than implicitly via
recursive calls
▪ Non-Recursive predictive parsing is a table-driven top-down
parser.

Model of a table-driven predictive parser

Chapter – 3 : Syntax Analysis 30 Bahir Dar Institute of Technology


Non-Recursive Predictive Parsing…
▪ Input buffer
• our string to be parsed. We will assume that its end is
marked with a special symbol $.
▪ Output
• a production rule representing a step of the derivation
sequence (left-most derivation) of the string in the input
buffer.
▪ Stack
• contains the grammar symbols
• at the bottom of the stack, there is a special end marker
symbol $.
• initially the stack contains only the symbol $ and the
starting symbol S.
• when the stack is emptied (i.e. only $ left in the stack), the
parsing is completed.
Chapter – 3 : Syntax Analysis 31 Bahir Dar Institute of Technology
Non-Recursive Predictive Parsing…
▪ Parsing table
• a two-dimensional array M[A,a]
• each row is a non-terminal symbol
• each column is a terminal symbol or the special
symbol $
• each entry holds a production rule.

Chapter – 3 : Syntax Analysis 32 Bahir Dar Institute of Technology


Non-Recursive Predictive /LL(1) Parser – Parser Actions
▪ The symbol at the top of the stack (say X) and the current symbol
in the input string (say a) determine the parser action.
▪ There are four possible parser actions.
1. If X and a are $ ➔ parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
➔ parser pops X from the stack, and moves the next symbol in the input
buffer.
3. If X is a non-terminal
➔ parser looks at the parsing table entry M[X,a]. If M[X,a] holds a
production rule X→Y1Y2...Yk, it pops X from the stack and pushes Yk,Yk-
1,...,Y1 into the stack. The parser also outputs the production rule
X→Y1Y2...Yk to represent a step of the derivation.
4. none of the above ➔ error
• all empty entries in the parsing table are errors.
• If X is a terminal symbol different from a, this is also an
error case.

Chapter – 3 : Syntax Analysis 33 Bahir Dar Institute of Technology


Non-Recursive Predictive /LL(1) Parser – Parser Actions
General algorithm of LL (1) parser:
METHOD: Initially, the parser is in a configuration with w$ in the
input buffer and the start symbol S of G on top of the stack, above $.
The following procedure uses the predictive parsing table M to
produce a predictive parse for the input.

Chapter – 3 : Syntax Analysis 34 Bahir Dar Institute of Technology


LL(1) Parser – Example1
S → aBa a b $
LL(1) Parsing
B → bB |  S S → aBa Table

B B→ B → bB Assume current


string is: abba
stack input output
$S abba$ S → aBa
We will see how
$aBa abba$
to construct
$aB bba$ B → bB
parsing table
$aBb bba$ next
$aB ba$ B → bB
$aBb ba$
$aB a$ B→
$a a$
$ $ accept, successful completion

Chapter – 3 : Syntax Analysis 35 Bahir Dar Institute of Technology


LL(1) Parser – Example2
E → TE’
E’ → +TE’ | 
E is start symbol
T → FT’
T’ → *FT’ | 
F → (E) | id
id + * ( ) $
E E→ E → TE’
TE’
E’ E’ → +TE’ E’ →  E’ → 
T T→ T → FT’
FT’
T’ T’ →  T’ → *FT’ T’ →  T’ → 
F F → id F → (E)
Chapter – 3 : Syntax Analysis 36 Bahir Dar Institute of Technology
LL(1) Parser – Example2…
Parse the input stack input output
id+id $E id+id$ E → TE’
Notice: the parsing
$E’T id+id$ T → FT’
is done by looking
the parse table $E’ T’F id+id$ F → id
$ E’ T’id id+id$
$ E’ T’ +id$ T’ → 
$ E’ +id$ E’ → +TE’
$ E’ T+ +id$
$ E’ T id$ T → FT’
$ E’ T’ F id$ F → id
$ E’ T’id id$
$ E’ T’ $ T’ → 
$ E’ $ E’ → 
$ $ accept

Chapter – 3 : Syntax Analysis 37 Bahir Dar Institute of Technology


non-recursive predictive /LL(1) Parsing steps
▪ Steps to be involved in non-recursive predictive /LL(1)
Parsing Method:
1. Stack is pushed with $.
2. Construction of parsing table T.
1. Computation of FIRST set.
2. Computation of FOLLOW set.
3. Making entries into the parsing table.
3. Parsing by the help of parsing routine.

Step 1: pushing $ to Stack and ready to start parsing


process.

Chapter – 3 : Syntax Analysis 38 Bahir Dar Institute of Technology


Step 2: Constructing LL(1) Parsing Tables
▪ Two functions are used in the construction of LL(1) parsing tables:
FIRST and FOLLOW.
▪ These can provide (with their sets) the actual position of any
terminal in the derivation.

▪ FIRST() is a set of the terminal symbols which occur as first


symbols in strings derived from  where  is any string of
grammar symbols.
• if  derives to , then  is also in FIRST() .

▪ FOLLOW(A) is the set of the terminals which occur immediately


after the non-terminal A in the strings derived from the starting
symbol.
• a terminal a is in FOLLOW(A) if S  Aa

Chapter – 3 : Syntax Analysis 39 Bahir Dar Institute of Technology


Compute FIRST for a String X

1. If X is a terminal symbol, then FIRST(X)={X}

2. If X is , then FIRST(X)={}
3. If X is a non-terminal symbol and X →  is a
production rule, then add  in FIRST(X).

4. If X is a non-terminal symbol and X → Y1Y2..Yn is a


production rule, then
• if a terminal a in FIRST(Yi) and  is in all
FIRST(Yj) for j=1,...,i-1, then a is in FIRST(X).
• if  is in all FIRST(Yj) for j=1,...,n, then  is in
FIRST(X).

Chapter – 3 : Syntax Analysis 40 Bahir Dar Institute of Technology


Compute FIRST for a String X…
Example
E → TE’
E’ → +TE’ | 
T → FT’
T’ → *FT’| 
F → (E) | id
FIRST(E’) = {+, }
From Rule 1 FIRST(E) = {(,id}
FIRST(id) = {id}
From Rule 2 Others
FIRST() = {} FIRST(TE’) = {(,id}
From Rule 3 and 4 FIRST(+TE’ ) = {+}
First(F) = {(, id} FIRST(FT’) = {(,id}
First(T’) = {*, } FIRST(*FT’) = {*}
FIRST(T) = {(,id} FIRST((E)) = {(}

Chapter – 3 : Syntax Analysis 41 Bahir Dar Institute of Technology


Compute FOLLOW (for non-terminals)

1. $ is in FOLLOW(S), if S is the start symbol


2. Look at the occurrence of a non‐terminal on the RHS of a
production which is followed by something
• if A → B is a production rule, then everything in FIRST()
except  is FOLLOW(B)
3. Look at B on the RHS that is not followed by anything
• If ( A → B is a production rule ) or ( A → B is a
production rule and  is in FIRST() ), then everything in
FOLLOW(A) is in FOLLOW(B).

We apply these rules until nothing more can be added to any


follow set.
Chapter – 3 : Syntax Analysis 42 Bahir Dar Institute of Technology
Compute FOLLOW (for non-terminals)
Example
i. E → TE’
ii. E’ → +TE’ | 
iii. T → FT’
iv. T’ → *FT’ | 
v. F → (E) | id
FOLLOW(E) = { $, ) }, because
• From first rule Follow (E) contains $
• From Rule 2 Follow(E) is first()), from the production F → (E)
FOLLOW(E’) = { $, ) } …. Rule 3
FOLLOW(T) = { +, ), $ }
• From Rule 2, i.e. First(E’) = + is in FOLLOW(T)
• From Rule 3 Everything in Follow(E) is in Follow(T) since First(E’)
contains 
FOLLOW(F) = {+, *, ), $ } …same reasoning as above
FOLLOW(T’) = { +, ), $ } ….Rule3
Chapter – 3 : Syntax Analysis 43 Bahir Dar Institute of Technology
Constructing LL(1) Parsing Table -- Algorithm
▪ For each production rule A →  of a grammar G
1. for each terminal a in FIRST()
➔ add A →  to M[A,a]

2. If  in FIRST()
➔ for each terminal a in FOLLOW(A) add A →  to M[A,a]

3. If  in FIRST() and $ in FOLLOW(A)


➔ add A →  to M[A,$]

▪ All other undefined entries of the parsing table are error entries.

Chapter – 3 : Syntax Analysis 44 Bahir Dar Institute of Technology


Constructing LL(1) Parsing Table -- Example
The grammmar: Step 2: compute FIRST and FOLLOW
E→E+T | T First( ) : Follow( ):
T→T*F | F FIRST(E) = { ( , id} FOLLOW(E) = { $, ) }
F→(E) | id FIRST(E’) ={+ , ε } FOLLOW(E’) = { $, ) }
Step 1: Remove left recursion FIRST(T) = { ( , id} FOLLOW(T) = { +, $, ) }
E→TE’ FIRST(T’) = {*, ε } FOLLOW(T’) = { +, $, ) }
E’→+TE’ | ϵ FIRST(F) = { ( , id } FOLLOW(F) = {+, * , $ , ) }
T→FT’ Step 3: constructing predictive parsing tree
T’→*FT’ | ϵ
F→(E) | id

How it works example: M[A, a]

Chapter – 3 : Syntax Analysis 45 Bahir Dar Institute of Technology


Constructing LL(1) Parsing Table -- Example
Stack implementation to parse string id+id*id$ using the above parsing

Chapter – 3 : Syntax Analysis 46 Bahir Dar Institute of Technology


Constructing LL(1) Parsing Table -- Example
Example 2: Consider this following grammar:
S → iEtS | iEtSeS | a
E→b
After eliminating left factoring, we have ε is presented here since the
S → iEtSS’ | a production S → iEtS has not
S’→ eS | ε other non-terminal
E→b
To construct a parsing table, we need FIRST() and FOLLOW() for all the non
terminals.
FIRST(S) = { i, a }
FIRST(S’) = {e, ε }
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}
And the Parsing table is:

Chapter – 3 : Syntax Analysis 47 Bahir Dar Institute of Technology


LL(1) Grammars
▪ A grammar whose parsing table has no multiple-defined entries is said to
be LL(1) grammar.
• First L refers input scanned from left, the second L refers
left-most derivation and 1 refers one input symbol used as a
look-head symbol do determine parser action input scanned from left
to right
▪ A grammar G is LL(1) if and only if the following conditions hold for
two distinctive production rules A →  and A → 
1. Both  and  cannot derive strings starting with same terminals.
2. At most one of  and  can derive to .
3. If  can derive to , then  cannot derive to any string starting with a terminal
in FOLLOW(A).
▪ From 1 & 2, we can say that First( ) I First() = 0
▪ From 3, means that if  is in First(), then First( ) I Follow(A) = 0
and the like

Chapter – 3 : Syntax Analysis 48 Bahir Dar Institute of Technology


A Grammar which is not LL(1)

▪ The parsing table of a grammar may contain more than one


production rule.
• In this case, we say that it is not a LL(1) grammar.
S→iCtSE | a
E→eS | 
C→b
a b e i t $
FIRST(iCtSE) = {i}
FIRST(a) = {a} S S→a S → iCtSE
FIRST(eS) = {e}
FIRST() = {} E E→eS E→
FIRST(b) = {b} E→
FOLLOW(S) = { $,e } C→b
FOLLOW(E) = { $,e } C
FOLLOW(C) = { t } two production rules for M[E,e]
Problem ➔ ambiguity

Chapter – 3 : Syntax Analysis 49 Bahir Dar Institute of Technology


A Grammar which is not LL(1)
▪ What do we have to do if the resulting parsing table contains multiply
defined entries?
• Eliminate left recursion in the grammar, if it is not eliminated
• A → A | 
➔ any terminal that appears in FIRST() also appears
FIRST(A) because A  .
➔ If  is , any terminal that appears in FIRST() also
appears in FIRST(A) and FOLLOW(A).
• Left factor the grammar, if it is not left factored.
• A grammar is not left factored, it cannot be a LL(1) grammar:
• A → 1 | 2
➔any terminal that appears in FIRST(1) also appears
in FIRST(2)
• If its (new grammar’s) parsing table still contains multiply defined entries, that
grammar is ambiguous or it is inherently not a LL(1) grammar.
• An ambiguous grammar cannot be a LL(1) grammar.
Chapter – 3 : Syntax Analysis 50 Bahir Dar Institute of Technology
Error Recovery in Predictive Parsing
▪ An error may occur in the predictive parsing (LL(1) parsing)
• if the terminal symbol on the top of stack does not match
with the current input symbol.
• if the top of stack is a non-terminal A, the current input
symbol is a,
• the parsing table entry M[A,a] is empty.

▪ What should the parser do in an error case?


• The parser should be able to give an error message (as
much as possible meaningful error message).
• It should recover from that error case, and it should be able
to continue the parsing with the rest of the input.

Chapter – 3 : Syntax Analysis 51 Bahir Dar Institute of Technology


Contents (Session-2)
▪ Bottom Up Parsing
▪ Handle Pruning
▪ Implementation of A Shift-Reduce Parser
▪ LR Parsers
▪ LR Parsing Algorithm
▪ Actions of an LR-Parser
▪ Constructing SLR Parsing Tables
▪ SLR(1) Grammar
▪ Error Recovery in LR Parsing

Chapter – 3 : Syntax Analysis 52 Bahir Dar Institute of Technology


Bottom-Up Parsing
▪ A bottom-up parser creates the parse tree of the given
input starting from leaves towards the root.
• A bottom-up parser tries to find the RMD of the given input in the
reverse order.
▪ Bottom-up parsing is also known as shift-reduce parsing
because its two main actions are shift and reduce.
• At each shift action, the current symbol in the input string is
pushed to a stack.
• At each reduction step, the symbols at the top of the stack will be
replaced by the non-terminal at the left side of that production.
• Accept: Successful completion of parsing.
• Error: Parser discovers a syntax error, and calls an error recovery
routine.

Chapter – 3 : Syntax Analysis 53 Bahir Dar Institute of Technology


Bottom-Up Parsing…
▪ A shift-reduce parser tries to reduce the given input string into
the starting symbol.
a string ➔ the starting symbol
reduced to

▪ At each reduction step, a substring of the input matching to the


right side of a production rule is replaced by the non-terminal
at the left side of that production rule.
▪ If the substring is chosen correctly, the right most derivation of
that string is created in the reverse order.
Rightmost Derivation: S
rm

Shift-Reduce Parser finds:   ...  S


rm rm

Chapter – 3 : Syntax Analysis 54 Bahir Dar Institute of Technology


Shift-Reduce Parsing -- Example

S → aABb input string: aaabb


A → aA | a aaAbb
B → bB | b aAbb  reduction
aABb
S
S  aABb  aAbb  aaAbb  aaabb

Right Sentential Forms

▪ How do we know which substring to be replaced at each


reduction step?
Chapter – 3 : Syntax Analysis 55 Bahir Dar Institute of Technology
Handle & Handle pruning
▪ Handle: A “handle” of a string is a substring of the string that matches the right
side of a production, and whose reduction to the non terminal of the production is
one step along the reverse of rightmost derivation.
▪ Handle pruning: The process of discovering a handle and reducing it to
appropriate left hand side non terminal is known as handle pruning.

E→E+E
E→E*E String: id1+id2*id3
E→id
Right sentential Handle Production
Rightmost Derivation form
E id1+id2*id3 id1 E→id
E+E E+id2*id3 id2 E→id
E+E*E E+E*id3 id3 E→id
E+E*id3 E+E*E E*E E→E*E
E+id2*id3 E+E E+E E→E+E
id1+id2*id3 E

Chapter – 3 : Syntax Analysis 56 Bahir Dar Institute of Technology


Handle Pruning

▪ A right-most derivation in reverse can be obtained by handle-


pruning.

S  0  1 rm
 2  ...  n-1  n= 
rm rm rm rm
input string

▪ Start from n, find a handle An→n in n, and replace n by An to


get n-1.
▪ Then find a handle An-1→n-1 in n-1, and replace n-1 in by An-1 to
get n-2.
▪ Repeat this, until we reach S.

Chapter – 3 : Syntax Analysis 57 Bahir Dar Institute of Technology


Shift reduce parser
▪ Shift-reduce parsing is a type of bottom-up parsing that attempts to
construct a parse tree for an input string beginning at the leaves (the
bottom) and working up towards the root (the top).
▪ The shift reduce parser performs following basic operations/actions:
1. Shift: Moving of the symbols from input buffer onto the stack, this
action is called shift.
2. Reduce: If handle appears on the top of the stack then reduction of it
by appropriate rule is done. This action is called reduce action.
3. Accept: If stack contains start symbol only and input buffer is empty
at the same time then that action is called accept.
4. Error: A situation in which parser cannot either shift or reduce the
symbols, it cannot even perform accept action then it is called error
action.

Chapter – 3 : Syntax Analysis 58 Bahir Dar Institute of Technology


A Shift-Reduce Parser - example
E → E+T | T Right-Most Derivation of id+id*id
T → T*F | F E  E+T  E+T*F  E+T*id  E+F*id
F → (E) | id  E+id*id  T+id*id  F+id*id 
id+id*id

Right-Most Sentential Form Reducing Production


id+id*id F → id
F+id*id T→F
T+id*id E→T
E+id*id F → id
E+F*id T→F
E+T*id F → id
E+T*F T → T*F
E+T E → E+T
E
Handles are red and underlined in the right-sentential
forms.

Chapter – 3 : Syntax Analysis 59 Bahir Dar Institute of Technology


A Stack Implementation of A Shift-Reduce parsorar
Initial stack just contains
Stack Input Action only the end-marker $
$ id+id*id$ shift & the end of the input
$id +id*id$ reduce by F → id string is marked by the
$F +id*id$ reduce by T → F end-marker $.
$T +id*id$ reduce by E → T Parse Tree
$E +id*id$ shift
$E+ id*id$ shift
$E+id *id$ reduce by F → id
$E+F *id$ reduce by T → F
$E+T *id$ shift
$E+T* id$ shift
$E+T*id $ reduce by F → id
$E+T*F $ reduce by T → T*F
$E+T $ reduce by E → E+T
$E $ accept

Chapter – 3 : Syntax Analysis 60 Bahir Dar Institute of Technology


Conflicts in shift-reduce parsing
▪ There are two conflicts that occur in shift-reduce parsing:
▪ Shift-reduce conflict: The parser cannot decide whether to shift or to
reduce.
▪ Example: Consider the input id+id*id generated from the grammar G:
E->E+E | E*E | id

▪ NB: If a shift-reduce parser cannot be used for a grammar, that grammar is called
non-LR(k) grammar. An ambiguous grammar can never be an LR grammar.

Chapter – 3 : Syntax Analysis 61 Bahir Dar Institute of Technology


Conflicts in shift-reduce parsing …
▪ Reduce-reduce conflict: The parser cannot decide which of
several reductions to make.
Example: Consider the input c+c generated from the grammar:
M → R+R | R+c | R, R→c

Chapter – 3 : Syntax Analysis 62 Bahir Dar Institute of Technology


Operator precedence parser
▪ An efficient way of constructing shift-reduce parser is called
operator-precedence parsing.
▪ Operator precedence parser can be constructed from a grammar
called Operator-grammar.
▪ Operator Grammar: A Grammar in which there is no  in RHS of
any production or no adjacent non terminals is called operator
grammar.
E→ EAE | (E) | id
▪ Example: a grammar: A→ + | * | - is not operator grammar
because right side EAE has consecutive non terminals.
▪ In operator precedence parsing we define following disjoint
relations: <. , =, .>
a.>b operator a has higher precedence than operator b.
a<.b operator a has
. lower precedence than operator b.
a=b operators a and b have equal/same precedence.
Chapter – 3 : Syntax Analysis 63 Bahir Dar Institute of Technology
Operator precedence parser
▪ If the grammar is not operator-precedence, you must change it to
its equivalent operator precedence
▪ E.g.-1: E→ EAE | (E) | id, A→ + | * | - is not operator grammar.
▪ Solution: remove adjacent non terminals by substituting A’s production
E→ E+E | E*E| E-E (E) | id
▪ E.g.-2: S➔aSb|SbaS|  is not operator grammar.
▪ Solution: eliminate  with possible combination substitutions
S➔aSb|SbaS
S➔ab // substitute S with  in aSb
S➔ baS // substitute first S with  in SbaS
S➔ Sba // substitute second S with  in SbaS
.
S➔ ba // substitute both S with  in SbaS

Chapter – 3 : Syntax Analysis 64 Bahir Dar Institute of Technology


Rules for binary operations
▪ 1. If operator θ1 has higher precedence than operator θ2, then make
θ1 . > θ2 and θ2 < . Θ1
▪ 2. If operators θ1 and θ2, are of equal precedence, then make
θ1 . > θ2 and
θ2 . > θ1 if operators are left associative
θ1 < . θ2 and
θ2 < . θ1 if right associative
▪ 3. Make the following for all operators θ:
θ < . id , id . > θ
θ < .(, ( < . θ
) .> θ, θ > .)
θ.>$,$<.Θ
▪ Also make ( = ) , ( < . ( , ) . > ) , ( < . id, id . > ),
$ < . id, id . > $,. $ < . (, ) . > $

Chapter – 3 : Syntax Analysis 65 Bahir Dar Institute of Technology


Precedence & associativity of operators

Operator Precedence Associative


↑ 1 right
*, / 2 left
+, - 3 left

Chapter – 3 : Syntax Analysis 66 Bahir Dar Institute of Technology


Operator Precedence example
Operator-precedence relations for the grammar
E → E+E | E-E | E*E | E/E | E↑E | (E) | -E | id is given in the
following table assuming
1. ↑ is of highest precedence and right-associative
2. * and / are of next higher precedence and left-associative, and
3. + and - are of lowest precedence and left-associative
Note that the blanks in the table denote error entries.

Operator-precedence relations table


Chapter – 3 : Syntax Analysis 67 Bahir Dar Institute of Technology
Operator precedence parsing algorithm
Method: Initially the stack contains $ and the input buffer the string w $.
To parse, we execute the following program:
1. Set ip to point to the first symbol of w$;
2. repeat forever
3. if $ is on top of the stack and ip points to $ then
4. return
else begin
5. let a be the topmost terminal symbol on the stack and let b be the
symbol pointed to by ip;
6. if a <. b or a = b then begin
7. push b onto the stack;
8. advance ip to the next input symbol; end;
9. else if a .> b then /*reduce*/
10. repeat
11. pop the stack until the top stack terminal is related by <. to the
terminal most recently popped
13. else error( ) end
Chapter – 3 : Syntax Analysis 68 Bahir Dar Institute of Technology
Stack implementation of operator precedence parsing
▪ Operator precedence parsing uses a stack and precedence relation table for its
implementation of above algorithm. It is a shift -reduce parsing containing all
four actions shift , reduce, accept and error.
▪ Example: grammar: E → E+E | E-E | E*E | E/E | E↑E | (E) | id. Input
string is id+id*id Stack Relation Input comment
$ <. Id+id*id$ Shift id
$id .> +id*id$ Pop the top of stack id
$ <. +id*id$ Shift +
$+ <. id*id$ Shift id
$+id .> *id$ Pop id
$+ <. *id$ Shift *
$+* <. id$ Shift id
$+*id .> $ pop id
$+* .> $ Pop *
$+ .> $ Pop +
$ $ accept
Chapter – 3 : Syntax Analysis 69 Bahir Dar Institute of Technology
Pros and cons of operator precedence parsing
▪ Advantages of operator precedence parsing:
1. It is easy to implement.
2. Once an operator precedence relation is made between all pairs of
terminals of a grammar , the grammar can be ignored. The grammar is
not referred anymore during implementation.
Disadvantages of operator precedence parsing:
1. It is hard to handle tokens like the minus sign (-) which has two
different precedence.
2. Only a small class of grammar can be parsed using operator-
precedence parser.
▪ NB: The worst case of computation of this table (on previous slide) is
O(n2). because the table entry is nXn.
▪ To decrease the cost of computation, the operator function table can
be constructed and the cost/order of operator function table
computation is O(2n)
Chapter – 3 : Syntax Analysis 70 Bahir Dar Institute of Technology
Algorithm for constructing operator precedence functions

1. Create functions fa and ga for each a that is terminal or $.


2. Partition the symbols in as many as groups possible, in such a way
that fa and gb are in the same group if a = b.
3. Create a directed graph whose nodes are in the groups, next for
each symbol a and b do:
a) if a <· b, place an edge from the group of gb to the group of fa
b) if a ·> b, place an edge from the group of fa to the group of gb
4. If the constructed graph has a cycle then no precedence functions
exist (i.e., if there is a cycle function table cannot be generated).
When there are no cycles, collect the length of the longest paths
from the groups of fa and gb respectively.

Chapter – 3 : Syntax Analysis 71 Bahir Dar Institute of Technology


Example: constructing operator precedence functions
▪ Example: Grammar:

▪ Step1: Create functions fa and ga for each a that is terminal or $ .i.e.


a = {+,∗,id} or $

Chapter – 3 : Syntax Analysis 72 Bahir Dar Institute of Technology


Example: constructing operator precedence functions
▪ Step2: Partition the symbols in as many as groups possible, in such
a way that fa and gb are in the same group if a = b.

▪ Step3: if a <· b, place an edge from the group of gb to the group of


fa if a ·> b, place an edge from the group of fa to the group of gb

Chapter – 3 : Syntax Analysis 73 Bahir Dar Institute of Technology


Example: constructing operator precedence functions
▪ From this graph we can extract the following precedence function table:
Find the longest path to include all at once.
Since there
▪ Example: is no out
going edge
from $
Then count the edges for each

▪ Hence, this table contains information as the same as to precedence relation


table. And you can to implement parsing using this table and stack like previous.
▪ When you compare fid and gid in precedence relation table, it has only space or 
(it is impossible to compare it)
▪ But in operator function table fid and gid has 4 <· 5 comparison; this is a
disadvantage of operator function table.
Chapter – 3 : Syntax Analysis 74 Bahir Dar Institute of Technology
Shift-Reduce Parsers
▪ The most prevalent type of bottom-up parser today is based on a
concept called LR(k) parsing;

left to right right-most k lookahead (k is omitted ➔ it is 1|0)

CFG
▪ LR-Parsers covers wide range of grammars. LR
LALR
• Simple LR parser (SLR)
SLR
• Look Ahead LR (LALR)
• most general LR parser (LR)

▪ SLR, LR and LALR work same, only their parsing tables are
different.
Chapter – 3 : Syntax Analysis 75 Bahir Dar Institute of Technology
LR Parsers
▪ LR parsing is attractive because:
• LR parsers can be constructed to recognize virtually
(effectively) all programming-language constructs for which
context-free grammars can be written.
• LR parsing is most general non-backtracking shift-reduce
parsing, yet it is still efficient.
• The class of grammars that can be parsed using LR methods
is a proper superset of the class of grammars that can be
parsed with predictive parsers.
• LL(1)-Grammars  LR(1)-Grammars
• An LR-parser can detect a syntactic error as soon as it is
possible to do so a left-to-right scan of the input.
▪ Drawback of the LR method is that it is too much work
to construct an LR parser by hand.
• Use tools e.g. yacc

Chapter – 3 : Syntax Analysis 76 Bahir Dar Institute of Technology


An overview of LR Parsing model

input a1 ... ai ... an $


stack

Sm
Xm
LR Parsing Algorithm output
Sm-1
Xm-1
.
.
Action Table Goto Table
S1 terminals and $ non-terminal
X1 s s
t four different t each item is
S0 a actions will be a a state number
t applied t
e e
s s
A Configuration of LR Parsing Algorithm

▪ A configuration of an LR parsing is:

( $ So S1 ... Sm, ai ai+1 ... an $ )

Stack Rest of Input

▪ Sm and ai decides the parser action by consulting the parsing action


table. (Initially Stack contains just $ )

▪ A configuration of an LR parsing represents the right sentential


form:
X1 ... Xm ai ai+1 ... an $
• Xi is the grammar symbol represented by state si
Chapter – 3 : Syntax Analysis 78 Bahir Dar Institute of Technology
Actions of An LR-Parser
1. If ACTION[Sm, ai ] = shift s, the parser executes a shift move ; it
shifts the next state s onto the stack, entering the configuration
( $ So S1 ... Sm, ai ai+1 ... an $ ) ➔ ( $ So S1 ... Sm s, ai+1 ... an $ )

2. If ACTION[Sm, ai ] = reduce A→, then the parser executes a


reduce move changing configuration from
( $ So S1 ... Sm, ai ai+1 ... an $ ) to ( $ So S1 ... Sm-r s, ai ... an $ ) where r is the
length of , and s = GOTO[sm-r, A].
Output is the reducing production A→
• Here the parser first popped r state symbols off the stack, exposing state sm-r
then the parser pushed s.

3. If ACTION[Sm, ai ] = Accept, parsing completed successfully.


4. If ACTION[Sm, ai ] = Error, parser detected an error (an empty
entry in the action table)
Chapter – 3 : Syntax Analysis 79 Bahir Dar Institute of Technology
LR-parsing algorithm
METHOD: Initially, the parser has s0 on its stack, where s0 is the initial state,
and w$ in the input buffer.
Let a be the first symbol of w$;
while(1)
{ /* repeat forever */
let S be the state on top of the stack;
if ( ACTION[S, a] = shift t )
{ push t onto the stack;
let a be the next input symbol;
}
else if ( ACTION[S, a] = reduce A→β) //reduce previous input symbol to head
{ pop 2*|β| symbols off the stack;
let state t now be on top of the stack;
push A and GOTO[t, A] onto the stack;
output the production A→β;
} else if ( ACTION[S, a] = accept ) break; /* parsing is done */
else call error- recovery routine;
}
Chapter – 3 : Syntax Analysis 80 Bahir Dar Institute of Technology
SLR Parsing Tables for Expression Grammar

Action Table Goto Table


Expression
state id + * ( ) $ E T F
Grammar 0 s5 s4 1 2 3
1) E → E+T 1 s6 acc
2) E→T 2 r2 s7 r2 r2
3) T → T*F 3 r4 r4 r4 r4
4 s5 s4 8 2 3
4) T→F
5 r6 r6 r6 r6
5) F → (E) 6 s5 s4 9 3
6) F → id 7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Actions of An SLR-Parser – Example input: id*id+id
▪ Stack Input action output
$0 id*id+id$ Shift 5
$0id5 *id+id$ reduce by F→id F→id
$0F3 *id+id$ reduce by T→F T→F b/c goto(0, F) = 3
b/c goto(0, T) = 2
$0T2 *id+id$ shift 7
$0T2*7 id+id$ Shift 5
$0T2*7id5 +id$ reduce by F→id F→id
b/c goto(7, F) = 10
$0T2*7F10 +id$ reduce by T→T*F T→T*F b/c goto(0, T) = 2
$0T2 +id$ reduce by E→T E→T
T2*7F10 pop by rule
$0E1 +id$ Shift 6 2*|β| b/c |β| = 3 and
$0E1+6 id$ Shift 5 T2 pushed
$0E1+6id5 $ reduce by F→id F→id
$0E1+6F3 $ reduce by T→F T→F b/c goto(6, F) =3
b/c goto(6, T) = 9
$0E1+6T9 $ reduce by E→E+T E→E+T E1+6T9 pop and E1
$0E1 $ accept pushed

Chapter – 3 : Syntax Analysis 82 Bahir Dar Institute of Technology


Steps for Constructing SLR Parsing Tables

1. Augment G and produce G' (The purpose of this new


starting production is to indicate to the parser when it should
stop parsing and announce acceptance of the input)
2. Construct the canonical collection of set of items C for
G' (finding closure/ LR(0) items)
3. Compute goto(I, X), where, I is set of items and X is
grammar symbol.
4. Construction of DFA using GOTO result and find
FOLLOW set
5. Construct LR(0) parsing table using the DFA result and
FOLLOW set

Chapter – 3 : Syntax Analysis 83 Bahir Dar Institute of Technology


Constructing SLR Parsing Tables – LR(0) Item
▪ An LR parser makes shift-reduce decisions by maintaining states
to keep track of where we are in a parse.
▪ An LR(0) item of a grammar G is a production of G with a dot at
the some position of the right side.
• Ex: A → aBb Possible LR(0) Items: A→ .. aBb
(four different possibility) A → a Bb
A → aB b
A → aBb
..
▪ Sets of LR(0) items will be the states of action and goto table of
the SLR parser.
• i.e. States represent sets of "items"
▪ A collection of sets of LR(0) items (the canonical LR(0)
collection) is the basis for constructing a deterministic finite
automaton that is used to make parsing decisions.
Such an automaton is called an LR(0) automaton.
Chapter – 3 : Syntax Analysis 84 Bahir Dar Institute of Technology
Constructing SLR Parsing Table:- closure & GOTO operation
▪ Closure operation:
▪ If I is a set of items for a grammar G, then closure(I) is the set of items
constructed from I by two rules:
1. Initially, every item in I is added to closure(I).
2. If A → α .Bβ is in closure(I) and B → γ is a production, then add the
item B → .γ to I , if it is not already there.
We apply this rule until no more new items can be added to closure(I).
▪ Goto operation: Goto(I, X) (I is CLOURE set and X is all Grammar
symbol)
Goto(I, X) is defined to be the closure of the set of all items [A→ αX.β]
such that [A→ α .Xβ] is in I.
GOTO (I, X) will be performed based on the following Rules:
1. If A->X.BY where B is a Terminal, including this item only in the
CLOSURE (X) Item.
2. If A->X.BY where B is a Non-Terminal including this item along with
B's CLOSURE (B).
Chapter – 3 : Syntax Analysis 85 Bahir Dar Institute of Technology
Algorithm for construction of SLR parsing Table
▪ Input: An augmented grammar G’
Output: The SLR parsing table functions action and goto for G’
Method:
1. Construct C = {I0, I1, …. In}, collection of sets of LR(0) items for G’.
2. State i is constructed from Ii..
The parsing functions for state i are determined as follows:
➔If [A→α ∙aβ] is in Ii and goto(Ii, a) = Ij, then set action[i, a] to “shift j”.
Here a must be terminal.
➔If [A→α∙] is in Ii , then set action[i,a] to “reduce A→α” for all a in
FOLLOW(A).
➔If [S’→S.] is in Ii, then set action[i,$] to “accept”.
▪ If any conflicting actions are generated by the above rules, we say grammar is not
SLR(1).
3. The goto transitions for state i are constructed for all non -terminals A using the
rule : If goto(Ii, A) = Ij, then goto[i,A] = j.
4. All entries not defined by rules (2) and (3) are made “error”
5. The initial state of the parser is the one constructed from the set of items
containing [S’→.S].
Chapter – 3 : Syntax Analysis 86 Bahir Dar Institute of Technology
Example for LR(0) parser
▪ Example of LR(0): Let the grammar G1: S➔AA, A➔aA|b
Step 1: augment the grammer GOTO(I0 , A): I2 . A is a non
S’➔S S➔A.A terminal.
S➔AA A➔.aA Hence, write all
A➔aA A➔.b its productions
A➔b GOTO(I0, a): I3
Step 2: closure(I) A➔a.A GOTO(I2, b): I4
S’➔.S A➔.aA A➔b.
S➔.AA A➔.b GOTO(I3, A): I6
A➔.aA GOTO(I0, b): I4 A➔aA.
A➔.b A➔b. GOTO(I3, a): I3
Step 3: compute GOTO GOTO(I2, A): I5 A➔a.A
GOTO (I0) S➔AA. A➔.aA
S’➔.S GOTO(I2, a): I3 A➔.b
S➔.AA A➔a.A GOTO(I3, b): I4
A➔.aA A➔.aA A➔b.
A➔.b A➔.b
GOTO(I0 , S): I1 GOTO(I2, b): I4 NB:I1, I4, I5, I6 are
S’➔S. A➔b. called final items.
Chapter – 3 : Syntax Analysis 87 Bahir Dar Institute of Technology
Example of LR(0) parsing Table
▪ Step 4: Construct a DFA

▪ NB: I1, I4, I5, I6 are called final items. They lead to fill the
‘reduce’/ri action in specific row of action part in a table

Chapter – 3 : Syntax Analysis 88 Bahir Dar Institute of Technology


Example of LR(0) parsing Table
▪ Step 5: construct LR(0) parsing table
First, level the grammar with integer value i.e.
S→AA ---1
A→aA ---2
A→ b ---3

NB: design LR parsing table from DFA.


I. all shifts=Si came from GOTO(I, X).
Put them by seeing from DFA.
II. All reduces = Ri came from final
items. See them in DFA and value of
i in Ri come from grammar’s level
above.
LR(0) parsing table

NB: In the LR(0) construction table whenever any state having final item in
that particular row of action part put Ri completely.
egg. in row 4, put R3 , 3 is a leveled number for production in G
Chapter – 3 : Syntax Analysis 89 Bahir Dar Institute of Technology
Example of LR(0) parsing Table
▪ Step 6: check the parser by implementing using stack for string abb$

Stack Input Action table Goto Parsing action


buffer table
$0 abb$ [0, a]=S3 Shift
$0a3 bb$ [3, b]=S4 Shift
$0a3b4 b$ [4, b]=R3 [3,A]=6 Reduce A➔b
$0a3A6 b$ [6, b]=R2 [0,A]=2 Reduce A➔aA
$0A2 b$ [2, b]=S4 Shift
$0A2b4 $ [4, $]=R3 [2,A]=5 Reduce A➔b
$0A2A5 $ [5, $]=R1 [0,S]=1 Reduce A➔AA
$0S1 $ [1,$]=accept Accept

Chapter – 3 : Syntax Analysis 90 Bahir Dar Institute of Technology


Notice:
▪ SLR(1) parsers use the same LR(0) configuration sets and have the same
table structure and parser operation.
▪ The difference comes in assigning table actions, where we are going to use
one token of lookahead to help arbitrate among the conflicts.
▪ The fundamental of LR(0) is the zero, meaning no lookahead tokens are
used.
▪ It is a satisfying constraint to have to make decisions using only what has
already been read, without even glancing at what comes next in the input.
▪ Therefore, in LR(0) parsing table conflicts are occurred and they are not
resolved.
▪ The simple improvement that SLR(1) makes on the basic LR(0) parser is to
reduce only if the next input token is a member of the follow set of the non-
terminal being reduced.
▪ When filling in the table, we don't assume a reduce on all inputs as we did in
LR(0), we selectively choose the reduction only when the next input symbols
in a member of the follow set.
▪ This avoids the two conflicts (Shift-reduce and reduce-reduce conflicts).

Chapter – 3 : Syntax Analysis 91 Bahir Dar Institute of Technology


Example of SLR(1) parsing Table
▪ Example of SLR(1) parsing table: use example the above grammar previously
used in LR(0) example
▪ To construct SLR(1) parser it is the same as to LR(0) parser but the
difference is only construction of parsing table (that is to fill reduce part
(Ri), we must use FOLLOW set.
▪ If the input terminal belongs to FOLLOW set, fill the corresponding cell
else not fill the corresponding cell). Hence apply the difference only.
Find the FOLLOW set of each non terminal.
S→AA ---1
A→aA ---2
A→b -----3
▪ FOLLOW(S) ={$}, FOLLOW(A) ={$, a, b}
▪ A→b. is final item. Then FOLLOW(A) ={$, a, b}, fill row 4 of action part with
R3.
▪ S→AA. is final item. FOLLOW(S) ={$}, fill row 5,$ cell with R1.
▪ A→aA. is final item. FOLLOW(A) ={$, a, b}, fill row 6 of action part with R2.

Chapter – 3 : Syntax Analysis 92 Bahir Dar Institute of Technology


Example of SLR(1) parsing Table

SLR(1) parsing table

S→AA ---1
A→aA ---2
A→b -----3
▪ Stack implementation is the same as to LR(0)
Chapter – 3 : Syntax Analysis 93 Bahir Dar Institute of Technology
LALR and CLR parser

▪ NB:
• LR(0) and SLR(1) used LR(0) items to create a parsing table
• But LALR and CLR parsers used LR(1) items in order to
construct a parsing table.

▪ Reading assignment
• LALR parser and
• CLR parser

Chapter – 3 : Syntax Analysis 94 Bahir Dar Institute of Technology

You might also like