0% found this document useful (0 votes)

167 views171 pages

Syntax Analysis and Parser Techniques

This document discusses syntax analysis and context-free grammars. It introduces topics like parsing techniques (top-down and bottom-up), syntax error handling, and context-free grammars. Context-free grammars are used to define the syntactic structure of programming languages. They consist of terminals, non-terminals, a start symbol, and production rules. Parsing validates that a program's tokens match the rules of the grammar.

Uploaded by

Eyoab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

167 views171 pages

Syntax Analysis and Parser Techniques

Uploaded by

Eyoab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter- Three

Syntax Analysis
Basic Topics of Chapter -Three

 Introduction of syntax  Parsing techniques

analysis
o Top down parsing
 Role of parser
o Bottom up parsing
 Syntax Error Handling
 Parser generators
 Context free grammars

 Ambiguity

 Eliminating ambiguity
3.1. Introduction of syntax analysis

 Syntax Analysis is the second phase of compiler

 Syntax Analysis creates the syntactic structure of the given source program.

 Syntax Analysis is also known as parser.

 The output is a parse tree.

 The syntax of a programming is described by a context-free grammar (CFG).

 Parser: program that takes tokens and grammars (CFGs) as input and validates the
output tokens against the grammar.

 We will use BNF (Backus-Naur Form) notation in the description of CFGs.

3.1. Introduction of syntax analysis: Cont’d
 The syntax analysis (parser) checks whether a given source program satisfies the rules
implied by a CFG or not.

o If it satisfies, the parser creates the parse tree of that program.

o Otherwise the parser gives the error messages.

 A context-free grammar

o gives a precise syntactic specification of a programming language.

o the design of the grammar is an initial phase of the design of a compiler.

o a grammar can be directly converted into a parser by some tools.

3.2. The role of Parser

token
Source Lexical Parser Parse tree Rest of Front End Intermediate
program Analyzer representation
getNext
Token

Symbol
table
The main Responsibility of Syntax Analysis
 Major task conducted during parsing(syntax analysis), such as

 the parser obtains a stream of tokens from the lexical analyzer. verifies that the

stream of token names can be generated by the grammar for the source language.

 Determine the syntactic validity of a source string is valid, a tree is built for use by

the subsequent phases of the compiler.

 Collecting information about various tokens into the symbol table, performing type

checking and other kinds of semantic analysis.

Parsing Techniques
 We categorize the parsers into two groups:
1. Top-Down Parser- the parse tree is created top to bottom,
o starting from the root.
2. Bottom-Up Parser- the parse is created bottom to top;
o starting from the leaves
 Both top-down and bottom-up parsers scan the input from left to right (one symbol
at a time).
 Efficient top-down and bottom-up parsers can be implemented only for sub-classes of
CFG.
– LL for top-down parsing

– LR for bottom-up parsing

3.3. Syntax Error Handling
 Most programming language specifications do not describe how a compiler should
respond to errors
 Error handling is left to the compiler designer.
 Planning the error handling right from the start can both
o simplify the structure of a compiler and improve its handling of errors.
 Error handler goals:

 Report the presence of errors clearly and accurately

 Recover from each error quickly enough to detect subsequent errors

 Add minimal overhead to the processing of correct programs

 Common Programming Errors
 Lexical errors: include misspellings of identifiers, keywords, or operators

 Syntactic errors: include misplaced semicolons or extra or missing braces; that is, "
{ " or " } . “

 Semantic errors: include type mismatches between operators and operands.

• An example is a return statement in a Java method with result type void.

 Logical errors : can be anything from incorrect reasoning on the part of the
programmer to the use in a C program of the assignment operator = instead of the
comparison operator ==.
Error-Recover Strategies
 Once an error is detected, how should the parser recover?

 Error recovery strategies:

a. Panic-mode recovery

b. Phrase-level recovery

c. Error-productions, and

d. Global-correction.
i. Panic Mode Recovery
 Once an error is found, the parser intends to find designated set of synchronizing tokens
by discarding input symbols one at a time.

 Synchronizing tokens are delimiters, semicolon or } whose role in source program is clear.

 When parser finds an error in the statement, it ignores the rest of the statement by not
processing the input.

 This is the easiest way of error-recovery.

 It prevents the parser from developing infinite loops.

 Example: In case of an error like: a=b + c // no semi-colon

d=e + f ;

 The compiler will discard all subsequent tokens till a semi-colon is encountered.
ii. phrase-level recovery
 On discovering an error, a parser may perform local correction on the remaining
input; that is, it may
o replacing a prefix of the remaining input by some string that allows the parser to continue.

 A typical local correction is to replace a

 comma by a semicolon,

 delete an extraneous semicolon, or

 insert a missing semicolon.

 But error correction is difficult in this strategy.

 For example insert a “;” after closing “}” of a class definition
iii. Error-productions
 Add erroneous constructs as productions in the grammar

 Works only for most common mistakes which can be easily identified

 Essentially makes common errors as part of the grammar

 Complicates the grammar and does not work very well

 For example if we have a production rule like:

iv. Global-correction
 Compiler to make as few changes as possible in processing an incorrect input

string .

 Given an incorrect input string x and grammar G,

o these algorithms will find a parse tree for a related string y,

o such that the number of insertions, deletions, and changes of tokens

required to transform x into y is as small as possible.

3.4. Context free grammars
 This section reviews the definition of a context-free grammar(CFG) and introduces
terminology for talking about parsing.

 Basically there is a number of type grammar but for compiler design or syntactic structure of
the programming language we use the CFG.

 CFG- is used to define the syntactic structure of a programming language constructions, like:
Algebric expression, if else statement, while loop, array representation etc..

 It contains a set if rules called as production rules

 Inherently recursive structures of a programming language are defined by a context-free

grammar.
3.4. Context free grammars cont’d.
• What does mean Syntactic Structure?
• Suppose in English language: we have a grammar in this form

I am going.
(This implies proper noun, helping verb, verb followed by ing and terminates by
full stop respectively).
 . In programming language suppose we write a sentence in this form

int a, b,c;
define data type, variable name separated by comma, terminated by semicolon
Therefore, CFG used to check the syntax of a programming language
Formal Definition of a CFG

 A grammar is set of rules when validates the correctness of the sentences of a

language. i. e. grammar defines the rules
 A context-free grammar (grammar for short) consists of four tuple:
 G: {S, T, N, P} where
i. T is a finite set of terminals (in our case, this will be the set of tokens)
ii. N is a finite set of non-terminals (syntactic-variables)
iii. S is a start symbol (one of the non-terminal symbol)
iv. P is a finite set of productions rules in the following form:
A   where A is a non-terminal and  is a string of terminals and non-
terminals (including the empty string)
i. Terminals
 Terminals are the basic symbols from which strings are formed.
 The term "token name" is a synonym for "terminal" and frequently we will use the word
"token" for terminal when it is clear that we are talking about just the token name.
 We assume that the terminals are the first components of the tokens output by the lexical
analyzer.
 These symbols are terminals:
a. Lowercase letters early in the alphabet, such as a, b, c.
b. Operator symbols such as +, *, and so on.
c. Punctuation symbols such as parentheses, comma, and so on.
d. The digits 0 , 1 , . . . ,9.
e. Boldface strings such as id or if, each of which represents a single terminal symbol
f. keywords if and else
ii. Non-terminals
 Nonterminals are syntactic variables that denote sets of strings.
 The sets of strings denoted by non-terminals help define the language generated by the grammar.
 Non-terminals impose a hierarchical structure on the language that is key to syntax analysis and
translation.
 These symbols are non-terminals:
a. Uppercase letters early in the alphabet, such as A, B, C.
b. The letter S, which, when it appears, is usually the start symbol.
c. Lowercase, italic names such as expr or stmt.
d. When discussing programming constructs, uppercase letters may be used to represent non-
terminals for the constructs.
 For example, non-terminals for expressions, terms, and factors are often represented by E, T, and
F, respectively.
iii. Start Symbol
 In a grammar, one nonterminal is distinguished as the start symbol from where the production
begging, and the set of strings it denotes is the language generated by the grammar.

 Conventionally, the productions for the start symbol are listed first.

iv. Productions
 The productions of a grammar specify the manner in which the terminals and non-terminals
can be combined to form strings.
 Each Production Consists of:
a. A non-terminal called the head or left side of the production;
this production defines some of the strings denoted by the head.
b. The symbol -K Sometimes : : = has been used in place of the arrow.
c. A body or right side consisting of zero or more terminals and non-terminals.
Example #1: Simple Arithmetic Expressions

 In this grammar, the terminal symbols are: id+-*/()

 The nonterminal symbols are expression, term and factor, and expression is the
start symbol .
Example #2: CFG (Algebraic grammar)
• G: SAB
AaAA
AaA
Aa
BbB
Bb
Q1. Identify S, T, N and P

Q2. Check if the following input string is accepted or not by the given G. Input
string= ab, aab, aaab , aabba
Example #3: CFG (Algebraic grammar)
G: EE+E|E-E|E*E|E/E|(E)id
 sentence: id+id*id

( this sentence is derived from the above grammar hence it is a valid sentence)

 Invalid sentence: id++id*id

( this sentence can not be generated through above grammar hence it is a valid
sentence)

NB: so the job of a grammar is to validate correctness of the sentence or find out the
error of the sentence
Example #4: CFG (“If else” Grammar)
• Sif expression then Statement
|if expression then statement else statement
Q. How can we write this sentence in the help of grammar?
- Derive this statement with the help of grammar, this statement is valid sentence
Example: Sentence
If(a<b) then printf(“yes”);
if (a<b) then printf(“yes”)
else
printf(“no”);
CFG - Terminology
 L(G) is the language of G (the language generated by G) which is a set of
sentences.

 A sentence of L(G) is a string of terminal symbols of G.

 If S is the start symbol of G then

 w is a sentence of L(G) iff S  w where w is a string of terminals of G.

 If G is a context-free grammar, L(G) is a context-free language.

 Two grammars are equivalent if they produce the same language.

 S   - If  contains non-terminals, it is called as a sentential form of G.

- If  does not contain non-terminals, it is called as a sentence of G.

CFG Versus Regular Expressions

 A grammars are a more powerful notation than regular expressions.

 Every construct that can be described by a regular expression can be described by a

grammar, but not vice-versa.

 Alternatively, every regular language is a context-free language, but not vice-versa.

3.5. Representative Grammars
 Constructs that begin with keywords like while or int, are relatively easy to parse,
o because the keyword guides the choice of the grammar production that must be applied to
match the input.
 We therefore concentrate on expressions,
o which present more of challenge, because of the associativity and precedence of operators.

 Associativity and precedence are captured in the following grammar, Expressions, Terms,
and Factors.
 E represents expressions consisting of terms separated by
+ signs,
 T represents terms consisting of factors separated by * signs, and
 F represents factors that can be either parenthesized expressions or identifiers:
Cont’d
G: E → E + T | T

T→T*F|F

F → (E) | id

 Expression grammar above belongs to the class of LR grammars that are suitable for
bottom-up parsing.

 This grammar can be adapted to handle additional operators and additional levels of
precedence.

 However, it cannot be used for top-down parsing because it is left recursive

Cont’d
 The following non-left-recursive variant of the expression grammar below will be
used for top-down parsing:

E → TE’

E’ →+TE’ | Ɛ

T →FT’

T’ → *FT’ | Ɛ

F →(E) | id
Cont’d
 The following grammar treats + and * alike, so it is useful for illustrating techniques
for handling ambiguities during parsing:

 Here, E represents expressions of all types.

 Grammar above permits more than one parse tree for expressions like: a + b*c.
3.6. Derivations and Parse Trees
• Definition: Let G=(N T P S) be a CFG

• A tree is a derivation(or parse) tree if:

– Every vertex has a label from NUTU{є}

– The label of the root is S

– If a vertex with label A has children with labels X1,X2,,…Xn from left to right then
AX1,X2,…Xn must be a production in p

– If a vertex has label є, then that vertex is a leaf and the only child of it’s parent.

– More generally, a derivation tree can be defined with non-terminal as the tree.
Derivations
 Derivation is a sequence of production rules.
 It is used to get the input string through these productions rules.
 We have to decide:
• which non-terminal to replace
• Production rule by which the Non-terminals will be replaced
 If there is a production A  α then we say that A derives α and is denoted by A  α

 α A β  α γ β if A  γ is a production
 If α1  α2  …  αn then α1  αn
 Given a grammar G and a string w of terminals in L(G) we can write S  w
 If S  α where α is a string of terminals and non terminals of G then we say that α is a
sentential form of G.
 There are two options for Derivation
a. Left-Most Derivation (LMD) and b. Right-Most Derivation (RMD)
a. Left-Most Derivations (LMD)
• If we always choose the left-most non-terminal in each derivation step, this
derivation is called as left-most derivation.
• In LMD- input string scanned and replaced with the production rule from left to
right.
• In a sentential form only the leftmost non terminal is replaced then it becomes
leftmost derivation.
 Every leftmost step can be written as
wAγ lm* wδγ
where w is a string of terminals and A  δ is a production
LMD: Example #1 LMD: Example #2
 G: E  E+E
G: E E+E Input String: -(id+id)
 Input String: id+id
 E+E derives from E |(E)

o we can replace E by E+E |-E

o to able to do this, we have to have a |id
production rule EE+E in our
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
grammar.
 E  E+E  id+E  id+id  At each derivation step, we can choose any of

 A sequence of replacements of non- the non-terminal in the sentential form of G for

terminal symbols is called a derivation the replacement.
of id+id from E.
b. Right-Most Derivations (LMD)

 If we always choose the right-most non-terminal in each derivation step, this

derivation is called as right-most derivation.

 In RHD input string is scanned and replaced with the production rule from
right to left.

 We will see that the top-down parsers try to find the left-most derivation of the
given source program.

 We will see that the bottom-up parsers try to find the right-most derivation of
the given source program in the reverse order.
RMD: Example #1
i. G: E  E+E|(E)|-E Input String: -(id+id)

Right-Most Derivation
E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)

ii. G: E E+E |(E) |-E Input String: -(id+id+id)

Right-Most Derivation

E  -E  -(E)  -(E+E)  -(E+E+id)  -(E+id+id)  -(id+id+id)

Exercise #1

Q1. Consider the Context free grammar:

G: SS+S|S-S|a|b|c and the Input string: a-b+c

a) Give a leftmost derivation for the string.

b) Give a rightmost derivation for the string.

Parse Trees
 A parse tree is a graphical representation of a derivation

 It shows how the start symbol of a grammar derives a string in the language

 Properties of parse tree:

 Root is labeled by the start symbol

 Leaf nodes are labeled by tokens (=terminal)

 Each internal node is labeled by a non terminal

 if A is a non-terminal labeling an internal node and x1, x2, …xn are labels of
children of that node then A  x1 x2 … xn is a production
Parse Tree: Example #1
G: E E+E |(E) |-E |id Input String: -(id+id)

LMD will be: E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)

Parse tree:
Parse Tree: Example #2
 Construct parse tree for the given grammar:

 G: listlist+list|list-digit|digit Input string: 9-5+2

 Parse tree:
Exercise #2
• Construct parse tree for the following grammar:

1. G: TT+T

|T*T

|a|b|c input String: a*b+c

2. G: SXYZ input String: abd

Xa

Yb

Zc|d
3.7. Ambiguity
 A grammar that produces more than one parse tree for some sentence is said to be
ambiguous. Or

 An ambiguous grammar is one that produces :

o more than one leftmost derivation (lm)or

o more than one rightmost derivation (rm)for the same sentence.

 Drawback of Ambiguity:
o Parsing complexity

o Affects other phases

Ambiguity : Example #1

 Consider grammar:
G: E E+E|E*E|id Input string: id+id*id

E  E+E  id+E  id+E*E E

E + E
 id+id*E  id+id*id
id E * E

id id

E  EE  E+EE  id+E*E E

*
 id+id*E  id+id*id E E

E + E id

id id
Ambiguity : Example #2
string  string + string
| string – string
|0|1|…|9
• String 9-5+2 has two parse trees
Ambiguity (cont.)
 For the most parsers, the grammar must be unambiguous.

 unambiguous grammar

 unique selection of the parse tree for a sentence

 We should eliminate the ambiguity in the grammar during the design phase of the
compiler.

 An unambiguous grammar should be written to eliminate the ambiguity.

 We have to prefer one of the parse trees of a sentence (generated by an ambiguous

grammar) to disambiguate that grammar to restrict to this choice.
Ambiguity : Example #3
Q. Consider the context-free grammar:

S→a|S+S|SS|S*|(S) -is ambiguous.

The string a+a* can be parsed in more than one way:

How to handle Ambiguity ?
 Ambiguity is problematic because meaning of the programs can be incorrect.

 Ambiguity can be handled in several ways

– Enforce associativity and precedence

– Rewrite the grammar (unambiguous Grammar)

 There are no general techniques for handling ambiguity

 It is impossible to convert automatically an ambiguous grammar to an unambiguous

one.
3.8. Precedence and Associativity operators.

 Ambiguous grammars (because of ambiguous operators) can be disambiguated

according to the

 Associativity rules

 Precedence rules
 Associativity of Operators
 If an operand has operator on both the sides, the side on which operator takes this
operand is the associativity of that operator.
 In a+b+c b is taken by left +
+, -, *, / are left associative
^, = are right associative
e.g. 1+2+3 first we evaluate (1+2)+3 left associative
1 ^2 ^3 = 1 ^(2^ 3) right associative
a=b=c right associative
 Grammar to generate strings with right associative operators
right  letter = right | letter
letter  a| b |…| z
 Precedence of Operator
 Whenever an operators has a higher precedence then the other operators

 it means that the first operator will get its operands before the operators with lower

precedence.

e.g. How the expression a*b+c will be interpreted

 (ab)+c multiplication is evaluated first because has higher precedence than +

 Precedence of Operator (cont’d)
E.g 2 In the expression a*b/c

 since multiplication and division have the same precedence we must use the
associative

 which means they are grouped left to right as if the expression was (a*b)/c

 If the same precedence associativity is checked .e.g 1+2-3 = (1+2)-3

e.g 3
3.10 Eliminating Ambiguity
 AMBIGUITY. The context-free grammar G = (T, N, S, P) is
 unambiguous if every sentence of G has a unique parse tree,

 ambiguous if is not unambiguous.

 Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

 Eliminating Ambiguity: Example#1

Consider the grammar: E→E+E|E*E|id with input string:

id+id+id and id+id*id

 Eliminating ambiguity from the grammar or convert into unambiguous grammar.

Eliminating Ambiguity :Dangling else Grammar

Consider Example #2:

 we shall eliminate the ambiguity from the following "dangling else" grammar:

 Here "other" stands for any other statement.

Eliminating Ambiguity(cont’d)
 According to this grammar, the compound conditional statement .
if E1 then S1 else if E2 then S2 else S3
 has the parse tree shown in Fig 3.1.

Fig 3.1 : Parse tree for a conditional statement.

Eliminating Ambiguity(cont’d)
 the given Grammar is ambiguous since the String
if E1 then if E2 then S1 else S2
has the two parse trees as shown in Fig 3.2

Fig 3.2: Two parse trees for an ambiguous sentence

Eliminating Ambiguity(cont’d)
 In all programming languages with if-then-else statements of this form, the first parse

tree is preferred.

 Hence the general rule is: match each else with the previous closest then.

 This disambiguating rule can be incorporated directly into a grammar by using the

following observations.
Eliminating Ambiguity(cont’d)
 A statement appearing between a then and a else must be matched. (Otherwise there
will be an ambiguity.)

 Thus statements must split into kinds: matched and unmatched.

 A matched statement is

 either an if-then-else statement containing no unmatched statements

 or any statement which is not an if-then-else statement and not an if-then

statement.
Eliminating Ambiguity(cont’d)
 Then an unmatched statement is
o an if-then statement (with no else-part)

o an if-then-else statement where unmatched statements are allowed in the else-part (but
not in the then-part).

o Figure 3.3 : Unambiguous grammar for if-then-else statements

Exercise #3
Q. Construct unambiguous context-free grammars for each of the following languages.

a) Arithmetic expressions in postfix notation.

expr → expr expr +
expr → expr expr –
expr → digit
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
b) Left-associative lists of identifiers separated by commas.
expr → expr,
idexpr → id
Exercise #3 Cont’d
c) Right-associative lists of identifiers separated by commas.
expr → id, expr
expr → id
d) Arithmetic expressions of integers and identifiers with the four binary operators + , - , * , /.

expr → expr + term | expr - term | term

term → term * factor | term / factor | factor

factor → digit | id | (expr)

Exercise #4
Q1. Consider the context-free grammar:
S→S S + \ S S * \ a and the string aa + a*.

a) Give a leftmost derivation for the string.

b) Give a rightmost derivation for the string.
c) Give a parse tree for the string.
d) Is the grammar ambiguous or unambiguous? Justify your answer.
e) Describe the language generated by this grammar.
Recursion
 Recursion can be classified into following three types
i. Left Recursion
 A production of grammar is said to have left recursion,
 if the leftmost variable (LHS) is same as variable of its (RHS).
 A grammar containing a production having left recursion is called as Left Recursive
Grammar.

 Let G be a context-free grammar.

 A production of G is said left recursive if it has the form

 A=> Aα where A is a nonterminal and is a string of grammar symbols.

 Example S → Sa / ∈ (Left Recursive Grammar)
 The grammar G is left recursive if it has at least one left recursive nonterminal.
Elimination of left recursion
 left recursion is considered to be a problematic situation for Top down parsers.

 Therefore, left recursion has to be eliminated from the grammar.

 Top-down parsing is one of the methods that we will study for generating parse trees.

 This method cannot handle left recursive grammars.

 We present now an algorithm for transforming a left recursive grammar G into a

grammar G'

 which is not left recursive and which generates the same language as G.
How to eliminate left recursion?
 A simple rule for direct left recursion elimination:

– For a rule like:

A →A α|β
– We may replace it with

A → β A’

A’ → α A’ | ɛ

where
Eliminate left recursion: Example #1
 The following grammar which generates arithmetic expressions.

E →E + T | T

T→T*F | F

F→(E) | id
 has two left recursive productions. Applying the above trick leads to

E →TE’

E’ →+TE’ |∈

T →FT’

T’ →*FT’ |∈

F→(E) | id
Elimination of left recursion(cont’d)
 The Case of Several Left Recursive A-productions.

 Assume that the set of all A-productions has the form

Elimination of left recursion: Example #2
 Let us consider the following grammar.
– S → Aa | b
– A → Ac | Sd | ɛ
 the nonterminal S is left recursive since we have

 However there is no left recursive S-productions.

 We show now how to deal with these left recursive derivations.

Elimination of left recursion: Example #3

Consider the following grammar and eliminate left recursion-

S→A

A → Ad / Ae / aB / ac

B → bBc / f
• Solution-
• The grammar after eliminating left recursion is-
ii. Right Recursion
 A production of grammar is said to have right recursion if the rightmost variable of
its RHS is same as variable of its LHS.
 A grammar containing a production having RR is called as Right Recursive Grammar.
Example: S → aS / ∈ (Right Recursive Grammar)
 Note: Right recursion does not create any problem for the Top down parsers.
 Therefore, there is no need of eliminating right recursion from the grammar.

iii. General Recursion

 The recursion which is neither left recursion nor right recursion is called as general recursion.

Example: S → aSb / ∈
Left factoring
 Left factoring is a process by which the grammar with common prefixes is
transformed to make it useful for Top down parsers.

 If the RHS of more than one production starts with the same symbol, then such a
grammar is called as grammar with common prefixes.
• Ex: A → αβ1 / αβ2 / αβ3 (Grammar with common prefixes)
 This kind of grammar creates a problematic situation for Top down parsers.

 Top down parsers can not decide which production must be chosen to parse the string in
hand.

 To remove this confusion, we use left factoring.

Left factoring(cont’d) How?
 In left factoring, we make one production for each common prefixes and rest of the
derivation is added by new productions

 The grammar obtained after the process of left factoring is called as left factored
grammar.
 Example:

 The common prefix may be a terminal or a non-terminal or a combination of both.

Left factoring: Left factoring:
Example #1 Example #2
 Do left factoring in the following
 Do left factoring in the grammar.
following grammar A → aAB / aBc / aAc
solution:
S → iEtS / iEtSeS / a
A → aA’
E→b A’ → AB / Bc / Ac
Solution Again, this is a grammar with common
prefixes.
The left factored grammar is- A → aA’
S → iEtSS’ / a A’ → AD / Bc
D→B/c
S’ → eS / ∈
• Now, this is a left factored grammar.
E→b
Left factoring: Example #3
 Do left factoring in the following grammar
S → bSSaaS / bSSaSb / bSb / a
Solution-
S → bSS’ / a
S’ → SaaS / SaSb / b

• Again, this is a grammar with common prefixes.

S → bSS’ / a
S’ → SaA / b
A → aS / Sb
• Now, this is a left factored grammar.
• Parsing Techniques
Parsing Techniques
 The term parsing comes from Latin pars meaning “part”.

 Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream
of tokens.

 Syntax analyzers follow production rules defined by means of context-free grammar.

 The way the production rules are implemented (derivation) divides parsing into two types
:

i. Top-down parsing (LL) and

ii. Bottom-up parsing(LR).

i. Top Down Parsing
 A Top-down parser tries to create a parse tree from the root towards the leafs
scanning input from left to right .
 It can be also viewed as finding a leftmost derivation for an input string.
 It may require backtracking , some grammars are backtrack-free (Predictive) .
 Two types of TDP techniques:

a. Recursive-Descent Parsing - with backtracking

b. Predictive Parsing -without backtracking

a. Recursive-Descent Parsing
 A RDP program consists a set of procedures, one for each non-terminal.

 It is a general parsing technique, but not widely used.

 Execution begins with the procedure for the start symbol

 General recursive-descent may require backtracking

 Backtracking means (If a choice of a production rule does not work, we backtrack to
try other alternatives.)

 or making repeated scan of its input

 Not efficient
Recursive descent parsing: Example #1
 Consider the grammar with input string “cad”:

S→cAd

A→ab | a

 Backtracking is needed.

 It tries to find the left-most derivation

Recursive descent parsing: Example #2

 Suppose the given production rules are as follows:

S→aAd|aB
A→b|c
B→ccd

Input string :

accd
Recursive Descent Parsing: Algorithm
 A typical procedure for a non-terminal
Procedure A() {
choose an A-production, A→X1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
}
}
Recursive Descent Parsing: Algorithm (cont’d)

 The previous code needs to be modified to allow backtracking.

 In general form it can’t choose an A-production easily.

 So we need to try all alternatives

 If one failed the input pointer needs to be reset and another alternative should be
tried.

 Recursive descent parsers can’t be used for left-recursive grammars

Recursive descent parsing: Example #3
 Consider the following grammar with input string: id+id*id

E → TE’
E’→+TE’ | Ɛ
T → FT’
T’ →*FT’ | Ɛ
F → (E) | id
Recursive descent parsing: Exercise #1
Q. Construct a recursive descent parser for the following grammar.

AabC|aBd|aAD

BbB|∈

Cd|∈

Da|b|∈

Input string: aaba

b. Predictive Parsing
 A parsing technique that uses a lookahead symbol to determine if the current input
arguments matches the lookahead symbol.
 Predictive parsing is a table-driven parser.
 It is a top-down parser.
 It is also known as LL(1) Parser.
 Construction of a Predictive Parser:
 First and Follow
 LL(1) Grammars
• Predictive Parsing Tables Construction
Parse the input string
Error recovery in predictive parsing
 FIRST and FOLLOW
 First and Follow aids the construction of a predictive parser.

 They allow us to fill in the entries of a predictive parsing table.

 If a is any string of terminals , then

 First(A), is the set of terminals that begin the strings derived from a.

o If a is an empty string(ɛ), then ɛ is also in First(A).

 Follow (A), for a nonterminal A, to be the set of terminals a that can appear immediately to the
right of A in a sentential form.

– If we have S => αAaβ for some α and β then a is in Follow(A)

– If A can be the rightmost symbol in some sentential form, then $ is in Follow(A)

 Rules of Computing FIRST
 Rules in computing FIRST (X) where X can be a terminal or nonterminal, or even ε
(empty string).
1. If X is a terminal, then FIRST(X)= X.
2. If X is ε, then FIRST (X) = ε.
3. If X is a nonterminal and Y and Z are non-terminals, with a production of X → Y Y → Za
Z→b; then FIRST(X) = b; where FIRST(nonterminal1) → FIRST(nonterminal2)or until
you reach the first terminal of the production.

In that case (FIRST(non-terminal n) =FIRST(nonterminal n+1))

4) If X is a nonterminal and contains two productions.

EX: X → a | b; then FIRST (X) = {a , b}

Computing FIRST:
Example: #1

Example: #2
 Rules of Computing FOLLOW
 Rules in computing FOLLOW ( X) where X is a nonterminal
1) If X is a part of a production and is succeeded by a terminal,
for example:
A → Xa; then Follow(X) = { a }
2) If X is the start symbol for a grammar, for ex:
X → AB
A→a
B → b;
then add $ to FOLLOW (X); FOLLOW(X)= { $ }
 Rules of Computing FOLLOW(cont’d)
3) If X is a part of a production and followed by another non terminal, get the FIRST of that

succeeding nonterminal.

Ex: A →XD D → aB ;

then FOLLOW(X)= FIRST(D) = { a }; and

if FIRST(D) contains ε ex: D → aB | ε,

then everything in FOLLOW(D) is in FOLLOW(X).

4) If X is the last symbol of a production,

ex: S → abX, then FOLLOW(X)= FOLLOW(S)

Computing FOLLOW: Example 1&2
Computing FIRST and FOLLOW: Exercise #1
 Consider the following grammar G:

S→ABCDE
A →a|∈
B →b|∈
C →c
D →d|∈
E →e|∈

Compute FIRST and FOLLOW sets

 LL(1) Grammars
 Predictive parsers are those recursive descent parsers needing no backtracking.
 Grammars for which we can create predictive parsers are called LL(1)
– The first L means scanning input from left to right
– The second L means leftmost derivation
– And 1 stands for using one input symbol for lookahead
– No ambiguous or left recursive grammar is LL(1)

 A grammar G is LL(1) if and only if whenever A→α|β are two distinct productions
of G, the following conditions hold:
– For non-terminal a do α and β both derive strings beginning with a
– At most one of α or β can derive empty string
– If α=> ɛ then β does not derive any string beginning with a terminal in Follow(A).
How LL(1) Parser works?
 How LL(1) Parser works?…
 input buffer
– our string to be parsed.
– We will assume that its end is marked with a special symbol $.
 output
– a production rule representing a step of the derivation sequence (left-most derivation) of
the string in the input buffer.
 stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.

– initially the stack contains only the symbol $ and the starting symbol S.
$S  initial stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is completed.
 Predictive Parsing Tables Construction

 The general idea is to use the FIRST AND FOLLOW to construct the parsing tables.

 Each FIRST of every production is labeled in the table whenever the input matches with it.

 When a FIRST of a production contains ε, then we get the Follow of the production.

How to construct parsing table:

– a two-dimensional array M[A,a]

– each row is a non-terminal symbol

– each column is a terminal symbol or the special symbol $

– each entry holds a production rule.

Predictive Parsing Tables: Construction -Algorithm

 For each production A→α in grammar do the following:

Input: Grammar G

Output: Parsing table M

Method

1.For each production A→α of the grammar do step 2 and 3

2.For each terminal a in First(α) add A→α to M[A,a]

3.If ɛ is in First(α), add A→∈ to M[A,b] for each terminal b in Follow(A)

If ɛ is in First(α) and $ is in Follow(A), add A→α to M[A,$] .

4. If after performing the above, there is no production in M[A,a] then set M[A,a] to error.
Predictive Parsing:
Tables Construction– Example #1
 Consider grammar G:
E → TE’
E’ → +TE’ | Ɛ
T →FT’
T’ →*FT’ | Ɛ
F → (E) | id
and their First and Follow
Predictive Parsing:
Tables Construction– Example #1…

 Blank entries are error states.

 For example E can not derive a string starting with ‘+’
LL(1) Parser – Stack implementation:
input string: id+id
Stack implementation:
Predictive parsing algorithm

Set ip point to the first symbol of w; else if (M[X,a] = X→Y1Y2..Yk) {

output the production X →
Set X to the top stack symbol; Y1Y2..Yk;
While (X<>$) { /* stack is not empty */ pop the stack;
push Yk,…,Y2,Y1 on to the
if (X is a) pop the stack and advance ip; stack with Y1 on top;
else if (X is a terminal) error(); }
set X to the top stack symbol;
else if (M[X,a] is an error entry) error(); }
ii. Bottom-up parsing
• Constructs parse tree for an input string beginning at the leaves (the bottom) and
working towards the root (the top).

• we start from a sentence and then apply production rules in reverse manner in order
to reach the start symbol.

• Attempts to traverse a parse tree bottom up (post-order traversal)

• Reduces a sequence of tokens to the start symbol

• At each reduction step, the RHS of a production is replaced with LHS

• A reduction step corresponds to the reverse of a rightmost derivation.

Example #1:
 Consider the Grammar with input string: id*id.
E→E+T|T

T→T*F|F

F →(E) | id

• A rightmost derivation for id + id * id is shown below:

Why Bottom-up parsing?
 A more powerful parsing technique

 LR grammars – more expensive than LL

 Can handle left recursive grammars

 Can handle virtually all the programming languages

 Natural expression of programming language syntax

 Detects errors as soon as possible

 Allows better error recovery

Difference between Top-Down Parsing and Bottom-Up Parsing

Key Top Down Parsing Bottom Up Parsing

starts evaluating the parse tree from the starts evaluating the parse tree from
top and move downwards for parsing the lowest level of the tree and move
Strategy other nodes. upwards for parsing the node.

attempts to find the left most derivation attempts to reduce the input string to
Attempt for a given string. first symbol of the grammar.

Derivation uses leftmost derivation. uses the rightmost derivation.

Type
searches for a production rule to be searches for a production rule to be
Objective used to construct a string. used to reduce a string to get a
starting symbol of grammar.
Bottom-up parsing: Reduction

 The bottom-up parsing as the process of “reducing” a token string to the start

symbol of the grammar.

 At each reduction, the token string matching the RHS of a production is replaced

by the LHS non-terminal of that production.

 The key decisions during bottom-up parsing are about when to reduce and about

what production to apply.

Bottom-up parsing: Example #2
 Consider a grammar: with input string abbcde
S  aABe
A  Abc|b
Bd
And reduction of a string
a bbcde
aAb c d e
aAd e
aAB e
S
• Right most derivation
S  aAB e
 aAd e
 aAb cd e
abbcde
 Shift-reduce parser and Possible Action
 Bottom-up parsing is also known as shift-reduce parsing because its two main actions are
shift and reduce.
 Shift Reduce parsing is a process of reduction string to the start symbol of a Grammar S.
A string the starting symbol
reduce to
There are 4 possible actions that a Shift reduce parsing can make
i. Shift- at shift action, the current symbol in the input string is pushed to a stack.
ii. Reduce- at each reduction, the symbol will replaced by the non-terminal.
iii. Accept- in an accept actions, the parsers announces successful completion of parsing
iv. Error- A situation in which parsers cannot either shift or reduce the symbol and also
cannot even perform the accept action.
A Stack Implementation of A Shift-Reduce Parser
 To implement shift-reduce,
o we use a stack to hold grammar symbols and an input buffer to hold the string w to the
parsed.

o The initial configuration of shift reduce parsing is

 Initial stack just contains only the end-marker $.

 The end of the input string is marked by the end-marker $.

A Stack Implementation of A Shift-Reduce Parser…
 The parser reports this cycle until it has detected on error or until the stack contains the
start symbol and the input is empty.

 After entering this state/configuration on the parsers halts and announces successful
completion of parsing.

 The symbol is the RHS of the production and non terminal is the LHS of the production.
Solution 1
Shift reduce parsing:
Stack Input string action
Example #1
$ id*id+id$ Shift id

 Consider the following $id *id+id$ Reduce Eid

grammar
$E *id+id$ Shift *
E→E+E|E*E|id
$E* id+id$ Shift id

Parse the input string idid+id $Eid +id$ Reduce Eid

using shift reduce parsing. $EE +id$ Reduce EEE

$E +id$ Shift +

$E+ id$ Shift id

$E+id $ Reduce Eid

$E+E $ Reduce EE+E

$E $ Accept
Shift reduce parsing: Exercise #1

1.Consider the following grammar

S→S+S|S-S|(S) |a Parse the input string a-(a+a) using shift reduce parsing.

2. Consider the following grammar

E→ 2E2|3E3|4

Parse the input string 32423 using shift reduce parsing.

Pruning the Handle
 Removing the children of left hand side non-terminal from the parse tree is called
Handle pruning.

 A right most derivation in reverse can be obtained by Handle pruning.

Steps to follow
 Start with a string of terminals ‘w’ that is to be parsed.

 Let w= n, where n is the nth right sentential form of an unknown RMD.

 To reconstruct the RMD in reverse, locate handle n in n.

 Replace n by LMS of some Ann to get (n-1)th right sentential form n-1

 Repeat this, until we reach S.

Example
Consider the following grammar: E→E+E|E*E|id Parse the input string id+id*id using handle
pruning.
Handle
 A handle is a substring that matches the right side of a production rule.
– But not every substring matches the right side of a production rule is handle
 A handle of a right sentential form  ( ) is a production rule A   and a
position of 
where the string  may be found and replaced by A to produce
the previous right-sentential form in a rightmost derivation of .
S  Aw  w
• If the grammar is unambiguous, then every right-sentential form of the grammar
has exactly one handle.
Example #2
Conflicts During Shift-Reduce Parsing
 There are context-free grammars for which shift-reduce parsers cannot be used.

 Stack contents and the next input symbol may not decide action:

– shift/reduce conflict: Whether make a shift operation or a reduction.

– reduce/reduce conflict: The parser cannot decide which of several reductions to make.
 If a shift-reduce parser cannot be used for a grammar, that grammar is called as non-LR(k)
grammar.

 Where L-Left to right scanning, R-right most derivation and K-lookahead

 An ambiguous grammar can never be a LR grammar.

 There are two main categories of shift-reduce parsers

a. Operator-Precedence Parser b. LR-Parsers

a. Operator-Precedence Parser
 Operator-precedence parser is mainly used to define mathematical operator in the
compiler.

 Operator grammar

 small, but an important class of grammars

 We may have an efficient operator precedence parser (a shift-reduce parser) for

operator grammar.

 In an operator grammar, no production rule can have:

 No Ɛ-transition.

 No two adjacent non-terminals in its right hand side.

Example #1
1. Consider the grammar:
E  EAE | id
A + | *
The above grammar is not an operator grammar but:

E  E + E | E* E | id is an operator grammar

2. Consider the grammar: S  SAS | a

A bSb|b not an operator grammar
But, S  SbSbS |bSb| a
A bSb|b is an operator grammar
Operator Precedence Relations: Parsing Action
 In operator-precedence parsing, we define three disjoint precedence relations between
certain pairs of terminals.
a<.b b has higher precedence than a
a =· b b has same precedence as a
a .> b b has lower precedence than a
Note:
– id has higher precedence than any other symbol

– $ has lowest precedence.

– if two operators have equal precedence, then we check the Associativity of that
particular operator.
Using Operator - Precedence Relations
 E → E+E| E*E | id

PRECEDENCE TABLE

 Then the input string id + id*id with the precedence relations inserted will be: $ <. id .> + <. id .> *
<. id .> $
Basic principle

 Scan input string left to right, try to detect .> and put a pointer on its location.

 Now scan backwards till reaching <.

 String between <. And .> is our handle.

 Replace handle by the head of the respective production.

 REPEAT until reaching start symbol.

Operator-Precedence Parsing with input string
Operator-Precedence Parsing Algorithm
The input string is w$, the initial stack is $ and a table holds precedence relations between certain terminals
set p to point to the first symbol of w$ ;
repeat forever
if ( $ is on top of the stack and p points to $ ) then return
else {
let a be the topmost terminal symbol on the stack and let b be the
symbol pointed to by p;
if ( a <. b or a =· b ) then { /* SHIFT */
push b onto the stack;
advance p to the next input symbol;
}
else if ( a .> b ) then /* REDUCE */
repeat pop stack
until ( the top of stack terminal is related by <. to the terminal most
recently popped );
else error();
}
Example #2
Q1. Construct operator precedence parsing table

Grammar:

E → E+E| E*E | id with input string: id+id+id $

Q2. Consider the following grammar and construct the operator precedence parser.

EEAE|id

A+|*

Then parse the following string: id+id*id

Disadvantages of Operator Precedence Parsing
 Advantages:
 simple
 powerful enough for expressions in programming languages
 Disadvantages:

 Number of size of the table is increasing.

 In general for n operator size of table is O(n^2)

 It cannot handle the unary minus (the lexical analyzer should handle the unary minus).

 Small class of grammars.

 Difficult to decide which language is recognized by the grammar.

 Note: We need precedence function


Needs of Precedence functions

 Operator precedence parsers use precedence functions that map terminal symbols to

integers.

 To decrease the size of table we use operator function table.

 To define the missing precedence/error

 To implement the precedence relations in computer program

Algorithm for Constructing Precedence Functions
1. Create functions fa for each grammar terminal a and for the end of string symbol.

2. Partition the symbols in groups so that fa and gb are in the same group
if a =· b (there can be symbols in the same group even if they are not connected by this relation).

3. Create a directed graph whose nodes are in the groups, next for each symbols a and
b do: place an edge from the group of gb to the group of fa if a <· b, otherwise if a ·>
b place an edge from the group of fa to that of gb.

4. If the constructed graph has a cycle then no precedence functions exist.

5. When there are no cycles collect the length of the longest paths from the groups of fa
and gb respectively.
Consider the following table:

 Resulting graph: Longest path travel for each

b. LR Parsing

 The most common type of bottom-up parsers

 Left-scan Rightmost derivation in reverse (LR) parsers are characterized by

 the number of look-ahead symbols that are examined to determine parsing

actions.

 We can make the look-ahead parameter explicit and discuss LR(k) parsers,
where k is the look-ahead size.

 Covers wide range of grammars.

LR Parsing
 Why LR parsers?
– Table driven
– Can be constructed to recognize all programming language constructs
– Most general non-backtracking shift-reduce parsing method
– Can detect a syntactic error as soon as it is possible to do so
– Class of grammars for which we can construct LR parsers are superset of those
which we can construct LL parsers.
– Problems with LL parsing : predicting right rule and left recursion
LR(k) Parsers
 LR(k), mostly interested on parsers with k<=1

 LR(k) parsers are of interest in that they are the most powerful class of
deterministic bottom-up parsers using at most K look-ahead tokens.

 Deterministic parsers must uniquely determine the correct parsing action at each
step.

they cannot back up or retry parsing actions.

LL vs. LR
LL LR
Does a leftmost derivation. Does a rightmost derivation in reverse.
Starts with the root nonterminal on the stack. Ends with the root nonterminal on the stack.
Ends when the stack is empty. Starts with an empty stack.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a nonterminal off the stack, Tries to recognize a right hand side on the
and pushes the corresponding right hand side. stack, pops it, and pushes the corresponding
nonterminal.
Expands the non-terminals. Reduces the non-terminals.
Reads the terminals when it pops one off the Reads the terminals while it pushes them on
stack. the stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.
Model of an LR parser
 How LR parser works?
Model of an LR parser…

 Stack contains a string of the form S0X1S1X2……XnSn where each Xi is a

grammar symbol and each Si is a state.

 Table contains action and goto parts.

• action table is indexed by state and terminal symbols.

• goto table is indexed by state and non terminal symbols.

LR Parsers (cont.)
In building an LR Parser:
1) Create the Transition Diagram
2) Depending on it, construct: Goto and Action
 Go_to table defines the next state after a shift.
 Action table tells parser whether to:
1) shift (S),

2) reduce (R),
3) accept (A) the source code, or
4) signal a syntactic error (E).
LR Parsers (Cont.)
 An LR parser makes shift-reduce decisions by maintaining states to keep track of
where we are in a parse.
 States represent sets of items.

 LR(k) Parsers:
 4 types of LR(k) parsers:
i. LR(0)
ii. SLR(1) –Simple LR
iii. LALR(1) – Look Ahead LR and
iv. CLR(1) – Canonical LR
LR Parsers (Cont.)

 In order to construct parsing table of LR(0) and SLR(1) we use canonical collection of
LR(0) items
 In order to construct parsing table of LALR(1) and CLR(1) we use canonical
collection of LR(1) items.
i. LR(0) Item
 LR(0) and all other LR-style parsing are based on the idea of: an item of the form:
A→X1…Xi.Xi+1…Xj
 The dot symbol . in an item may appear anywhere in the right-hand side of a
production.
 It marks how much of the production has already been matched.
 An LR(0) item (item for short) of a grammar G is a production of G with a dot at
some position of the RHS.
 The production A → XYZ yields the four items:
A → .XYZ this means at RHS we have not seen anything
A → X . YZ this means at RHS we have seen Y
A → XY . Z this means at RHS we have seen Z
A → XYZ . this means at RHS we have seen everything
The production A → λ generates only one item, A → .
Constructing Canonical LR(0) item sets
• Augmented Grammar
– If G is a grammar with start symbol S then G’, the augmented grammar for G, is the grammar with
new start symbol S’ and a production
E’ → E.
– The purpose of this new starting production is to indicate to the parser when it should stop parsing
and announce acceptance of input.

– Let a grammar be
E→BB
B→ cB | d

• The augmented grammar for the above grammar will be

E’ → E
B → BB
B → cB | d
Constructing canonical LR(0) item sets…

 Closure of a state

 Closure of a state adds items for all productions whose LHS occurs in an item

in the state, just after “.”

– Set of possible productions to be reduced next.

– Added items have “.” located at the beginning

– No symbol of these items is on the stack as yet

Constructing canonical LR(0) item sets…

• Closure operation
– Let I be a set of items for a grammar G

– closure(I) is a set constructed as follows:

– Every item in I is in closure (I)
– If A → α.Bβ is in closure(I) and B → γ is a production the

B → .γ is in closure(I)
• Intuitively A → α.Bβ indicates that we expect a string derivable from Bβ in input
• If B → γ is a production then we might see a string derivable from γ at this point
Example
• For the grammar
E’ →E
E→E+T|T
T→T*F|F
F → ( E ) | id
If I is , E’ → .E then closure(I) is
• E’ →.E
E → .E + T
E → .T
T → .T * F
T → .F
F →.id
F → .(E)
Constructing canonical LR(0) item sets…
 Goto operation

 Goto(I,X) , where I is a set of items and X is a grammar symbol,

– is closure of set of item A → αX.β
– such that A → α.Xβ is in I
 Intuitively if I is a set of items for some valid prefix α then goto(I,X) is set of
valid items for prefix αX.
• Example
• If I is , E’→E. , E → E. + T } then goto(I,+) is
E → E + .T
T → .T * F
T →.F
F →.(E)
F → .id
Steps to construct LR(0) items

 Step 1. Add Augment the given grammar

 Step 2. Draw canonical collection of LR(0) item (apply closure and go-to)

 Step 3. Number the production

 Step 4. Create parsing table

 Step 5. Stack implementation

 Step6. Draw parse tree

Constructing canonical LR(0) item sets: Example 1

 Consider the grammar:

E→BB

B →cB|d Input string: ccdd$

Step 1. Augment the given grammar

E’→E

E→BB

B →cB|d
Step 2. Draw canonical co llection of LR(0) item
Step 3. Number the production

• Step 4. Create parsing table

Step 5. Stack implementation Step 6. Draw parse tree
Constructing canonical LR(0) item sets: Example #2

Consider the grammar:

E’ → E
E→E+T|T
T→T*F|F
F → (E) | id

I is the set of one item {E’→.E}

Find CLOSURE(I)
Constructing canonical LR(0) item sets: Example#2
(Cont.)
First, E’ → .E is put in CLOSURE(I) by rule 1.
I0=closure({[E’->.E]}
Then, E-productions with dots at the left end: E’->.E
E → ‧E + T and E → ‧T E->.E+T
Now, there is a T immediately to the right of a dot in E->.T
E → .T, so we add T → .T * F and T → .F T->.T*F
Next, T → .F forces us to add: T->.F
F → ‧(E) and F → .id
F->.(E)
F->.id
Goto Next State
 Given an item set (state) s,
 we can compute its next state, s’, under a symbol X,

that is, Go_to (s, X) = s’

E’ → E
E→E+T|T
T→T*F|F
F → (E) | id

S is the item set (state):

E→E.+T
Goto Next State (Cont.)
S’ is the next state that Goto(S, +) goes to:

E → E +.T

T → .T * F (by closure)

T → .F (by closure)

F → .(E) (by closure)

F → .id (by closure)

We can build all the states of the Transition Diagram this way.
LR(0) Transition Diagram (Cont.)
 Each state in the Transition Diagram,

 either signals a shift (.moves to right of a terminal) or

 signals a reduce (reducing the RHS handle to LHS)

Constructing canonical LR(0) item sets (cont.)
• Goto (I,X) where I is an item set and X is a grammar symbol is closure of set of all items [A→αX.
β] where [A → α.X β] is in I
• Example
I1
E’->E.
E E->E.+T
I0=closure({[E’->.E]} I2
E’->.E T E’->T.
E->.E+T T->T.*F
E->.T I4
T->.T*F F->(.E)
T->.F E->.E+T
( E->.T
F->.(E) T->.T*F
F->.id T->.F
F->.(E)
F->.id
Closure algorithm GOTO algorithm

SetOfItems CLOSURE(I) { SetOfItems GOTO(I,X) {

J=I;
J=empty;
repeat
if (A-> α.X β is in I)
for (each item A-> α.Bβ in J)
for (each prodcution B->γ of G) add CLOSURE(A-> αX. β ) to J;
if (B->.γ is not in J) return J;
add B->.γ to J;
until no more items are added to J on one round; }
return J;
Canonical LR(0) items Algorithm
Void items(G’) {

C= CLOSURE({[S’->.S]});

repeat

for (each set of items I in C)

for (each grammar symbol X)

if (GOTO(I,X) is not empty and not in C)

add GOTO(I,X) to C;

until no new set of items are added to C on a round;

}
Example
E’->E acc
E -> E + T | T $ I6 I9
E->E+.T
T -> T * F | F I1 T->.T*F T
F -> (E) | id E’->E. + T->.F
E->E+T.
T->T.*F
E E->E.+T
F->.(E)
F->.id
I0=closure({[E’->.E]} I2
T I7 I10
E’->.E
E’->T. * T->T*.F F
E->.E+T F->.(E)
T->T.*F T->T*F.
E->.T id F->.id
T->.T*F id
T->.F I5
F->.(E) F->id.
F->.id ( +
I4
F->(.E)
E->.E+T I8 I11
E->.T
E E->E.+T )
T->.T*F F->(E.) F->(E).
T->.F
F->.(E)
F->.id

I3
T>F.
LR(0) Parsing Table
LR(0) Stack Implementation: Example: id*id

Line Stack Symbols Input Action

(1) 0 $ id*id$ Shift to 5
(2) 05 $id *id$ Reduce by F->id
(3) 03 $F *id$ Reduce by T->F
(4) 02 $T *id$ Shift to 7
(5) 027 $T* id$ Shift to 5
(6) 0275 $T*id $ Reduce by F->id
(7) 02710 $T*F $ Reduce by T->T*F
(8) 02 $T $ Reduce by E->T
(9) 01 $E $ accept
ii. Simple LR(1), SLR(1), Parsing
 Few number of states, hence very small table.

 Simple and fast construction.

 Works on smallest class of grammar.

 SLR(1) parsers can parse a larger number of grammars than LR(0).

 SLR(1) has the same Transition Diagram and Goto table as LR(0)
 BUT with different Action table because it looks ahead 1 token.
SLR(1) Look-ahead
 SLR(1) parsers are built first by constructing:

• Transition Diagram,

• then by computing Follow set as SLR(1) look aheads.

 The ideas is:

• A handle (RHS) should NOT be reduced to N

• if the look ahead token is NOT in follow(N)

SLR(1) Look-ahead: Example1

 From previous LR(0) example only reduce move are different.

iii. LALR(l) - Look ahead LR (1)
 Preferred parsing technique in many parser generators
 Close in power to LR(1), but with less number of states
 Increased number of states in LR(1) is because
• Different lookahead tokens are associated with same LR(0) items
 Works on intermediate size of grammar.
 Number of states in LALR(1) = states in LR(0)
 LALR(1) is based on the observation that
• Some LR(1) states have same LR(0) items
• Differ only in lookahead tokens
 LALR(1) can be obtained from LR(1) by
 Merging LR(1) states that have same LR(0) items
 Obtaining the union of the LR(1) lookahead tokens
iV. LR( 1) or CLR(1) parser
 Also called as Canonical LR parser.

 More powerful LR parsers

 Works on complete set of LR(l) Grammar.

 Generates large table and large number of states.

 Slow construction.

 Use look ahead symbols for items: LR(1) items

 Results in a large collection of items

Drawbacks of LR(l) parsers

 It is too much work to construct LR(l) parser by hand.

 It needs an automated parser generator.

 If the grammar contains ambiguities or other constructs then it is difficult to parse in

a left-to-right scan of the input.

Summary : LR(0), SLR(1) , LALR(1) and CLR(1)
• LR(0)- Least powerful
• SLR(1) – Simple LR Parser:
– Works on smallest class of grammar
– Few number of states, hence very small table
– Simple and fast construction
• LALR(1) – Look-Ahead LR Parser:
– Works on intermediate size of grammar
– Number of states are same as in SLR(1)
• CLR(1) – LR Parser:
– Works on complete set of LR(1) Grammar
– Generates large table and large number of states
– Slow construction
Summary: LALR(l) - Look ahead LR (1)…

 Relative power of various classes

• SLR(1) ≤ LALR(1) ≤ CLR(1)

• SLR(k) ≤ LALR(k) ≤ CLR(k)

• LL(k) ≤ LR(k)
Exercise
Q1. Construct LL(1) parse table for the expression grammar

bexpr  bexpr or bterm | bterm

bterm  bterm and bfactor | bfactor

bfactor  not bfactor | ( bexpr ) | true | false

 Steps to be followed

– Remove left recursion

– Compute FIRST sets

– Compute FOLLOW sets

– Construct the parse table

Exercise Cont.…
Q2.Consider the following grammar

S→(L)|a

L → L,S|S Parse the input string (a,(a,a)) using shift reduce parsing.

Q3. Construct SLR parse table for following grammar

E →E + E | E - E | E * E | E / E | ( E ) | digit Show steps in parsing of string 9*5+(2+3*7)
Steps to be followed
– Augment the grammar
– Construct set of LR(0) items
– Construct the parse table
– Show states of parser as the given string is parsed

FLAT
No ratings yet
FLAT
85 pages
RG, Re-Rg, Fa-Rg, Rg-Fa, RLG-LLG, LLG-RLG
No ratings yet
RG, Re-Rg, Fa-Rg, Rg-Fa, RLG-LLG, LLG-RLG
19 pages
ATCD PPT Module-4
No ratings yet
ATCD PPT Module-4
35 pages
005chapter 5 - Symbol Table and Type Checking
No ratings yet
005chapter 5 - Symbol Table and Type Checking
31 pages
Intermediate Code Generation Overview
No ratings yet
Intermediate Code Generation Overview
23 pages
Finite Automata for CS Students
No ratings yet
Finite Automata for CS Students
39 pages
Theory of Computation-Lecture 1
No ratings yet
Theory of Computation-Lecture 1
78 pages
Lecture-7 Turing Machine As Adder
No ratings yet
Lecture-7 Turing Machine As Adder
15 pages
FAFL-Final-Lecture 2.2
No ratings yet
FAFL-Final-Lecture 2.2
29 pages
Finite Automata and DFA Concepts
No ratings yet
Finite Automata and DFA Concepts
36 pages
Lecture 10
No ratings yet
Lecture 10
39 pages
Theory of Automata
No ratings yet
Theory of Automata
22 pages
Chap 4 PDF
No ratings yet
Chap 4 PDF
33 pages
Turing Machine
No ratings yet
Turing Machine
84 pages
FAFL Final Lecture 1.10 CMH
No ratings yet
FAFL Final Lecture 1.10 CMH
22 pages
Sipser Chapter 2 Solutions
No ratings yet
Sipser Chapter 2 Solutions
9 pages
Unit 5
No ratings yet
Unit 5
95 pages
Atc Notes
No ratings yet
Atc Notes
30 pages
NP Completeness Lecture Notes
No ratings yet
NP Completeness Lecture Notes
7 pages
Hashing and Indexing Techniques Explained
No ratings yet
Hashing and Indexing Techniques Explained
28 pages
Complexity Classes
No ratings yet
Complexity Classes
18 pages
Regular Grammar
No ratings yet
Regular Grammar
56 pages
Closure Properties of Regular Languages
No ratings yet
Closure Properties of Regular Languages
41 pages
NFA To DFA - FST
No ratings yet
NFA To DFA - FST
95 pages
Flat CH 2
No ratings yet
Flat CH 2
86 pages
النظرية الاحتسابية
No ratings yet
النظرية الاحتسابية
667 pages
Unit 2 Compiler
No ratings yet
Unit 2 Compiler
42 pages
07 Rules of Inference
No ratings yet
07 Rules of Inference
34 pages
Sparse Matrices Explained
No ratings yet
Sparse Matrices Explained
28 pages
Chapter-4 - Push Down Automata
No ratings yet
Chapter-4 - Push Down Automata
13 pages
Unit 5 Toc
No ratings yet
Unit 5 Toc
159 pages
Names, Bindings, and Scopes in Programming
No ratings yet
Names, Bindings, and Scopes in Programming
41 pages
Introduction To The Theory of Computation: Part II: Computability Theory
100% (1)
Introduction To The Theory of Computation: Part II: Computability Theory
42 pages
Compiler Syntax Analysis Guide
No ratings yet
Compiler Syntax Analysis Guide
37 pages
03 DSA PPT Algorithmic Anaysis-I
No ratings yet
03 DSA PPT Algorithmic Anaysis-I
28 pages
Compiler Design Notes, IIT Delhi
No ratings yet
Compiler Design Notes, IIT Delhi
147 pages
Compiler Design Code Generation
No ratings yet
Compiler Design Code Generation
4 pages
Theory of Automata Overview
No ratings yet
Theory of Automata Overview
34 pages
Abstract Algebra 1
No ratings yet
Abstract Algebra 1
42 pages
Context Free Grammar and Parsing
0% (1)
Context Free Grammar and Parsing
138 pages
Graph Theory: Dept of Computer Science and Engineering Osmania University College of Engineering Hyderabad 500007
No ratings yet
Graph Theory: Dept of Computer Science and Engineering Osmania University College of Engineering Hyderabad 500007
92 pages
Groups & Symmetries Notes
No ratings yet
Groups & Symmetries Notes
49 pages
Theory of Computation
No ratings yet
Theory of Computation
4 pages
TOC Notes 2020
No ratings yet
TOC Notes 2020
71 pages
CD Lab Manual
100% (1)
CD Lab Manual
55 pages
05 DSA PPT Algorithmic Anaysis-II
No ratings yet
05 DSA PPT Algorithmic Anaysis-II
19 pages
Toc Unit Iv
No ratings yet
Toc Unit Iv
6 pages
CSE231 - Lecture 1
No ratings yet
CSE231 - Lecture 1
31 pages
Closure Properties of Regular Languages
No ratings yet
Closure Properties of Regular Languages
32 pages
TAFL Unit-3
No ratings yet
TAFL Unit-3
26 pages
Formal Languages and Automata Theory: II B.Tech - II Sem (R19)
No ratings yet
Formal Languages and Automata Theory: II B.Tech - II Sem (R19)
26 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
160 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
10 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
9 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Class Three
No ratings yet
Class Three
74 pages
Ch3 - Syntax Analysis
No ratings yet
Ch3 - Syntax Analysis
96 pages
Context-Free Grammar and Parsing Overview
No ratings yet
Context-Free Grammar and Parsing Overview
19 pages
Chapter 1 - The Atmosphere
No ratings yet
Chapter 1 - The Atmosphere
10 pages
Distributed Database Systems Guide
0% (1)
Distributed Database Systems Guide
54 pages
Applications of Syntax-Directed Translation
No ratings yet
Applications of Syntax-Directed Translation
49 pages
Code Optimization Techniques Explained
No ratings yet
Code Optimization Techniques Explained
16 pages
Cryptography and Internet Security Exam
No ratings yet
Cryptography and Internet Security Exam
18 pages
Software Testing - Levels
No ratings yet
Software Testing - Levels
8 pages
Cupp: Custom Wordlist Generator Guide
No ratings yet
Cupp: Custom Wordlist Generator Guide
12 pages
Qhome - Notifications Worker + Push (Spring Boot + Rabbit MQ + FCM) - Sample
No ratings yet
Qhome - Notifications Worker + Push (Spring Boot + Rabbit MQ + FCM) - Sample
9 pages
Operating Systems: Lecture Notes
No ratings yet
Operating Systems: Lecture Notes
131 pages
Class 6 Computer Science Notes PDF
100% (1)
Class 6 Computer Science Notes PDF
5 pages
Rasa Chatbot
No ratings yet
Rasa Chatbot
44 pages
DBT MCQs # 1
No ratings yet
DBT MCQs # 1
29 pages
Naveen Project
No ratings yet
Naveen Project
29 pages
Product Development Process Overview
No ratings yet
Product Development Process Overview
48 pages
Correction Report For Notification Status Conflict
No ratings yet
Correction Report For Notification Status Conflict
2 pages
Gnosis Safe Formal Verification Report 1 0 0
No ratings yet
Gnosis Safe Formal Verification Report 1 0 0
57 pages
Gxw42xx User Manual English
No ratings yet
Gxw42xx User Manual English
64 pages
Financial Modeling Module 1 and 2 Midterms
No ratings yet
Financial Modeling Module 1 and 2 Midterms
6 pages
HP Deskjet F4185 Driver Downloads
No ratings yet
HP Deskjet F4185 Driver Downloads
6 pages
Information System Acquisition
100% (1)
Information System Acquisition
18 pages
3087S (RGB) UserManual-V2 0
No ratings yet
3087S (RGB) UserManual-V2 0
24 pages
SoundGrid Driver User Guide
No ratings yet
SoundGrid Driver User Guide
18 pages
Multifunctional Java Calculator Project
45% (11)
Multifunctional Java Calculator Project
8 pages
Phabsim Manual
No ratings yet
Phabsim Manual
299 pages
S54 - Civil Rights Labor Management Systems - LTC2013
No ratings yet
S54 - Civil Rights Labor Management Systems - LTC2013
20 pages
Defaulting Element Selectors Using URL API
No ratings yet
Defaulting Element Selectors Using URL API
3 pages
Kerio Connect Adminguide
No ratings yet
Kerio Connect Adminguide
469 pages
VSICM8 M08 MNG VMs
No ratings yet
VSICM8 M08 MNG VMs
101 pages
Configuration Del Arcadis Varic
100% (2)
Configuration Del Arcadis Varic
120 pages
SQL Injection Notes
No ratings yet
SQL Injection Notes
2 pages
Communication Hardware & Communication Software
No ratings yet
Communication Hardware & Communication Software
16 pages
Photoshop Useful Shortcut Key Chart For Windows - : Main Toolbar
No ratings yet
Photoshop Useful Shortcut Key Chart For Windows - : Main Toolbar
6 pages
Planmeca Twain Interface
No ratings yet
Planmeca Twain Interface
29 pages
Red Hat System Administration II (RH134R) (RHEL 8) - 6
0% (1)
Red Hat System Administration II (RH134R) (RHEL 8) - 6
11 pages
Sicar@Tia + Diagaddon
No ratings yet
Sicar@Tia + Diagaddon
33 pages
Cisa High Temperature Steam Sterilizer 6464
No ratings yet
Cisa High Temperature Steam Sterilizer 6464
8 pages